National Repository of Grey Literature 3 records found  Search took 0.00 seconds. 
Detection of Duplicates in Huge Web Databases
Sadloň, Vladimír ; Galamboš, Leo (advisor) ; Kopecký, Michal (referee)
This master thesis analyses the methods used for duplicity document detection and possibilities of their integration with a web search engine. It offers an overview of commonly used methods, from which it chooses the method of approximation of the Jaccard similarity measure in combination with shingling. The chosen method is adapted for implementation in the Egothor web search engine environment. The aim of the thesis is to present this implementation, describe its features, and find the most suitable parameters for the detection to run in real time. An important feature of the described method is also the possibility to make dynamic changes over the collection of indexed documents.
Martha E. Williams (1934-2007), her work and her significance for information science
Dvořáková, Drahomíra ; Bratková, Eva (advisor) ; Vlasák, Rudolf (referee)
The purpose of the thesis is to describe life and work of Martha E. Williams related to library and information science. The thesis introduces private life of Martha E. Williams, her professional development in the Illinois Institute of Technology Research Institute in Chicago and University of Illinois at Urbana-Champaign, and her activities in professional library associations and academic awards given to her. The core of the thesis is made of thorough analysis of individual works, activities and projects that Martha E. Williams conducted or significantly participated in. Martha E. Williams contributed to development of databases and information industry, worked as an editor of register of Computer-Readable database (CRDB) and of serial Annual Review of Information Science and Technology (ARIST). Furthemore, Martha E. Williams devoted her time systematically to transparency of information retrieval, database classification and evaluation, usage data, analysis of governmental and private databases and the role of libraries and information centers in the era of rapid expansion of databases.
Detection of Duplicates in Huge Web Databases
Sadloň, Vladimír ; Galamboš, Leo (advisor) ; Kopecký, Michal (referee)
This master thesis analyses the methods used for duplicity document detection and possibilities of their integration with a web search engine. It offers an overview of commonly used methods, from which it chooses the method of approximation of the Jaccard similarity measure in combination with shingling. The chosen method is adapted for implementation in the Egothor web search engine environment. The aim of the thesis is to present this implementation, describe its features, and find the most suitable parameters for the detection to run in real time. An important feature of the described method is also the possibility to make dynamic changes over the collection of indexed documents.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.