National Repository of Grey Literature 1 records found  Search took 0.01 seconds. 

Near duplicate detection in large document collections
Benčík, Daniel ; Kopecký, Michal (referee) ; Pecina, Pavel (advisor)
This thesis deals with the problematics of detecting documents, which are so similair one to another, that we can consider them to be (nearly) identical and that in collections having up to millions of documents. The greatest aim of this thesis is a comparison of new, fast algorithms designed to solve this task with current algorithms, which due to their complexitiy cannot be used for large collections. The thesis contains an implementation of both new and current methods of solving the given task toghether with applications that are designed to experimentally compare these methods.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.