National Repository of Grey Literature 5 records found  Search took 0.01 seconds. 
Analysis and Data Extraction from a Set of Documents Merged Together
Jarolím, Jordán ; Bartík, Vladimír (referee) ; Kreslíková, Jitka (advisor)
This thesis deals with mining of relevant information from documents and automatic splitting of multiple documents merged together. Moreover, it describes the design and implementation of software for data mining from documents and for automatic splitting of multiple documents. Methods for acquiring textual data from scanned documents, named entity recognition, document clustering, their supportive algorithms and metrics for automatic splitting of documents are described in this thesis. Furthermore, an algorithm of implemented software is explained and tools and techniques used by this software are described. Lastly, the success rate of the implemented software is evaluated. In conclusion, possible extensions and further development of this thesis are discussed at the end.
Intelligent Processing of Bookmarks
Brhel, Miroslav ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This thesis deals with intelligent bookmarks processing mainly with web pages clustering according to text similarity. As a practical part of the thesis a system which is capable of bookmarks sorting and clustering was designed.
Analysis and Data Extraction from a Set of Documents Merged Together
Jarolím, Jordán ; Bartík, Vladimír (referee) ; Kreslíková, Jitka (advisor)
This thesis deals with mining of relevant information from documents and automatic splitting of multiple documents merged together. Moreover, it describes the design and implementation of software for data mining from documents and for automatic splitting of multiple documents. Methods for acquiring textual data from scanned documents, named entity recognition, document clustering, their supportive algorithms and metrics for automatic splitting of documents are described in this thesis. Furthermore, an algorithm of implemented software is explained and tools and techniques used by this software are described. Lastly, the success rate of the implemented software is evaluated. In conclusion, possible extensions and further development of this thesis are discussed at the end.
Intelligent Processing of Bookmarks
Brhel, Miroslav ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This thesis deals with intelligent bookmarks processing mainly with web pages clustering according to text similarity. As a practical part of the thesis a system which is capable of bookmarks sorting and clustering was designed.
Classification of electronic documents using cluster analysis
Ševčík, Radim ; Řezanková, Hana (advisor) ; Svátek, Vojtěch (referee)
The current age is characterised by unprecedented information growth, whether it is by amount or complexity. Most of it is available in digital form so we can analyze it using cluster analysis. We have tried to classify the documents from 20 Newsgroups collection in terms of their content only. The aim was to asses available clustering methods in a variety of applications. After the transformation into binary vector representation we performed several experiments and measured the values of entropy, purity and time of execution in application CLUTO. For a small number of clusters the best results offered the direct method (generally hierarchical method), but for more it was the repeated bisection (divisive). Agglomerative method proved not to be suitable. Using simulation we estimated the optimal number of clusters to be 10. For this solution we described in detail features of each cluster using repeated bisection method and i2 criterion function. In the future focus should be set on realisation of binary clustering with advantage of programming languages like Perl or C++. Results of this work might be of interest to web search engine developers and electronic catalogue administrators.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.