National Repository of Grey Literature 9 records found  Search took 0.00 seconds. 
Determination of basic form of words
Šanda, Pavel ; Burget, Radim (referee) ; Karásek, Jan (advisor)
Lemmatization is an important preprocessing step for many applications of text mining. Lemmatization process is similar to the stemming process, with the difference that determines not only the word stem, but it´s trying to determines the basic form of the word using the methods Brute Force and Suffix Stripping. The main aim of this paper is to present methods for algorithmic improvements Czech lemmatization. The created training set of data are content of this paper and can be freely used for student and academic works dealing with similar problematics.
Methods for Mining Association Rules from Data
Uhlíř, Martin ; Burget, Radek (referee) ; Bartík, Vladimír (advisor)
The aim of this thesis is to implement Multipass-Apriori method for mining association rules from text data. After the introduction to the field of knowledge discovery, the specific aspects of text mining are mentioned. In the mining process, preprocessing is a very important problem, use of stemming and stop words dictionary is necessary in this case. Next part of thesis deals with meaning, usage and generating of association rules. The main part is focused on the description of Multipass-Apriori method, which was implemented. On the ground of executed tests the most optimal way of dividing partitions was set and also the best way of sorting the itemsets. As a part of testing, Multipass-Apriori method was compared with Apriori method.
Information Retrieval in Czech Wikipedia
Balgar, Marek ; Bartík, Vladimír (referee) ; Chmelař, Petr (advisor)
The main task of this Masters Thesis is to understand questions of information retrieval and text classifi cation. The main research is focused on the text data, the semantic dictionaries and especially the knowledges inferred from the Wikipedia. In this thesis is also described implementation of the querying system, which is based on achieved knowledges. Finally properties and possible improvements of the system are talked over.
Application for Text Summarization
Mička, Jakub ; Zendulka, Jaroslav (referee) ; Bartík, Vladimír (advisor)
This work is focused on an implementation a web application, which is a tool for automatic English text summarization. In result, automatic text summarization is made by TextRank and Latent semantic analysis method. Both of these methods are improved by named entity recognition. The main benefit of this work is proving that using the named entity recognition with Latent semantic analysis and especially with TextRank method leads to creation of higher quality summaries. This quality of the summaries was verified by ROUGE metrics.
Information Retrieval in Czech Wikipedia
Balgar, Marek ; Bartík, Vladimír (referee) ; Chmelař, Petr (advisor)
The main task of this Masters Thesis is to understand questions of information retrieval and text classifi cation. The main research is focused on the text data, the semantic dictionaries and especially the knowledges inferred from the Wikipedia. In this thesis is also described implementation of the querying system, which is based on achieved knowledges. Finally properties and possible improvements of the system are talked over.
Application for Text Summarization
Mička, Jakub ; Zendulka, Jaroslav (referee) ; Bartík, Vladimír (advisor)
This work is focused on an implementation a web application, which is a tool for automatic English text summarization. In result, automatic text summarization is made by TextRank and Latent semantic analysis method. Both of these methods are improved by named entity recognition. The main benefit of this work is proving that using the named entity recognition with Latent semantic analysis and especially with TextRank method leads to creation of higher quality summaries. This quality of the summaries was verified by ROUGE metrics.
Methods for Mining Association Rules from Data
Uhlíř, Martin ; Burget, Radek (referee) ; Bartík, Vladimír (advisor)
The aim of this thesis is to implement Multipass-Apriori method for mining association rules from text data. After the introduction to the field of knowledge discovery, the specific aspects of text mining are mentioned. In the mining process, preprocessing is a very important problem, use of stemming and stop words dictionary is necessary in this case. Next part of thesis deals with meaning, usage and generating of association rules. The main part is focused on the description of Multipass-Apriori method, which was implemented. On the ground of executed tests the most optimal way of dividing partitions was set and also the best way of sorting the itemsets. As a part of testing, Multipass-Apriori method was compared with Apriori method.
Methods of Text Document Summarization
Pokorný, Lubomír ; Očenášek, Pavel (referee) ; Bartík, Vladimír (advisor)
This thesis deals with one-document summarization of text data. Part of it is devoted to data preparation, mainly to the normalization. Listed are some of the stemming algorithms and it contains also description of lemmatization. The main part is devoted to Luhn"s method for summarization and its extension of use WordNet dictionary. Oswald summarization method is described and applied as well. Designed and implemented application performs automatic generation of abstracts using these methods. A set of experiments where developed, which verified correct functionality of the application and of extension of Luhn"s summarization method too.
Determination of basic form of words
Šanda, Pavel ; Burget, Radim (referee) ; Karásek, Jan (advisor)
Lemmatization is an important preprocessing step for many applications of text mining. Lemmatization process is similar to the stemming process, with the difference that determines not only the word stem, but it´s trying to determines the basic form of the word using the methods Brute Force and Suffix Stripping. The main aim of this paper is to present methods for algorithmic improvements Czech lemmatization. The created training set of data are content of this paper and can be freely used for student and academic works dealing with similar problematics.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.