| |
|
Information Extraction from Wikipedia
Krištof, Tomáš ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This bachelor's thesis describes the issue of information extraction from unstructured text. The first part contains summary of basic techniques used for information extracting. Thereafter, concept and realization of the system for information extraction from Wikipedia is described. In the last part of thesis, results, coming from experiments, are analysed.
|
|
Text Messages Manager for Android
Bloudíček, Jan ; Otrusina, Lubomír (referee) ; Kouřil, Jan (advisor)
This thesis describes a creation of an application for mobile devices which is designed to manage short text messages and electronic mail on the Android platform. Text messages can be sent also through the SMS gateways. This work explains basic concepts and technologies for developing applications for Android. It describes the analysis phase, design of architecture and user interface, implementation and testing of the program.
|
|
Analysis of Social Media Content Discussing Czech Mobile Operators
Pavlů, Jan ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
The main topic of this thesis is sentiment analysis of posts obtained from a social networks. The posts are about czech mobile network operators. The essential part of implemented system is also data visualization. The sentiment analysis is done using machine learning techniques. Downloaded posts are cleaned, lemmatized and transformed to feature vectors. Stochastic Gradient Descent algorithm is used for classification. Analyzed data are visualized in charts and as the list of posts. The system provides tools for text categorization. The accuracy, precision, recall and F1 score of sentiment analysis is about 75%. The accuracy of post categorization is high (about 80%), but precision, recall and F1 score are low (about 30%). This is the reason why post categorization isn't automatically done. The benefit of the system it that it automatically collects data from different sources, analysis them and displays them. It also provides tools for manual change of sentiment/categories which can lead to better system characteristics with some help of users.
|
|
Game Portal in Cloud
Pečínka, Zdeněk ; Otrusina, Lubomír (referee) ; Škoda, Petr (advisor)
This thesis deals with the development of the gaming portal in Cloud designed for selected PaaS provider. Informs about the possibilities offered by cloud computing. Summarizes the advantages and disadvantages of PaaS providers. It pursues the design and the implementation of the application, focusing on its scalability and modularity. The result is the gaming portal providing games Tanks and Ships, which you can play against other users or against a computer controlled opponent. The solution contains the evaluation of the scalability of the critical components, based on the results of the stress tests.
|
|
Word Sense Clustering
Hošták, Viliam Samuel ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This thesis deals with semantic similarity of words. It describes and compares existing models that are currently used for this purpose. It discusses the design and implementation of the system for corpus preprocessing, semantic modelling and retrieval of semantically related words. The system that has been created supports the use of distributional semantic models Word2vec, FastText and Glove.
|
|
Syntactic Analyzer for Czech Language
Beneš, Vojtěch ; Otrusina, Lubomír (referee) ; Kouřil, Jan (advisor)
Master’s thesis describes theoretical basics, solution design, and implementation of constituency (phrasal) parser for Czech language, which is based on a part of speech association into phrases. Created program works with manually built and annotated Czech sample corpus to generate probabilistic context free grammar within runtime machine learning. Parser implementation, based on extended CKY algorithm, then for the input Czech sentence decides if the sentence can be generated by the created grammar and for the positive cases constructs the most probable derivation tree. This result is then compared with the expected parse to evaluate constituency parser success rate.
|
|
Semantic Similarity Methods in Folksonomies
Kadlec, Jan ; Otrusina, Lubomír (referee) ; Schmidt, Marek (advisor)
Bakalářská práce byla vypracována na studijním pobytu na "Aalborg University" v Dánsku, a byla zpracována v angličtině. Folksonomie jsou nový, uživateli řízený přístup ke klasifikaci a důležitá čast Web 2.0. Jsou také jediným přístupem, který je schopen udržet krok s dnešní rychlostí expanze webu, tím že předá uživatelům odpovědnost za klasifikaci. Pokud folksonomie obsahují dostatečné množství dat, dají se k mnohému využít. Tato práce se zaměřuje na metody sémantické podobnosti ve folksonomiích. Cílem této práce bylo odzkoušet mnohé metody na vzorku tří datových sad - delicious.com, Last.fm a medworm.com. Toto bylo vykonáno za pomocí kotvících dat z WordNetu, Open Directory Project a zdravotně orientované ontologie. Výsledky přinesené touto prací indikují, že metody sémantické podobnosti mohou být použity k úspěšnému měření podobností v mnohých doménách.
|
| |
|
Entity Knowledge Base Creation from Czech Wikipedia
Sychra, Martin ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
The aim of this thesis is to propose and implement a system for an automatic extraction of named entities from Czech Wikipedia, to create a knowledge base consisting of these entities and to evaluate results of the created system. The first part explains basic notions of this field and discusses related work. The main part proposes several methods of extraction and details their implementation. The following types of entities are extracted: people, places, events and organizations. The final part of the thesis presents results, i.e., the success of the individual methods for each entity type and statistics on extraction of the individual entities in the whole Czech Wikipedia context.
|