National Repository of Grey Literature 21 records found  beginprevious12 - 21  jump to record: Search took 0.01 seconds. 
Comparison of approaches to text classification
Knížek, Jan ; Hana, Jiří (advisor) ; Vidová Hladká, Barbora (referee)
The focus of this thesis is short text classification. Short text is the prevailing form of text on e-commerce and review platforms, such as Yelp, Tripadvisor or Heureka. As the popularity of the online communication is increasing, it is becoming infeasible for users to filter information manually. It is therefore becoming more and more important to recog- nise the relevant information in text. Classification of reviews is especially challenging, because they have limited structure, use informal language, contain a high number of errors and rely heavily on context and common knowledge. One of the possible appli- cations of machine learning is to automatically filter data and show users only relevant pieces of information. We work with restaurant reviews from Yelp and aim to predict their usefulness. Most restaurants have relatively many reviews, yet only few are truly useful. Our objective is to compare machine learning methods for predicting usefulness. 1
An online collaborative platform for the development of empirical grammars
Garcia Sevilla, Antonio Fernando ; Rosen, Alexandr (advisor) ; Hana, Jiří (referee)
Modern science has seen the rise in prominence of group research projects and other many-person endeavours, in what has been called "Big Science". Computational linguistics is no exception to that, and especially the devel- opment of large linguistic resources is a task best suited for collaborative approaches. In this document, the design and implementation of an environment for doing computational linguistics online is described. The environment is a software tool, with which development of formal grammars and other types of computational linguistic resources can be performed in a collaborative way. The application supports HPSG as an example paradigm of this kind of work. 1
Native Language Identification of L2 Speakers of Czech
Tydlitátová, Ludmila ; Hana, Jiří (advisor) ; Vidová Hladká, Barbora (referee)
Native Language Identification is the task of identifying an author's na- tive language based on their productions in a second language. The absolute majority of previous work has focused on English as the second language. In this thesis, we work with 3,715 essays written in Czech by non-native speakers. We use machine learning methods to determine whether an au- thors native language belongs to the Slavic language group. By training models with different feature and parameter settings, we were able to reach an accuracy of 78%. 1
Automatic assignment of diagnosis to medical reports
Lachata, Adrián ; Hana, Jiří (advisor) ; Vidová Hladká, Barbora (referee)
The goal of the thesis is to examine the percentage of automatically assigned diagnosis codes (ICD­10) to Czech text medical reports. We used machine learning and text classification algorithms such as Naive Bayes and decision trees. Program WEKA was used for classification. Features selection and data preprocessing were made by our program, which was created exclusive for this purpose. The key features of the program are features selection based on IG or PMI, text lemmatization and stopwords generation by IDF. We took closer look at I10 diagnosis but the results were processed for H660, J00, K30 and Z001 as well. For the curiosity we include a comparison of automatic assignment I10 versus manuals assignment by doctors on a sample of hundred. Out data set was about one million medical reports.
An iOS implementation of the Shannon switching game
Macík, Miroslav ; Vidová Hladká, Barbora (advisor) ; Hana, Jiří (referee)
Shannon switching game is a logical graph game for two players. The game was created by American mathematician Claude Shannon. iOS is an operating system designed for iPhone cellular phone, iPod music player and iPad tablet. The thesis describes existing implementations of the game and also specific implementation for iOS operating system created as a part of this work. This implementation allows you to play against virtual opponent and also supports multiplayer game consisting of two players playing on the same device or through the Internet against each other. Another component of the thesis is deep insight of algorithms describing next move decisions of virtual opponent or the techniques of game plan generation.
Language Modelling for German
Tlustý, Marek ; Bojar, Ondřej (advisor) ; Hana, Jiří (referee)
The thesis deals with language modelling for German. The main concerns are the specifics of German language that are troublesome for standard n-gram models. First the statistical methods of language modelling are described and language phenomena of German are explained. Following that suggests own variants of n-gram language models with an aim to improve these problems. The models themselves are trained using the standard n-gram methods as well as using the method of maximum entropy with n-gram features. Both possibilities are compared using corelation metrics of hand-evaluated fluency of sentences and automatic evaluation - the perplexity. Also, the computation requirements are compared. Next, the thesis presents a set of own features that represent the count of grammatical errors of chosen phenomena. Success rate is verified on ability to predict the hand-evaluated fluency. Models of maximum entropy and own models that classify only using the medians of phenomena values computed from training data are used.
An HPSG-based Formal Grammar of a Core Fragment of Georgian Implemented in TRALE
Abzianidze, Lasha ; Rosen, Alexandr (advisor) ; Hana, Jiří (referee)
Georgian is remarkably different from Indo-European languages. The language has several linguistic phenomena that are challenging both from theoretical and computational points of view. In addition, it is low- resourced and insufficiently studied from the computational point of view. In the thesis, we model morphology and syntax of a core fragment of the language in a formal grammar. Namely, the formal grammar is written in the HPSG framework - one of the most powerful grammar frameworks nowadays. We also implement the grammar in TRALE - a grammar implementation platform, which is faithful to "hand-written" HPSG-based grammars. Note that this is the first application of HPSG to Georgian.
Semantic disambiguation using Distributional Semantics
Prodanovic, Srdjan ; Hana, Jiří (advisor) ; Vidová Hladká, Barbora (referee)
Ve statistických modelů sémantiky jsou významy slov pouze na základě jejich distribuční vlastnosti.Základní zdroj je zde jeden slovník, který lze použít pro různé úkoly, kde se význam slov reprezentovány jako vektory v vektorového prostoru, a slovní podoby jako vzdálenosti mezi jejich vektorových osobnosti. Pomocí silných podobnosti, může vhodnost podmínek uvedených zejména v souvislosti se vypočítá a používá pro celou řadu úkolů, jeden z nich je slovo smysl Disambiguation. V této práci bylo vyšetřeno několik různých přístupů k modelům z vektorového prostoru a prováděny tak, aby k překročení vyhodnocení vlastního výkonu na Word Sense disambiguation úkolem Prague Dependency Treebank.
Automatické osvojení vzorů s minimální supervizí
Klíč, Radoslav ; Hana, Jiří (advisor) ; Hlaváčová, Jaroslava (referee)
The thesis presents a semi-supervised morphology learner developed by extending Paramor (Monson, 2009), an unsupervised system, to accept easy to obtain manually provided data in the form of inflections with marked morpheme boundary. In addition, a hierarchical clustering framework allowing combination of multiple sources of information was developed as a part of the thesis. The approach was tested on Czech, Slovene, German and Catalan and has shown increased F-measure in comparison with the Paramor baseline.
Detekce podezřelých anotací
Václ, Jan ; Vidová Hladká, Barbora (advisor) ; Hana, Jiří (referee)
This work describes a machine learning approach for checking the part-of-speech annotation, and presents its implementation - a system called MissTagger. The checking procedure covers both error detection and error correction. MissTagger employs a simplified instance-based learning algorithm where the words in the text are recognized as instances. Part-of-speech tags of context of static length are selected as features, no lexical information is included. The words whose tags comprises this context are chosen based either on a linear or on a dependency-tree structure of the sentence. Two languages are examined in the experiments for evaluation, Czech and English.

National Repository of Grey Literature : 21 records found   beginprevious12 - 21  jump to record:
See also: similar author names
1 HÁNA, Jiří
1 HÁNA, Jonatan
1 Hána, J.
9 Hána, Jan
Interested in being notified about new results for this query?
Subscribe to the RSS feed.