National Repository of Grey Literature 22 records found  1 - 10nextend  jump to record: Search took 0.01 seconds. 
Determination of basic form of words
Šanda, Pavel ; Burget, Radim (referee) ; Karásek, Jan (advisor)
Lemmatization is an important preprocessing step for many applications of text mining. Lemmatization process is similar to the stemming process, with the difference that determines not only the word stem, but it´s trying to determines the basic form of the word using the methods Brute Force and Suffix Stripping. The main aim of this paper is to present methods for algorithmic improvements Czech lemmatization. The created training set of data are content of this paper and can be freely used for student and academic works dealing with similar problematics.
Slovak Pattern-based Morphology
Klocok, Andrej ; Dytrych, Jaroslav (referee) ; Smrž, Pavel (advisor)
Theaimofthisthesisistogetacquaintedwithmethodsofmorphologicalanalysis,representation of data of morphological dictionaries, creation of system based on technical patterns for flective morphology of Slovak language. From this system is derived a morphological analyzer, which lemmatizes input words, determines their pattern and a morphological tag, a tool for comparison and evaluation of stemmers, which evaluates stemmers based on a derivative dictionary, a tool for reconstruction of diacritics, which was created as an auxiliary tool. In the last chapters of thesis, individual tools are assessed, morphological analyzer is compared with available alternative,two implementations of Slovak stemmers are evaluated by the tool for stemmer evaluation and the further development of tools is indicated.
Fast Adaptation of Codenames Computer Assistant for New Languages
Jareš, Petr ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This thesis extends a system of an artificial player of a word-association game Codenames to easy addition of support for new languages. The system is able to play Codenames in roles as a guessing player, a clue giver or, by their combination a Duet version player. For analysis of different languages a neural toolkit Stanza was used, which is language independent and enables automated processing of many languages. It was mainly about lemmatization and part of speech tagging for selection of clues in the game. For evaluation of word associations were several models tested, where the best results had a method Pointwise Mutual Information and predictive model fastText. The system supports playing Codenames in 36 languages comprising 8 different alphabets.
Application for Text Summarization
Mička, Jakub ; Zendulka, Jaroslav (referee) ; Bartík, Vladimír (advisor)
This work is focused on an implementation a web application, which is a tool for automatic English text summarization. In result, automatic text summarization is made by TextRank and Latent semantic analysis method. Both of these methods are improved by named entity recognition. The main benefit of this work is proving that using the named entity recognition with Latent semantic analysis and especially with TextRank method leads to creation of higher quality summaries. This quality of the summaries was verified by ROUGE metrics.
Automatic Creation of Dictionaries from Translations
Sumbalová, Lenka ; Kouřil, Jan (referee) ; Smrž, Pavel (advisor)
The aim of this bachelor thesis was to make a system for automatic creation of dictionaries from translations. It describes the implementation of a system that generates Czech-English dictionary from the aligned parallel corpus and summarizes the results. It also analyzed CzEng parallel corpus, which was used as the data source for dictionaries and explainS the theoretical concepts related to this topic.
Parallel Corpus Manager
Kouřil, Jan ; Dytrych, Jaroslav (referee) ; Smrž, Pavel (advisor)
The goal of diploma project was to implement parallel corpus manager, which can align parallel texts in different languages and insert them into corpus, where several more processing functions are provided. Program provides possibilities of automatic text alignment and its interactive editing. These aligned texts are then inserted into corpus. Program can work with multiple corpora, parallel corpus is allways identified by a couple of languages. In corpus, there are possibilities to search by many categories, view and edit particular selections, lemmatize and morphologically tag given texts, sort selections, import and export data, in many ways edit corpus for further easy navigation and add new expressions to managed dictionaries. Particular chapters describe introduction to corpus problematics, theory of aligning parallel texts, morphological text tagging and lemmatization, external tools used in program, most common subtitle formats and implementation solution of particular problems.
Recognition of emotions in Czech texts
Červenec, Radek ; Smékal, Zdeněk (referee) ; Burget, Radim (advisor)
With advances in information and communication technologies over the past few years, the amount of information stored in the form of electronic text documents has been rapidly growing. Since the human abilities to effectively process and analyze large amounts of information are limited, there is an increasing demand for tools enabling to automatically analyze these documents and benefit from their emotional content. These kinds of systems have extensive applications. The purpose of this work is to design and implement a system for identifying expression of emotions in Czech texts. The proposed system is based mainly on machine learning methods and therefore design and creation of a training set is described as well. The training set is eventually utilized to create a model of classifier using the SVM. For the purpose of improving classification results, additional components were integrated into the system, such as lexical database, lemmatizer or derived keyword dictionary. The thesis also presents results of text documents classification into defined emotion classes and evaluates various approaches to categorization.
Czech-English Translation
Petrželka, Jiří ; Schmidt, Marek (referee) ; Smrž, Pavel (advisor)
Tato diplomová práce popisuje principy statistického strojového překladu a demonstruje, jak sestavit systém pro statistický strojový překlad Moses. V přípravné fázi jsou prozkoumány volně dostupné bilingvní česko-anglické korpusy. Empirická analýza časové náročnosti vícevláknových nástrojů pro zarovnání slov demonstruje, že MGIZA++ může dosáhnout až pětinásobného zrychlení, zatímco PGIZA++ až osminásobného zrychlení (v porovnání s GIZA++). Jsou otestovány tři způsoby morfologického pre-processingu českých trénovacích dat za použití jednoduchých nefaktorových modelů. Zatímco jednoduchá lemmatizace může snížit BLEU, sofistikovanější přístupy většinou BLEU zvyšují. Positivní efekty morfologického pre-processingu se vytrácejí s růstem velikosti korpusu. Vztah mezi dalšími charakteristikami korpusu (velikost, žánr, další data) a výsledným BLEU je empiricky měřen. Koncový systém je natrénován na korpusu CzEng 0.9 a vyhodnocen na testovacím vzorku z workshopu WMT 2010.
Slovak Lemmatization
Lipták, Šimon ; Dytrych, Jaroslav (referee) ; Smrž, Pavel (advisor)
Aim of this bachelor thesis was to become familiar with the tools and methods for morphological analysis and lemmatization of words, to design and to implement a system for lemmatization of slovak words, which are not in dictionary and then to write their forms, to process slovak data for implementation of stemming. At the end to score prediction based on testing and to compare with available alternatives.
Fast Adaptation of Codenames Computer Assistant for New Languages
Jareš, Petr ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This thesis extends a system of an artificial player of a word-association game Codenames to easy addition of support for new languages. The system is able to play Codenames in roles as a guessing player, a clue giver or, by their combination a Duet version player. For analysis of different languages a neural toolkit Stanza was used, which is language independent and enables automated processing of many languages. It was mainly about lemmatization and part of speech tagging for selection of clues in the game. For evaluation of word associations were several models tested, where the best results had a method Pointwise Mutual Information and predictive model fastText. The system supports playing Codenames in 36 languages comprising 8 different alphabets.

National Repository of Grey Literature : 22 records found   1 - 10nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.