National Repository of Grey Literature 3 records found  Search took 0.01 seconds. 
Language Modelling for German
Tlustý, Marek ; Bojar, Ondřej (advisor) ; Hana, Jiří (referee)
The thesis deals with language modelling for German. The main concerns are the specifics of German language that are troublesome for standard n-gram models. First the statistical methods of language modelling are described and language phenomena of German are explained. Following that suggests own variants of n-gram language models with an aim to improve these problems. The models themselves are trained using the standard n-gram methods as well as using the method of maximum entropy with n-gram features. Both possibilities are compared using corelation metrics of hand-evaluated fluency of sentences and automatic evaluation - the perplexity. Also, the computation requirements are compared. Next, the thesis presents a set of own features that represent the count of grammatical errors of chosen phenomena. Success rate is verified on ability to predict the hand-evaluated fluency. Models of maximum entropy and own models that classify only using the medians of phenomena values computed from training data are used.
Coarse Word Representations in Machine Translation into Czech
Tlustý, Marek ; Bojar, Ondřej (advisor) ; Mareček, David (referee)
In this thesis we deal with the possibilities of the coarse word representation in machine translation from German and Hungarian into Czech. First, we compare the different tools for splitting of German and Hungarian compounds. For Hungarian we additionally designed several variants of nouns splitting. Then we experiment with word classes, where we combine splitting of words and several different configurations of word classes. Specially we use the bilingual classes. After that comparison for a translation from German or Hungarian into Czech is made. Outputs are evaluated by automatic metrics BLEU and METEOR. The best configurations are evaluated manually afterwards. It turns out that the solitary splitting of German compounds and Hungarian nouns does not lead to much better results when translated into Czech. In combination with the word classes there is a noticable improvement.
Language Modelling for German
Tlustý, Marek ; Bojar, Ondřej (advisor) ; Hana, Jiří (referee)
The thesis deals with language modelling for German. The main concerns are the specifics of German language that are troublesome for standard n-gram models. First the statistical methods of language modelling are described and language phenomena of German are explained. Following that suggests own variants of n-gram language models with an aim to improve these problems. The models themselves are trained using the standard n-gram methods as well as using the method of maximum entropy with n-gram features. Both possibilities are compared using corelation metrics of hand-evaluated fluency of sentences and automatic evaluation - the perplexity. Also, the computation requirements are compared. Next, the thesis presents a set of own features that represent the count of grammatical errors of chosen phenomena. Success rate is verified on ability to predict the hand-evaluated fluency. Models of maximum entropy and own models that classify only using the medians of phenomena values computed from training data are used.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.