National Repository of Grey Literature 2 records found  Search took 0.00 seconds. 
German Compounds in Transformer Models
Neumannová, Kristýna ; Bojar, Ondřej (advisor) ; Zeman, Daniel (referee)
German is known for its highly productive word formation processes, particularly in the area of compounding and derivation. In this thesis, we focus on German nominal compounds and their representation in machine translation (MT) outputs. Despite their importance in German text, commonly used metrics for MT evaluation, such as BLEU, do not adequately capture the usage of compounds. The aim of this thesis was to investigate the generation of German compounds in Transformer models and to explore the conditions that lead to their production. Our analysis revealed that MT systems tend to produce fewer compounds than humans. However, we found that due to the highly productive nature of German compounds, it is not feasible to identify them based on a fixed list. Therefore, we manually identified novel compounds, and even then, human translations still contained more compounds than MT systems. We trained our own Transformer model for English-German translation and conducted experiments to examine various factors that influence the production of compounds, in- cluding word segmentation and the frequency of compounds in the training data. Addi- tionally, we explored the use of forced decoding and the impact of providing the model with the first words of a sentence during translation. Our findings highlight the...
Identification and analysis of Czech equivalents of German compounds
Neumannová, Kristýna ; Ševčíková, Magda (advisor) ; Zeman, Daniel (referee)
This bachelor thesis deals with automatic identification of Czech equivalents of Ger- man nominal compounds and their linguistic analysis. Compounding is a word formation process which is exploited in both languages, however, in German it is much more pro- ductive than in Czech, where the derivation word formation process predominates. The first part of the thesis copes with identification of Czech counterparts of Ger- man compounds with the help of parallel corpora and tools for phrase-based statistical machine translation. After the identification, one-word, two-word and multi-word Czech equivalents were distinguished. The Czech equivalents were analysed according to their part-of-speech tags. Over a half of the German nominal compounds correspond to a se- quence of two or more words in Czech, most of the sequences are made up of an adjective and a noun. Morphological structure of one-word equivalents was studied and these equivalents were distinguished into compounds and derivatives, in which the second part of the German compound corresponds to a suffix in the Czech counterpart. 1

See also: similar author names
7 NEUMANNOVÁ, Kateřina
1 Neumannová, Karin
2 Neumannová, Karolína
7 Neumannová, Kateřina
4 Neumannová, Klára
Interested in being notified about new results for this query?
Subscribe to the RSS feed.