National Repository of Grey Literature 2 records found  Search took 0.01 seconds. 
Automatic dictionary acquisition from parallel corpora
Popelka, Jan ; Pecina, Pavel (advisor) ; Mareček, David (referee)
In this work, an extensible word-alignment framework is implemented from scratch. It is based on a discriminative method that combines a wide range of lexical association measures and other features and requires a small amount of manually word-aligned data to optimize parameters of the model. The optimal alignment is found as minimum-weight edge cover, selected suboptimal alignments are used to estimate confidence of each alignment link. Feature combination is tuned in the course of many experiments with respect to the results of evaluation. The evaluation results are compared to GIZA++. The best trained model is used to word-align a large Czech-English parallel corpus and from the links of highest confidence a bilingual lexicon is extracted. Single-word translation equivalents are sorted by their significance. Lexicons of different sizes are extracted by taking top N translations. Precision of the lexicons is evaluated automatically and also manually by judging random samples.
Automatic dictionary acquisition from parallel corpora
Popelka, Jan ; Pecina, Pavel (advisor) ; Mareček, David (referee)
In this work, an extensible word-alignment framework is implemented from scratch. It is based on a discriminative method that combines a wide range of lexical association measures and other features and requires a small amount of manually word-aligned data to optimize parameters of the model. The optimal alignment is found as minimum-weight edge cover, selected suboptimal alignments are used to estimate confidence of each alignment link. Feature combination is tuned in the course of many experiments with respect to the results of evaluation. The evaluation results are compared to GIZA++. The best trained model is used to word-align a large Czech-English parallel corpus and from the links of highest confidence a bilingual lexicon is extracted. Single-word translation equivalents are sorted by their significance. Lexicons of different sizes are extracted by taking top N translations. Precision of the lexicons is evaluated automatically and also manually by judging random samples.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.