National Repository of Grey Literature 62 records found  beginprevious43 - 52next  jump to record: Search took 0.00 seconds. 
Measures of Machine Translation Quality
Macháček, Matouš ; Bojar, Ondřej (advisor) ; Kuboň, Vladislav (referee)
Title: Measures of Machine Translation Quality Author: Matouš Macháček Department: Institute of Formal and Applied Linguistics Supervisor: RNDr. Ondřej Bojar, Ph.D. Abstract: We explore both manual and automatic methods of machine trans- lation evaluation. We propose a manual evaluation method in which anno- tators rank only translations of short segments instead of whole sentences. This results in easier and more efficient annotation. We have conducted an annotation experiment and evaluated a set of MT systems using this method. The obtained results are very close to the official WMT14 evaluation results. We also use the collected database of annotations to automatically evalu- ate new, unseen systems and to tune parameters of a statistical machine translation system. The evaluation of unseen systems, however, does not work and we analyze the reasons. To explore the automatic methods, we organized Metrics Shared Task held during the Workshop of Statistical Ma- chine Translation in years 2013 and 2014. We report the results of the last shared task, discuss various metaevaluation methods and analyze some of the participating metrics. Keywords: machine translation, evaluation, automatic metrics, annotation
Porovnáni metod česko-ruského automatického překladu
Bílek, Karel ; Kuboň, Vladislav (advisor) ; Bojar, Ondřej (referee)
In this thesis, I am presenting several methods of Czech-to-Russian ma- chine translation, including both historical approaches and more modern ones, and including both phrase-based and rule-based systems. I am rst brie y describing the linguistic background of Czech and Russian, and their common history and di er- ences. en, I am describing automating, building and improving some o he ma- chine translation systems, together with their comparison, using both an automated metric and a limited human annotation. Meanwhile, I am also describing the creation of a several corpora of Czech-Russian parallel data and Russian monolingual data.
Japanese-Czech Machine Translation
Variš, Dušan ; Bojar, Ondřej (advisor) ; Popel, Martin (referee)
Machine translation (MT) using deep sentence analysis is not as widespread as other MT methods, however we believe that some of its aspects can contribute to the overall translation quality. It is also important to try out deep MT methods with various language pairs. In our case, we experiment with the language pair Japanese-Czech. As a part of this task, we also had to collect and process necessary parallel data. Due to a very small amount of such data being available, we were forced to devise aproaches tackling this problem. Our system is based on the same principles as the TectoMT translation system, therefore it was implemented within the same platform. In the process, we tried to capture at least some basic linguistic phenomena characteristic for Japanese. As a part of our research, we also compared our system with a simple phrase-based baseline. Powered by TCPDF (www.tcpdf.org)
Implementing Machine Translation in an SME
Hermanová, Barbora ; Svoboda, Tomáš (advisor) ; Špirk, Jaroslav (referee)
The thesis deals with the topic of implementation of machine translation (MT) in an SME with an emphasis on legal translation. The theoretical part brings together the existing research relevant for this topic, focusing in particular on the specifics of MT between Czech and English, the task of post-editing (PEMT), including the skills and competences required from post-editors, recommendations for, and experience with, MT implementation, MT evaluation, PEMT productivity and translator attitudes towards MT. In its empirical part, the thesis draws on a case study of MT implementation in a Czech language service provider (LSP), with a focus on selecting a suitable MT tool and incorporating it in the workflow. Furthermore, an experiment is performed with professional translators, aimed at measuring productivity of translation and post-editing in terms of the time spent in the respective tasks and analysing and comparing selected aspects of the output produced by the translators and post-editors participating in the experiment. The analytical model employed is an error-based human evaluation model. Lastly, a questionnaire is used to ascertain the experience of translators/post-editors with MT and their attitudes towards this technology. The thesis ultimately provides a set of findings that can be used...
Automatic post-editing of phrase-based machine translation outputs
Rosa, Rudolf ; Mareček, David (advisor) ; Žabokrtský, Zdeněk (referee)
We present Depfix, a system for automatic post-editing of phrase-based English-to-Czech machine trans- lation outputs, based on linguistic knowledge. First, we analyzed the types of errors that a typical machine translation system makes. Then, we created a set of rules and a statistical component that correct errors that are common or serious and can have a potential to be corrected by our approach. We use a range of natural language processing tools to provide us with analyses of the input sentences. Moreover, we reimple- mented the dependency parser and adapted it in several ways to parsing of statistical machine translation outputs. We performed both automatic and manual evaluations which confirmed that our system improves the quality of the translations.
CAT Tools in German - Czech Translation
Handšuhová, Jana ; Svoboda, Tomáš (advisor) ; Špirk, Jaroslav (referee)
Abstract This thesis handles special translation software, the mastery of which is becoming one of the basic requirements of successful translation work. The theoretical part describes the historical development, classification and main functions of translation memory systems. The thesis will further attempt to determine the criteria for the effective use of CAT tools and explore the text types and sorts for which the translation memory systems are most commonly used in the translation process. The functional view of the language-based text typology and the principles on which the translation memory systems work will also be handled. The practical part compares the result of a translation process (translation as a product) with and without CAT tools. The corpus of parallel texts (original translation) will be subjected to a translation analysis. This analysis concludes the levels which are affected by differences between translations made with and without CAT tools. The differences in the actual translation process with and without CAT tools which are not empirically verifiable will be analysed based on a survey conducted amongst translators. Then, the empirical part of the findings are summarized and systemized. The last chapter deals with the expected development in the translation market, the...
Feature Selection for Factored Phrase-Based Machine Translation
Tamchyna, Aleš ; Bojar, Ondřej (advisor) ; Popel, Martin (referee)
In the presented work we investigate factored models for machine translation. We provide a thorough theoretical description of this machine translation paradigm. We describe a method for evaluating the complexity of factored models and verify its usefulness in practice. We present a software tool for automatic creation of machine translation experiments and search in the space of possible configurations. In the experimental part of the work we verify our analyses and give some insight into the potential of factored systems. We indicate some of the possible directions that lead to improvement in translation quality, however we conclude that it is not possible to explore these options in a fully automatic way.
Metrics for Optimizing Statistical Machine Translation
Macháček, Matouš ; Bojar, Ondřej (advisor) ; Popel, Martin (referee)
State-of-the-art MT systems use so called log-linear model, which combines several components to predict the probability of the translation of a given sentence. Each component has its weight in the log-linear model. These weights are generally trained to optimize BLEU, but there are many alternative automatic metrics and some of them correlate better with human judgments than BLEU. We explore various metrics (PER, WER, CDER, TER, BLEU and SemPOS) in terms of correlation with human judgments. Metric SemPOS is examined in more detail and we propose some approximations and variants. We use the examined metrics to train Czech to English MT system using MERT method and explore how optimizing toward various automatic evaluation metrics affects the resulting model.
Automatic Alignment of Tectogrammatical Trees from Czech-English Parallel Corpus
Mareček, David
Title: Automatic Alignment of Tectogrammatical Trees from Czech-English Parallel Corpus Author: David Mareček Department: Institute of Formal and Applied Linguistics Supervisor: Ing. Zdeněk Žabokrtský, Ph.D. Abstract: The goal of this thesis is to implement and evaluate a software tool for automatic alignment of Czech and English tectogrammatical trees. The task is to find correspondent nodes between two trees that represent an English sentence and its Czech translation. Great amount of aligned trees acquired from parallel corpora can be used for training transfer models for machine translation systems. It is also useful for linguists in studying translation equivalents in two languages. In this thesis there is also described word alignment annotation process. The manual word alignment was necessary for evaluation of the aligner. The results of our experiments show that shifting the alignment task from the word layer to the tectogrammatical layer both (a) increases the interannotator agreement on the task and (b) allows to construct a feature-based algorithm which uses sentence structure and which outperforms the GIZA++ aligner in terms of f-measure on aligned tectogrammatical node pairs. This is probably caused by the fact that tectogrammatical representations of Czech and English sentences are much closer...
Rich Features in Phrase-Based Machine Translation
Kos, Kamil ; Žabokrtský, Zdeněk (referee)
Title: Rich Features in Phrase-Based Machine Translation Author: Kamil Kos Department: Institute of Formal and Applied Linguistics Supervisor: RNDr. Ondřej Bojar, Ph.D. Supervisor's e-mail address: bojar@ufal.mff.cuni.cz Keywords: machine translation, quality evaluation, source-context model, suffix array Abstract: In this thesis we investigate several methods how to improve the quality of statistical machine translation (MT) by using linguistically rich information. First, we describe SemPOS, a metric that uses shallow semantic representation of sentences to evaluate the translation quality. We show that even though this metric has high correlation with human assessment of translation quality it is not directly suitable for system parameter optimization. Second, we extend the log-linear model used in statistical MT by addi- tional source-context model that helps to better distinguish among possible translation options and select the most promising translation for a given context.

National Repository of Grey Literature : 62 records found   beginprevious43 - 52next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.