National Repository of Grey Literature 38 records found  beginprevious29 - 38  jump to record: Search took 0.01 seconds. 
Web Interface for the Treex Framework
Sedlák, Michal ; Popel, Martin (advisor) ; Rosa, Rudolf (referee)
This work deals with a web application called Treex::Web which serves as a web interface for NLP framework Treex. The work addresses several Treex issues (e.g. absence of graphical user interface and complicated installation) and offers Treex::Web as a possible solution. At the beginning of this work we introduce the Treex framework itself. The following chapters describe Treex::Web's user interface (chapter 3) and the implementation of the whole web application (chapter 4). Conclusion of this work includes a comparison of NLP frameworks similar to Treex and their web interfaces. 1
Tool for comparison and evaluation of machine translation
Klejch, Ondřej ; Popel, Martin (advisor) ; Tamchyna, Aleš (referee)
This bachelor thesis is about development of a tool for comparison and eva- luation of machine translation called MT-ComparEval. With this tool it is possi- ble to compare translations according to several criteria, such as automatic met- rics of machine translation quality computed on whole documents or single sen- tences, quality comparison of single sentence translation with highlighting confir- med, improving and worsening n-grams or summaries of the most improving and worsening n-grams for the whole document. When comparing two translations, MT-ComparEval also plots a chart with absolute differences of metrics compu- ted on single sentences and a chart with values obtained from paired bootstrap resampling.
Popularity Meter
Hajič, Jan ; Bojar, Ondřej (advisor) ; Popel, Martin (referee)
Having the possibility of automatically tracking a person's popularity in the newspapers is an idea appealing not just to those in the media spotlight. While sentiment (subjectivity) analysis is a rapidly growing subfield of computational linguistics, no data from the news domain are yet available for Czech. We have therefore started building a manually annotated polarity corpus of sentences from Czech news texts; however, these texts have proven themselves rather unwieldy for such processing. We have also designed a classifier which should be able to track popularity based on this corpus; the classifier has been tested on a corpus of product reviews of domestic appliances and some introductory testing has been done on the nascent news corpus. As a model, we simply extract a unigram polarity lexicon from the data. We then use three related methods for identifying lemma polarity and a number of simple filters for feature selection. On the domestic appliance data, our simplest model has achieved results comparable to the state of the art, however, the properties of Czech news texts and preliminary results hint a more linguistically oriented approach might be preferrable.
Hybrid Machine Translation Approaches for Low-Resource Languages
Kamran, Amir ; Popel, Martin (advisor) ; Kuboň, Vladislav (referee)
In recent years, corpus based machine translation systems produce significant results for a number of language pairs. However, for low-resource languages like Urdu the purely statistical or purely example based methods are not performing well. On the other hand, the rule-based approaches require a huge amount of time and resources for the development of rules, which makes it difficult in most scenarios. Hybrid machine translation systems might be one of the solutions to overcome these problems, where we can combine the best of different approaches to achieve quality translation. The goal of the thesis is to explore different combinations of approaches and to evaluate their performance over the standard corpus based methods currently in use. This includes: 1. Use of syntax-based and dependency-based reordering rules with Statistical Machine Translation. 2. Automatic extraction of lexical and syntactic rules using statistical methods to facilitate the Transfer-Based Machine Translation. The novel element in the proposed work is to develop an algorithm to learn automatic reordering rules for English-to-Urdu statistical machine translation. Moreover, this approach can be extended to learn lexical and syntactic rules to build a rule-based machine translation system.
Word prediction using language models
Koutný, Michal ; Popel, Martin (advisor) ; Novák, Michal (referee)
The thesis utilizes ngram language models to improve text entry with QWERTY keyboard by the means of word prediction. Related solutions are briedly introduced. Then follows theoretical background for the work. The analysis in the next part divides problems into four tasks: language model training, incorporating model for word prediction, GUI component and evaluation framework. The realization combines Python and C++. The used corpora come from Czech (19\,M words) and (84\,M words) English Wikipedia articles. A small corpus of Czech educative texts was used to test domain adaptation. The quality metrics are defined and various configuration are measured. The best solutions reduced keystrokes per character to 0.44, resp. 0.55 for English, resp. Czech on testing data.
Feature Selection for Factored Phrase-Based Machine Translation
Tamchyna, Aleš ; Bojar, Ondřej (advisor) ; Popel, Martin (referee)
In the presented work we investigate factored models for machine translation. We provide a thorough theoretical description of this machine translation paradigm. We describe a method for evaluating the complexity of factored models and verify its usefulness in practice. We present a software tool for automatic creation of machine translation experiments and search in the space of possible configurations. In the experimental part of the work we verify our analyses and give some insight into the potential of factored systems. We indicate some of the possible directions that lead to improvement in translation quality, however we conclude that it is not possible to explore these options in a fully automatic way.
Metrics for Optimizing Statistical Machine Translation
Macháček, Matouš ; Bojar, Ondřej (advisor) ; Popel, Martin (referee)
State-of-the-art MT systems use so called log-linear model, which combines several components to predict the probability of the translation of a given sentence. Each component has its weight in the log-linear model. These weights are generally trained to optimize BLEU, but there are many alternative automatic metrics and some of them correlate better with human judgments than BLEU. We explore various metrics (PER, WER, CDER, TER, BLEU and SemPOS) in terms of correlation with human judgments. Metric SemPOS is examined in more detail and we propose some approximations and variants. We use the examined metrics to train Czech to English MT system using MERT method and explore how optimizing toward various automatic evaluation metrics affects the resulting model.
Možnosti zlepšení strojového překladu z angličtiny do češtiny
Popel, Martin ; Bojar, Ondřej (referee) ; Žabokrtský, Zdeněk (advisor)
This thesis describes English-Czech Machine Translation as it is implemented in TectoMT system. The transfer uses deep-syntactic dependency (tectogrammatical) trees and exploits the annotation scheme of Prague Dependency Treebank. The primary goal of the thesis is to improve the translation quality using both rule-base and statistical methods. First, we present a manual annotation of translation errors in 250 sentences and subsequent identi cation of frequent errors, their types and sources. The main part of the thesis describes the design and implementation of modi cations in the three transfer phases: analysis, transfer and synthesis. The most prominent modi cation is a novel approach to the transfer phase based on Hidden Markov Tree Models (a tree modi cation of Hidden Markov Models). The improvements are evaluated in terms of BLEU and NIST scores.

National Repository of Grey Literature : 38 records found   beginprevious29 - 38  jump to record:
See also: similar author names
1 POPEL, Milan
Interested in being notified about new results for this query?
Subscribe to the RSS feed.