National Repository of Grey Literature 53 records found  beginprevious31 - 40nextend  jump to record: Search took 0.00 seconds. 
Measures of Machine Translation Quality
Macháček, Matouš ; Bojar, Ondřej (advisor) ; Kuboň, Vladislav (referee)
Title: Measures of Machine Translation Quality Author: Matouš Macháček Department: Institute of Formal and Applied Linguistics Supervisor: RNDr. Ondřej Bojar, Ph.D. Abstract: We explore both manual and automatic methods of machine trans- lation evaluation. We propose a manual evaluation method in which anno- tators rank only translations of short segments instead of whole sentences. This results in easier and more efficient annotation. We have conducted an annotation experiment and evaluated a set of MT systems using this method. The obtained results are very close to the official WMT14 evaluation results. We also use the collected database of annotations to automatically evalu- ate new, unseen systems and to tune parameters of a statistical machine translation system. The evaluation of unseen systems, however, does not work and we analyze the reasons. To explore the automatic methods, we organized Metrics Shared Task held during the Workshop of Statistical Ma- chine Translation in years 2013 and 2014. We report the results of the last shared task, discuss various metaevaluation methods and analyze some of the participating metrics. Keywords: machine translation, evaluation, automatic metrics, annotation
Porovnáni metod česko-ruského automatického překladu
Bílek, Karel ; Kuboň, Vladislav (advisor) ; Bojar, Ondřej (referee)
In this thesis, I am presenting several methods of Czech-to-Russian ma- chine translation, including both historical approaches and more modern ones, and including both phrase-based and rule-based systems. I am rst brie y describing the linguistic background of Czech and Russian, and their common history and di er- ences. en, I am describing automating, building and improving some o he ma- chine translation systems, together with their comparison, using both an automated metric and a limited human annotation. Meanwhile, I am also describing the creation of a several corpora of Czech-Russian parallel data and Russian monolingual data.
Joining Segments in Czech Complex Sentences
Čech, Josef ; Kuboň, Vladislav (advisor) ; Krůza, Oldřich (referee)
Title: Joining segments in Czech sentences Author: Bc. Josef Čech Department: Institute of Formal and Applied Linguistics Supervisor: doc. RNDr. Vladislav Kuboň Ph.D. e-mail: vk@ufal.mff.cuni.cz Abstract: This thesis follows up segmentation of complex sentences to linguistic motivated objects - segments - and their mutual relations. These relations can be used for next work with segments. Main purpose for mapping relations is their joining into next level unit - clause. Theoretically should be possible to analyse each clause of complex sentence separately. Analysis of set of clauses should be quicker than of analysis whole complex sentence. Segments should be found thanks to linguistic separators and rule approach. Rule approach proves in problem relations between neighbouring segments. This thesis should attest that rule approach is best solution for joining segments into clauses. Position tag of segment was part of this thesis. This tag should be used in methods dealing with segments instead of custom segment. Keyword: segment, clause, tag, joining segments, syntactic analysis
Development Environment Extending the Dialog Management Options of AIML
Brodec, Václav ; Kuboň, Vladislav (advisor) ; Plátek, Ondřej (referee)
The AIML language was created with a goal of authoring of simple chat bots. Therefore it lacks some of the features of advanced dialog systems. One of them is the support for dialog management, which is beneficial in many applications that the language has already spread into due to its popularity. This thesis solves the problem of dialog management implementation in pure AIML by using the augmented transition networks in design and code generation. It results in a development environment that supports the chosen solution, thus facilitating the design of more complex bots, while maintaining compatibility with standard interprets.
Joining Segments in Czech Complex Sentences
Čech, Josef ; Kuboň, Vladislav (advisor) ; Krůza, Oldřich (referee)
Title: Joining segments in Czech sentences Author: Bc. Josef Čech Department: Institute of Formal and Applied Linguistics Supervisor: doc. RNDr. Vladislav Kuboň Ph.D. e-mail: vk@ufal.mff.cuni.cz Abstract: This thesis follows up segmentation of complex sentences to linguistic motivated objects - segments - and their mutual relations. These relations can be used for next work with segments. Main purpose for mapping relations is their joining into next level unit - clause. Theoretically should be possible to analyze each clause of complex sentence separately. Analysis of set of clauses should be quicker than of analysis whole complex sentence. Segments should be found thanks to linguistic separators and rule approach. Rule approach prove in problem relations between neighbouring segments. This thesis should attest that rule approach is best solution for joining segments into clauses. Position tag of segment was part of this thesis. This tag should be used in metods dealing with segments instead of custom segment. Keyword: segment, clause, tag, joinig segments, syntactic analysis
Methods for Creating Subjectivity Lexicon for Indonesian
Franky, ; Bojar, Ondřej (advisor) ; Kuboň, Vladislav (referee)
In this work, we created subjectivity lexicons of positive and negative expres- sions for Indonesian language by automatically translating English lexicons, and by intersecting and unioning the translation results. We compared the perfor- mances of the resulting lexicons using a simple prediction method that compares the number of occurrences of positive and negative expressions in a sentence. We also experimented with weighting the expressions by their frequency and relative frequency in unannotated data. A modification in prediction method using ma- chine learning was later used to better incorporate the information that cannot be captured by the simple prediction. We showed that the lexicons were able to reach high recall but low precision when predicting whether a sentence is eval- uative (positive or negative) or not (neutral). Scoring the expressions improve the recall or precision but with comparable decrease in the other measure. The machine learning prediction was able to minimize the sensitivity of the perfor- mances to the size of the lexicon, but further experiments are required to explore the best choice for the prediction method. 1
Hybrid Machine Translation Approaches for Low-Resource Languages
Kamran, Amir ; Popel, Martin (advisor) ; Kuboň, Vladislav (referee)
In recent years, corpus based machine translation systems produce significant results for a number of language pairs. However, for low-resource languages like Urdu the purely statistical or purely example based methods are not performing well. On the other hand, the rule-based approaches require a huge amount of time and resources for the development of rules, which makes it difficult in most scenarios. Hybrid machine translation systems might be one of the solutions to overcome these problems, where we can combine the best of different approaches to achieve quality translation. The goal of the thesis is to explore different combinations of approaches and to evaluate their performance over the standard corpus based methods currently in use. This includes: 1. Use of syntax-based and dependency-based reordering rules with Statistical Machine Translation. 2. Automatic extraction of lexical and syntactic rules using statistical methods to facilitate the Transfer-Based Machine Translation. The novel element in the proposed work is to develop an algorithm to learn automatic reordering rules for English-to-Urdu statistical machine translation. Moreover, this approach can be extended to learn lexical and syntactic rules to build a rule-based machine translation system.
An Implementation of Methods of Structural Analysis of Czech Complex Sentences
Dutkevič, Jiří ; Kuboň, Vladislav (advisor) ; Holan, Tomáš (referee)
Title: An Implementation of Methods of Structural Analysis of Czech Complex Sentences Author: Jiří Dutkevič Department: Institute of Formal and Applied Linguistics Supervisor: doc. RNDr. Vladislav Kuboň, Ph.D., Institute of Formal and Applied Linguistics Abstract: This paper discusses automated analysis of complex sentences in Czech language. It summarizes the results of preceding research, uses therein described method for splitting complex sentences into segments using well defined set of separators and proposes three methods of automated assignment of levels to segments (which also describe relations between the segments) in sentences based on rules presented in the research. First method directly applies the rules presented in referenced research papers, the second method uses a genetic algorithm and the third makes use of a neural network. This paper includes an implementation of these methods and an analysis of the results using manually annotated data from the Prague Dependency Treebank.
Searching Czech Structured Data using Stemming
Tattermusch, Jan ; Hlaváčová, Jaroslava (advisor) ; Kuboň, Vladislav (referee)
This work describes and implements a component for fulltext searching with czech diacritics restoration and stemming support. Diacritics restoration is based on statistical principles and is context dependent. This work presents ve stemmers ready for immediate use (two algorithmic stemmers and three hybrid stemmers) and discusses their properties. The component is implemented using Apache Lucene library and provides a simple interface for querying and insertions, deletions and updates of documents indexed. Stored documents consist of named elds with prede ned data types. Besides regular fulltext queries, the component also supports non-trivial queries with additional constraints and provides a way to customize the way query result score is computed. Component's performance is suffcient for medium-load applications and is approximately 50 queries per second with a repository that contains 2.7 million documents. Contribution of stemming and diacritics restoration to the quality of fulltext searching was measured using MAP and is signi cant.
Machine Translation of Related Asian Languages
Larasati, Septina Dian ; Kuboň, Vladislav (advisor) ; Petkevič, Vladimír (referee)
This thesis presents the development of an MT system between Indonesian and Malaysian. The system uses a method of almost a direct translation exploiting the similarity of both languages. This method was previously used on a number of language pairs of European languages. The thesis also elaborates the attempts to make language resources from scratch since the languages are under-resourced.

National Repository of Grey Literature : 53 records found   beginprevious31 - 40nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.