National Repository of Grey Literature 53 records found  previous11 - 20nextend  jump to record: Search took 0.00 seconds. 
Machine Translation of Related Asian Languages
Larasati, Septina Dian ; Kuboň, Vladislav (advisor) ; Petkevič, Vladimír (referee)
This thesis presents the development of an MT system between Indonesian and Malaysian. The system uses a method of almost a direct translation exploiting the similarity of both languages. This method was previously used on a number of language pairs of European languages. The thesis also elaborates the attempts to make language resources from scratch since the languages are under-resourced.
Hybrid Machine Translation Approaches for Low-Resource Languages
Kamran, Amir ; Popel, Martin (advisor) ; Kuboň, Vladislav (referee)
In recent years, corpus based machine translation systems produce significant results for a number of language pairs. However, for low-resource languages like Urdu the purely statistical or purely example based methods are not performing well. On the other hand, the rule-based approaches require a huge amount of time and resources for the development of rules, which makes it difficult in most scenarios. Hybrid machine translation systems might be one of the solutions to overcome these problems, where we can combine the best of different approaches to achieve quality translation. The goal of the thesis is to explore different combinations of approaches and to evaluate their performance over the standard corpus based methods currently in use. This includes: 1. Use of syntax-based and dependency-based reordering rules with Statistical Machine Translation. 2. Automatic extraction of lexical and syntactic rules using statistical methods to facilitate the Transfer-Based Machine Translation. The novel element in the proposed work is to develop an algorithm to learn automatic reordering rules for English-to-Urdu statistical machine translation. Moreover, this approach can be extended to learn lexical and syntactic rules to build a rule-based machine translation system.
The resemblance analysis of Czech texts
Cvengroš, Petr ; Holan, Tomáš (advisor) ; Kuboň, Vladislav (referee)
In the present work we study means of comparing two Czech texts. The design and the implementation principles of a program for comparing two Czech texts are described. The program compares texts using a set of various criteria, new criteria are easy to add. The program is able to learn, it configures itself on texts given by the user. In the first part of the work we give a description of the algorithms for comparing texts and for learning. The next part is dedicated to some interesting parts of implementation. The conclusion is about using the program on real texts.
An Implementation of Methods of Structural Analysis of Czech Complex Sentences
Dutkevič, Jiří ; Kuboň, Vladislav (advisor) ; Holan, Tomáš (referee)
Title: An Implementation of Methods of Structural Analysis of Czech Complex Sentences Author: Jiří Dutkevič Department: Institute of Formal and Applied Linguistics Supervisor: doc. RNDr. Vladislav Kuboň, Ph.D., Institute of Formal and Applied Linguistics Abstract: This paper discusses automated analysis of complex sentences in Czech language. It summarizes the results of preceding research, uses therein described method for splitting complex sentences into segments using well defined set of separators and proposes three methods of automated assignment of levels to segments (which also describe relations between the segments) in sentences based on rules presented in the research. First method directly applies the rules presented in referenced research papers, the second method uses a genetic algorithm and the third makes use of a neural network. This paper includes an implementation of these methods and an analysis of the results using manually annotated data from the Prague Dependency Treebank.
Deep analysis in IQA: evaluation on real users dialogues.
Ratkovic, Zorana ; Kuboň, Vladislav (advisor) ; Hoffmannová, Petra (referee)
Interactive Question Answering (IQA) is a natural and cohesive way for a user to obtain information by interactive with a system using natural language. With the advancement in Natural Language Processing, research in the eld of IQA has started to focus on the role of semantics and the discourse structure in these systems. The need for a deeper analysis, which examines the syntax and semantics of the questions and the answers is evident. Using this deeper analysis allows us to model the context of the interaction. I will look at a current closeddomain IQA system which is based on Linear Regression modeling. This system uses super cial and non-semantically motivated features. I propose adding deep analysis and semantic features in order to improve the system and show the need for such analysis. Particular attention will be placed on the so-called follow-up questions (questions that the user poses after having received some answer from the system) and the role of context. I propose that adding the linguistically heavy features will prove bene cial, thereby showing the need for such analysis in IQA systems.
Automatic Evaluation of Parallel Bilingual Data Quality
Kolovratník, David ; Kuboň, Vladislav (advisor) ; Pecina, Pavel (referee)
Statistical machine translation is an approach dependent particularly on huge amount of parallel bilingual data. It is used to train a translation model. The translation model works instead of a rule-based transfer; in some systems even lexical. It is believed that quality of the translation may be improved with more data for training. I have tried contrary to give less data and watch how the score of the translation changes. I selected sentence pairs to stay a part of the corpus with some key fi rst randomly, then according to sentence length ratio and finaly according to the number of word couples that a dictionary knows as translation pairs. I show that selection according to an advisable criteria slows down falling of NIST and BLEU score with decreasing size of the corpus and in some cases may tend even to better score. Decreasing the corpus size also lead to faster evaluation and less need of space. It may be useful in an implementation of the machine translation system in small devices with limited system resources.
Semantic relation extraction from unstructured data in the business domain
Rampula, Ilana ; Pecina, Pavel (advisor) ; Kuboň, Vladislav (referee)
Text analytics in the business domain is a growing field in research and practical applications. We chose to concentrate on Relation Extraction from unstructured data which was provided by a corporate partner. Analyzing text from this domain requires a different approach, counting with irregularities and domain specific attributes. In this thesis, we present two methods for relation extraction. The Snowball system and the Distant Supervision method were both adapted for the unique data. The methods were implemented to use both structured and unstructured data from the database of the company. Keywords: Information Retrieval, Relation Extraction, Text Analytics, Distant Supervision, Snowball
Measures of Machine Translation Quality
Macháček, Matouš ; Bojar, Ondřej (advisor) ; Kuboň, Vladislav (referee)
Title: Measures of Machine Translation Quality Author: Matouš Macháček Department: Institute of Formal and Applied Linguistics Supervisor: RNDr. Ondřej Bojar, Ph.D. Abstract: We explore both manual and automatic methods of machine trans- lation evaluation. We propose a manual evaluation method in which anno- tators rank only translations of short segments instead of whole sentences. This results in easier and more efficient annotation. We have conducted an annotation experiment and evaluated a set of MT systems using this method. The obtained results are very close to the official WMT14 evaluation results. We also use the collected database of annotations to automatically evalu- ate new, unseen systems and to tune parameters of a statistical machine translation system. The evaluation of unseen systems, however, does not work and we analyze the reasons. To explore the automatic methods, we organized Metrics Shared Task held during the Workshop of Statistical Ma- chine Translation in years 2013 and 2014. We report the results of the last shared task, discuss various metaevaluation methods and analyze some of the participating metrics. Keywords: machine translation, evaluation, automatic metrics, annotation
Iterative Improving of Transcribed Speech Recordings Exploiting Listener's Feedback
Krůza, Jan Oldřich ; Kuboň, Vladislav (advisor) ; Müller, Luděk (referee) ; Pollák, Petr (referee)
Iterative Improving of Transcribed Speech Recordings Exploiting Listener's Feedback Abstract This Ph.D. thesis deals with making a corpus of audio recordings of a single speaker accessible to wide public and interested community. The work has been motivated by the existence of a set of perishing recordings of the Czech philosopher Karel Makoň on magnetophone tapes. The aim is to conserve the material for future generations and making it accessible using digital technologies, in particular publishing the recordings online and enabling the users to search through them. The thesis introduces the creation of a system for transcribing a large set of speech recordings employing a lay community. The solution designed is based on obtaining a baseline low-quality transcription by means of automated speech recognition and developing an application that allows for collecting corrections of the automatic transcription in a fashion that makes it usable as training data for further improvement of said transcription. The spoken corpus itself is described. The author and his works, topics cove- red in the talks, the process of recording and digitization as well as the gained transcription are introduced. Next, the development of a system for automated transcription of the corpus, from collecting data, to acoustic and...
Artificial Neural Network for Opinion Target Identification in Czech
Glončák, Vladan ; Kuboň, Vladislav (advisor) ; Mírovský, Jiří (referee)
The main focus of this thesis is to use neural networks, specifically long short-term memory cells, for identifying opinion targets in Czech data. The side product is a new version of dataset for opinion target identification. For a comparison, previously obtained results for another languages and by employing probabilistic methods instead were listed. The experiment was successful, achieved results are above trivial baseline models and comparable with the results achieved previously. Powered by TCPDF (www.tcpdf.org)

National Repository of Grey Literature : 53 records found   previous11 - 20nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.