National Repository of Grey Literature 53 records found  1 - 10nextend  jump to record: Search took 0.00 seconds. 
Iterative Improving of Transcribed Speech Recordings Exploiting Listener's Feedback
Krůza, Jan Oldřich ; Kuboň, Vladislav (advisor) ; Pollák, Petr (referee) ; Müller, Luděk (referee)
Iterative Improving of Transcribed Speech Recordings Exploiting Listeners' Feedback Abstract This Ph.D. thesis deals with making a corpus of audio recordings of a single speaker accessible to wide public and interested community. The work has been motivated by the existence of a set of perishing recordings of the Czech philosopher Karel Makoň on magnetophone tapes. The aim is to conserve the material for future generations and to make it accessible using digital technologies, in particular publishing the recordings online and enabling the users to search through them. The thesis introduces the creation of a system for transcribing a large set of speech recordings employing a lay community. The solution designed is based on obtaining a baseline low-quality transcription by means of automated speech recognition and developing an application that allows for collecting corrections of the automatic transcription in a fashion that makes it usable as training data for further improvement of said transcription. The spoken corpus itself is described. The author and his works, topics cove- red in the talks, the process of recording and digitization as well as the gained transcription are introduced. Next, the development of a system for automated transcription of the corpus, from collecting data, to acoustic and...
Automatic Checking of Translation
Šimlovič, Juraj ; Kuboň, Vladislav (advisor) ; Dědek, Jan (referee)
Translation memories are becoming more and more popular with professional translators nowadays, especially in fields of software localization and translation of technical and official documents. Although commercial systems, which employ memory translation, provide some limited capabilities for automatic checking of translations, these are mostly of simple search-and-replace type. And none of these systems provide reasonable means of applying Czech morphology while checking. Professional translators could benefit from an automatic tool, which would provide more advanced rule-based checking capabilities, taking Czech and even English morphology into the process. Checking not only for correct use of terminology, but also for illicit translations and use of forbidden terms would be useful. This thesis investigates types of mistakes translators tend to make. Review of existing solutions for automatic translation checking for different languages is provided. An application is then suggested and developed, which attempts to search for some of the most frequent mistakes made in translations into Czech language, taking morphology into account while searching.
Automatické zjednodušování textů pro překlad
Prokopová, Magdalena ; Kuboň, Vladislav (advisor) ; Zeman, Daniel (referee)
This thesis describes one of the areas where automatic simpli fication can be used: simpli cation of texts for machine translation. We start by comparing methods of automatic simpli cation and controlled language, describing their similarities and di erences. Further on we focus only on automatic simpli cation used as a preprocessing step for machine translation. We describe what issues can be solved and address some of them using our own system ASOFT. A text preprocessed by ASOFT is intended to be translated by a machine translation system PC Translator. We evaluate the output of the PC Translator using two automatic methods, BLEU and NIST scores, and one method of human evaluation. In the end we propose other issues that can be addressed by means of automatic simpli fication.
Searching Czech Structured Data using Stemming
Tattermusch, Jan ; Hlaváčová, Jaroslava (advisor) ; Kuboň, Vladislav (referee)
This work describes and implements a component for fulltext searching with czech diacritics restoration and stemming support. Diacritics restoration is based on statistical principles and is context dependent. This work presents ve stemmers ready for immediate use (two algorithmic stemmers and three hybrid stemmers) and discusses their properties. The component is implemented using Apache Lucene library and provides a simple interface for querying and insertions, deletions and updates of documents indexed. Stored documents consist of named elds with prede ned data types. Besides regular fulltext queries, the component also supports non-trivial queries with additional constraints and provides a way to customize the way query result score is computed. Component's performance is suffcient for medium-load applications and is approximately 50 queries per second with a repository that contains 2.7 million documents. Contribution of stemming and diacritics restoration to the quality of fulltext searching was measured using MAP and is signi cant.
Segmentation analysis of Czech sentences
Procházka, Jan ; Kuboň, Vladislav (advisor) ; Holan, Tomáš (referee)
Objective of this work is implementing of segmentation analysis method for Czech language including creating list of separators. Also method, how to divide long sentences into clauses, is proposed and implemented. Implementation uses Czech "Free" Morfology by Jan Hajič. Program is written in Python. Method was debugged on 62-sentences and tested on 80-sentences corpus.
Software localization and translation tools
Dolejš, Jan ; Nemejovský, Jan (advisor) ; Kuboň, Vladislav (referee)
The theoretical part gives an overview oj the history oj the localisation industry and defines basic terms before going on to cover the localisation tools and companies available. ft then defines the localisation process and its individua! phases and provides for a classification oj the translation tools available. Finally, it outlines their potential development. The practical part sets the theory against the Internet browser Mozilla Firefox v2. O localisation case study. ft dea/s with the practical aspects unique for localisation, i. e. the definition oj text strings to be localized, data recycling from previous versions and the application oj translation tools. ft subsequently !o o ks at the phases that follow localisation, i. e. the testing oj the localised application and the evaluation oj the localisation process. The analysis proves that an open-source community is in alf respects able to provide for a product localisation on the same quality Ieve! o.ffered by established software producers. The thesis also includes a Glossary oj terms, List oj relevant Internet links, Microsoft and Apple Product glossaries, Code-pages with Czech characters, a Mozilla Firefox v2.0 Product Glossary and a DVD-ROM containing tria! versions oj selected translation tools and Firefox browser resource fil es.
Porovnáni metod česko-ruského automatického překladu
Bílek, Karel ; Kuboň, Vladislav (advisor) ; Bojar, Ondřej (referee)
In this thesis, I am presenting several methods of Czech-to-Russian ma- chine translation, including both historical approaches and more modern ones, and including both phrase-based and rule-based systems. I am rst brie y describing the linguistic background of Czech and Russian, and their common history and di er- ences. en, I am describing automating, building and improving some o he ma- chine translation systems, together with their comparison, using both an automated metric and a limited human annotation. Meanwhile, I am also describing the creation of a several corpora of Czech-Russian parallel data and Russian monolingual data.
Methods for Creating Subjectivity Lexicon for Indonesian
Franky, ; Bojar, Ondřej (advisor) ; Kuboň, Vladislav (referee)
In this work, we created subjectivity lexicons of positive and negative expres- sions for Indonesian language by automatically translating English lexicons, and by intersecting and unioning the translation results. We compared the perfor- mances of the resulting lexicons using a simple prediction method that compares the number of occurrences of positive and negative expressions in a sentence. We also experimented with weighting the expressions by their frequency and relative frequency in unannotated data. A modification in prediction method using ma- chine learning was later used to better incorporate the information that cannot be captured by the simple prediction. We showed that the lexicons were able to reach high recall but low precision when predicting whether a sentence is eval- uative (positive or negative) or not (neutral). Scoring the expressions improve the recall or precision but with comparable decrease in the other measure. The machine learning prediction was able to minimize the sensitivity of the perfor- mances to the size of the lexicon, but further experiments are required to explore the best choice for the prediction method. 1
The Exploitation of Linguistic Information in EBMT
Týnovský, Miroslav ; Kuboň, Vladislav (advisor) ; Žabokrtský, Zdeněk (referee)
Example-based machine translation (EBMT) is a corpus-driven method of machine translation. It builds the translation using analogy of the input text with a translation already made. The benefit of using linguistic knowledge within EBMT is the subject of this thesis. Two language pairs are covered: Czech-English and Czech-German. The thesis covers gathering annotated parallel Czech-German data, design and implementation process of an experimental EBMT system, and the effort to improve it using linguistic knowledge. Detailed evaluation and comparison of both the baseline EBMT and the linguistically enhanced system are described. Evaluation has been done using machine and human evaluation methods. The three automatic evaluation methods are BLEU, NIST and METEOR. The linguistic enhancement of the baseline EBMT system includes comparisons of the input sentence with the examples in the translation memory based on morphology and syntax.

National Repository of Grey Literature : 53 records found   1 - 10nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.