National Repository of Grey Literature 53 records found  beginprevious21 - 30nextend  jump to record: Search took 0.01 seconds. 
Context-Dependent Dictionary for Translators
Fanta, Petr ; Bojar, Ondřej (advisor) ; Kuboň, Vladislav (referee)
During a manual translation of short texts, such as texts occurring on social networks or microblogs (e.g., Twitter), translators are often forced to gather additional information from various sources. These can include less common words, domain-specific terms, or numerous abbreviations. The aim of this thesis is to design and implement a system which automatically creates a minimal context-dependent dictionary for the given short message. The system identifies suitable dictionary entries in the translated text and searches for their definitions, translations, and examples from available open sources, or extracts them automatically from a parallel corpus. The resulted dictionary is ideally sufficient for human translators to understand the message, and to choose appropriate translation equivalent (including technical terms). An empirical evaluation is based on statistics which tracks how often users were satisfied with the proposed entries, how often the entries were incorrect and to what extent the system correctly identified the relevance for the input text.
Iterative Improving of Transcribed Speech Recordings Exploiting Listener's Feedback
Krůza, Jan Oldřich ; Kuboň, Vladislav (advisor)
Iterative Improving of Transcribed Speech Recordings Exploiting Listeners' Feedback Abstract This Ph.D. thesis deals with making a corpus of audio recordings of a single speaker accessible to wide public and interested community. The work has been motivated by the existence of a set of perishing recordings of the Czech philosopher Karel Makoň on magnetophone tapes. The aim is to conserve the material for future generations and to make it accessible using digital technologies, in particular publishing the recordings online and enabling the users to search through them. The thesis introduces the creation of a system for transcribing a large set of speech recordings employing a lay community. The solution designed is based on obtaining a baseline low-quality transcription by means of automated speech recognition and developing an application that allows for collecting corrections of the automatic transcription in a fashion that makes it usable as training data for further improvement of said transcription. The spoken corpus itself is described. The author and his works, topics cove- red in the talks, the process of recording and digitization as well as the gained transcription are introduced. Next, the development of a system for automated transcription of the corpus, from collecting data, to acoustic and...
Iterative Improving of Transcribed Speech Recordings Exploiting Listener's Feedback
Krůza, Jan Oldřich ; Kuboň, Vladislav (advisor) ; Müller, Luděk (referee) ; Pollák, Petr (referee)
Iterative Improving of Transcribed Speech Recordings Exploiting Listener's Feedback Abstract This Ph.D. thesis deals with making a corpus of audio recordings of a single speaker accessible to wide public and interested community. The work has been motivated by the existence of a set of perishing recordings of the Czech philosopher Karel Makoň on magnetophone tapes. The aim is to conserve the material for future generations and making it accessible using digital technologies, in particular publishing the recordings online and enabling the users to search through them. The thesis introduces the creation of a system for transcribing a large set of speech recordings employing a lay community. The solution designed is based on obtaining a baseline low-quality transcription by means of automated speech recognition and developing an application that allows for collecting corrections of the automatic transcription in a fashion that makes it usable as training data for further improvement of said transcription. The spoken corpus itself is described. The author and his works, topics cove- red in the talks, the process of recording and digitization as well as the gained transcription are introduced. Next, the development of a system for automated transcription of the corpus, from collecting data, to acoustic and...
Předpovídání trendů akciového trhu z novinových článků
Serebryannikova, Anastasia ; Kuboň, Vladislav (advisor) ; Vidová Hladká, Barbora (referee)
In this work we made an attempt to predict the upwards/downwards movement of the S&P 500 index from the news articles published by Bloomberg and Reuters. We employed the SVM classifier and conducted multiple experiments aiming at understanding the shape of the data and the specifics of the task better. As a result, we established the common evaluation settings for all our subsequent experiments. After that we tried incorporating various features into the model and also replicated several approaches previously suggested in the literature. We were able to identify some non-trivial dependencies in the data which helped us achieve a high accuracy on the development set. However, none of the models that we built showed comparable performance on the test set. We have come to the conclusion that whereas some trends or patterns can be identified in a particular dataset, such findings are usually barely transferable to other data. The experiments that we conducted support the idea that the stock market is changing at random and a high quality of prediction may only be achieved on particular sets of data and under very special settings, but not for the task of stock market prediction in general. 1
Comparing Machine Translation Output (and the Way it Changes over Time)
Kyselová, Soňa ; Svoboda, Tomáš (advisor) ; Kuboň, Vladislav (referee)
This diploma thesis focuses on machine translation (MT), which has been studied for a relatively long time in linguistics (and later also in translation studies) and which in recent years is at the forefront of the broader public as well. This thesis aims to explore the quality of machine translation outputs and the way it changes over time. The theoretical part first deals with the machine translation in general, namely basic definitions, brief history and approaches to machine translation, then describes online machine translation systems and evaluation methods. Finally, this part provides a methodological model for the empirical part. Using a set of texts translated with MT, the empirical part seeks to check how online machine translation systems deal with translation of different text-types and whether there is improvement of the quality of MT outputs over time. In order to do so, an analysis of text-type, semantics, lexicology, stylistics and pragmatics is carried out as well as a rating of the general applicability of the translation. The final part of this thesis compares and concludes the results of the analysis. With regard to this comparation, conclusions are made and general tendencies stated that have emerged from the empirical part of the thesis.
Artificial Neural Network for Opinion Target Identification in Czech
Glončák, Vladan ; Kuboň, Vladislav (advisor) ; Mírovský, Jiří (referee)
The main focus of this thesis is to use neural networks, specifically long short-term memory cells, for identifying opinion targets in Czech data. The side product is a new version of dataset for opinion target identification. For a comparison, previously obtained results for another languages and by employing probabilistic methods instead were listed. The experiment was successful, achieved results are above trivial baseline models and comparable with the results achieved previously. Powered by TCPDF (www.tcpdf.org)
Linguistic Issues in Machine Translation between Czech and Russian
Klyueva, Natalia ; Kuboň, Vladislav (advisor) ; Panevová, Jarmila (referee) ; Strossa, Petr (referee)
In this thesis we analyze machine translation between Czech and Russian languages from the perspective of a linguist. We work with two types of Machine Translation systems - rule-based (TectoMT) and statistical (Moses). We experiment with different setups of these two systems in order to achieve the best possible quality. One of the questions we address in our work is whether relatedness of the discussed languages has some impact on machine translation. We explore the output of our two experimental systems and two commercial systems: PC Translator and Google Translate. We make a linguistically-motivated classification of errors for the language pair and describe each type of error in detail, analyzing whether it occurred due to some difference between Czech and Russian or is it caused by the system architecture. We then compare the usage of some specific linguistic phenomena in the two languages and state how the individual systems cope with mismatches. For some errors, we suggest ways to improve them and in several cases we implement those suggestions. In particular, we focus on one specific error type - surface valency. We research the mismatches between Czech and Russian valency, extract a lexicon of surface valency frames, incorporate the lexicon into the TectoMT translation pipeline and present...
Semantic relation extraction from unstructured data in the business domain
Rampula, Ilana ; Pecina, Pavel (advisor) ; Kuboň, Vladislav (referee)
Text analytics in the business domain is a growing field in research and practical applications. We chose to concentrate on Relation Extraction from unstructured data which was provided by a corporate partner. Analyzing text from this domain requires a different approach, counting with irregularities and domain specific attributes. In this thesis, we present two methods for relation extraction. The Snowball system and the Distant Supervision method were both adapted for the unique data. The methods were implemented to use both structured and unstructured data from the database of the company. Keywords: Information Retrieval, Relation Extraction, Text Analytics, Distant Supervision, Snowball

National Repository of Grey Literature : 53 records found   beginprevious21 - 30nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.