National Repository of Grey Literature 23 records found  previous11 - 20next  jump to record: Search took 0.01 seconds. 
Entity Relationship Extraction
Šimečková, Zuzana ; Straka, Milan (advisor) ; Straňák, Pavel (referee)
Relationship extraction is the task of extracting semantic relationships between en- tities from a text. We create a Czech Relationship Extraction Dataset (CERED) using distant supervision on Wikidata and Czech Wikipedia. We detail the methodology we used and the pitfalls we encountered. Then we use CERED to fine-tune a neural network model for relationship extraction. We base our model on BERT - a linguistic model pre-trained on extensive unlabeled data. We demonstrate that our model performs well on existing English relationship datasets (Semeval 2010 Task 8, TACRED) and report the results we achieved on CERED. 1
Talk-Level Domain Adaptation of Speech Recognition
Srdečný, Vojtěch ; Bojar, Ondřej (advisor) ; Straňák, Pavel (referee)
This thesis explores the idea of talk-level domain adaptation for automatic speech recognition (ASR) and machine translation (MT) systems. A quick overview of an existing ASR domain adaptation method is provided. A method for MT domain adaptation is proposed, using an unsupervised MT system. A metric to evaluate the quality of the adaptation process is proposed. The domain adaptation was used on the unsupervised MT system for five different domains. The results of the domain adaptation process are presented and discussed. 1
CLARIN-DSpace repository at LINDAT/CLARIN : LINDAT/CLARIN FAIR repository for language data
Straňák, Pavel ; Košarko, Ondřej ; Mišutka, Jozef
We will present a software solution for and experience in running a digital repository for language data and natural language processing tools - LINDAT/CLARIN. We will present unique support for licensing with an emphasis on Open Access, and how we support all 4 key FAIR principles. We will show the submission workflow including license choice, approval and publishing or submissions by editors, as well as the repository administration environment including license definition, signing and access control. We will also present repository integration with other services, and statistics of operation.
Fulltext: Stranak_Kosarko_Misutka_fulltext - Download fulltextPDF
Slides: Stranak_prezentace_EN - Download fulltextPDF
Video: Stranak_video - Download fulltextMP4
English grammar checker and corrector: the determiners
Auersperger, Michal ; Pecina, Pavel (advisor) ; Straňák, Pavel (referee)
Correction of the articles in English texts is approached as an article generation task, i.e. each noun phrase is assigned with a class corresponding to the definite, indefinite or zero article. Supervised machine learning methods are used to first replicate and then improve upon the best reported result in the literature known to the author. By feature engineering and a different choice of the learning method, about 34% drop in error is achieved. The resulting model is further compared to the performance of expert annotators. Although the comparison is not straightforward due to the differences in the data, the results indicate the performance of the trained model is comparable to the human-level performance when measured on the in-domain data. On the other hand, the model does not generalize well to different types of data. Using a large-scale language model to predict an article (or no article) for each word of the text has not proved successful. 1
Natural Language Correction
Náplava, Jakub ; Straka, Milan (advisor) ; Straňák, Pavel (referee)
The goal of this thesis is to explore the area of natural language correction and to design and implement neural network models for a range of tasks ranging from general grammar correction to the specific task of diacritization. The thesis opens with a description of existing approaches to natural language correction. Existing datasets are reviewed and two new datasets are introduced: a manually annotated dataset for grammatical error correction based on CzeSL (Czech as a Second Language) and an automatically created spelling correction dataset. The main part of the thesis then presents design and implementation of three models, and evaluates them on several natural language correction datasets. In comparison to existing statistical systems, the proposed models learn all knowledge from training data; therefore, they do not require an error model or a candidate generation mechanism to be manually set, neither they need any additional language information such as a part of speech tags. Our models significantly outperform existing systems on the diacritization task. Considering the spelling and basic grammar correction tasks for Czech, our models achieve the best results for two out of the three datasets. Finally, considering the general grammatical correction for English, our models achieve results which are...
Creating a Bilingual Dictionary using Wikipedia
Ivanova, Angelina ; Zeman, Daniel (advisor) ; Straňák, Pavel (referee)
Title: Creating a Bilingual Dictionary using Wikipedia Author: Angelina Ivanova Department/Institute: Institute of Formal and Applied Linguistics (32-ÚFAL) Supervisor of the master thesis: RNDr. Daniel Zeman Ph.D. Abstract: Machine-readable dictionaries play important role in the research area of com- putational linguistics. They gained popularity in such fields as machine translation and cross-language information extraction. In this thesis we investigate the quality and content of bilingual English-Russian dictionaries generated from Wikipedia link structure. Wiki-dictionaries differ dramatically from the traditional dictionaries: the re- call of the basic terminology on Mueller's dictionary was 7.42%. Machine translation experiments with Wiki-dictionary incorporated into the training set resulted in the rather small, but statistically significant drop of the the quality of the translation compared to the experiment without Wiki-dictionary. We supposed that the main reason was domain difference between the dictio- nary and the corpus and got some evidence that on the test set collected from Wikipedia articles the model with incorporated dictionary performed better. In this work we show how big the difference between the dictionaries de- veloped from the Wikipedia link structure and the traditional...
Voice command for a TV set
Černý, Patrik ; Straňák, Pavel (advisor) ; Peterek, Nino (referee)
Title: Voice command for a TV set Author: Patrik Černý Department: Institute of Formal and Applied Linguistics Supervisor: Mgr. Pavel Straňák, Ph.D. Abstract: A goal of this thesis is to create television voice control intended for poeple with speech and movement disorder. This is achieved by interconnecting computer and television. Voice control is based on well-known dynamic time warping algorithm. It has been shown, that due to high and frequent changes in sound intensity the voice control of television is quite a complex task. The word recognition success rate of the final application is not very high, but for the purpose sufficient. Because of application design, program can be easily extended by techniques, that can improve recognition effectivity. Keywords: voice control, word recognition, dynamic time warping, television 1
Today's news
Jankovský, Petr ; Holan, Tomáš (advisor) ; Straňák, Pavel (referee)
The project deals with the design and implementation of the program based on frequency analysis of the text. The results should provide a quick overview about currently published articles in the newspapers. The program downloads the current articles from newspaper Web sites. For each of defined section and each article is able to list the most frequent n-tuple of words. There is option to define dictionary of uninteresting (banned) words and dictionary of phrases. Implementation solves some problems with downloading articles from various structure different servers, such as problems with encoding and problems with recognition articles from advertisement. The work reveals that simple frequency analysis can bring interesting results.
Annotation of Multiword Expressions in the Prague Dependency Treebank
Straňák, Pavel ; Hajič, Jan (advisor) ; Pala, Karel (referee) ; Pecina, Pavel (referee)
This thesis explores annotation of multiword expressions in the Prague Dependency Treebank 2.0. We explain, what we understand as multiword expressions (MWEs), review the state of PDT 2.0 with respect to MWEs and present our annotation. We describe the data format developed for the annotation, the annotation tool, and other soware developed to allow for visualisation and searching of the data. We also present the annotation lexicon SemLex and analysis of the annotation.
Pokročilý korektor češtiny
Richter, Michal ; Straňák, Pavel (advisor) ; Žabokrtský, Zdeněk (referee)
The aim of this work is to implement a Czech spell-checker using several language models and a lexical morphological analyser in order to o er proper correction suggestions and also to nd real-word spelling errors (spelling errors that happen to be in the lexicon). The system should also be able to complete diacritics to Czech text. Mac OS X was chosen as the target platform for the application. During the implementation, emphasis was put especially on memory-effient representation of the above-mentioned statistical models. In the beginning, a gentle introduction to Hiden Markov Models, Language Models and Viterbi algorithm is given. The actual system implementation and the statistical models training is discussed further. In the nal part of the work, the achived results are evaluated and discussed in depth.

National Repository of Grey Literature : 23 records found   previous11 - 20next  jump to record:
See also: similar author names
3 Straňák, Peter
4 Straňák, Petr
Interested in being notified about new results for this query?
Subscribe to the RSS feed.