National Repository of Grey Literature 78 records found  1 - 10nextend  jump to record: Search took 0.01 seconds. 
Generování hudebních symbolů pomocí neuronových sítí
Havelka, Jonáš ; Pecina, Pavel (advisor) ; Hajič, Jan (referee)
We create more training data for the optical music recognition (OMR) task by generating artificial images of the music symbols. We follow up Mashcima and the model J. Mayer trained on it. We take the Rebelo dataset (dataset of music symbol images), adjust it with some computer vision methods, and train generative neural networks (above all, variational and adversarial autoencoders) on it. By replacing some original images in Mashcima input with ones generated by those networks, we get more general performance from the model: For slightly worsening on the original dataset (CVC-MUSCIMA), we get much better results on the PrIMuS dataset. Also, we create very realistic synthetic images of music symbols.
Semi-supervised learning in Optical Music Recognition
Mayer, Jiří ; Pecina, Pavel (advisor) ; Straka, Milan (referee)
Optical music recognition (OMR) is a niche subfield of computer vision, where some labeled datasets exist, but there is an order of magnitude more unlabeled data available. Recent advances in the field happened largely thanks to the adoption of deep learning. However, such neural networks are trained using labeled data only. Semi-supervised learning is a set of techniques that aim to incorporate unlabeled data during training to produce more capable models. We have modified a state-of-the-art object detection archi- tecture and designed a semi-supervised training scheme to utilize unlabeled data. These modifications have successfully allowed us to train the architecture in an unsupervised setting, and our semi-supervised experiments indicate improvements to training stability and reduced overfitting. 1
Automatic dictionary acquisition from parallel corpora
Popelka, Jan ; Pecina, Pavel (advisor) ; Mareček, David (referee)
In this work, an extensible word-alignment framework is implemented from scratch. It is based on a discriminative method that combines a wide range of lexical association measures and other features and requires a small amount of manually word-aligned data to optimize parameters of the model. The optimal alignment is found as minimum-weight edge cover, selected suboptimal alignments are used to estimate confidence of each alignment link. Feature combination is tuned in the course of many experiments with respect to the results of evaluation. The evaluation results are compared to GIZA++. The best trained model is used to word-align a large Czech-English parallel corpus and from the links of highest confidence a bilingual lexicon is extracted. Single-word translation equivalents are sorted by their significance. Lexicons of different sizes are extracted by taking top N translations. Precision of the lexicons is evaluated automatically and also manually by judging random samples.
Text summarization
Majliš, Martin ; Pecina, Pavel (advisor) ; Schlesinger, Pavel (referee)
The present work explains the basic principles of automatic summarization, evaluation and fundamental concepts, which are used in this eld. It also includes a description of a system for automatic text summarization and evaluation - CSummaK (Czech Summarization Kit). As part of this system are basic algorithms for creating sentence extract summaries (Cenroid, Lead, Position, Random, Relevance Measure, etc.) their evaluation (Precision, Recall, FMeasure, etc.), whose description is also part of this work. This system was used for production of automatic extracts from news articles. Another system was developed for obtaining reference extracts, which allows users to create on-line extracts from news articles. In this work is also evaluated quality of single algorithms, their combination with of di erent parameters, together with discussion of the possibilities of practical application.
Pravděpodobnostní překladový slovník
Rouš, Jan ; Žabokrtský, Zdeněk (advisor) ; Pecina, Pavel (referee)
In this work we present the method of semi-automatic training of the probabilistic translation dictionary using large automatically annotated parallel corpora. According to the study of translation errors and the role of translation dictionary within the TectoMt translatio system in general we propose models of various complexity. These basic models were combined to hierarchical models that were designed to reduce impact of the sparse data problem. Various extensions were implemented to deal with common lexical errors. The dictionary along with extensions was compared to the former approach on test data and the results show improved translation quality.
Automatic construction of semantic networks
Kirschner, Martin ; Pecina, Pavel (advisor) ; Holub, Martin (referee)
Presented work explores the possibilities of automatic construction and expansion of semantic networks with use of machine learning methods. The main focus is put on the feature retrieving procedure for the data set. The work presents a robust method of semantic relation retrieval, based on distributional hypothesis and trained on the data from Czech WordNet. We also show the first results for czech language in this area of research. Part of the thesis is also a set of software for processing and evaluating of input data and a overview and discussion about its results on real-world data. The resulting tools can process data of amount in orders of hundreds of millions of words. The research part of the thesis used Czech morphologicaly and syntacticaly annotated data, but the methods are not language dependent.
Entity retrieval on Wikipedia in the scope of the gikiCLEF track
Duarte Torres, Sergio Raul ; Pecina, Pavel (advisor) ; Žabokrtský, Zdeněk (referee)
This thesis presents a system to retrieve entities specified by a question or description given in natural language, this description indicates the entity type and the properties that the entities need to satisfy. This task is analogous to the one proposed in the GikiCLEF 2009 track. The system is fed with the Spanish Wikipedia Collection of 2008 and every entity is represented by a Wikipage. We propose three novel methods to perform query expansion in the problem of entity retrieval. We also introduce a novel method to employ the English Yago and DBpedia semantic resources to determine the target named entity type; this method is used to improve previous approaches in which the target NE type is based solely on Wikipedia categories. We show that our system obtains promising results when we evaluate its performance in the GikiCLEF 2009 topic list and compare the results with the other participants of the track.
On the Possibility of ESP Data Use in Natural Language Processing
Knopp, Tomáš ; Vidová Hladká, Barbora (advisor) ; Pecina, Pavel (referee)
The aim of this bachelor thesis is to explore this image label database coming from the ESP game from the natural language processing (NLP) point of view. ESP game is an online game, in which human players do useful work - they label images. The output of the ESP game is then a database of images and their labels. What interests us is whether the data collected in the process of labeling images will be of any use in NLP tasks. Specifically, we are interested in the tasks of automatic coreference resolution, extension of the lexical database WordNet, idiom detection, and collocation detection. In this bachelor thesis we deal with the first two of them, which is the task of the automatic coreference resolution and the task of exploring the potential benefits to the lexical database WordNet.
Mining texts at the discourse level
Van de Moosdijk, Sara Francisca ; Pecina, Pavel (advisor) ; Novák, Michal (referee)
Linguistic discourse refers to the meaning of larger text segments, and could be very useful for guiding attempts at text mining such as document selection or summarization. The aim of this project is to apply discourse information to Knowledge Discovery in Databases. As far as we know, this is the first attempt at combining these two very different fields, so the goal is to create a basis for this type of knowledge extraction. We approach the problem by extracting discourse relations using unsupervised methods, and then model the data using pattern structures in Formal Concept Analysis. Our method is applied to a corpus of medical articles compiled from PubMed. This medical data can be further enhanced with concepts from the UMLS MetaThesaurus, which are combined with the UMLS Semantic Network to apply as an ontology in the pattern structures. The results show that despite having a large amount of noise, the method is promising and could be applied to domains other than the medical domain. Powered by TCPDF (www.tcpdf.org)
Evaluation methods of systems for unsegmented speech retrieval.
Galuščáková, Petra ; Pecina, Pavel (advisor) ; Hoffmannová, Petra (referee)
Methods that are currently used for evaluation of speech retrieval are described in this work. Techniques that are used for speech retrieval are explained, as well as methods used for evaluation of this retrieval. Special attention is paid to processing of unsegmented records. The main aim of the work is to verify whether the methods currently used for evaluation of speech retrieval are appropriate to use and modify these methods if needed. Empirical methods based on the user`s perception of speech retrieval is used for this verification. Modified metrics are compared with the original ones.

National Repository of Grey Literature : 78 records found   1 - 10nextend  jump to record:
See also: similar author names
3 Pecina, Petr
Interested in being notified about new results for this query?
Subscribe to the RSS feed.