National Repository of Grey Literature 40 records found  1 - 10nextend  jump to record: Search took 0.01 seconds. 
Hybrid Deep Question Answering
Aghaebrahimian, Ahmad ; Holub, Martin (advisor) ; Kordik, Pavel (referee) ; Pecina, Pavel (referee)
Title: Hybrid Deep Question Answering Author: Ahmad Aghaebrahimian Institute: Institute of Formal and Applied Linguistics Supervisor: RNDr. Martin Holub, Ph.D., Institute of Formal and Applied Lin- guistics Abstract: As one of the oldest tasks of Natural Language Processing, Question Answering is one of the most exciting and challenging research areas with lots of scientific and commercial applications. Question Answering as a discipline in the conjunction of computer science, statistics, linguistics, and cognitive science is concerned with building systems that automatically retrieve answers to ques- tions posed by humans in a natural language. This doctoral dissertation presents the author's research carried out in this discipline. It highlights his studies and research toward a hybrid Question Answering system consisting of two engines for Question Answering over structured and unstructured data. The structured engine comprises a state-of-the-art Question Answering system based on knowl- edge graphs. The unstructured engine consists of a state-of-the-art sentence-level Question Answering system and a word-level Question Answering system with results near to human performance. This work introduces a new Question An- swering dataset for answering word- and sentence-level questions as well. Start- ing from a...
Deep contextualized word embeddings from character language models for neural sequence labeling
Lief, Eric ; Pecina, Pavel (advisor) ; Kocmi, Tom (referee)
A family of Natural Language Processing (NLP) tasks such as part-of- speech (PoS) tagging, Named Entity Recognition (NER), and Multiword Expression (MWE) identification all involve assigning labels to sequences of words in text (sequence labeling). Most modern machine learning approaches to sequence labeling utilize word embeddings, learned representations of text, in which words with similar meanings have similar representations. Quite recently, contextualized word embeddings have garnered much attention because, unlike pretrained context- insensitive embeddings such as word2vec, they are able to capture word meaning in context. In this thesis, I evaluate the performance of different embedding setups (context-sensitive, context-insensitive word, as well as task-specific word, character, lemma, and PoS) on the three abovementioned sequence labeling tasks using a deep learning model (BiLSTM) and Portuguese datasets. v
Grounding Natural Language Inference on Images
Vu Trong, Hoa ; Pecina, Pavel (advisor) ; Libovický, Jindřich (referee)
Grounding Natural Language Inference on Images Hoa Trong VU July 20, 2018 Abstract Despite the surge of research interest in problems involving linguistic and vi- sual information, exploring multimodal data for Natural Language Inference remains unexplored. Natural Language Inference, regarded as the basic step towards Natural Language Understanding, is extremely challenging due to the natural complexity of human languages. However, we believe this issue can be alleviated by using multimodal data. Given an image and its description, our proposed task is to determined whether a natural language hypothesis contra- dicts, entails or is neutral with regards to the image and its description. To address this problem, we develop a multimodal framework based on the Bilat- eral Multi-perspective Matching framework. Data is collected by mapping the SNLI dataset with the image dataset Flickr30k. The result dataset, made pub- licly available, has more than 565k instances. Experiments on this dataset show that the multimodal model outperforms the state-of-the-art textual model. References 1
Information retrieval and navigation in audio-visual archives
Galuščáková, Petra ; Pecina, Pavel (advisor) ; Jones, Gareth (referee) ; Ircing, Pavel (referee)
The thesis probes issues associated with interactive audio and video retrieval of relevant segments. Text-based methods for search in audio-visual archives using automatic transcripts, subtitles and metadata are first described. Search quality is analyzed with respect to video segmentation methods. Navigation using multimodal hyperlinks between video segments is then examined as well as methods for automatic detection of the most informative anchoring segments suitable for subsequent hyperlinking application. The described text-based search, hyperlinking and anchoring methods are finally presented in working form through their incorporation in an online graphical user interface.
Splitting word compounds
Oberländer, Jonathan ; Pecina, Pavel (advisor) ; Hlaváčová, Jaroslava (referee)
Unlike the English language, languages such as German, Dutch, the Skandinavian languages or Greek form compounds not as multi-word expressions, but by combining the parts of the compound into a new word without any orthographical separation. This poses problems for a variety of tasks, such as Statistical Machine Translation or Information Retrieval. Most previous work on the subject of splitting compounds into their parts, or ``decompounding'' has focused on German. In this work, we create a new, simple, unsupervised system for automatic decompounding for three representative compounding languages: German, Swedish, and Hungarian. A multi-lingual evaluation corpus in the medical domain is created from the EMEA corpus, and annotated with regards to compounding. Finally, several variants of our system are evaluated and compared to previous work. Powered by TCPDF (
English grammar checker and corrector: the determiners
Auersperger, Michal ; Pecina, Pavel (advisor) ; Straňák, Pavel (referee)
Correction of the articles in English texts is approached as an article generation task, i.e. each noun phrase is assigned with a class corresponding to the definite, indefinite or zero article. Supervised machine learning methods are used to first replicate and then improve upon the best reported result in the literature known to the author. By feature engineering and a different choice of the learning method, about 34% drop in error is achieved. The resulting model is further compared to the performance of expert annotators. Although the comparison is not straightforward due to the differences in the data, the results indicate the performance of the trained model is comparable to the human-level performance when measured on the in-domain data. On the other hand, the model does not generalize well to different types of data. Using a large-scale language model to predict an article (or no article) for each word of the text has not proved successful. 1
Text simplification in Czech
Burešová, Karolína ; Pecina, Pavel (advisor) ; Bejček, Eduard (referee)
This thesis deals with text simplification in Czech, in particular with lexical simplification. Several strategies of complex word identification, substitution generation and substitution ranking are implemented and evaluated. Substitution generation is attempted both in a dictionary-based manner and in an embedding- based manner. Some experiments involving people are also presented, the experiments aim at gaining an in- sight into perceived simplicity/complexity and its factors. The experiments conducted and evaluated include sentence pair comparison and manual text simplification. Both the evaluation results of various strategies and the outcomes of experiments involving humans are described and some future work is suggested. 1
Query expansion for medical information retrieval
Bibyna, Feraena ; Pecina, Pavel (advisor) ; Holub, Martin (referee)
One of the challenges in medical information retrieval is the terminology gap between the documents (commonly written by medical professional, using medical jargons), and the queries (commonly composed by non professional, using layman terms). In this thesis, we investigate the effect of query expansion, using domain-specific knowledge resource, to deal with this challenge. We use the Unified Medical Language System (UMLS), a repository of biomedical vocabularies, and utilize two of its resources: the Metathesaurus and the Semantic Network. We use the query set and document set provided by CLEF eHealth organizer. The query sets, provided for the medical information retrieval shared task, represent two different use cases of medical information retrieval. We experiment with query expansion using synonymous terms and non-synonymous concepts, blind relevance feedback, field weighting, and linear interpolation of different systems. Powered by TCPDF (
Semantic relation extraction from unstructured data in the business domain
Rampula, Ilana ; Pecina, Pavel (advisor) ; Kuboň, Vladislav (referee)
Text analytics in the business domain is a growing field in research and practical applications. We chose to concentrate on Relation Extraction from unstructured data which was provided by a corporate partner. Analyzing text from this domain requires a different approach, counting with irregularities and domain specific attributes. In this thesis, we present two methods for relation extraction. The Snowball system and the Distant Supervision method were both adapted for the unique data. The methods were implemented to use both structured and unstructured data from the database of the company. Keywords: Information Retrieval, Relation Extraction, Text Analytics, Distant Supervision, Snowball
Towards concept visualization through image generation
Nguyen, Tien Dat ; Pecina, Pavel (advisor) ; Žabokrtský, Zdeněk (referee)
Title: Toward concept visualization through image generation Author: Tien Dat Nguyen Department: Institute of Formal and Applied Linguistics Supervisors: Pavel Pecina (Charles University in Prague), Angeliki Lazaridou, Raffaella Bernardi, Marco Baroni (University of Trento), Abstract: Computational linguistic and computer vision have a common way to embed the semantics of linguistic/visual units through vector representation. In addition, high-quality semantic representations can be effectively constructed thanks to recent advances in neural network methods. Nevertheless, the under- standing of these representations remains limited, so they need to be assessed in an intuitive way. Cross-modal mapping is mapping between vector semantic embedding of words and the visual representations of the corresponding objects from images. Inverting image representation involves learning an image inversion of visual vectors (SIFT, HOG and CNN features) to reconstruct the original one. The goal of this project is to build a complete pipeline, in which word represen- tations are transformed into image vectors using cross modal mapping and these vectors are projected to pixel space using inversion. This suggests that there might be a groundbreaking way to inspect and evaluate the semantics encoded in word representations by...

National Repository of Grey Literature : 40 records found   1 - 10nextend  jump to record:
See also: similar author names
2 Pecina, Petr
Interested in being notified about new results for this query?
Subscribe to the RSS feed.