National Repository of Grey Literature 6 records found  Search took 0.01 seconds. 
Extraction of multilingual valency frames from dependency treebanks
Faryad, Ján ; Zeman, Daniel (advisor) ; Lopatková, Markéta (referee)
Multilingual valency dictionaries provide helpful information about correspon- dence of valency frames (verbs and their arguments) across various languages. This work aims at developing a program that automatically creates a multi- lingual valency dictionary, based on parallel treebanks annotated according to Universal Dependencies. This task includes monolingual extraction of va- lency frames and their cross-lingual linking. Various methods for solving the task are analysed and implemented. The work includes both general, language- independent approach and additional language-specific extensions, provided in particular for English, Czech and Slovak. The methods for linking the valency frames include using word alignment, morphological and syntactic information contained in the UD annotation or similarity of verbs between related languages. The quality of the solution is evaluated by multiple established metrics on man- ually annotated data or by comparison with an existing valency dictionary. 1
Consistency of Linguistic Annotation
Aggarwal, Akshay ; Zeman, Daniel (advisor) ; Lopatková, Markéta (referee)
Thesis Abstract Akshay Aggarwal July 2020 This thesis attempts at correction of some errors and inconsistencies in dif- ferent treebanks. The inconsistencies can be related to linguistic constructions, failure of the guidelines of annotation, failure to understand the guidelines on annotator's part, or random errors caused by annotators, among others. We propose a metric to attest the POS annotation consistency of different tree- banks in the same language, when the annotation guidelines remain the same. We offer solutions to some previously identified inconsistencies in the scope of the Universal Dependencies Project, and check the viability of a proposed in- consistency detection tool in a low-resource setting. The solutions discussed in the thesis are language-neutral, intended to work with multiple languages with efficiency. 1
Využití syntaktické informace pro identifikaci hodnocených entit
Glončák, Vladan ; Hajič, Jan (advisor) ; Helcl, Jindřich (referee)
Opinion Target Extraction (OTE) is a well-established subtask of sentiment analysis. While detecting sentiment polarity is useful in itself, the ability to extract the targets of the opinions allows for more thorough decision making. For example, an owner of a restaurant needs to know whether the guests are complaining about the food, or the ambience, or any other aspect of their establishment, etc. Despite the lexical information being crucial for the task, syntactic structures have potential in being used to correctly decide among multiple candidate entities. Rules based on such structures have been used previously for the task. The objective of this thesis is to investigate, whether syntactic information influences the behavior of the state-of-the-art models such as recurrent neural networks for the OTE task. We did not find any substantial evidence to suggest that adding the syntactic information influences the behavior of the models.
Tvorba závislostního korpusu pro jorubštinu s využitím paralelních dat
Oluokun, Adedayo ; Zeman, Daniel (advisor) ; Rosa, Rudolf (referee)
The goal of this thesis is to create a dependency treebank for Yorùbá, a language with very little pre-existing machine-readable resources. The treebank follows the Universal Dependencies (UD) annotation standard, certain language-specific guidelines for Yorùbá were specified. Known techniques for porting resources from resource-rich languages were tested, in particular projection of annotation across parallel bilingual data. Manual annotation is not the main focus of this thesis; nevertheless, a small portion of the data was verified manually in order to evaluate the annotation quality. Also, a model was trained on the manual annotation using UDPipe.
Syntaktická analýza textů se střídáním kódů
Ravishankar, Vinit ; Zeman, Daniel (advisor) ; Mareček, David (referee)
(English) Vinit Ravishankar July 2018 The aim of this thesis is twofold; first, we attempt to dependency parse existing code-switched corpora, solely by training on monolingual dependency treebanks. In an attempt to do so, we design a dependency parser and ex- periment with a variety of methods to improve upon the baseline established by raw training on monolingual treebanks: these methods range from treebank modification to network modification. On this task, we obtain state-of-the- art results for most evaluation criteria on the task for our evaluation language pairs: Hindi/English and Komi/Russian. We beat our own baselines by a sig- nificant margin, whilst simultaneously beating most scores on similar tasks in the literature. The second part of the thesis involves introducing the relatively understudied task of predicting code-switching points in a monolingual utter- ance; we provide several architectures that attempt to do so, and provide one of them as our baseline, in the hopes that it should continue as a state-of-the-art in future tasks. 1
Coreference resolution for Universal Dependencies
Faryad, Ján ; Novák, Michal (advisor) ; Rosa, Rudolf (referee)
Title: Coreference resolution for Universal Dependencies Author: Ján Faryad Department: Institute of Formal and Applied Linguistics Supervisor: Mgr. Michal Novák Abstract: Coreference is an important tool for maintaining of the text coherence. Up to now, there has been no possibility to mark it in Universal Dependencies (UD), which is a project for universal description of morphology and dependency syntax. This work presents a way how to mark coreference in the UD project. It also includes a conversion of data with coreference annotation from the corpora PDT 3.0 and OntoNotes 5.0 with using a tool UDPipe for an automatic analysis of text in the UD style. This work is also aimed to implement a system for automatic resolution of pronoun coreference using machine learning. Finally, the quality of the system is evaluated by simple way. The design of the program emphasizes the language independence and compatibility with the Udapi interface, which is used for processing of the UD data. Keywords: coreference resolution, coreference, anaphora, Universal Dependencies, UD

Interested in being notified about new results for this query?
Subscribe to the RSS feed.