National Repository of Grey Literature 30 records found  previous11 - 20next  jump to record: Search took 0.00 seconds. 
Lexical-semantic Conversions in the Valency Lexicon
Kettnerová, Václava ; Lopatková, Markéta (advisor) ; Panevová, Jarmila (referee) ; Karlík, Petr (referee)
In this thesis, we provide an adequate lexicographic representation of lexical-semantic conversion. Under the term lexical-semantic conversion, the relation between semantically similar syntactic structures which are based on separate lexical units of the same verb lexeme is understood. These relations are associated with various changes in valency structure of verbs - they may involve a number of valency complementations, their type, obligatoriness as well as morphemic forms. These changes arise from differences in the mapping of situational participants onto valency complementations. On the basis of semantic and syntactic analysis of two types of Czech lexical- semantic conversions, the locative conversion and the conversion Bearer of action-Location, we propose to represent lexical units creating syntactic variants in the relation of lexical semantic conversion by separate valency frames stored in the data component of the lexicon. The special attribute -conv whose value is a type of lexical-semantic conversion is assigned to relevant valency frames. Then the rule component of the lexicon consists of general rules determining changes in the correspondence between situational participants and valency complementations. This proposal is primarily designed for the valency lexicon of Czech verbs, VALLEX....
Assessing the impact of manual corrections in the Groningen Meaning Bank
Weck, Benno ; Lopatková, Markéta (advisor) ; Vidová Hladká, Barbora (referee)
The Groningen Meaning Bank (GMB) project develops a corpus with rich syntactic and semantic annotations. Annotations in GMB are generated semi-automatically and stem from two sources: (i) Initial annotations from a set of standard NLP tools, (ii) Corrections/refinements by human annotators. For example, on the part-of-speech level of annotation there are currently 18,000 of those corrections, so called Bits of Wisdom (BOWs). For applying this information to boost the NLP processing we experiment how to use the BOWs in retraining the part-of-speech tagger and found that it can be improved to correct up to 70% of identified errors within held-out data. Moreover an improved tagger helps to raise the performance of the parser. Preferring sentences with a high rate of verified tags in retraining has proven to be the most reliable way. With a simulated active learning experiment using Query-by-Uncertainty (QBU) and Query-by- Committee (QBC) we proved that selectively sampling sentences for retraining yields better results with less data needed than random selection. In an additional pilot study we found that a standard maximum-entropy part-of-speech tagger can be augmented so that it uses already known tags to enhance its tagging decisions on an entire sequence without retraining a new model first. Powered by...
Consistency of Linguistic Annotation
Aggarwal, Akshay ; Zeman, Daniel (advisor) ; Lopatková, Markéta (referee)
Thesis Abstract Akshay Aggarwal July 2020 This thesis attempts at correction of some errors and inconsistencies in dif- ferent treebanks. The inconsistencies can be related to linguistic constructions, failure of the guidelines of annotation, failure to understand the guidelines on annotator's part, or random errors caused by annotators, among others. We propose a metric to attest the POS annotation consistency of different tree- banks in the same language, when the annotation guidelines remain the same. We offer solutions to some previously identified inconsistencies in the scope of the Universal Dependencies Project, and check the viability of a proposed in- consistency detection tool in a low-resource setting. The solutions discussed in the thesis are language-neutral, intended to work with multiple languages with efficiency. 1
Lexicographic treatment of the valency aspects of verbal diatheses
Vernerová, Anna ; Lopatková, Markéta (advisor) ; Ivanová, Martina (referee) ; Petkevič, Vladimír (referee)
Název práce: Slovníkové zpracování valenčních aspektů slovesných diatezí Autor: Anna Vernerová Ústav: Ústav formální a aplikované lingvistiky Vedoucí disertační práce: doc. RNDr. Markéta Lopatková, Ph.D., Ústav formální a aplikované lingvistiky Klíčová slova: valence, diateze, pasivní participium Abstrakt: Diateze, a to jak ty tvořené pomocí pasivního participia (pasivum, prostý a posesivní rezultativ, recipientní diateze), tak i tzv. zvratné pasivum (deagentizace) byly v minulosti předmětem řady studií jak v bohemistické, tak i v mezinárodní lingvistice, pro češtinu ale dosud chybělo jejich důkladné slovníkové zpracování. V této dizertační práci se zabývám zachycením diatezí tvořených pomocí pasivního participia a s nimi příbuzných verbonominálních konstrukcí v gramatické komponentě valenčního slovníkuVALLEX. Vlastnímu tématu práce předchází krátký historický úvod a podrobné shrnutí pojetí valence ve Funkčním generativním popisu. Title: Lexicographic treatment of the valency aspects of verbal diatheses Author: Anna Vernerová Department: Institute of Formal and Applied Linguistics Supervisor: doc. RNDr. Markéta Lopatková, Ph.D., Institute of Formal and Applied Linguistics Keywords: valency, diathesis, passive participle Abstract: Diatheses have been the topic of a number of linguistic studies in Czech as...
Verbal Valency in a Cross-Linguistic Perspective
Šindlerová, Jana ; Lopatková, Markéta (advisor) ; Petkevič, Vladimír (referee) ; Malá, Markéta (referee)
Verbal Valency in a Cross-Linguistic Perspective Jana Šindlerová Abstract In the thesis, we look upon differences in argument structure of verbs considering the Czech language and the English language. In the first part, we describe the process of building the CzEngVallex lexicon. In the second part, based on the aligned data of the Prague Czech-English Dependency Treebank, we compare the valencies of verbal translation equivalents and comment of their differences. We classify the differences according to their underlying causes. The causes can be based in the linguistic structure of the languages, they can include translatological reasons, or they can be grounded in the character of the descriptive linguistic theory used.
Semantic information from FrameNet and the possibility of its transfer to Czech data
Limburská, Adéla ; Lopatková, Markéta (advisor) ; Holub, Martin (referee)
The thesis focuses on transferring FrameNet annotation from English to Czech and the possibilities of using the resulting data for automatic frame prediction in Czech. The first part, annotation transfer, has been performed in two ways. First, a parallel corpus of English sentences and their human created Czech translations (PCEDT) was used. Second, a much larger parallel corpus was created using ma- chine translation of FrameNet example sentences. This corpus was then used to transfer the annotation as well. The resulting data were partially evaluated and some of the automatically detectable errors were filtered out. Subsequently, the data were used as an input for two machine learning methods, decision trees and support vector machines. Since neither of the machine learning experiments brought impressive results, further manual correction of the data annotation was performed, which helped increase the accuracy of the prediction. However, as the accuracy reported in related papers is notably higher, the thesis also discusses dif- ferent approaches to feature selection and the possibility of further improvement of the prediction results using these methods. 1
Automatic linking of lexicographic sources and corpus data
Bejček, Eduard ; Lopatková, Markéta (advisor) ; Horák, Aleš (referee) ; Žabokrtský, Zdeněk (referee)
Along with the increasing development of language resources - i.e., new lexicons, lexical databases, corpora, treebanks - the need for their efficient interlinking is growing. With such a linking, one can easily benefit from all their properties and information. Considering the convergence of resources, universal lexicographic formats are frequently discussed. In the present thesis, we investigate and analyse methods of interlinking language resources automatically. We introduce a system for interlinking lexicons (such as VALLEX, PDT-Vallex, FrameNet or SemLex) that offer information on syntactic properties of their entries. The system is automated and can be used repeatedly with newer versions of lexicons under development. We also design a method for identification of multiword expressions in a parsed text based on syntactic information from the SemLex lexicon. An output that verifies feasibility of the used methods is, among others, the mapping between the VALLEX and the PDT-Vallex lexicons, resulting in tens of thousands of annotated treebank sentences from the PDT and the PCEDT treebanks added into VALLEX. Powered by TCPDF (www.tcpdf.org)
Assessing the impact of manual corrections in the Groningen Meaning Bank
Weck, Benno ; Lopatková, Markéta (advisor) ; Vidová Hladká, Barbora (referee)
The Groningen Meaning Bank (GMB) project develops a corpus with rich syntactic and semantic annotations. Annotations in GMB are generated semi-automatically and stem from two sources: (i) Initial annotations from a set of standard NLP tools, (ii) Corrections/refinements by human annotators. For example, on the part-of-speech level of annotation there are currently 18,000 of those corrections, so called Bits of Wisdom (BOWs). For applying this information to boost the NLP processing we experiment how to use the BOWs in retraining the part-of-speech tagger and found that it can be improved to correct up to 70% of identified errors within held-out data. Moreover an improved tagger helps to raise the performance of the parser. Preferring sentences with a high rate of verified tags in retraining has proven to be the most reliable way. With a simulated active learning experiment using Query-by-Uncertainty (QBU) and Query-by- Committee (QBC) we proved that selectively sampling sentences for retraining yields better results with less data needed than random selection. In an additional pilot study we found that a standard maximum-entropy part-of-speech tagger can be augmented so that it uses already known tags to enhance its tagging decisions on an entire sequence without retraining a new model first. Powered by...
Quantifying Determiners from the Distributional Semantics View
Gutiérrez Vasques, María Ximena ; Lopatková, Markéta (advisor) ; Vidová Hladká, Barbora (referee)
Název práce: Quantifying Determiners from the Distributional Semantics View Autor: Maria Ximena Gutierrez Vasques Katedra: Ústav formální a aplikované lingvistiky Vedoucí diplomové práce: doc. RNDr. Markéta Lopatková, Ph.D. Abstrakt: Distribuční sémanika představuje moderní přístup k zachycení sémantiky přirozeného jazyka. Jedním z témat, kterým zatím v rámci tohoto přístupu nebyla věnována dostatečná pozornost, je možnost automatické detekce logických relací jako vyplývání. Tato diplomová práce navazuje na práci autorů Baroni, Bernar- di, Do and Shan (2012), kteří se zabývají relací vyplývání mezi kvantifikujícími výrazy. Citovaná práce využívá detekce pomocí SVN klasifikátorů natrénavaných na sémantických vektorech reprezentujících relaci vyplývání. Popisované exper- imenty se nezaměřovaly na nastaveni parametrů SVN klasifikátoru, proto se v této práci vracíme k původním experimentům popisujícím relaci vyplývání mezi kvantifikovanýmo jmennými konstrukcemi, navrhujeme nové konfigurace klasi- fikátoru a optimalizujeme nastavení parametrů. Dosaženou přesnost predikce porovnáváme s původními výsledky a ukazujeme, že SVM klasifikátor s kvadrat- ickým polynomiálním jádrem dosahuje lepších výsledků....
Processing of Turkic Languages
Ciddi, Sibel ; Zeman, Daniel (advisor) ; Lopatková, Markéta (referee)
This thesis aims to present several combined methods for the morphological processing of Turkic languages, such as Turkish, which pose a specific set of challenges for computational processing, and also aims to make larger data sets publicly available. Because of the highly productive, agglutinative morphology in Turkish, data sparsity---besides the lack of the publicly available large data sets---impose difficulties in natural language processing, especially with regards to relying on purely statistical methods. Therefore, we evaluate a publicly available rule-based morphological analyzer, TRmorph, based on finite state transducers. In order to enhance the efficiency of this analyzer, and to expand its lexicon; we combine statistical and heuristics-based methods for the named entity processing (and construction of gazetteers), morphological disambiguation task and the multiword expression processing. Experiment results obtained so far point out that the use of heuristic-methods provides promising coverage increase for the text being processed by TRmorph, while the statistical approach is used as a back-up for more fine-grained tasks that may not be captured by pattern-based heuristics approach. This way, our proposed combined approach enhances the efficiency of a morphological analyzer based purely on FST...

National Repository of Grey Literature : 30 records found   previous11 - 20next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.