National Repository of Grey Literature 30 records found  1 - 10nextend  jump to record: Search took 0.00 seconds. 
Extraction of multilingual valency frames from dependency treebanks
Faryad, Ján ; Zeman, Daniel (advisor) ; Lopatková, Markéta (referee)
Multilingual valency dictionaries provide helpful information about correspon- dence of valency frames (verbs and their arguments) across various languages. This work aims at developing a program that automatically creates a multi- lingual valency dictionary, based on parallel treebanks annotated according to Universal Dependencies. This task includes monolingual extraction of va- lency frames and their cross-lingual linking. Various methods for solving the task are analysed and implemented. The work includes both general, language- independent approach and additional language-specific extensions, provided in particular for English, Czech and Slovak. The methods for linking the valency frames include using word alignment, morphological and syntactic information contained in the UD annotation or similarity of verbs between related languages. The quality of the solution is evaluated by multiple established metrics on man- ually annotated data or by comparison with an existing valency dictionary. 1
Typical Usage Patterns of English Verbs
Smejkalová, Lenka ; Holub, Martin (advisor) ; Lopatková, Markéta (referee)
Corpus Pattern Analysis (CPA) is a corpus-based method that explores typical usage patterns of verbs in a text corpus, and describes meaning of verbs by means of contextual preferences defined both syntactically and semantically [1]. CPA in conjuction with the British National Corpus (BNC) is currently used to create The Pattern Dictionary of English Verbs (PDEV) [1, 2]. The thesis describes the current status of the PDEV, presents a thorough analysis of available data on typical usage patterns and explores possible applications of the PDEV for automatic lexical analysis. In this thesis procedures usable in further PDEV development have been designed and implemented. The first of them automatically extracts arguments of verbs from an output of English syntactic analysis. The second one uses the extracted arguments to create lists of lexical units that realize semantic types. The last procedure uses these lists to automatically recognize typical usage patterns of verbs. The thesis also evaluates inter-annotator agreement, automatic extraction of verb arguments in/from English sentence, and effectiveness of the proposed procedures in the extraction of lexical units that realize semantic types and in automatic recognition of typical usage patterns.
Automatic linking of lexicographic sources and corpus data
Bejček, Eduard ; Lopatková, Markéta (advisor) ; Horák, Aleš (referee) ; Žabokrtský, Zdeněk (referee)
Along with the increasing development of language resources - i.e., new lexicons, lexical databases, corpora, treebanks - the need for their efficient interlinking is growing. With such a linking, one can easily benefit from all their properties and information. Considering the convergence of resources, universal lexicographic formats are frequently discussed. In the present thesis, we investigate and analyse methods of interlinking language resources automatically. We introduce a system for interlinking lexicons (such as VALLEX, PDT-Vallex, FrameNet or SemLex) that offer information on syntactic properties of their entries. The system is automated and can be used repeatedly with newer versions of lexicons under development. We also design a method for identification of multiword expressions in a parsed text based on syntactic information from the SemLex lexicon. An output that verifies feasibility of the used methods is, among others, the mapping between the VALLEX and the PDT-Vallex lexicons, resulting in tens of thousands of annotated treebank sentences from the PDT and the PCEDT treebanks added into VALLEX. Powered by TCPDF (www.tcpdf.org)
Mapping the Prague Dependency Treebank Annotation Scheme onto Robust Minimal Recursion Semantics
Jakob, Max ; Lopatková, Markéta (advisor) ; Štěpánek, Jan (referee)
This thesis investigates the correspondence between two semantic formalisms, namely the tectogrammatical layer of the Prague Dependency Treebank 2.0 (PDT) and Robust Minimal Recursion Semantics (RMRS). It is a rst attempt to relate the dependency based annotation scheme of PDT to a compositional semantics approach like RMRS. An iterative mapping algorithm that converts PDT trees into RMRS structures is developed that associates RMRSs to each node in the dependency tree. Therefore, composition rules are formulated and the complex relation between dependency in PDT and semantic heads in RMRS is analyzed in detail. It turns out that structure and dependencies, morphological categories and some coreferences can be preserved in the target structures. Furthermore, valency and free modi cations are distinguished using the valency dictionary of PDT as an additional resource. The evaluation result of 81% recall shows that systematically correct underspeci ed target structures can be obtained by a rule-based mapping approach, which is an indicator that RMRS is capable of representing Czech data. This nding is novel as Czech, with its free word order and rich morphology, is typologically di erent from language that used RMRS thus far.
Verb Valency Frames Disambiguation
Semecký, Jiří ; Hajič, Jan (advisor) ; Krbec, Pavel (referee) ; Lopatková, Markéta (referee)
Semantic analysis has become a bottleneck of many natural language applications. Machine translation, automatic question answering, dialog management, and others rely on high quality semantic analysis. Verbs are central elements of clauses with strong influence on the realization of whole sentences. Therefore the semantic analysis of verbs plays a key role in the analysis of natural language. We believe that solid disambiguation of verb senses can boost the performance of many real-life applications. In this thesis, we investigate the potential of statistical disambiguation of verb senses. Each verb occurrence can be described by diverse types of information. We investigate which information is worth considering when determining the sense of verbs. Different types of classification methods are tested with regard to the topic. In particular, we compared the Naive Bayes classifier, decision trees, rule-based method, maximum entropy, and support vector machines. The proposed methods are thoroughly evaluated on two different Czech corpora, VALEVAL and the Prague Dependency Treebank. Significant improvement over the baseline is observed.
Valency of Verbs in the Prague Dependency Treebank
Urešová, Zdeňka ; Hajičová, Eva (advisor) ; Lopatková, Markéta (referee) ; Ondrejovič, Slavo (referee)
Title: Valency of verbs in the Prague Dependency Treebank Author: PhDr. Zdeňka Urešová Department: Institute of Formal and Applied Linguistics MFF UK Supervisor: Prof. PhDr. Eva Hajičová, DrSc. Abstract: This dissertation describes PDT-Vallex, a valency lexicon of Czech verbs, and its relation to the annotation of the Prague Dependency Treebank (PDT). The PDT-Vallex lexicon was created during the an- notation of the PDT and it is a valuable source of verbal valency information available both for linguistic research and for computer- ized natural language processing. In this thesis, we describe not only the structure and design of the lexicon (which is closely related to the notion of valency as developed in the Functional Generative De- scription of language) but also the relation between the PDT-Vallex and the PDT. The explicit and full-coverage linking of the lexicon to the treebank prompted us to pay special attention to diatheses; we propose formal transformation rules for diatheses to handle their surface realization even when the canonical forms of verb arguments as captured in the lexicon do not correspond to the forms of these arguments actually appearing in the corpus.
Question and Answer Classifier for closed domain Interactive Question Answering
Dinh, Le Thanh ; Lopatková, Markéta (advisor) ; Schlesinger, Pavel (referee)
Nowadays natural language processing has made big progress thanks to the application of statistical approaches and to the large amount of data available to train the systems. These progresses are pushed by the several evaluation campaigns. Thanks to them systems are compared and progress measured. These evaluations are mostly based on data sets artificially developed by the organizers of such evaluation campaigns. In our work we show that though useful these data sets are biased and there is the need of developing data generated in a more natural setting by real users. We consider as case studies the classification of questions. In particular we look at the classification of questions types needed in Question Answering systems, and the classification of follow up questions into topic continuation and topic shift needed in Interactive Question Answering. We evaluate classifiers first on TREC data and than on a corpus of real user's data. In both cases the performance of the classifiers drops significantly showing the need of working on more users centered systems. The results also show that the classifiers could be better fine tuned taking into account the new challenges real users data launch to NLP systems. We leave this for future research.
Automatické určování sémantických preferencí pro slovesná valenční doplnění
Vandas, Karel ; Lopatková, Markéta (advisor) ; Vidová Hladká, Barbora (referee)
Verb valency plays an important role in the description of behaviour of verbs and connects surface realisation of language with its semantics. Verb itself usually encodes several readings. Complementations of a verb help to identify correct reading of the verb. So far valency verb complementations are mostly studied from morphological and syntactical point of view. The purpose of this thesis is to examine possibilities of automatic identification of semantic preferences for valency complementations of verbs. The thesis discusses performance of system with different levels of available verb valency information in connection with cluster analysis. The thesis contains an evaluation section that compares available methods and their comparision.
Quantifying Determiners from the Distributional Semantics View
Gutiérrez Vasques, María Ximena ; Lopatková, Markéta (advisor) ; Vidová Hladká, Barbora (referee)
Název práce: Quantifying Determiners from the Distributional Semantics View Autor: Maria Ximena Gutierrez Vasques Katedra: Ústav formální a aplikované lingvistiky Vedoucí diplomové práce: doc. RNDr. Markéta Lopatková, Ph.D. Abstrakt: Distribuční sémanika představuje moderní přístup k zachycení sémantiky přirozeného jazyka. Jedním z témat, kterým zatím v rámci tohoto přístupu nebyla věnována dostatečná pozornost, je možnost automatické detekce logických relací jako vyplývání. Tato diplomová práce navazuje na práci autorů Baroni, Bernar- di, Do and Shan (2012), kteří se zabývají relací vyplývání mezi kvantifikujícími výrazy. Citovaná práce využívá detekce pomocí SVN klasifikátorů natrénavaných na sémantických vektorech reprezentujících relaci vyplývání. Popisované exper- imenty se nezaměřovaly na nastaveni parametrů SVN klasifikátoru, proto se v této práci vracíme k původním experimentům popisujícím relaci vyplývání mezi kvantifikovanýmo jmennými konstrukcemi, navrhujeme nové konfigurace klasi- fikátoru a optimalizujeme nastavení parametrů. Dosaženou přesnost predikce porovnáváme s původními výsledky a ukazujeme, že SVM klasifikátor s kvadrat- ickým polynomiálním jádrem dosahuje lepších výsledků....
Form and function of nouns in Czech: relation between nominal case and syntactic function. Based on a synchronic written corpus of Czech (SYN2005)
Jelínek, Tomáš ; Petkevič, Vladimír (advisor) ; Lopatková, Markéta (referee) ; Uličný, Oldřich (referee)
The case in Czech is the basic morphological means by which nouns express their function in a sentence. The objective of this thesis is to describe, from a frequency point of view, the relation between form and function of nouns, or, more precisely, how frequently cases (both simple and prepositional) are used to realise syntactic functions in sentences. The thesis is based on one of the largest corpora of written synchronic Czech: 100-million-token corpus SYN2005. In order to obtain data on frequencies of syntactic functions of nouns in relation to their cases, we annotated the corpus SYN2005 with a dependency syntactic annotation. For this annotation, we adopted the format of the analytical layer of the Prague Dependency Treebank. The syntactic annotation has been performed by a stochastic parser: the MST parser. Since the reliability of this annotation was not high enough, we have built an automatic correction module, which identifies errors of syntactic annotation in the output of the stochastic parser and corrects these errors by means of linguistic rules. We have implemented 26 different rules, but annotation errors have been reduced by merely 6-8%. However, this correction module can be further developed. It can be used to correct the output of any dependency parser trained on the data from...

National Repository of Grey Literature : 30 records found   1 - 10nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.