Hajič, Jan - Search Results - Digital Repository

guest :: login Digital Repository
		Search		Submit		Help		About

Home > Search Results: Hajič, Jan

Search:

Search Tips :: Advanced Search

Search collections:

Sort by:	Display results:	Output format:

	Lexical Association Measures Collocation Extraction Pecina, Pavel ; Hajič, Jan (advisor) Lexical Association Measures: Collocation Extraction Pavel Pecina Abstract of Doctoral Thesis This thesis is devoted to an empirical study of lexical association measures and their application for collocation extraction. We focus on two-word (bigram) collocations only. We compiled a comprehensive inventory of 82 lexical association measures and present their empirical evaluation on four reference data sets: dependency bigrams from the manually annotated Prague Dependency Trcebank, surface bigrams from the same source, instances of the previous from the Czech National Corpus provided with automatically assigned lemmas and part-of-speech tags, and distance verb-noun bigrams from the automatically part-of-spcech tagged Swedish Parole Corpus. Collocation candidates in the reference data sets were manually annotated and identified as collocations and non-collocations. The evaluation scheme is based on measuring the quality of ranking collocation candidates according to their chance to form collocations. The methods are compared by precision-recall curves and mean average precision scores adopted from the field of information retrieval. Tests of statistical significance were also performed. Further, we study the possibility of combining lexical association measures and present empirical results of several... Detailed record
	Automatic annotation of English on the tectogrammatical level Toman, Josef ; Hajič, Jan (advisor) ; Žabokrtský, Zdeněk (referee) Tectogrammatical layer is very complex and its annotation is di cult and expensive. Unlike other corpora, the Prague English Dependency Treebank (pedt) is based on data for which there already exists a syntactic annotation, even though a fundamentally di erent one. The goal of this work is to propose and implement methods of automatic annotation that are using the available data and (preferably) would lead to minimization of the e ort needed for a manual annotation. A high-quality evaluation is important so that the contribution of the used methods can be veri ed. Tens of modules, which focus on various aspects of annotation, were created. The analysis of their activity is complicated and required a complex system to be created. The analyses created with it are very detailed. The outcome is positive and urges to continue the work and extend it further. Detailed record
	Speech Interface for Corpus Annotation Tools Přikryl, Leoš ; Hajič, Jan (advisor) ; Peterek, Nino (referee) The thesis considers design and implementation of the interface for the corpus annotation tools used at the Institute of Formal and Applied Linguistics (TrEd and its additional modules) in the natural language (speech). Already existing modules for automatic speech recognition from the University of West Bohemia in Pilsen are used. Detailed record
	Analytical and Tectogrammatical Analysis of a Natural Language Klimeš, Václav ; Hajič, Jan (advisor) ; Pala, Karel (referee) ; Ribarov, Kiril (referee) The thesis presents tools for analysis at analytical and tectogrammatical layers that the Prague Dependency Treebank is based on. The tools for analytical annotation consist of two parsers and a tool for assigning syntactic tags. Although the performance of the parsers is far below that of the state-of-the-art parsers, they both can be considered a certain contribution to parsing, since the methods they are based on are novel. The tool for assigning syntactic tags makes 15% less errors than a tool used for this purpose previously. The tool developed for tectogrammatical annotation is the only one that can currently perform this task in such a breadth. Although other, specialized tools may have a better performance of some of its particular subtasks, my tool makes 29% and 47% less errors for the Czech language than the combination of existing tools for annotating the tectogrammatical structure and deep functors, respectively, which are the core of the tectogrammatical layer. The proposed tools are designed the way they can be used for other languages as well. Detailed record
	Speech Recognition of Czech Using Finite-State Machines Podveský, Petr ; Hajič, Jan (advisor) ; Psutka, Josef (referee) ; Krbec, Pavel (referee) Speech recognition has become a thriving field with many real-life applications. Voice dialing in cell phones, voice control in embedded devices, speech-driven interactive manuals and many other utilities rely on solid speech recognition software. We believe that research in speech recognition can boost performance of many applications related to the area. The thesis concentrates on automatic large-vocabulary continuous-speech recognition of Czech. Czech differs from English in a few aspects. We focus on these differences and propose new language-depended techniques. Namely rich morphology is investigated and its impact on speech recognition is studied. Out-of-vocabulary (OOV) words are identified as one of the major sources deteriorating recognition performace. New language modeling techniques are proposed to alleviate the problem of OOV words. The proposed language models are tested in speech recognition systems on diverse speech corpora. The obtained results validate the original approach to language modeling. Significant overall speech recognition improvement is observed. Detailed record
	Using Dependency Tree Structure for Czech-English Machine Translation Čmejrek, Martin ; Hajič, Jan (advisor) ; Pala, Karel (referee) ; Ircing, Pavel (referee) Detailed record
	Statistical Methods in Czech-English Machine Translation Cuřín, Jan ; Hajič, Jan (advisor) ; Rosen, Alexandr (referee) ; Žabokrtský, Zdeněk (referee) Detailed record
	Rules for analyzing anaphora in Czech Nguy, Giang Linh ; Hajičová, Eva (referee) ; Hajič, Jan (advisor) With the increasing importance of natural language processing there is growing number of research with the theme automatic anaphora resolution.. The contribution to the research on this problem is also this thesis. The aim of the work is to propose a set of rules for anaphora resolution in Czech. The created set of rules consists of handwritten rules as well as rules developped with the aid of machine learning system C4.5. For the rules training and testing were used anoted data from the Prague Dependency Treebank, in which following types of anaphora are captured: pronominal anaphora, control, reciprocity and dependency relation of adjuncts. Our work is focused on these types of anaphora. The evaluation of the rules is done with standard methods for interpretation of recall and precision. Detailed record
	Capturing a Sentence Structure by a Dependency Relation in an Annotated Syntactical Corpus (Tools Guaranteeing Data Consistence) Štěpánek, Jan ; Panevová, Jarmila (advisor) ; Hajič, Jan (referee) ; Pognan, Patrice (referee) Detailed record
	Rule -Based Morphological Disambiguation Květoň, Pavel ; Hajič, Jan (advisor) ; Oliva, Karel (referee) ; Rosen, Alexandr (referee) Detailed record

See also: similar author names
2	Hajič, Jakub

Interested in being notified about new results for this query?
Subscribe to the RSS feed.

Digital Repository :: :: :: ::
Powered by v1.1.2
Maintained by

This site is also available in the following languages:
Česky English