National Repository of Grey Literature 4 records found  Search took 0.01 seconds. 
Syntax in methods for information retrieval
Kravalová, Jana ; Pecina, Pavel (advisor) ; Holub, Martin (referee)
In the last years, application of language modeling in information retrieval has been studied quite extensively. Although language models of any type can be used with this approach, only traditional n-gram models based on surface word order have been employed and described in published experiments (often only unigram language models). The goal of this thesis is to design, implement, and evaluate (on Czech data) a method which would extend a language model with syntactic information, automatically obtained from documents and queries. We attempt to incorporate syntactic information into language models and experimentally compare this approach with unigram and bigram model based on surface word order. We also empirically compare methods for smoothing, stemming and lemmatization, effectiveness of using stopwords and pseudo relevance feedback. We perform a detailed analysis of these retrieval methods and describe their performance in detail.
Automatický word alignment
Kravalová, Jana ; Pecina, Pavel (advisor) ; Novák, Václav (referee)
Word alignment is a crucial component of modern machine translation systems. Given a sentence in two languages, the task is to determine which words from one language are the most likely translations of words from the other language. As an alternative to classical generative approach (IBM models) new methods based on discriminative training and maximum-weight bipartite matching algorithms for complete bipartite graphs have been proposed in recent years. The graph vertices represent words in the source and target language. The edges are weighted by measures of association estimated from parallel training data. This work focuses on the effective implementation of maximum weight bipartite matching algorithm, implementation of scoring procedures for graph vertexes, and basic experiments and their evaluation.
Automatický word alignment
Kravalová, Jana ; Novák, Václav (referee) ; Pecina, Pavel (advisor)
Word alignment is a crucial component of modern machine translation systems. Given a sentence in two languages, the task is to determine which words from one language are the most likely translations of words from the other language. As an alternative to classical generative approach (IBM models) new methods based on discriminative training and maximum-weight bipartite matching algorithms for complete bipartite graphs have been proposed in recent years. The graph vertices represent words in the source and target language. The edges are weighted by measures of association estimated from parallel training data. This work focuses on the effective implementation of maximum weight bipartite matching algorithm, implementation of scoring procedures for graph vertexes, and basic experiments and their evaluation.
Syntax in methods for information retrieval
Kravalová, Jana ; Holub, Martin (referee) ; Pecina, Pavel (advisor)
In the last years, application of language modeling in information retrieval has been studied quite extensively. Although language models of any type can be used with this approach, only traditional n-gram models based on surface word order have been employed and described in published experiments (often only unigram language models). The goal of this thesis is to design, implement, and evaluate (on Czech data) a method which would extend a language model with syntactic information, automatically obtained from documents and queries. We attempt to incorporate syntactic information into language models and experimentally compare this approach with unigram and bigram model based on surface word order. We also empirically compare methods for smoothing, stemming and lemmatization, effectiveness of using stopwords and pseudo relevance feedback. We perform a detailed analysis of these retrieval methods and describe their performance in detail.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.