|
Processing of Turkic Languages
Ciddi, Sibel ; Zeman, Daniel (advisor) ; Lopatková, Markéta (referee)
This thesis aims to present several combined methods for the morphological processing of Turkic languages, such as Turkish, which pose a specific set of challenges for computational processing, and also aims to make larger data sets publicly available. Because of the highly productive, agglutinative morphology in Turkish, data sparsity---besides the lack of the publicly available large data sets---impose difficulties in natural language processing, especially with regards to relying on purely statistical methods. Therefore, we evaluate a publicly available rule-based morphological analyzer, TRmorph, based on finite state transducers. In order to enhance the efficiency of this analyzer, and to expand its lexicon; we combine statistical and heuristics-based methods for the named entity processing (and construction of gazetteers), morphological disambiguation task and the multiword expression processing. Experiment results obtained so far point out that the use of heuristic-methods provides promising coverage increase for the text being processed by TRmorph, while the statistical approach is used as a back-up for more fine-grained tasks that may not be captured by pattern-based heuristics approach. This way, our proposed combined approach enhances the efficiency of a morphological analyzer based purely on FST...
|
|
Word order reconstruction
Dvořák, Tomáš ; Vidová Hladká, Barbora (advisor) ; Mírovský, Jiří (referee)
A word order reconstruction is a re-arrangement of words to get a gramma- tically correct sentence. It is a very useful task for the applications of natural language processing, machine translation, speech recognition or construction artificial communication partners. We present a corpus-based approach to the task of word order reconstruction. We use two methods: morfological and syntactical method. Both methods use output from the external module. This approach is designed independently on the application where the word order reconstruction can help improve overall performance. Czech and English will be used as the object language. 1
|
|
Finding the answer in the answers
Záhumenský, Jakub ; Vidová Hladká, Barbora (advisor) ; Bojar, Ondřej (referee)
Title : Searching for the answer in answers Author : Jakub Záhumenský Contact : zahumensky.jakub@gmail.com Department : Institute of Formal and Applied Linguistics Supervisor : Mgr. Barbora Vidová Hladká, Ph.D. Contact on supervisor : hladka@ufal.mff.cuni.cz Abstract : We design a question-answering system Interviewer that enables users to fictionally (virtually) interview this person by asking questions as similar as possible to questions that journalists have already asked. The interviews with a given person posted on the web are being collected as a corpus of (question, answer) pairs. The user asks his/her question and the Interviewer system searches questions in the corpus to provide the answer that belongs to the most similar question. Matching questions is based on the frequency analysis and on the applications coming from natural language processing, namely tagging and parsing. We work with the interviews with Vaclav Havel posted on his personal page.
|
|
Slovak Lemmatization
Lipták, Šimon ; Dytrych, Jaroslav (referee) ; Smrž, Pavel (advisor)
Aim of this bachelor thesis was to become familiar with the tools and methods for morphological analysis and lemmatization of words, to design and to implement a system for lemmatization of slovak words, which are not in dictionary and then to write their forms, to process slovak data for implementation of stemming. At the end to score prediction based on testing and to compare with available alternatives.
|
| |
| |
|
Web as a Source for Automatic Creation of Morphological Dictionary
Bulka, Pavol ; Matějka, Pavel (referee) ; Smrž, Pavel (advisor)
Creation of natural language words is based on rules, which are generally complex. Often it is very difficult or even impossible to describe them precisely in a formal way. That is why we use a morphological dictionary to process natural language. In this paper we discuss the creation of morphological dictionary from Slovak's top level domain web. We talk about web crawling, data processing for morphological analysis and data structures too. This document makes basic principle and conception of morphological analysis clear. Final system, which is described in this thesis, produces morphological dictionary. This dictionary can be use in various application, for example spell checker, machine translation and so on.
|
|
Information Technologies in Psychology
Ličko, Jozef ; Grézl, František (referee) ; Smrž, Pavel (advisor)
We focus on characteristic traits recognition of the autor from his written text. This thesis, in particular, deals with the implementaion of Kreitler psychosemantics method. The result of our work includes our own vocabulary, that is used to assign one of the parameters from the method. Implemented solution is successful when used on a set of words that was used as a source for the vocabulary construction.
|
|
Fairy tales by Siegfried Kapper
ČÁSLAVOVÁ, Kateřina
Bachelor´s thesis analyses The Maritime Fairy Tales by Siegfried Kapper, draws comparisons with folklore fairy tales on the base of Fairy Tale Morphology by V. J. Propp and period highly presentable representatives of the Czech fairy tales writing (Erben, Kulda and Němcová) as well as with period prestigious H. Ch. Andersen. It specifies their bibliography, deals with period response of the work and tries to determine benefit for a contemporary reader.
|
| |