National Repository of Grey Literature 2 records found  Search took 0.01 seconds. 
Processing of Turkic Languages
Ciddi, Sibel ; Zeman, Daniel (advisor) ; Hlaváčová, Jaroslava (referee)
Title: Processing of Turkic Languages Author: Sibel Ciddi Department: Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague Supervisor: RNDr. Daniel Zeman, Ph.D. Abstract: This thesis presents several methods for the morpholog- ical processing of Turkic languages, such as Turkish, which pose a specific set of challenges for natural language processing. In order to alleviate the problems with lack of large language resources, it makes the data sets used for morphological processing and expansion of lex- icons publicly available for further use by researchers. Data sparsity, caused by highly productive and agglutinative morphology in Turkish, imposes difficulties in processing of Turkish text, especially for meth- ods using purely statistical natural language processing. Therefore, we evaluated a publicly available rule-based morphological analyzer, TRmorph, based on finite state methods and technologies. In order to enhance the efficiency of this analyzer, we worked on expansion of lexicons, by employing heuristics-based methods for the extraction of named entities and multi-word expressions. Furthermore, as a prepro- cessing step, we introduced a dictionary-based recognition method for tokenization of multi-word expressions. This method complements...
Processing of Turkic Languages
Ciddi, Sibel ; Zeman, Daniel (advisor) ; Lopatková, Markéta (referee)
This thesis aims to present several combined methods for the morphological processing of Turkic languages, such as Turkish, which pose a specific set of challenges for computational processing, and also aims to make larger data sets publicly available. Because of the highly productive, agglutinative morphology in Turkish, data sparsity---besides the lack of the publicly available large data sets---impose difficulties in natural language processing, especially with regards to relying on purely statistical methods. Therefore, we evaluate a publicly available rule-based morphological analyzer, TRmorph, based on finite state transducers. In order to enhance the efficiency of this analyzer, and to expand its lexicon; we combine statistical and heuristics-based methods for the named entity processing (and construction of gazetteers), morphological disambiguation task and the multiword expression processing. Experiment results obtained so far point out that the use of heuristic-methods provides promising coverage increase for the text being processed by TRmorph, while the statistical approach is used as a back-up for more fine-grained tasks that may not be captured by pattern-based heuristics approach. This way, our proposed combined approach enhances the efficiency of a morphological analyzer based purely on FST...

Interested in being notified about new results for this query?
Subscribe to the RSS feed.