keywords:"Zpracování přirozeného jazyka" - Search Results - Digital Repository

guest :: login Digital Repository
		Search		Submit		Help		About

Home > Search Results: keywords:"Zpracování přirozeného jazyka"

Search:



Search Tips :: Simple Search

Search collections:

Sort by:	Display results:	Output format:

	The Most Frequent Word n-Grams Holec, Matúš ; Szőke, Igor (referee) ; Smrž, Pavel (advisor) This thesis deals with design and implementation of effective system for word n-grams extraction from texts. System is based on batch processing therefore it is able to process large text corpuses. The first part contains principles of existing methods for an n-gram extraction. The next part includes description of the implemented system as well as the approach of acceleration system by paralelizing the batch processing. The last part contains efficiency comparison between available implementations and designed system and time complexity comparison between sequential and paralelized approach. Detailed record
	Processing Czech in Python Novotný, Zdeněk ; Schmidt, Marek (referee) ; Smrž, Pavel (advisor) This bachorelor´s thesis presents some ways of Czech language processing. The first part contains a general destription of NLTK system. Some of aftermentioned functions were inspired by NLTK functions. There are described functions which attend to inflection and inflexion of various words class in Czech language. Next part is focused on processing of the text in Czech language in which are found and marked each sentences and other parts. Last part describes possibillity of tranformations rules application for each part of text. Results after rules application could be represented graphically. Detailed record
	Czech-English Translation Petrželka, Jiří ; Schmidt, Marek (referee) ; Smrž, Pavel (advisor) Tato diplomová práce popisuje principy statistického strojového překladu a demonstruje, jak sestavit systém pro statistický strojový překlad Moses. V přípravné fázi jsou prozkoumány volně dostupné bilingvní česko-anglické korpusy. Empirická analýza časové náročnosti vícevláknových nástrojů pro zarovnání slov demonstruje, že MGIZA++ může dosáhnout až pětinásobného zrychlení, zatímco PGIZA++ až osminásobného zrychlení (v porovnání s GIZA++). Jsou otestovány tři způsoby morfologického pre-processingu českých trénovacích dat za použití jednoduchých nefaktorových modelů. Zatímco jednoduchá lemmatizace může snížit BLEU, sofistikovanější přístupy většinou BLEU zvyšují. Positivní efekty morfologického pre-processingu se vytrácejí s růstem velikosti korpusu. Vztah mezi dalšími charakteristikami korpusu (velikost, žánr, další data) a výsledným BLEU je empiricky měřen. Koncový systém je natrénován na korpusu CzEng 0.9 a vyhodnocen na testovacím vzorku z workshopu WMT 2010. Detailed record
	Similarity Search in Document Collections Jordanov, Dimitar Dimitrov ; Plchot, Oldřich (referee) ; Smrž, Pavel (advisor) Hlavním cílem této práce je odhadnout výkonnost volně šířeni balík Sémantický Vektory a třída MoreLikeThis z balíku Apache Lucene. Tato práce nabízí porovnání těchto dvou přístupů a zavádí metody, které mohou vést ke zlepšení kvality vyhledávání. Detailed record
	Data Mining in Social Networks Raška, Jiří ; Očenášek, Pavel (referee) ; Bartík, Vladimír (advisor) This thesis deals with knowledge discovery from social media. This thesis is focused on feature based opinion mining from user reviews. In theoretical part were described methods of opinion mining and natural language processing. Main parts of this thesis were design and implementation of library for opinion mining based on Stanford Parser and lexicon WordNet. For feature identi cation was used dependency grammar, implicit features were mined with method CoAR and opinions were classi ed with supervised algorithm. Finally were given experiments with implemented library and examples of usage. Detailed record
	Document Classification Marek, Tomáš ; Škoda, Petr (referee) ; Otrusina, Lubomír (advisor) This thesis deals with a document classification, especially with a text classification method. Main goal of this thesis is to analyze two arbitrary document classification algorithms to describe them and to create an implementation of those algorithms. Chosen algorithms are Bayes classifier and classifier based on support vector machines (SVM) which were analyzed and implemented in the practical part of this thesis. One of the main goals of this thesis is to create and choose optimal text features, which are describing the input text best and thus lead to the best classification results. At the end of this thesis there is a bunch of tests showing comparison of efficiency of the chosen classifiers under various conditions. Detailed record
	Named Entity Recognition Rylko, Vojtěch ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor) In this master thesis are described the history and theoretical background of named-entity recognition and implementation of the system in C++ for named entity recognition and disambiguation. The system uses local disambiguation method and statistics generated from the Wikilinks web dataset. With implemented system and with alternative implementations are performed various experiments and tests. These experiments show that the system is sufficiently successful and fast. System participates in the Entity Recognition and Disambiguation Challenge 2014. Detailed record
	Syntactic Analyzer for Czech Language Beneš, Vojtěch ; Otrusina, Lubomír (referee) ; Kouřil, Jan (advisor) Master’s thesis describes theoretical basics, solution design, and implementation of constituency (phrasal) parser for Czech language, which is based on a part of speech association into phrases. Created program works with manually built and annotated Czech sample corpus to generate probabilistic context free grammar within runtime machine learning. Parser implementation, based on extended CKY algorithm, then for the input Czech sentence decides if the sentence can be generated by the created grammar and for the positive cases constructs the most probable derivation tree. This result is then compared with the expected parse to evaluate constituency parser success rate. Detailed record
	Extraction of Semantic Relations from Text Schmidt, Marek ; Burget, Radek (referee) ; Smrž, Pavel (advisor) Extraction of semantic relations from English text is the topic of this thesis. It focuses on exploitation of a dependency parser. A method based on syntactic patterns is proposed and evaluated in addition to evaluation of several statistical methods over syntactic features. It applies the methods for extraction of a hypernymy relation and evaluates it on the WordNet thesaurus. A system for extraction of semantic relations from text is designed and implemented based on these methods. Detailed record
	Word Sense Disambiguation Kraus, Michal ; Glembek, Ondřej (referee) ; Smrž, Pavel (advisor) The master's thesis deals with sense disambiguation of Czech words. Reader is informed about task's history and used algorithms are introduced. There are naive Bayes classifier, AdaBoost classifier, maximum entrophy method and decision trees described in this thesis. Used methods are clearly demonstrated. In the next parts of this thesis are used data also described. Last part of the thesis describe reached results. There are some ideas to improve the system at the end of the thesis. Detailed record

Interested in being notified about new results for this query?
Subscribe to the RSS feed.

Digital Repository :: :: :: ::
Powered by v1.1.2
Maintained by

This site is also available in the following languages:
Česky English