National Repository of Grey Literature 17 records found  1 - 10next  jump to record: Search took 0.01 seconds. 
Language Identification of Text Document
Cakl, Jan ; Pešán, Jan (referee) ; Szőke, Igor (advisor)
The thesis deals with a language identification of a text document. The final program includes three different implementation methods of language identification. The first method is based on a frequency statistics of N-gram. The second one represents Markov chains and the last one uses the simulated neural net for the identification purposes. The result is implemented in the Python language.
Plagiarism Identification
Menšík, Jakub ; Dytrych, Jaroslav (referee) ; Smrž, Pavel (advisor)
The aim of this work is to design and implement system for plagiarism identification based on a huge dataset.
Duplicate Text Identification
Pekař, Tomáš ; Kouřil, Jan (referee) ; Smrž, Pavel (advisor)
The aim of this work is to design and implement a system for duplicate text identification. The application should be able to index documents and also searching documents at index. In our work we deal with preprocessing documents, their fragmentation and indexing. Furthermore we analyze methods for duplicate text identification, that are also linked with strategies for selecting substrings. The thesis includes a description of the basic data structures that can be used to index n-grams.
Mining of Textual Data from the Web for Speech Recognition
Kubalík, Jakub ; Plchot, Oldřich (referee) ; Mikolov, Tomáš (advisor)
Prvotním cílem tohoto projektu bylo prostudovat problematiku jazykového modelování pro rozpoznávání řeči a techniky pro získávání textových dat z Webu. Text představuje základní techniky rozpoznávání řeči a detailněji popisuje jazykové modely založené na statistických metodách. Zvláště se práce zabývá kriterii pro vyhodnocení kvality jazykových modelů a systémů pro rozpoznávání řeči. Text dále popisuje modely a techniky dolování dat, zvláště vyhledávání informací. Dále jsou představeny problémy spojené se získávání dat z webu, a v kontrastu s tím je představen vyhledávač Google. Součástí projektu byl návrh a implementace systému pro získávání textu z webu, jehož detailnímu popisu je věnována náležitá pozornost. Nicméně, hlavním cílem práce bylo ověřit, zda data získaná z Webu mohou mít nějaký přínos pro rozpoznávání řeči. Popsané techniky se tak snaží najít optimální způsob, jak data získaná z Webu použít pro zlepšení ukázkových jazykových modelů, ale i modelů nasazených v reálných rozpoznávacích systémech.
ChatBot Based on Language Modelling
Plaga, Michal ; Szőke, Igor (referee) ; Skála, František (advisor)
The thesis deals with chatbot based on language modeling. The main part of thesis is implementation of chatbot on social networks. Comparison chatbot with other existing chatbots. A use of language modeling in chatbot application.
Modelling Prosodic Dynamics for Speaker Recognition
Jančík, Zdeněk ; Fapšo, Michal (referee) ; Matějka, Pavel (advisor)
Most current automatic speaker recognition system extract speaker-depend features by looking at short-term spectral information. This approach ignores long-term information. I explored approach that use the fundamental frequency and energy trajectories for each speaker. This approach models prosody dynamics on single fonemes or syllables. It is known from literature that prosodic systems do not work as well the acoustic one but it improve the system when fusing. I verified this assumption by fusing my results with state of the art acoustic system from BUT. Data from standard evaluation campaigns organized by National Institute of Standarts and Technology are used for all experiments.
Language Modelling for German
Tlustý, Marek ; Bojar, Ondřej (advisor) ; Hana, Jiří (referee)
The thesis deals with language modelling for German. The main concerns are the specifics of German language that are troublesome for standard n-gram models. First the statistical methods of language modelling are described and language phenomena of German are explained. Following that suggests own variants of n-gram language models with an aim to improve these problems. The models themselves are trained using the standard n-gram methods as well as using the method of maximum entropy with n-gram features. Both possibilities are compared using corelation metrics of hand-evaluated fluency of sentences and automatic evaluation - the perplexity. Also, the computation requirements are compared. Next, the thesis presents a set of own features that represent the count of grammatical errors of chosen phenomena. Success rate is verified on ability to predict the hand-evaluated fluency. Models of maximum entropy and own models that classify only using the medians of phenomena values computed from training data are used.
Plagiarism Identification
Menšík, Jakub ; Dytrych, Jaroslav (referee) ; Smrž, Pavel (advisor)
The aim of this work is to design and implement system for plagiarism identification based on a huge dataset.
Suggester implementation for the OpenGrok search engine
Hornáček, Adam ; Kotal, Vladimír (advisor) ; Kofroň, Jan (referee)
The suggester functionality is an important feature of modern search engines. The aim of the thesis is to implement it for the OpenGrok project. The OpenGrok search engine is based on Apache Lucene and supports its query syntax. Presented suggester implementation supports this query syntax and provides suggestions not only for prefixes but also for wildcards, regular expressions, or phrases. The implementation also takes into account the possibility of grouping queries. That means, if one query is already specified and user is typing another query, then the first query will restrict the suggestions for the second query. The promotion of specific suggestions is based on the underlying Lucene index data structure and previous searches of the users.
Language Modelling for German
Tlustý, Marek ; Bojar, Ondřej (advisor) ; Hana, Jiří (referee)
The thesis deals with language modelling for German. The main concerns are the specifics of German language that are troublesome for standard n-gram models. First the statistical methods of language modelling are described and language phenomena of German are explained. Following that suggests own variants of n-gram language models with an aim to improve these problems. The models themselves are trained using the standard n-gram methods as well as using the method of maximum entropy with n-gram features. Both possibilities are compared using corelation metrics of hand-evaluated fluency of sentences and automatic evaluation - the perplexity. Also, the computation requirements are compared. Next, the thesis presents a set of own features that represent the count of grammatical errors of chosen phenomena. Success rate is verified on ability to predict the hand-evaluated fluency. Models of maximum entropy and own models that classify only using the medians of phenomena values computed from training data are used.

National Repository of Grey Literature : 17 records found   1 - 10next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.