National Repository of Grey Literature 17 records found  previous11 - 17  jump to record: Search took 0.00 seconds. 
Language Identification of Text Document
Cakl, Jan ; Pešán, Jan (referee) ; Szőke, Igor (advisor)
The thesis deals with a language identification of a text document. The final program includes three different implementation methods of language identification. The first method is based on a frequency statistics of N-gram. The second one represents Markov chains and the last one uses the simulated neural net for the identification purposes. The result is implemented in the Python language.
ChatBot Based on Language Modelling
Plaga, Michal ; Szőke, Igor (referee) ; Skála, František (advisor)
The thesis deals with chatbot based on language modeling. The main part of thesis is implementation of chatbot on social networks. Comparison chatbot with other existing chatbots. A use of language modeling in chatbot application.
Mining of Textual Data from the Web for Speech Recognition
Kubalík, Jakub ; Plchot, Oldřich (referee) ; Mikolov, Tomáš (advisor)
Prvotním cílem tohoto projektu bylo prostudovat problematiku jazykového modelování pro rozpoznávání řeči a techniky pro získávání textových dat z Webu. Text představuje základní techniky rozpoznávání řeči a detailněji popisuje jazykové modely založené na statistických metodách. Zvláště se práce zabývá kriterii pro vyhodnocení kvality jazykových modelů a systémů pro rozpoznávání řeči. Text dále popisuje modely a techniky dolování dat, zvláště vyhledávání informací. Dále jsou představeny problémy spojené se získávání dat z webu, a v kontrastu s tím je představen vyhledávač Google. Součástí projektu byl návrh a implementace systému pro získávání textu z webu, jehož detailnímu popisu je věnována náležitá pozornost. Nicméně, hlavním cílem práce bylo ověřit, zda data získaná z Webu mohou mít nějaký přínos pro rozpoznávání řeči. Popsané techniky se tak snaží najít optimální způsob, jak data získaná z Webu použít pro zlepšení ukázkových jazykových modelů, ale i modelů nasazených v reálných rozpoznávacích systémech.
Modelling Prosodic Dynamics for Speaker Recognition
Jančík, Zdeněk ; Fapšo, Michal (referee) ; Matějka, Pavel (advisor)
Most current automatic speaker recognition system extract speaker-depend features by looking at short-term spectral information. This approach ignores long-term information. I explored approach that use the fundamental frequency and energy trajectories for each speaker. This approach models prosody dynamics on single fonemes or syllables. It is known from literature that prosodic systems do not work as well the acoustic one but it improve the system when fusing. I verified this assumption by fusing my results with state of the art acoustic system from BUT. Data from standard evaluation campaigns organized by National Institute of Standarts and Technology are used for all experiments.
Duplicate Text Identification
Pekař, Tomáš ; Kouřil, Jan (referee) ; Smrž, Pavel (advisor)
The aim of this work is to design and implement a system for duplicate text identification. The application should be able to index documents and also searching documents at index. In our work we deal with preprocessing documents, their fragmentation and indexing. Furthermore we analyze methods for duplicate text identification, that are also linked with strategies for selecting substrings. The thesis includes a description of the basic data structures that can be used to index n-grams.
automatic recognition of encoding and language
Hron, Michal ; Pinkas, Otakar (advisor) ; Pavlíčková, Jarmila (referee)
Processing simple or complex texts (MIME type - application) often requires automatic recognition of encoding and language. Some types of files or pages contain an internal information about the encoding method. There might be some conflicts, however, eg. between the HTTP header and the meta tag. Sometimes it may be useful to verify the accuracy of the file encoding even when the encoding is known. In case that the identification of encoding is not available it is necessary to use a method of automatic recognition of encoding and language. One such method is an n-grams method. It has been used many times to categorize texts in many programs and in various programming languages. Based on tests results, it seems that the automatic recognition of the Czech language and other Slavic languages is less successful than the recognition of Western languages. Determining the reasons and searching for better solutions is therefore beneficial even nowadays. The length of the input text and the use of various languages in one document are important parameters. This thesis does not take text consisting of sentences in several different languages into consideration. In addition to a basic analysis of the topic, the thesis also includes a software solution to particular problems in a form of independent programs or plug-ins.
Automated Sentiment Analysis
Zeman, Matěj ; Kincl, Tomáš (advisor) ; Přibil, Jiří (referee)
The goal of my master thesis is to describe the Automated Sentiment Analysis, its methods and Cross-domain problems and to test the already existing model. I have applied this model on the data from the Czech-Slovak film database website CSFD.cz, Czech e-shop MALL.cz and one of the biggest Czech websites about books Databazeknih.cz to contribute to the solution of the Cross-Domain issue by using n-grams and the analytic software RapidMiner.

National Repository of Grey Literature : 17 records found   previous11 - 17  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.