keywords:"n-gram" - Search Results - Digital Repository

guest :: login Digital Repository
		Search		Submit		Help		About

Home > Search Results: keywords:"n-gram"

Search:

Search Tips :: Advanced Search

Search collections:

Sort by:	Display results:	Output format:

	Detekce nežádoucích požadavků na webu Slovák, Michal ; Setinský, Jiří (referee) ; Hranický, Radek (advisor) This thesis deals with the development of a classifier for detecting unwanted requests to a web server using machine learning methods. This approach requires the creation of an annotated dataset and the analysis of common features and characteristics of illegitimate requests that can be used to categorize them. Furthermore, the paper deals with the selection of an appropriate classification algorithm. The resulting model achieves a weighted F1 score of 99.95 %, is reliable and fast, making it suitable for practical deployment. Detailed record
	Language Identification of Text Document Cakl, Jan ; Pešán, Jan (referee) ; Szőke, Igor (advisor) The thesis deals with a language identification of a text document. The final program includes three different implementation methods of language identification. The first method is based on a frequency statistics of N-gram. The second one represents Markov chains and the last one uses the simulated neural net for the identification purposes. The result is implemented in the Python language. Detailed record
	Plagiarism Identification Menšík, Jakub ; Dytrych, Jaroslav (referee) ; Smrž, Pavel (advisor) The aim of this work is to design and implement system for plagiarism identification based on a huge dataset. Detailed record
	Duplicate Text Identification Pekař, Tomáš ; Kouřil, Jan (referee) ; Smrž, Pavel (advisor) The aim of this work is to design and implement a system for duplicate text identification. The application should be able to index documents and also searching documents at index. In our work we deal with preprocessing documents, their fragmentation and indexing. Furthermore we analyze methods for duplicate text identification, that are also linked with strategies for selecting substrings. The thesis includes a description of the basic data structures that can be used to index n-grams. Detailed record
	Mining of Textual Data from the Web for Speech Recognition Kubalík, Jakub ; Plchot, Oldřich (referee) ; Mikolov, Tomáš (advisor) Prvotním cílem tohoto projektu bylo prostudovat problematiku jazykového modelování pro rozpoznávání řeči a techniky pro získávání textových dat z Webu. Text představuje základní techniky rozpoznávání řeči a detailněji popisuje jazykové modely založené na statistických metodách. Zvláště se práce zabývá kriterii pro vyhodnocení kvality jazykových modelů a systémů pro rozpoznávání řeči. Text dále popisuje modely a techniky dolování dat, zvláště vyhledávání informací. Dále jsou představeny problémy spojené se získávání dat z webu, a v kontrastu s tím je představen vyhledávač Google. Součástí projektu byl návrh a implementace systému pro získávání textu z webu, jehož detailnímu popisu je věnována náležitá pozornost. Nicméně, hlavním cílem práce bylo ověřit, zda data získaná z Webu mohou mít nějaký přínos pro rozpoznávání řeči. Popsané techniky se tak snaží najít optimální způsob, jak data získaná z Webu použít pro zlepšení ukázkových jazykových modelů, ale i modelů nasazených v reálných rozpoznávacích systémech. Detailed record
	ChatBot Based on Language Modelling Plaga, Michal ; Szőke, Igor (referee) ; Skála, František (advisor) The thesis deals with chatbot based on language modeling. The main part of thesis is implementation of chatbot on social networks. Comparison chatbot with other existing chatbots. A use of language modeling in chatbot application. Detailed record
	Modelling Prosodic Dynamics for Speaker Recognition Jančík, Zdeněk ; Fapšo, Michal (referee) ; Matějka, Pavel (advisor) Most current automatic speaker recognition system extract speaker-depend features by looking at short-term spectral information. This approach ignores long-term information. I explored approach that use the fundamental frequency and energy trajectories for each speaker. This approach models prosody dynamics on single fonemes or syllables. It is known from literature that prosodic systems do not work as well the acoustic one but it improve the system when fusing. I verified this assumption by fusing my results with state of the art acoustic system from BUT. Data from standard evaluation campaigns organized by National Institute of Standarts and Technology are used for all experiments. Detailed record
	Language Modelling for German Tlustý, Marek ; Bojar, Ondřej (advisor) ; Hana, Jiří (referee) The thesis deals with language modelling for German. The main concerns are the specifics of German language that are troublesome for standard n-gram models. First the statistical methods of language modelling are described and language phenomena of German are explained. Following that suggests own variants of n-gram language models with an aim to improve these problems. The models themselves are trained using the standard n-gram methods as well as using the method of maximum entropy with n-gram features. Both possibilities are compared using corelation metrics of hand-evaluated fluency of sentences and automatic evaluation - the perplexity. Also, the computation requirements are compared. Next, the thesis presents a set of own features that represent the count of grammatical errors of chosen phenomena. Success rate is verified on ability to predict the hand-evaluated fluency. Models of maximum entropy and own models that classify only using the medians of phenomena values computed from training data are used. Detailed record
	Plagiarism Identification Menšík, Jakub ; Dytrych, Jaroslav (referee) ; Smrž, Pavel (advisor) The aim of this work is to design and implement system for plagiarism identification based on a huge dataset. Detailed record
	Suggester implementation for the OpenGrok search engine Hornáček, Adam ; Kotal, Vladimír (advisor) ; Kofroň, Jan (referee) The suggester functionality is an important feature of modern search engines. The aim of the thesis is to implement it for the OpenGrok project. The OpenGrok search engine is based on Apache Lucene and supports its query syntax. Presented suggester implementation supports this query syntax and provides suggestions not only for prefixes but also for wildcards, regular expressions, or phrases. The implementation also takes into account the possibility of grouping queries. That means, if one query is already specified and user is typing another query, then the first query will restrict the suggestions for the second query. The promotion of specific suggestions is based on the underlying Lucene index data structure and previous searches of the users. Detailed record

Interested in being notified about new results for this query?
Subscribe to the RSS feed.

Digital Repository :: :: :: ::
Powered by v1.1.2
Maintained by

This site is also available in the following languages:
Česky English