Straňák, Pavel - Search Results - Digital Repository

guest :: login Digital Repository
		Search		Submit		Help		About

Home > Search Results: Straňák, Pavel

Search:



Search Tips :: Simple Search

Search collections:

Sort by:	Display results:	Output format:

	English grammar checker and corrector: the determiners Auersperger, Michal ; Pecina, Pavel (advisor) ; Straňák, Pavel (referee) Correction of the articles in English texts is approached as an article generation task, i.e. each noun phrase is assigned with a class corresponding to the definite, indefinite or zero article. Supervised machine learning methods are used to first replicate and then improve upon the best reported result in the literature known to the author. By feature engineering and a different choice of the learning method, about 34% drop in error is achieved. The resulting model is further compared to the performance of expert annotators. Although the comparison is not straightforward due to the differences in the data, the results indicate the performance of the trained model is comparable to the human-level performance when measured on the in-domain data. On the other hand, the model does not generalize well to different types of data. Using a large-scale language model to predict an article (or no article) for each word of the text has not proved successful. 1 Detailed record
	Natural Language Correction Náplava, Jakub ; Straka, Milan (advisor) ; Straňák, Pavel (referee) The goal of this thesis is to explore the area of natural language correction and to design and implement neural network models for a range of tasks ranging from general grammar correction to the specific task of diacritization. The thesis opens with a description of existing approaches to natural language correction. Existing datasets are reviewed and two new datasets are introduced: a manually annotated dataset for grammatical error correction based on CzeSL (Czech as a Second Language) and an automatically created spelling correction dataset. The main part of the thesis then presents design and implementation of three models, and evaluates them on several natural language correction datasets. In comparison to existing statistical systems, the proposed models learn all knowledge from training data; therefore, they do not require an error model or a candidate generation mechanism to be manually set, neither they need any additional language information such as a part of speech tags. Our models significantly outperform existing systems on the diacritization task. Considering the spelling and basic grammar correction tasks for Czech, our models achieve the best results for two out of the three datasets. Finally, considering the general grammatical correction for English, our models achieve results which are... Detailed record
	Creating a Bilingual Dictionary using Wikipedia Ivanova, Angelina ; Zeman, Daniel (advisor) ; Straňák, Pavel (referee) Title: Creating a Bilingual Dictionary using Wikipedia Author: Angelina Ivanova Department/Institute: Institute of Formal and Applied Linguistics (32-ÚFAL) Supervisor of the master thesis: RNDr. Daniel Zeman Ph.D. Abstract: Machine-readable dictionaries play important role in the research area of com- putational linguistics. They gained popularity in such fields as machine translation and cross-language information extraction. In this thesis we investigate the quality and content of bilingual English-Russian dictionaries generated from Wikipedia link structure. Wiki-dictionaries differ dramatically from the traditional dictionaries: the re- call of the basic terminology on Mueller's dictionary was 7.42%. Machine translation experiments with Wiki-dictionary incorporated into the training set resulted in the rather small, but statistically significant drop of the the quality of the translation compared to the experiment without Wiki-dictionary. We supposed that the main reason was domain difference between the dictio- nary and the corpus and got some evidence that on the test set collected from Wikipedia articles the model with incorporated dictionary performed better. In this work we show how big the difference between the dictionaries de- veloped from the Wikipedia link structure and the traditional... Detailed record
	Voice command for a TV set Černý, Patrik ; Straňák, Pavel (advisor) ; Peterek, Nino (referee) Title: Voice command for a TV set Author: Patrik Černý Department: Institute of Formal and Applied Linguistics Supervisor: Mgr. Pavel Straňák, Ph.D. Abstract: A goal of this thesis is to create television voice control intended for poeple with speech and movement disorder. This is achieved by interconnecting computer and television. Voice control is based on well-known dynamic time warping algorithm. It has been shown, that due to high and frequent changes in sound intensity the voice control of television is quite a complex task. The word recognition success rate of the final application is not very high, but for the purpose sufficient. Because of application design, program can be easily extended by techniques, that can improve recognition effectivity. Keywords: voice control, word recognition, dynamic time warping, television 1 Detailed record
	Today's news Jankovský, Petr ; Holan, Tomáš (advisor) ; Straňák, Pavel (referee) The project deals with the design and implementation of the program based on frequency analysis of the text. The results should provide a quick overview about currently published articles in the newspapers. The program downloads the current articles from newspaper Web sites. For each of defined section and each article is able to list the most frequent n-tuple of words. There is option to define dictionary of uninteresting (banned) words and dictionary of phrases. Implementation solves some problems with downloading articles from various structure different servers, such as problems with encoding and problems with recognition articles from advertisement. The work reveals that simple frequency analysis can bring interesting results. Detailed record
	Annotation of Multiword Expressions in the Prague Dependency Treebank Straňák, Pavel ; Hajič, Jan (advisor) ; Pala, Karel (referee) ; Pecina, Pavel (referee) This thesis explores annotation of multiword expressions in the Prague Dependency Treebank 2.0. We explain, what we understand as multiword expressions (MWEs), review the state of PDT 2.0 with respect to MWEs and present our annotation. We describe the data format developed for the annotation, the annotation tool, and other soware developed to allow for visualisation and searching of the data. We also present the annotation lexicon SemLex and analysis of the annotation. Detailed record
	Pokročilý korektor češtiny Richter, Michal ; Straňák, Pavel (advisor) ; Žabokrtský, Zdeněk (referee) The aim of this work is to implement a Czech spell-checker using several language models and a lexical morphological analyser in order to o er proper correction suggestions and also to nd real-word spelling errors (spelling errors that happen to be in the lexicon). The system should also be able to complete diacritics to Czech text. Mac OS X was chosen as the target platform for the application. During the implementation, emphasis was put especially on memory-effient representation of the above-mentioned statistical models. In the beginning, a gentle introduction to Hiden Markov Models, Language Models and Viterbi algorithm is given. The actual system implementation and the statistical models training is discussed further. In the nal part of the work, the achived results are evaluated and discussed in depth. Detailed record
	N-gram language model for a Czech spellchecker Richter, Michal ; Bojar, Ondřej (referee) ; Straňák, Pavel (advisor) The aim of this thesis is to explore the possibilities of using n-gram language models for spellchecking Czech texts and to implement an extension to the spellchecker which would be able to find such misspelled words that are true Czech words. Furthermore, the aim was to implement a simple web application which would present the extended spellchecker. The influence of using lemmatization and morphology analysis of words regarding the hit rate of finding misspelled words was also looked into. The methods of language modelling used in the thesis are described first. What follow, then, is the description of the procedure of the spellchecking program using language models. The next part shows the way of getting the data for language model training. In the following part, the evaluation of the language models created is presented. The final part shows the results achieved for each option of spellchecking. Detailed record
	Automatické čištění HTML dokumentů Marek, Michal ; Straňák, Pavel (referee) ; Pecina, Pavel (advisor) This paper describes a system for automatic cleaning of HTML documents, which was used in the participation of the Charles University in CLEANEVAL 2007. CLEANEVAL is a shared task and competitive evaluation of automatic systems for cleaning arbitrary web pages with the goal of preparing web data for use as a corpus in the area of computational linguistics and natural language processing. We try to solve this task as a sequence-labeling problem and our experimental system is based on Conditional Random Fields exploiting a set of features extracted from textual content and HTML structure of analyzed web pages for each block of text. Detailed record
	Software for a Czech-Chinese and Chinese-Czech dictionary Hudeček, Jan ; Straňák, Pavel (referee) ; Homola, Petr (advisor) Czech-Chinese and Chinese-Czech dictionary is an electronic dictionary which can be used both by a beginner or a seasoned translator. It allows searching in both directions and a fulltext search for given expression. Data access is hybrid - the program checks if it can access the database - if it fails it reads the data files. Moreover users can change the data source at run-time. The program builds indexes on the data file speeding searches up considerably. Indexes can be hashtables or binary trees. Asynchronous multithreaded IO was implemented to enhance the comfort of the GUI. The .NET framework and MS SQL Server as a platform guarantees rapid development, deployment and scalability - for example adding a web application to the project would be quite easy. At the same time the design of the system allows for future improvements - for instance editing the dictionary from the GUI. Detailed record

Interested in being notified about new results for this query?
Subscribe to the RSS feed.

Digital Repository :: :: :: ::
Powered by v1.1.2
Maintained by

This site is also available in the following languages:
Česky English