National Repository of Grey Literature 64 records found  1 - 10nextend  jump to record: Search took 0.01 seconds. 
Automatic detection and attribution of quotes
Ustinova, Evgeniya ; Hana, Jiří (advisor) ; Vidová Hladká, Barbora (referee)
Quotations extraction and attribution are important practical tasks for the media, but most of the presented solutions are monolingual. In this work, I present a complex machine learning-based system for extraction and attribution of direct and indirect quo- tations, which is trained on English and tested on Czech and Russian data. Czech and Russian test datasets were manually annotated as part of this study. This system is com- pared against a rule-based baseline model. Baseline model demonstrates better precision in extraction of quotation elements, but low recall. The machine learning-based model is better overall in extracting separate elements of quotations and full quotations as well. 1
Analysis and visualization of OCR output
Nová, Kateřina ; Vidová Hladká, Barbora (advisor) ; Mírovský, Jiří (referee)
Optical Character Recognition (OCR) is a process of converting text from images to a machine-readable text. We run three OCR systems (Tesseract, Ocrad and GOCR) on an original multilingual OCR dataset and perform statistical and linguistic analysis of the results in order to compare the tested systems and investigate typical OCR errors. 1
Measuring readability of technical texts
Kriukova, Anna ; Cinková, Silvie (advisor) ; Vidová Hladká, Barbora (referee)
Title: Measuring readability of technical texts Author: Anna Kriukova Faculty of Mathematics and Physics: Institute of Formal and Applied Linguistics Supervisor: Mgr. Cinkov'a Silvie, Ph.D., Institute of Formal and Applied Lin- guistics Abstract: This research explores various approaches to measuring readability of technical texts. The data I work with is provided by Hyperskill, an online educa- tional platform dedicated mostly to Computer Science, where I did my internship. In the first part of my research, I examine classical readability formulas and try to find correlations between their values and the user statistics available for the texts. The results show that there are no high correlations, thus, the standard formulas are not suitable for the task. The second part of the research is dedi- cated to experiments with machine learning algorithms. Firstly, I use four sets of features to predict the average rating, completion time, and completion rate of a step. Then, I introduce a rule-based algorithm to split the texts into well- and poorly-written ones, which relies on students' comments. However, binary classification trained on this division shows low results and is not used in the final pipeline. The system suggested as the outcome of my work employs the user statistics' prediction for new texts and...
Prague Dependency Treebank as a Czech grammar practice book Prague Dependency Treebank as an exercise book of Czech
Kučera, Ondřej ; Vidová Hladká, Barbora (advisor) ; Panevová, Jarmila (referee)
Prague Dependency Treebank (PDT) is one of the top language corpora in the world. The aim of this work is to introduce a software system that builds an exercise book of Czech using the data of PDT. Two kinds of exercises are provided: morphology (selecting correct parts of speech and their morphological cathegories) and sentence parsing (selecting analytical functions and dependencies between them). The PDT data cannot be used directly though, because of the differences between the academic approach in sentence parsing and the approach that is used in schools. Some of the sentences have to be discarded completely, several transformations have to be applied to the others in order to convert the original representation to the form to which the students are used to from school.
Syntactically-based classification of Czech sentences
Kríž, Vincent ; Vidová Hladká, Barbora (advisor) ; Mírovský, Jiří (referee)
Classification of syntactically meaningful sentences is a very useful task for the applications of natural language processing, for example machine translation, search engines and question answering systems. The theoretical linguistic research considers the language to be a system of layers. In our project, a term 'to-be-meaningful' will be specified with respect to this point of view. Namely, the morphological and syntactic layers will be considered. A knowledge-based algorithm classifying a string of Czech words being either meaningful or meaningless will be proposed and implemented. Before being classified, strings will be pre-processed by the external modules. Czech will be used as the object language.
Automatic combinations of feature templates
Dubovský, Jakub ; Novák, Václav (advisor) ; Vidová Hladká, Barbora (referee)
Searching for useful combinations of features and feature templates is not a simple task. Though combination is valuable tool for increasing accuracy of machine learning. This paper tries to suggest an algorithm for automatic search for useful combinations of categorical features and their templates. An attempt to use simulated annealing and modified genetic algorithm for search process is studied. Construction of evaluation function for assessing categorical feature template is present as well. Features and feature templates are combined separately and together. The best increase of accuracy reached by suggested procedures on datasets used is around 0.1 percentage points. Experiments were made just on two datasets. Thus further testing of algorithm on other datasets is needed to verify its usefulness in general. However experiments indicate that it can be considered as a base of usable algorithm. Simple command-line application is part of work. It was developed and used for experimentation.
An iOS implementation of the Shannon switching game
Macík, Miroslav ; Vidová Hladká, Barbora (advisor) ; Brom, Cyril (referee)
Shannon switching game is a two player logical game. The main principle is a graph and its two marked nodes. The first player's goal is to connect these two nodes without being stopped by the other player. Otherwise the second player needs to prevent the first one from connecting them in order to win. This game was created by an American mathematician Claude Shannon. Independently of him David Gale created a very same one, called Bridg-It or Gale. iOS is an operating system created by Apple Inc. company. This system is designed for iPhone mobile phones, iPod music players and iPad tablets. The base for developing this operating system is the Objective-C programming language and Cocoa Touch framework.
On the Possibility of ESP Data Use in Natural Language Processing
Knopp, Tomáš ; Vidová Hladká, Barbora (advisor) ; Pecina, Pavel (referee)
The aim of this bachelor thesis is to explore this image label database coming from the ESP game from the natural language processing (NLP) point of view. ESP game is an online game, in which human players do useful work - they label images. The output of the ESP game is then a database of images and their labels. What interests us is whether the data collected in the process of labeling images will be of any use in NLP tasks. Specifically, we are interested in the tasks of automatic coreference resolution, extension of the lexical database WordNet, idiom detection, and collocation detection. In this bachelor thesis we deal with the first two of them, which is the task of the automatic coreference resolution and the task of exploring the potential benefits to the lexical database WordNet.
Hloubková automatická analýza angličtiny
Dušek, Ondřej ; Hajič, Jan (advisor) ; Vidová Hladká, Barbora (referee)
This thesis contains an account of our studies of deep or semantic analysis of English, particularly as described using predicate-argument structure description. Our main goal is to create a system for automatic inference of semantic relations between predicates and arguments - semantic role labeling. We developed a framework for parallel processing of our experiments, integrating third-party machine learning tools and implementing well-known as well as novel procedures. We investigated the current approaches to the problem and proposed several improvements, such as new classi cation features, separate handling of adverbial modi ers or special treatment for rare predicates. Based on our research, we designed and implemented our own semantic analysis system, consisting of predicate disambiguation and argument classi cation subtasks. We evaluated our solution using the CoNLL 2009 Shared Task English corpus.
Disambiguation of Czech Morphology Using Markov Models
Dufková, Kateřina ; Podveský, Petr (advisor) ; Vidová Hladká, Barbora (referee)
In my bachelor thesis I decided to focus on disambiguation of Czech morphology. This task is important in particular in the area of natural language translation, where it takes part in preprocessing the text intended for translation in order to eliminate ambiguity in part of speech and other morphological cathegories. This ambiguity would cause problems in subsequent phases of translation or unacceptable growth of translation's time demands. I chose statistical approach to this problem, which is in comparison with other possible methods faster, more universal and able to select word cathegory in all cases. I founded my aplication KDTagger, which I created within the framework of this bachelor thesis, on the theory of Hidden Markov Models. My aim was to create such a program, which would be universal in operating system and the way of use. KDTagger allows the experts to adjust every important linguistic parameter while preserving comfort use for begginers. My work also includes extensive testings of the program KDTagger, which I performed on the Czech newspaper texts from Prague Dependency Treebank version 2.0. The program can be however applied on arbitrary natural language without not even the smallest change. Powered by TCPDF (www.tcpdf.org)

National Repository of Grey Literature : 64 records found   1 - 10nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.