National Repository of Grey Literature 224 records found  beginprevious213 - 222next  jump to record: Search took 0.01 seconds. 
Word Sense Disambiguation
Kraus, Michal ; Glembek, Ondřej (referee) ; Smrž, Pavel (advisor)
The master's thesis deals with sense disambiguation of Czech words. Reader is informed about task's history and used algorithms are introduced. There are naive Bayes classifier, AdaBoost classifier, maximum entrophy method and decision trees described in this thesis. Used methods are clearly demonstrated. In the next parts of this thesis are used data also described.  Last part of the thesis describe reached results. There are some ideas to improve the system at the end of the thesis.
Information Extraction from Biomedical Texts
Knoth, Petr ; Burget, Radek (referee) ; Smrž, Pavel (advisor)
Recently, there has been much effort in making biomedical knowledge, typically stored in scientific articles, more accessible and interoperable. As a matter of fact, the unstructured nature of such texts makes it difficult to apply  knowledge discovery and inference techniques. Annotating information units with semantic information in these texts is the first step to make the knowledge machine-analyzable.  In this work, we first study methods for automatic information extraction from natural language text. Then we discuss the main benefits and disadvantages of the state-of-art information extraction systems and, as a result of this, we adopt a machine learning approach to automatically learn extraction patterns in our experiments. Unfortunately, machine learning techniques often require a huge amount of training data, which can be sometimes laborious to gather. In order to face up to this tedious problem, we investigate the concept of weakly supervised or bootstrapping techniques. Finally, we show in our experiments that our machine learning methods performed reasonably well and significantly better than the baseline. Moreover, in the weakly supervised learning task we were able to substantially bring down the amount of labeled data needed for training of the extraction system.
Word Sense Clustering
Haljuk, Petr ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This Bachelor's thesis deals with the semantic similarity of words . It describes the design and the implementation of a system, which searches for the most similar words and measures the semantic similarity of words . The system uses the Word2Vec model from GenSim library . It learns the relations among words from CommonCrawl corpus .
Automatic Link Detection in Parts of Audiovisual Documents
Sychra, Marek ; Černocký, Jan (referee) ; Szőke, Igor (advisor)
This paper deals with topic detection. Specifically link detection - finding similarities amongst a group of short documents according to their topic and story segmentation - finding borders between two topically different parts in a large document. The main motivation for research was practical application with the use of presentation materials from lectures at FIT (linking parts of different lectures and courses). The solution of link detection is achieved by text and word analysis, which includes learning the meaning and importance of each word. Story segmentation uses this while searching for the boundaries. Both parts of the problem (link detection, story segmentation) gave great results while testing with a standard dataset (world news reports). During evaluation of lecture processing the success rate was lower, but still good.
Query Answering over Wikipedia for Mobile Devices on the Android Platform
Kováč, Andrej ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
p { margin-bottom: 0.1in; direction: ltr; line-height: 120%; text-align: left; widows: 2; orphans: 2; }p.western { font-family: "Times New Roman",serif; }p.cjk { font-family: "Times New Roman"; }p.ctl { font-family: "Times New Roman"; font-size: 12pt; }a:link { color: rgb(0, 0, 255); } This bachelor thesis deals with the development of a system for query answering over Wikipedia for mobile devices running Android operating system. In this technical report theoretical knowledge related to this topic is described as well as the implementation process of a server system and client side application. Part of this thesis is dedicated to testing of the system and in the final part the potential for future development is drafted.
Dialogue System for Human-Robot Communication
Birger, Mark ; Materna, Zdeněk (referee) ; Smrž, Pavel (advisor)
In this thesis a problematic of spoken dialog systems was discovered. The dialog system framework was developed for a fast implementation of spoken dialog interfaces for existing robotics software. This framework allows describing a dialog flow in special markup format, which allows scope variables manipulating and controlling a flow of general-purpose programming language software by user input phrase. Markup language is designed for asynchronous function execution and subsequent manipulations with them. It allows robot to solve tasks simultaneously. Developed framework uses Link Grammar Parser for natural language processing. With this framework was implemented a dialog system instance for PR2 robot control.
Word Sense Clustering
Jadrníček, Zbyněk ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This thesis is focused on the problem of semantic similarity of words in English language. At first reader is informed about theory of word sense clustering, then there are described chosen methods and tools related to the topic. In the practical part we design and implement system for determining semantic similarity using Word2Vec tool, particularly we focus on biomedical texts of MEDLINE database. At the end of the thesis we discuss reached results and give some ideas to improve the system.
Extraction of unspecified relations from the web
Ovečka, Marek ; Svátek, Vojtěch (advisor) ; Labský, Martin (referee)
The subject of this thesis is non-specific knowledge extraction from the web. In recent years, tools that improve the results of this type of knowledge extraction were created. The aim of this thesis is to become familiar with these tools, test and propose the use of results. In this thesis these tools are described and compared and extraction is carried out using OLLIE. Based on the results of the extractions, two methods of enriching extractions using name entity recognition, are proposed. The first method proposes to modify the weights of extractions and second proposes the enrichment of extractions by named entities. The paper proposed ontology, which allows to capture the structure of enriched extractions. In the last part practical experiment is carried out, in which the proposed methods are demonstrated. Future research in this field would be useful in areas of extraction and categorization of relational phrases.
Eight ICT trends that will change libraries
Černý, Michal
Informační společnost i rychle se rozvíjející ICT proměňují vše kolem nás – od vzdělávání, přes dopravu až třeba právě po knihovny. Příspěvek představí osm technologií, které do deseti let začnou měnit knihovny téměř k nepoznání: Internet věcí; big data; veřejné multimediální displeje; firemní sociální sítě; cloud; nové mobilní sítě; zpracování přirozeného jazyka či sémantické technologie. Co bude tato změna znamenat pro knihovny? Jak se změní jejich postavení v informační společnosti?
Video: Download fulltextMP4
Klasifikace entit pomocí Wikipedie a WordNetu
Kliegr, Tomáš ; Rauch, Jan (advisor) ; Berka, Petr (referee) ; Smrž, Pavel (referee) ; Žabokrtský, Zdeněk (referee)
This dissertation addresses the problem of classification of entities in text represented by noun phrases. The goal of this thesis is to develop a method for automated classification of entities appearing in datasets consisting of short textual fragments. The emphasis is on unsupervised and semi-supervised methods that will allow for fine-grained character of the assigned classes and require no labeled instances for training. The set of target classes is either user-defined or determined automatically. Our initial attempt to address the entity classification problem is called Semantic Concept Mapping (SCM) algorithm. SCM maps the noun phrases representing the entities as well as the target classes to WordNet. Graph-based WordNet similarity measures are used to assign the closest class to the noun phrase. If a noun phrase does not match any WordNet concept, a Targeted Hypernym Discovery (THD) algorithm is executed. The THD algorithm extracts a hypernym from a Wikipedia article defining the noun phrase using lexico-syntactic patterns. This hypernym is then used to map the noun phrase to a WordNet synset, but it can also be perceived as the classification result by itself, resulting in an unsupervised classification system. SCM and THD algorithms were designed for English. While adaptation of these algorithms for other languages is conceivable, we decided to develop the Bag of Articles (BOA) algorithm, which is language agnostic as it is based on the statistical Rocchio classifier. Since this algorithm utilizes Wikipedia as a source of data for classification, it does not require any labeled training instances. WordNet is used in a novel way to compute term weights. It is also used as a positive term list and for lemmatization. A disambiguation algorithm utilizing global context is also proposed. We consider the BOA algorithm to be the main contribution of this dissertation. Experimental evaluation of the proposed algorithms is performed on the WordSim353 dataset, which is used for evaluation in the Word Similarity Computation (WSC) task, and on the Czech Traveler dataset, the latter being specifically designed for the purpose of our research. BOA performance on WordSim353 achieves Spearman correlation of 0.72 with human judgment, which is close to the 0.75 correlation for the ESA algorithm, to the author's knowledge the best performing algorithm for this gold-standard dataset, which does not require training data. The advantage of BOA over ESA is that it has smaller requirements on preprocessing of the Wikipedia data. While SCM underperforms on the WordSim353 dataset, it overtakes BOA on the Czech Traveler dataset, which was designed specifically for our entity classification problem. This discrepancy requires further investigation. In a standalone evaluation of THD on Czech Traveler dataset the algorithm returned a correct hypernym for 62% of entities.

National Repository of Grey Literature : 224 records found   beginprevious213 - 222next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.