National Repository of Grey Literature 65 records found  1 - 10nextend  jump to record: Search took 0.00 seconds. 
Information Extraction from Wikipedia
Valušek, Ondřej ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This thesis deals with automatic type extraction in English Wikipedia articles and their attributes. Several approaches with the use of machine learning will be presented. Furthermore, important features like date of birth in articles regarding people, or area in those about lakes, and many more, will be extracted. With the use of the system presented in this thesis, one can generate a well structured knowledge base, using a file with Wikipedia articles (called dump file) and a small training set containing a few well-classed articles. Such knowledge base can then be used for semantic enrichment of text. During this process a file with so called definition words is generated. Definition words are features extracted by natural text analysis, which could be used also in other ways than in this thesis. There is also a component that can determine, which articles were added, deleted or modified in between the creation of two different knowledge bases.
Identifying Entity Types and Attributes Across Languages
Švub, Daniel ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
The target of this thesis is to analyze articles on the Wikipedia internet encyclopedia and to convert their text written in natural language into a structured database of persons, places and other entities. The essence of the implemented program is the determination of the type of entity based on its typical characteristics, and the extraction of the most important attributes of this entity in the Czech and Slovak languages. The result of this task is a knowledge base allowing simple searching and sorting of information. Thanks to its easy extensibility, it is possible to add identification of other types of entities and other features to the program, as well as a support of other languages.
Keyword Suggestion in the Central Portal of Czech Libraries
Balaga, Róbert ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This thesis deals with various methods of keyphrase extraction from documents, specifically focused on documents from the Central Portal of Czech Libraries. Various methods from statistical, linguistic and graph-based methods have been implemented. Also a new method was suggested, that combines the statistical and linguistic approach. Individual methods have been tested and analyzed according to the standard evaluation metrics, with the suggested method achieving recall of 30 percent.
Analysis of Social Media Content Discussing Czech Mobile Operators
Pavlů, Jan ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
The main topic of this thesis is sentiment analysis of posts obtained from a social networks. The posts are about czech mobile network operators. The essential part of implemented system is also data visualization. The sentiment analysis is done using machine learning techniques. Downloaded posts are cleaned, lemmatized and transformed to feature vectors. Stochastic Gradient Descent algorithm is used for classification. Analyzed data are visualized in charts and as the list of posts. The system provides tools for text categorization. The accuracy, precision, recall and F1 score of sentiment analysis is about 75%. The accuracy of post categorization is high (about 80%), but precision, recall and F1 score are low (about 30%). This is the reason why post categorization isn't automatically done. The benefit of the system it that it automatically collects data from different sources, analysis them and displays them. It also provides tools for manual change of sentiment/categories which can lead to better system characteristics with some help of users.
Identifying Entity Types Based on Information Extraction from Wikipedia
Rusiňák, Petr ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This paper presents a system for identifying entity types of articles on Wikipedia (e.g. people or sports events) that can be used for identifaction of any arbitrary entity. The~input files for this system are a list of several pages that belong to this entity and a list of several pages that do not belong to this entity. These lists will be used to generate features that can be used for generation of the list of all pages belonging to this entity. The fatures can be based on both structured information on Wikipedia such as templates and categories and non-structured informations found by the analysis of natural text in the first sentence of the article where a defining noun that represents what the article is about will be found. This system support pages written in Czech and English and can be extended to support other languages.
Semantic Enrichment Component
Doležal, Jan ; Otrusina, Lubomír (referee) ; Dytrych, Jaroslav (advisor)
This master's thesis describes Semantic Enrichment Component (SEC), that searches entities (e.g., persons or places) in the input text document and returns information about them. The goals of this component are to create a single interface for named entity recognition tools, to enable parallel document processing, to save memory while using the knowledge base, and to speed up access to its content. To achieve these goals, the output of the named entity recognition tools in the text was specified, the tool for storing the preprocessed knowledge base into the shared memory was implemented, and the client-server scheme was used to create the component.
Interfaces for Faceted Search in Indexed Wikipedia
Cilip, Peter ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
Main aim of this thesis is to study existing systems of faceted search and to design own system based on faceted search in the index of Wikipedia. In this thesis we can meet with existing solutions of faceted search. From mistakes and failures of existing solutions was designed our own system, that is output of this thesis. Designed system is described in way of design and implementation. Product of thesis is application and graphical interface. Application interface can be integrated into existing informational system, where it can be used as multidimensional filter. Graphical interface provides option how can application interface be used in real system. System was created focusing on usefullness and simplicity, for using in existing information systems.
Annotation Editor for Semantic Analysis of Text
Šťastná, Barbora ; Otrusina, Lubomír (referee) ; Dytrych, Jaroslav (advisor)
The thesis introduces the most crucial terms related with text annotation, and shows some electronic tools for annotation of electronic documents. It describes an annotation editor which provides the user with a graphic interface and allows for annotation of web documents. It also proposes modifications to the editor which would make it more intuitive, efficient, and user-friendly. The thesis follows with the description of implementation of the said modifications, and their testing.
Word Sense Clustering
Hošták, Viliam Samuel ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This thesis deals with semantic similarity of words. It describes and compares existing models that are currently used for this purpose. It discusses the design and implementation of the system for corpus preprocessing, semantic modelling and retrieval of semantically related words. The system that has been created supports the use of distributional semantic models Word2vec, FastText and Glove.
Comparison of Annotation Tools
Prexta, Dávid ; Otrusina, Lubomír (referee) ; Dytrych, Jaroslav (advisor)
This work deals with the comparison of annotation tools when working with various data sets, and obtaining the results of comparisons useful for improving the knowledge base of the annotators. The thesis analyzes the existing solutions and their drawbacks, from which the proposals of the new solution are deduced. The other sections deals with the design, implementation and testing of the resulting tool, which is evaluated at the conclusion, and possible future extensions are suggested.

National Repository of Grey Literature : 65 records found   1 - 10nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.