National Repository of Grey Literature 77 records found  beginprevious46 - 55nextend  jump to record: Search took 0.00 seconds. 
Identifying Entity Types and Attributes Across Languages
Švub, Daniel ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
The target of this thesis is to analyze articles on the Wikipedia internet encyclopedia and to convert their text written in natural language into a structured database of persons, places and other entities. The essence of the implemented program is the determination of the type of entity based on its typical characteristics, and the extraction of the most important attributes of this entity in the Czech and Slovak languages. The result of this task is a knowledge base allowing simple searching and sorting of information. Thanks to its easy extensibility, it is possible to add identification of other types of entities and other features to the program, as well as a support of other languages.
Administration Interface of an Information Extraction System
Gongol, Jakub ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
This thesis covers the subject of information extraction from Web. The main objective is the design and implementation of an administration interface for an extraction system implemented as web application on Java platform. The application provides an editor for extraction tasks specification in the form of interactive graphs. It includes the possibility to upload and process an ontology from a file and generate graph according to selected ontology properties. The solution ensures integration with the FITLayout tool.
Extracting Information from Medical Texts
Zvára, Karel ; Svátek, Vojtěch (advisor) ; Veselý, Arnošt (referee) ; Skalská, Hana (referee)
The aim of my work was to find out the specific features of Czech medical reports in terms of the possibility of extracting specific information from them. For my work, I had a total of 268 anonymized narrative medical reports from two outpatient departments. I have studied standards for preserving electronic health records and for transferring clinical information between healthcare information systems. I have also participated in the process of implementing electronic medical record in the field of dentistry. First of all, I tried to process narrative medical reports using natural language processing (NLP) tools. I came to the conclusion that narrative medical reports in the Czech language are very different than a typical Czech text, especially because it mostly contains short telegraphic phrases and the texts lack typical Czech sentence structure. It also contains many misspellings, acronyms and abbreviations. Another problem was the absence of existence of the Czech translation of the main international classification systems. Therefore I decided to continue the research by developing the method for pro-processing the input text for translation and its semantic annotation. The main objective of this part of the research was to propose a method and support software for interactive correction...
Web Page Segmentation Algorithms Based on Clustering
Lengál, Tomáš ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
This report deals with segmentation of web pages, which is important discipline of information extraction. In the first part, we describe several general ways to implement it. After that we introduce method Box Clustering Segmentation, which comes with a slightly different approach towards segmentation. In the second half, we describe implementation of this method as a part of framework FITLayout and final testing.
Extrakce znalostních grafů z projektové dokumentace
Helešic, Tomáš ; Nečaský, Martin (advisor) ; Kopecký, Michal (referee)
Title: Knowledge Graph Extraction from Project Documentation Author: Bc. Tomáš Helešic Department: Department of Software Engineering Supervisor: Mgr. Martin Nečaský, Ph.D. Abstract: The goal of this thesis is to explore the possibilities of automatic in- formation extraction from company project documentation with the use of ma- chine natural language processing and the analysis of the precision of linguistic processing of these documents. Furthermore suggest methods how acquire key terms and dependencies between them. From this terms and dependencies cre- ate knowledge graphs, that are stored in an appropriate database with search engine. The work is trying to interconnect already existing technologies in a shape of a simple application and test their readiness for a practical use. The goal is to inspire future research in this field, identify critical parts and propose improvements. The main gain is in the interconnection between natural lan- guage processing, methods of information extraction and semantic searching in corporate documents. The gain of the practical part reside in the way how to identify key information that is uniquely describing each document and its use in search. Keywords: Knowledge graphs, Information extraction, Natural language pro- cessing, Resource Description Framework 1
Extrakce znalostních grafů z projektové dokumentace
Helešic, Tomáš ; Nečaský, Martin (advisor) ; Kruliš, Martin (referee)
Title: Knowledge Graph Extraction from Project Documentation Author: Bc. Tomáš Helešic Department: Department of Software Engineering Supervisor: Mgr. Martin Nečaský, Ph.D. Abstract: With the new research progress in the natural language processing and information extraction from text, new possibility of automatic knowledge acqui- sition and its grouping into Knowledge graphs, that are catching the semantic relations between these entities is emerging. For these Knowledge graphs, data storages and also query languages already exists, which allow more precise and relevant search possibilities compare with current full text search engines. The goal of this thesis is to explore the opportunity of automatic extraction of infor- mation from project documentation with the use of linguistic text processing, design a proper data storage and build a search engine over it. Keywords: Knowledge grahs, Information extraction, Natural language process- ing, Resource Description Framework 1
Processing of Czech court decisions
Maslowski, Bohdan ; Vidová Hladká, Barbora (advisor) ; Nečaský, Martin (referee)
Title: Processing of Czech court decisions Author: Bohdan Maslowski Department: Institute of Formal and Applied Linguistics Supervisor: Mgr. Barbora Vidová Hladká, Ph.D. Abstract: The objective of this thesis is a comparison of various language processing methods of Czech case-law documents. In particular, the tasks of extraction of information about parties (names, roles, addresses, etc.) and document classification by two criteria, subject and result have been solved. Machine learning methods are evaluated and compared to rule-based approach. For the purpose of training and evaluation of classifiers, a corpus of 400 Czech case-law documents has been created and manually annotated. The thesis includes a web application used for demonstration of the results of different approaches and a tool for running and evaluation of testing scenarios. Keywords: natural language processing, information extraction, legislative domain, machine learning, rule-based systems
Methodology and problems of data transformation and determine its importance in the integration of heterogeneous information sources
Bartoš, Ivan ; Papík, Richard (advisor) ; Dvořák, Jan (referee) ; Bureš, Miroslav (referee)
Methodology and issues of data transformation and its information value estimation during the integration of the heterogenous information sources PhDr. Ivan BARTOŠ Abstract This study focuses mainly on the data and information transformation issue. This topic is currently critical in several scientific and commercial areas. Information value, information quality and the quality of the source data differs between the various systems. This is not only due to the different topologies of the information sources but also because of its different understanding and a manner of storing the information describing the entity of the enterprise. Such information systems, respectively database systems in the scope of the thesis, could perform well as the stand alone systems. The issue appears in the moment when such heterogeneous systems are required to be integrated and the information shall be migrated between each other. The thesis is logically divided into four major parts based on these issues. The first part describes the methods that can be used to classify the data quality of the source system (the one to be integrated) from which the information can be extracted. Based on assumption of the common lack of project and system documentation hereby introduced methods can be used for such qualification even when the...
Semantic annotations
Dědek, Jan ; Vojtáš, Peter (advisor) ; Maynard, Diana (referee) ; Železný, Filip (referee)
Four relatively separate topics are presented in the thesis. Each topic represents one particular aspect of the Information Extraction discipline. The first two topics are focused on our information extraction methods based on deep language parsing. The first topic relates to how deep language parsing was used in our extraction method in combination with manually designed extraction rules. The second topic deals with a method for automated induction of extraction rules using Inductive Logic Programming. The third topic of the thesis combines information extraction with rule based reasoning. The core of our extraction method was experimentally reimplemented using semantic web technologies, which allows saving the extraction rules in so called shareable extraction ontologies that are not dependent on the original extraction tool. The last topic of the thesis deals with document classification and fuzzy logic. We are investigating the possibility of using information obtained by information extraction techniques to document classification. Our implementation of so called Fuzzy ILP Classifier was experimentally used for the purpose of document classification.
Consistency Checking of Relations Extracted from Text
Stejskal, Jakub ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This bachelor thesis is dedicated to mechanical techniques that are used in the natural language processing and information extraction from particular text. It is approaching the general methods that starting to process the raw text and it continues to the relations extraction from processed language constructs, moreover it provides options for the use of obtained relational data which can be seen for example in the project DBpedia. Another milestone of the described bachelor thesis is the design and implementation of an automated system for extracting information about entities, which do not have their own article on the English version of Wikipedia. Thesis also presents algorithms developed for the extraction of entities with their own name, the verification of the articles ‘existence of the extracted entities and finally the actual extraction of information about individual entities, which can be used during the information consistency checking. In the end, it can be seen the results and suggestions for further development of the created system.

National Repository of Grey Literature : 77 records found   beginprevious46 - 55nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.