National Repository of Grey Literature 76 records found  beginprevious56 - 65nextend  jump to record: Search took 0.00 seconds. 
Information Extraction from Loosely Structured Text
Minárik, Matej ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
Nowadays we are speaking about Web 2.0, which means the web of documents rather than the web of data. Documents are mostly unstructured, or just partially structured, but search engines need data in structured form in order to provide better search results. The process of extracting structured data from partially structured documents is the main goal of this work. In this work we are analyzing information extraction methods, namely classification methods, which need annotated training data, in order to create their inner model. We also analyze methods, which do not need training. These methods are initialized with a few data examples we are interested in extracting. We propose an extraction method in order to extract therapeutic indications and active substances from medical information sheets.
Extracting text data from the webpages
Troják, David ; Morský, Ondřej (referee) ; Červenec, Radek (advisor)
This work deals with text mining from web pages, an overview of available programs and its methods of text extraction. Part of this work is the program created in Java language, which allows text to obtain data from specific web pages and save them into XML file.
Metadata Extraction from Scientific Papers
Lokaj, Tomáš ; Dytrych, Jaroslav (referee) ; Otrusina, Lubomír (advisor)
This work deals with the Metadata Extraction from Scienti c Papers. There is generally described issue of information extraction, focusing on the processing of text documents. There is also presented programme clanky2meta.py designed to search for relevant  information in scienti c publication, created by the author. At the end of this work is a comparsion of systems dealing with the same issue, especially with the CiteSeerX system.
Support of Information Extraction from Structured Text
Kliment, Radek ; Petřík, Patrik (referee) ; Křivka, Zbyněk (advisor)
This Bachelor thesis deals with the way of information extraction from a structured text. The application converts the text from supported formats into the XML representation that is used for queries and then, corresponding output is created. In this thesis, particular formats of input files are described including the way of their conversion into the XML. The essential part explains the application functionality and implementation including short user manual.
Extraction of Relations among Named Entities Mentioned in Text
Voháňka, Ondřej ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This bachelor's thesis deals with relation extraction. Explains basic knowledge, that is necessary for creating an extraction system. Then describes design, implementation and comparison of three systems, which works differently. Following methods were used: regular expressions, NER, parser. 
Information Extraction from Wikipedia
Krištof, Tomáš ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This bachelor's thesis describes the issue of information extraction from unstructured text. The first part contains summary of basic techniques used for information extracting. Thereafter, concept and realization of the system for information extraction from Wikipedia is described. In the last part of thesis, results, coming from experiments, are analysed.
Encyclopedia Expert
Krč, Martin ; Schmidt, Marek (referee) ; Smrž, Pavel (advisor)
This project focuses on a system that answers questions formulated in natural language. Firstly, the report discusses problems associated with question answering systems and some commonly employed approaches. Emphasis is laid on shallow methods, which do not require many linguistic resources. The second part describes our work on a system that answers factoid questions, utilizing Czech Wikipedia as a source of information. Answer extraction is partly based on specific features of Wikipedia and partly on pre-defined patterns. Results show that for answering simple questions, the system provides significant improvements in comparison with a standard search engine.
Framework for Information Exctration from WWW
Brychta, Filip ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
Web environment has developed into the largest source of electronic documents, so it would be very useful, to process this information automatically. This is however not a trivial problem. Most documents are written in HTML (Hypertext Markup Language), which does not support semantic description of the content. The goal of this work is to create modular system for information extraction and further processing of this information from HTML documents. Further processing of information means to store this information in XML document or relational database. System modularity makes it possible to use various information extraction and storing methods, thus the system can be used for various tasks.
Methods for Information Extraction in Text Documents
Sychra, Tomáš ; Burget, Radek (referee) ; Bartík, Vladimír (advisor)
Knowledge discovery in text documents is part of data mining. However, text documents have different properties in comparison to regular databases. This project contains an overview of methods for knowledge discovery in text documents. The most frequently used task in this area is document classification. Various approaches for text classification will be described. Finally, I will present algorithm Winnow that should perform better than any other algorithm for classification. There is a description of Winnow implementation and an overview of experimental results.
Data Extraction from Product Descriptions
Sláma, Vojtěch ; Očenášek, Pavel (referee) ; Burget, Radek (advisor)
This work concentrates on the design and implementation of an automated support for data extraction from product descriptions. This system will be used for e-shop purposes. The work introduces present approaches to information extraction from HTML documents. It focuses chiefly at wrappers and methods for their induction. The visual approach to information extraction is also mentioned. System requirements and basic principles are described in the design part of the work. Next, a detailed description of a path tracing algorithm in document object model is explained. The last section of the work evaluates the results of experiments made with the implemented system.

National Repository of Grey Literature : 76 records found   beginprevious56 - 65nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.