National Repository of Grey Literature 59 records found  1 - 10nextend  jump to record: Search took 0.01 seconds. 
Distributed Tool for Extraction of Information from Network Flows
Sedlák, Michal ; Grégr, Matěj (referee) ; Žádník, Martin (advisor)
This work deals with the extraction of information from flow records that are the result of network monitoring by the IPFIX system. The goal of the work is to design a tool that allows querying stored network flows created by the open-source collector IPFIXcol2. Querying is performed with the highest possible efficiency and performance in mind, which is achieved by using appropriate data structures and thread-level parallelization, as well as by using multiple machines.
Extracting text data from the webpages
Troják, David ; Morský, Ondřej (referee) ; Červenec, Radek (advisor)
This work deals with text mining from web pages, an overview of available programs and its methods of text extraction. Part of this work is the program created in Java language, which allows text to obtain data from specific web pages and save them into XML file.
Extraction of Relations among Named Entities Mentioned in Text
Voháňka, Ondřej ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This bachelor's thesis deals with relation extraction. Explains basic knowledge, that is necessary for creating an extraction system. Then describes design, implementation and comparison of three systems, which works differently. Following methods were used: regular expressions, NER, parser. 
Framework for Information Exctration from WWW
Brychta, Filip ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
Web environment has developed into the largest source of electronic documents, so it would be very useful, to process this information automatically. This is however not a trivial problem. Most documents are written in HTML (Hypertext Markup Language), which does not support semantic description of the content. The goal of this work is to create modular system for information extraction and further processing of this information from HTML documents. Further processing of information means to store this information in XML document or relational database. System modularity makes it possible to use various information extraction and storing methods, thus the system can be used for various tasks.
Information Extraction from Wikipedia
Krištof, Tomáš ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This bachelor's thesis describes the issue of information extraction from unstructured text. The first part contains summary of basic techniques used for information extracting. Thereafter, concept and realization of the system for information extraction from Wikipedia is described. In the last part of thesis, results, coming from experiments, are analysed.
Support of Information Extraction from Structured Text
Kliment, Radek ; Petřík, Patrik (referee) ; Křivka, Zbyněk (advisor)
This Bachelor thesis deals with the way of information extraction from a structured text. The application converts the text from supported formats into the XML representation that is used for queries and then, corresponding output is created. In this thesis, particular formats of input files are described including the way of their conversion into the XML. The essential part explains the application functionality and implementation including short user manual.
Data Extraction from Product Descriptions
Sláma, Vojtěch ; Očenášek, Pavel (referee) ; Burget, Radek (advisor)
This work concentrates on the design and implementation of an automated support for data extraction from product descriptions. This system will be used for e-shop purposes. The work introduces present approaches to information extraction from HTML documents. It focuses chiefly at wrappers and methods for their induction. The visual approach to information extraction is also mentioned. System requirements and basic principles are described in the design part of the work. Next, a detailed description of a path tracing algorithm in document object model is explained. The last section of the work evaluates the results of experiments made with the implemented system.
Information Extraction from Biomedical Texts
Knoth, Petr ; Burget, Radek (referee) ; Smrž, Pavel (advisor)
Recently, there has been much effort in making biomedical knowledge, typically stored in scientific articles, more accessible and interoperable. As a matter of fact, the unstructured nature of such texts makes it difficult to apply  knowledge discovery and inference techniques. Annotating information units with semantic information in these texts is the first step to make the knowledge machine-analyzable.  In this work, we first study methods for automatic information extraction from natural language text. Then we discuss the main benefits and disadvantages of the state-of-art information extraction systems and, as a result of this, we adopt a machine learning approach to automatically learn extraction patterns in our experiments. Unfortunately, machine learning techniques often require a huge amount of training data, which can be sometimes laborious to gather. In order to face up to this tedious problem, we investigate the concept of weakly supervised or bootstrapping techniques. Finally, we show in our experiments that our machine learning methods performed reasonably well and significantly better than the baseline. Moreover, in the weakly supervised learning task we were able to substantially bring down the amount of labeled data needed for training of the extraction system.
System for Web Data Source Integration
Kolečkář, David ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
The thesis aims at designing and implementing a web application that will be used for the integration of web data sources. For data integration, a method using domain model of the target information system was applied. The work describes individual methods used for extracting information from web pages. The text describes the process of designing the architecture of the system including a description of the chosen technologies and tools. The main part of the work is implementation and testing the final web application that is written in Java and Angular framework. The outcome of the work is a web application that will allow its users to define web data sources and save data in the target database.
Visual Pattern Detection in Web Pages
Kotraš, Martin ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
The work solves the extraction of information from websites using the technique of searching for visual patterns - spatial relations between areas on the website and the same visual styles of these areas - with the extension of new techniques to improve results. It uses a user-specified ontological data model, which describes which data items will be extracted from the specified web page and how the individual items on the page look, mainly from a text point of view. As part of the work, a console application VizGet in Java was created using the FitLayout framework to obtain a visual model of the website. Testing the application on 7 different domains, including a list of the best movies, e-shop products, or weather forecasts, showed that the success rate of the application ranges in about 75 % of subtests above 85 % F-score and in more than 90 % of subtests above 60 % F-score, where 45 % of subtests achieve an F-score of 100 %. The VizGet application can thus be deployed for practical use in non-critical applications, while it is open to further extensions and possibilities for improvement.

National Repository of Grey Literature : 59 records found   1 - 10nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.