National Repository of Grey Literature 4 records found  Search took 0.01 seconds. 
Centralization and maintenance of distributed information
Valčák, Richard ; Jelínek, Mojmír (referee) ; Morávek, Patrik (advisor)
The master’s thesis deals with the web mining, information sources, unattended access methods to these sources, summary of available methods and tools. Web data mining is a very useful tool for required information acquiring, which is used for further processing. The work is focused on the proposal of a system, which is created to gather required information from given sources. The master’s thesis consists of three parts, which employ the developed library: API, which is used by programmers, server application for gathering information in time (such an exchange rate for instance) and example of AWT application, which serves for the processing of tables available on the internet.
Machine Learning Methods for Web Documents
Katrňák, Josef ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
This work aims to use machine learning techniques for the classification of specific parts of web page content. First, current methods for representing and classifying web page content using machine learning methods are described. For web page representation, the thesis focuses on the experimental tool FitLayout, whose visual representation of web pages serves as input for further processing and subsequent training of machine learning models. The work results in trained models that classify specific parts of the web page content. The model architecture is based on graph neural networks. For the experiments, a dataset of publicly available websites containing pages of products sold online is used. The advantage of the proposed and implemented approach is information extraction independent of the structure and language of a web page.
Centralization and maintenance of distributed information
Valčák, Richard ; Jelínek, Mojmír (referee) ; Morávek, Patrik (advisor)
The master’s thesis deals with the web mining, information sources, unattended access methods to these sources, summary of available methods and tools. Web data mining is a very useful tool for required information acquiring, which is used for further processing. The work is focused on the proposal of a system, which is created to gather required information from given sources. The master’s thesis consists of three parts, which employ the developed library: API, which is used by programmers, server application for gathering information in time (such an exchange rate for instance) and example of AWT application, which serves for the processing of tables available on the internet.
Extracting Structured Data from Czech Web Using Extraction Ontologies
Pouzar, Aleš ; Svátek, Vojtěch (advisor) ; Labský, Martin (referee)
The presented thesis deals with the task of automatic information extraction from HTML documents for two selected domains. Laptop offers are extracted from e-shops and free-published job offerings are extracted from company sites. The extraction process outputs structured data of high granularity grouped into data records, in which corresponding semantic label is assigned to each data item. The task was performed using the extraction system Ex, which combines two approaches: manually written rules and supervised machine learning algorithms. Due to the expert knowledge in the form of extraction rules the lack of training data could be overcome. The rules are independent of the specific formatting structure so that one extraction model could be used for heterogeneous set of documents. The achieved success of the extraction process in the case of laptop offers showed that extraction ontology describing one or a few product types could be combined with wrapper induction methods to automatically extract all product type offers on a web scale with minimum human effort.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.