National Repository of Grey Literature 13 records found  1 - 10next  jump to record: Search took 0.00 seconds. 
Web Interface of a Document Analysis Tool
Ševčík, Adam ; Hynek, Jiří (referee) ; Burget, Radek (advisor)
This thesis deals with the development of a web interface of a web document analysis tool. Goal of this interface is to replace old desktop application FitLayout. Own front-end is added to FitLayout back-end. This application is made by Javascript, HTML and visual part is done by CSS and framework Bootstrap. Web documents are in format RDF serialized in JSON-LD. Data are exchanged in the same format between client and server. Result of this thesis enables user to analyse documents in browser same way as in desktop application.
Interface for a Document Analysis System with a JavaScript Client
Marcelyová, Andrea ; Rychlý, Marek (referee) ; Burget, Radek (advisor)
This thesis deals with the study of existing frameworks for creating client-side applications, the Java platform, FITLayout software and software architecture design with a web interface, implementation of a web application, testing and comparing the functionality of desktop software with web application. Existing frameworks for the development of client-side applications in JavaScript programming language are described as well as the Java platform focused on creating a web application server component, FITLayout software for segmentation and analysis of documents. Software architecture design providing similar functionality such as existing graphical tools available in FITLayout and important aspects of implementation follows. At the end of the thesis testing of web application is described as well as comparison of desktop FITLayout software and web application that was implemented.
Visual Pattern Detection in Web Pages
Kotraš, Martin ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
The work solves the extraction of information from websites using the technique of searching for visual patterns - spatial relations between areas on the website and the same visual styles of these areas - with the extension of new techniques to improve results. It uses a user-specified ontological data model, which describes which data items will be extracted from the specified web page and how the individual items on the page look, mainly from a text point of view. As part of the work, a console application VizGet in Java was created using the FitLayout framework to obtain a visual model of the website. Testing the application on 7 different domains, including a list of the best movies, e-shop products, or weather forecasts, showed that the success rate of the application ranges in about 75 % of subtests above 85 % F-score and in more than 90 % of subtests above 60 % F-score, where 45 % of subtests achieve an F-score of 100 %. The VizGet application can thus be deployed for practical use in non-critical applications, while it is open to further extensions and possibilities for improvement.
Web Document Annotator
Hrúz, Tomáš ; Hynek, Jiří (referee) ; Burget, Radek (advisor)
The goal of this bachelor thesis is to compare of tools for annotation of web documents and create module for experimental tool FitLayout, which implements required functionality for annotation. Solution was developed with the progressive framework Vue.js and with knowledge obtained by comparison of existing annotation tools. The output of the work is new stand-alone component and other component with added functionality in FitLayout application. The result of the work is the web application module for creating new areas in documents and adjusted module for managing comments and tags. 
Layout-based Data Extraction from Documents
Sedláček, Martin ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
This thesis deals with automated data extraction from medical reports in PDF format based on document layout analysis. The main content of the thesis is an introduction to data extraction, a comparison of existing tools and a presentation of the design and requirements of the developed tool, which will be based on the FitLayout application framework. The thesis then describes the actual implementation of the tool in Java and comments on the results achieved by the tool on real data.
Machine Learning Methods for Web Documents
Katrňák, Josef ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
This work aims to use machine learning techniques for the classification of specific parts of web page content. First, current methods for representing and classifying web page content using machine learning methods are described. For web page representation, the thesis focuses on the experimental tool FitLayout, whose visual representation of web pages serves as input for further processing and subsequent training of machine learning models. The work results in trained models that classify specific parts of the web page content. The model architecture is based on graph neural networks. For the experiments, a dataset of publicly available websites containing pages of products sold online is used. The advantage of the proposed and implemented approach is information extraction independent of the structure and language of a web page.
Vision-based Web Page Segmentation
Maštera, František ; Hynek, Jiří (referee) ; Burget, Radek (advisor)
The FitLayout library offers a suite of implemented web page segmentation algorithms along with a number of tools for their evaluation and further development. The goal of this thesis is to extend this suite by another of already existing algorithms. To meet this goal, the Cormier et al. algorithm was chosen and integrated into the FitLayout. The plausibility of its implementation against its publication has been duly verified. Its extensive evaluation was also carried out to determine its properties and behaviour under different circumstances, which revealed algorithm settings that improve the quality of its outputs on the tested data sample by up to 9.89 %. As a result of this thesis, the FitLayout library has been extended with a new web page segmentation algorithm, which can be used in further research in this area that can be supported with the results found in this thesis.
Web Page Segmentation Methods
Grnáč, Martin ; Rychlý, Marek (referee) ; Burget, Radek (advisor)
This thesis focuses on segmentation methods. It discusses them at a theoretical level, describes their properties, advantages, and disadvantages. Within the scope of this work, the segmentation method Block-o-Matic was ultimately chosen and implemented within the FitLayout framework. After the implementation, it was evaluated and compared to its reference implementation.
Web Page Segmentation Methods
Grnáč, Martin ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
The aim of this work is to investigate segmentation algorithms and to select an appropriate variant which will be implemented in the FitLayout system. Then, after implementation  to compare this variant of the segmentation algorithm with the reference segmentation algorithm,  which is already implemented in the FitLayout system.  At the beginning, thesis deals with the introduction to the problem of segmentation and description of FitLayout system. In the next part, the segmentation algorithms that were suitable candidates for integration into the FitLayout system are described and compared. The practical part of the thesis includes a description of the implementation and integration of the chosen algorithm and also  comparison of the two algorithms in segmenting different web pages.
Visual Pattern Detection in Web Pages
Kotraš, Martin ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
The work solves the extraction of information from websites using the technique of searching for visual patterns - spatial relations between areas on the website and the same visual styles of these areas - with the extension of new techniques to improve results. It uses a user-specified ontological data model, which describes which data items will be extracted from the specified web page and how the individual items on the page look, mainly from a text point of view. As part of the work, a console application VizGet in Java was created using the FitLayout framework to obtain a visual model of the website. Testing the application on 7 different domains, including a list of the best movies, e-shop products, or weather forecasts, showed that the success rate of the application ranges in about 75 % of subtests above 85 % F-score and in more than 90 % of subtests above 60 % F-score, where 45 % of subtests achieve an F-score of 100 %. The VizGet application can thus be deployed for practical use in non-critical applications, while it is open to further extensions and possibilities for improvement.

National Repository of Grey Literature : 13 records found   1 - 10next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.