National Repository of Grey Literature 5 records found  Search took 0.01 seconds. 
Information Extraction from Web Pages
Bukovčák, Jakub ; Rychlý, Marek (referee) ; Burget, Radek (advisor)
This master thesis is focused on current technologies that are used for downloading web pages and extraction of structured information from them. The paper describes available tools to make this process possible and easier. Another part of this document provides the overview of technologies that can be used for creating web pages. Also, there is an information about development of information systems with web user interface based on Java Enterprise Edition (Java EE) platform. The main part of this master thesis describes design and implementation of application used to specify and manage extraction tasks. The last part of this project describes application testing on real web pages and evaluation of achieved results.
Automated Web Page Analysis
Vaňků, Nikita ; Křivka, Zbyněk (referee) ; Burget, Radek (advisor)
The aim of this thesis is to create an application for World Wide Web page analysis. JavaFX framework with using of graph database OrientDB has been chosen as implementation language. Application is capable of analysis small to medium based web domain and creating their structure.
Information Extraction from Web Pages
Bukovčák, Jakub ; Rychlý, Marek (referee) ; Burget, Radek (advisor)
This master thesis is focused on current technologies that are used for downloading web pages and extraction of structured information from them. The paper describes available tools to make this process possible and easier. Another part of this document provides the overview of technologies that can be used for creating web pages. Also, there is an information about development of information systems with web user interface based on Java Enterprise Edition (Java EE) platform. The main part of this master thesis describes design and implementation of application used to specify and manage extraction tasks. The last part of this project describes application testing on real web pages and evaluation of achieved results.
Interactive web crawling and data extraction
Fejfar, Petr ; Ježek, Pavel (advisor) ; Nečaský, Martin (referee)
Title: Interactive crawling and data extraction Author: Bc. Petr Fejfar Author's e-mail address: pfejfar@gmail.com Department: Department of Distributed and Dependable Systems Supervisor: Mgr. Pavel Je ek, Ph.D., Department of Distributed and De- pendable Systems Abstract: The subject of this thesis is Web crawling and data extraction from Rich Internet Applications (RIA). The thesis starts with analysis of modern Web pages along with techniques used for crawling and data extraction. Based on this analysis, we designed a tool which crawls RIAs according to the instructions defined by the user via graphic interface. In contrast with other currently popular tools for RIAs, our solution is targeted at users with no programming experience, including business and analyst users. The designed solution itself is implemented in form of RIA, using the Web- Driver protocol to automate multiple browsers according to user-defined instructions. Our tool allows the user to inspect browser sessions by dis- playing pages that are being crawled simultaneously. This feature enables the user to troubleshoot the crawlers. The outcome of this thesis is a fully design and implemented tool enabling business user to extract data from the RIAs. This opens new opportunities for this type of user to collect data from Web pages for use...
Automated Web Page Analysis
Vaňků, Nikita ; Křivka, Zbyněk (referee) ; Burget, Radek (advisor)
The aim of this thesis is to create an application for World Wide Web page analysis. JavaFX framework with using of graph database OrientDB has been chosen as implementation language. Application is capable of analysis small to medium based web domain and creating their structure.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.