National Repository of Grey Literature 3 records found  Search took 0.00 seconds. 
Web Browser Automation
Bastl, Vojtěch ; Polčák, Libor (referee) ; Burget, Radek (advisor)
This work deals with the automation of a web browser - the tools that allow programmatic control of the program for browsing the web pages. First, it discusses the existing solutions with focus on the tools from the Selenium Suite family and PhantomJS. Further, the internal representation of the web pages in the Gecko and WebKit browser engines is discussed. The work then focuses on the web browser application interface available for client-side scripting. The relevant standards are discussed as well. The core part of the thesis is dedicated to the design and implementation of a tool that allows to control a browser using the Selenium WebDriver tool and to extract data about the targert web page. The work presents an internal architecture, configuration files and the application interface of the designed tool. The topic of extracting detailed data about the page and its transformation to a unified structured description is covered as well. Finally, the performed unit tests and tests on real web pages are described.
Web Page Archiving Tools
Kvačkaj, Matúš ; Rychlý, Marek (referee) ; Burget, Radek (advisor)
This bachelor thesis deals with the issue of archiving and reproduction of web pages. The aim was to provide a tool that, after specifying the URL and parameters, creates an archive in WARC format of a given page and also generates its textual description, suitable for further processing and analysis. The tool also supports the reverse process - replaying a site from a WARC archive and generating a textual description of the page. When implementing the tool, it was intended that it would be applied to an existing dataset and would be part of a bulk data processing. The Webis-Web-Archive-17 dataset was used, which contains approximately 10,000 WARC archives collected since 2017. To ensure maximum portability of the tool, Docker containerization was used.
Web Browser Automation
Bastl, Vojtěch ; Polčák, Libor (referee) ; Burget, Radek (advisor)
This work deals with the automation of a web browser - the tools that allow programmatic control of the program for browsing the web pages. First, it discusses the existing solutions with focus on the tools from the Selenium Suite family and PhantomJS. Further, the internal representation of the web pages in the Gecko and WebKit browser engines is discussed. The work then focuses on the web browser application interface available for client-side scripting. The relevant standards are discussed as well. The core part of the thesis is dedicated to the design and implementation of a tool that allows to control a browser using the Selenium WebDriver tool and to extract data about the targert web page. The work presents an internal architecture, configuration files and the application interface of the designed tool. The topic of extracting detailed data about the page and its transformation to a unified structured description is covered as well. Finally, the performed unit tests and tests on real web pages are described.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.