National Repository of Grey Literature 2 records found  Search took 0.00 seconds. 
Automated extraction of data from HTML
Onderka, Jakub ; Koutný, Martin (referee) ; Vrba, Kamil (advisor)
This thesis deals with data extraction from web pages created in HTML language. It describes methods of downloading pages from remote server using HTTP protocol, document charset encoding and options for extraction content from elements. It also shows ways in which authors of web sites can prevent automatic web scraping. These were used to create C# applications for extraction data from two Czech Police databases – Investigation for person and Investigation for cars. These applications allow to download data from remote database, save to local database and search or show required data.
Automated extraction of data from HTML
Onderka, Jakub ; Koutný, Martin (referee) ; Vrba, Kamil (advisor)
This thesis deals with data extraction from web pages created in HTML language. It describes methods of downloading pages from remote server using HTTP protocol, document charset encoding and options for extraction content from elements. It also shows ways in which authors of web sites can prevent automatic web scraping. These were used to create C# applications for extraction data from two Czech Police databases – Investigation for person and Investigation for cars. These applications allow to download data from remote database, save to local database and search or show required data.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.