National Repository of Grey Literature 2 records found  Search took 0.00 seconds. 
Intelligent Data Scraping in a Web Browser
Maštera, František ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
The goal of this thesis is to extract data from web pages without the knowledge of their internal structure. The point is to recognize the structure using an algorithm and a given input information about the content that the user wants to extract. The structure analysis is then followed by the content extraction itself. An average success rate of over 80% was achieved on selected sets of websites. The resulting algorithm represents a new approach to data extraction and can be deployed in the real world or can be a part of further development.
Intelligent Data Scraping in a Web Browser
Maštera, František ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
The goal of this thesis is to extract data from web pages without the knowledge of their internal structure. The point is to recognize the structure using an algorithm and a given input information about the content that the user wants to extract. The structure analysis is then followed by the content extraction itself. An average success rate of over 80% was achieved on selected sets of websites. The resulting algorithm represents a new approach to data extraction and can be deployed in the real world or can be a part of further development.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.