National Repository of Grey Literature 2 records found  Search took 0.00 seconds. 
Main Text Extraction from Web Documents
Mrózek, Daniel ; Burget, Radek (referee) ; Bartík, Vladimír (advisor)
This thesis deals with the main text extraction from the web documents in HTML format. It describes some methods that are already used and their separation. The goal of the practical part is to propose an algorithm for main text detection in HTML pages using primarily text features in combination with position features. Block classification is solved by multilayer perceptron. It also describes implementation of the proposed algorithm, the testing procedure and presentation of the obtained results.
Main Text Extraction from Web Documents
Mrózek, Daniel ; Burget, Radek (referee) ; Bartík, Vladimír (advisor)
This thesis deals with the main text extraction from the web documents in HTML format. It describes some methods that are already used and their separation. The goal of the practical part is to propose an algorithm for main text detection in HTML pages using primarily text features in combination with position features. Block classification is solved by multilayer perceptron. It also describes implementation of the proposed algorithm, the testing procedure and presentation of the obtained results.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.