National Repository of Grey Literature 4 records found  Search took 0.01 seconds. 
Cleaning, extraction of text and transformation of web pages into vertical format
Švaňa, Miloš ; Otrusina, Lubomír (referee) ; Dytrych, Jaroslav (advisor)
This thesis deals with the topic of extraction of text from web page, recognition of important contents and its transformation to vertical format, which can be used as a suitable input for other natural language processing tasks. It analyzes the existing solution and its components with emphasis on its disadvantages and describes the design and implementation of new solution based on obtained knowledge.
Character recognition of machine-written documents
Kindermann, Hubert ; Blažek, Jan (advisor) ; Kolomazník, Jan (referee)
In the present thesis we solve the problem of symbol extraction and recognition from printed documents digitized by the scanner or camera. We introduce a noise resistant algorithm of document lighting normalization. We continue with the extraction of individual characters from the document and their recognition with a system of feedforward multilayer neural networks. We also focus on processing of the resulting set of recognized characters, which is necessary for further use of the extracted text. The last step is correction of the output based on surrounding letters of each character. We have successfully implemented an automatic system containing all the above components.
Character recognition of machine-written documents
Kindermann, Hubert ; Blažek, Jan (advisor) ; Kolomazník, Jan (referee)
In the present thesis we solve the problem of symbol extraction and recognition from printed documents digitized by the scanner or camera. We introduce a noise resistant algorithm of document lighting normalization. We continue with the extraction of individual characters from the document and their recognition with a system of feedforward multilayer neural networks. We also focus on processing of the resulting set of recognized characters, which is necessary for further use of the extracted text. The last step is correction of the output based on surrounding letters of each character. We have successfully implemented an automatic system containing all the above components.
Cleaning, extraction of text and transformation of web pages into vertical format
Švaňa, Miloš ; Otrusina, Lubomír (referee) ; Dytrych, Jaroslav (advisor)
This thesis deals with the topic of extraction of text from web page, recognition of important contents and its transformation to vertical format, which can be used as a suitable input for other natural language processing tasks. It analyzes the existing solution and its components with emphasis on its disadvantages and describes the design and implementation of new solution based on obtained knowledge.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.