Original title:
Active Learning pro zpracování archivních pramenů
Translated title:
Active Learning for Processing of Archive Sources
Authors:
Hříbek, David ; Zbořil, František (referee) ; Rozman, Jaroslav (advisor) Document type: Master’s theses
Year:
2021
Language:
cze Publisher:
Vysoké učení technické v Brně. Fakulta informačních technologií Abstract:
[cze][eng]
V teto praci je reseno vytvoreni systemu, ktery umoznuje nahrani a anotaci skenu historickych dokumentu a nasledne aktivni doucovani modelu pro rozpoznavani znaku (OCR) na dostupnych anotacich (vyznacenych radcich a jejich prepisech). V praci je popsan proces, klasifikovany techniky a uveden existujici system pro rozpoznavani znaku. Predevsim je kladen duraz na metody strojoveho uceni. Dale jsou vysvetleny metody aktivniho uceni a navrhnut zpusob doucovani OCR modelu z anotovanych skenu. Zbytek prace se zabyva konkretnim navrhem, implementaci, dostupnymi datasety, vyhodnocenim uspesnosti rozpoznavani znaku vlastnorucne vytvoreneho OCR modelu a testovanim celeho systemu.
This work deals with the creation of a system that allows uploading and annotating scans of historical documents and subsequent active learning of models for character recognition (OCR) on available annotations (marked lines and their transcripts). The work describes the process, classifies the techniques and presents an existing system for character recognition. Above all, emphasis is placed on machine learning methods. Furthermore, the methods of active learning are explained and a method of active learning of available OCR models from annotated scans is proposed. The rest of the work deals with a system design, implementation, available datasets, evaluation of self-created OCR model and testing of the entire system.
Keywords:
active learning; active learning in handwritten text recognition; annotation of historical document scans.; Machine learning; OCR; optical character recognition; supervised learning; aktivni uceni; aktivni uceni pro rozpoznavani rucne psaneho textu; anotace skenu historickych dokumentu.; OCR; opticke rozpoznavani znaku; Strojove uceni; uceni s ucitelem
Institution: Brno University of Technology
(web)
Document availability information: Fulltext is available in the Brno University of Technology Digital Library. Original record: http://hdl.handle.net/11012/200159