Original title:
Online nástroj pro rozpoznávání tabulek v obrázcích
Translated title:
Online Tool for Recognition of Tables in Images
Authors:
Inhliziian, Bohdan ; Kišš, Martin (referee) ; Herout, Adam (advisor) Document type: Bachelor's theses
Year:
2019
Language:
cze Publisher:
Vysoké učení technické v Brně. Fakulta informačních technologií Abstract:
[cze][eng]
Cilem teto prace je resit problem rozpoznavani tabulek v obrazcich a prevest vyfocenou tabulku, nahranou na webove rozhrani, do XLSX souboru. Program je vytvoreny s durazem na jednoduchost v pouziti potencialnim uzivatelem. Pro detekce car byl pouzit algoritmus Probablistic Hough Transform a pomoci nastroju Tesseract byla provedena detekce textu v bunkach. Program byl umistneni na Amazon AWS a pristup k nemu webova aplikace dela pomoci API. Byl vytvoren vlastni algoritmus pro spojeni car do jedne cary a taky algoritmus pro odstraneni car, ktere nepatri do tabulky a chybne detekovanych car (text, sum). Vytvorene reseni poskytuje moznost uzivatelum, ktere rucne prepisuji data z tabulek v dokumentech, knihach, vyuzit program, ktery dela vsechno automaticky, je potreba jen nahrat foto do webove aplikace.
This work solves the problem of recognising the tables in the figures. The goal is to convert the table into an XLS file thought web application. For line detection we have used the Probablistic Hough Transform algorithm and Tesse- ract tool was used to detect text in cells. The program was stored to the Amazon AWS and accessed by the web app using the API. An algorithm for line merging has been created, as well as an algorithm for removing lines that do not belong to the table and removing wrong detected lines (text, noise). The solution provides users who manually overwrite data from tables in documents, books, use a program that does everything automatically, you only need to upload photos to a web application.
Keywords:
convert table to XLSX; extrakt text from table; Hough Transform.; table detector; Table recognition; detekce tabulek; detektor uhlu; extrakce textu z tabulek; Hough Transform; konverze tabulek; Rozpoznavac tabulek
Institution: Brno University of Technology
(web)
Document availability information: Fulltext is available in the Brno University of Technology Digital Library. Original record: http://hdl.handle.net/11012/191387