National Repository of Grey Literature 29 records found  1 - 10nextend  jump to record: Search took 0.01 seconds. 
Vision Transformers for Facial Recognition
Strýček, Šimon ; Kišš, Martin (referee) ; Špaňhel, Jakub (advisor)
This thesis focuses on applying vision transformer-based neural networks to face recognition related tasks. It focuses on exploring modern vision transformer (ViT) architectures, experimenting with alternative data, and finding the suitable parameters to train ViTs to compete with the already established dominance of convolutional neural networks in face recognition. The goal of this work was to show the suitability of vision-transformers for face recognition. The output of this work contains results of various experiments, demonstrations of benefits and drawbacks of some of the modern and popular ViTs, the definition of an optimal setup when wanting to employ vision transformers for facial recognition, and interesting observations from working with vision transformers.
Segmentation of logical units in text
Kostelník, Martin ; Kišš, Martin (referee) ; Beneš, Karel (advisor)
Cílem projektu bylo vytvořit systém pro automatickou segmentaci textu do logických celků. Práce staví na systému PERO-OCR a cílí na zlepšení zpracovávání českých historických dokumentů a jejich vyhledávačů používaných knihovníky a vědci. Práce zahrnovala vytvoření a anotace vlastní datové sady složené celkem z 4044 stránek z knih, slovníků a novin. K problému segmentaci textu je přistoupeno inovativních přístupem, kdy je brán jako shlukovací problém jednotlivých řádků textu. Metoda je dvoufázová: nejprve probíhá detekce regionů textu pomocí modelu YOLOv8 a následuje jejich spojení grafovou neuronovou sítí. Vyhodnocení je provedeno pomocí shlukovací metriky V-measure a na testovacím datasetu dosahuje hodnot 77.93 % pro knihy, 95.79 % pro slovníky a 90.23 % pro noviny.
Page Layout Analysis with Graph Neural Networks
Otčenáš, Matej ; Kišš, Martin (referee) ; Hradiš, Michal (advisor)
The aim of this work is to experimentally test the power of graph neural networks in the comprehensive analysis of document layout. In terms of document types, the focus is primarily on newspaper articles and historical writings, such as handwritten books or medieval manuscripts. These are characterized by the complexity of their layout, lacking a fixed structure or having highly segmented text. The work deals with the creation of suitable datasets for training and testing an approach for globally ordering the sequence of reading lines on a page and assigning each line to one of the defined classes. The research also involves creating an appropriate representation of a graph that captures relationships between individual components on the page and selecting a suitable graph neural network with the appropriate parameters. Finally, the different approaches are evaluated and compared on multiple metrics suitable for the given problem, and the findings are summarized with a discussion on possible enhancements and limitations.
Convolutional Networks for Handwriting Recognition
Sladký, Jan ; Kišš, Martin (referee) ; Hradiš, Michal (advisor)
This thesis deals with handwriting recognition using convolutional neural networks. From the current methods, a network model was chosen to consist of convolutional and recurrent neural networks with the Connectist Temporal Classification. The Vertical Attention Module, which selects the relevant information in each column corresponding to the text in the figure was subsequently implemented in such a model. Then, this module was compared with other possibilities of vertical aggregation between convolutional and recurrent networks. The experiments took place on a data set containing over 80,000 lines of text from Czech letters from the 20th century. The results show that the Vertical Attention Module almost always achieves the best results on all used types of convolution networks. The resulting network achieved the best result with 8,9%  of the character error rate. The contribution of this work is a neural network with a newly introduced element that can recognize lines of text.
Convolutional Networks for Lip Reading
Kadleček, Josef ; Kišš, Martin (referee) ; Hradiš, Michal (advisor)
This thesis deals with current methods for automatic speech recognition and lip reading via neural networks. Furthermore it deals with similarities in the architectures of neural networks for audio and visual data and available datasets in the field of audiovisual automatic speech recognition. The main contribution of this thesis is set of experiments comparing different changes in neural network architecture and its impact on results. The thesis includes an implementation of a system for automatic speech recognition from audio (CER: 12.6 %) and visual (CER: 57,7 %) data. The architectures of both systems are based on features extraction via convolutional networks followed by recurrent layers LSTM, another layer of convolutions and loss function CTC. 
Text Recognition Enhanced by Writer Identity
Trněný, Matěj ; Kišš, Martin (referee) ; Kohút, Jan (advisor)
The objective of this theses was to implement a neural network for text recognition enhanced by writers identity. Adversarial learning method was selected for this purpose. Usefulness of this method was verified by experiments. This net should yield better results on data which are not similar to data contained in training data set. Accuracy of the resulting net was compared to method single-task learning and method multi-task learning. Net implementing single-task learning method has reached average character recognition error of 7, 995%, net implementing multi-task learning method has reached average error of 7, 565% and net implementing adversarial learning method has reached average error of 7, 573%. In comparison to the net implementing single-task learning multi-task learning has improvement of 5, 38% and adversarial learning has reached improvement of 5, 28%. 
Multi-Modal Text Recognition
Kabáč, Michal ; Herout, Adam (referee) ; Kišš, Martin (advisor)
The aim of this thesis is to describe and create a method for correcting text recognizer outputs using speech recognition. The thesis presents an overview of current methods for text and speech recognition using neural networks. It also presents a few existing methods of connecting the outputs of two modalities. Within the thesis, several approaches for the correction of recognizers, which are based on algorithms or neural networks, are designed and implemented. An algorithm based on the principle of searching the outputs of recognizers using levenshtain alignment was proven to be the best approach. It scans the outputs, if the uncertainty of the text recognizer character is less than the pre-selected limit. As part of the work, an annotation server was created for the text transcripts, which was used to collect recordings for the evaluation of experiments.
Convolutional Neural Networks for Security Applications
Kišš, Martin ; Hradiš, Michal (referee) ; Smrž, Pavel (advisor)
This thesis deals with design and implementation of application for person recognition in security camera. For single face rocongition are used convolutional neural networks, which creates representation of the face, and k-nearest neighbours algorithm for classification. For recognition of sequence of faces there are three algorithms implemented. On test data success of recognition reached nearly 75 %.
Improving Consistency in Text Recognition Datasets
Tvarožný, Matúš ; Hradiš, Michal (referee) ; Kišš, Martin (advisor)
This work is concerned with increasing the consistency of datasets for text recognition. This paper describes the problems that cause the inconsistency and then presents solutions to eliminate it. The effect of the properties of the polygons defining the text line boundaries and hence how the modified version of the dataset, which is composed of ideal text line variants, affected the accuracy of the model is investigated. Further, the work focuses on detecting and then removing or modifying text lines whose ground truth transcription does not match the actual text they contain. Experimentation showed that removing the visual inconsistency on the training set did not have a significant effect on the trained model, but modifying the test set improved the OCR accuracy of the model by 1.1\% CER. By modifying the dataset so that it did not contain mutually inconsistent pairs of recognized text and the corresponding ground truth, the model improved by a maximum of only 0.2\% CER after re-training. The main finding of this work is, above all, the proven beneficial effect of removing inconsistencies on test suites, thanks to which it is possible to determine a more realistic error rate of the OCR model.
Online Tool for Recognition of Tables in Images
Inhliziian, Bohdan ; Kišš, Martin (referee) ; Herout, Adam (advisor)
This work solves the problem of recognising the tables in the figures. The goal is to convert the table into an XLS file thought web application. For line detection we have used the Probablistic Hough Transform algorithm and Tesse- ract tool was used to detect text in cells. The program was stored to the Amazon AWS and accessed by the web app using the API. An algorithm for line merging has been created, as well as an algorithm for removing lines that do not belong to the table and removing wrong detected lines (text, noise). The solution provides users who manually overwrite data from tables in documents, books, use a program that does everything automatically, you only need to upload photos to a web application.

National Repository of Grey Literature : 29 records found   1 - 10nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.