National Repository of Grey Literature 30 records found  1 - 10nextend  jump to record: Search took 0.00 seconds. 
Question Answering in Czech via Machine Translation and Cross-lingual Transfer
Macková, Kateřina ; Straka, Milan (advisor) ; Mareček, David (referee)
Reading comprehension and question answering are computer science disciplines in the field of natural language processing and information retrieval. Reading comprehension is the ability of the model to read text, process it and understand its meaning. One of its applications is in question answering tasks, which is concerned with building a system that can automatically find an answer in the text to a certain question relied on the content of the text. It is a well-studied task, with huge training datasets in English. However, there are no Czech datasets and models for this task. This work focuses on building reading comprehension and question answering systems for Czech, without requiring any manually annotated Czech training data. Our main focus is to create Czech training and development datasets, create the models for the Czech question answering system using Czech data, and create the models for the Czech question answering system using English data and cross-lingual transfer and compare the results and select the best model. First of all, we translated freely available English question answering datasets SQuAD 1.1 and SQuAD 2.0 to Czech to create training and development datasets. We then trained and evaluated several BERT and XLM-RoBERTa baseline models used for the question answering task in...
Natural Language Correction With Focus on Czech
Náplava, Jakub ; Straka, Milan (advisor) ; Grundkiewicz, Roman (referee) ; Dušek, Ondřej (referee)
Natural language correction, a subfield of natural language processing (NLP), is the task of automatically correcting user errors in written texts. It includes, but is not lim- ited to, grammatical error correction, spelling error correction and diacritics restoration. During the course of the work on this thesis, we witnessed a great advance in this field, with the emergence of new approaches to correct user errors, new datasets and also new evaluation metrics. This thesis presents, in the form of a dissertation by publication, our contributions to this field. As Czech is the primary language of the thesis author, special focus was devoted to improving natural language correction in Czech. The main con- tributions are (1) the creation of the Grammar Error Correction Corpus for Czech that comprises multiple sources of noisy texts such as essays or online discussion posts, eval- uation of strong neural models on this dataset, and meta-evaluation of existing metrics, (2) the development of grammar error correction systems suited to scenarios in which only low amount of annotated data is available, and (3) the development of two state-of- the-art models and the creation of the new multilingual dataset comprising 12 languages for diacritics restoration. 1
Semi-supervised learning in Optical Music Recognition
Mayer, Jiří ; Pecina, Pavel (advisor) ; Straka, Milan (referee)
Optical music recognition (OMR) is a niche subfield of computer vision, where some labeled datasets exist, but there is an order of magnitude more unlabeled data available. Recent advances in the field happened largely thanks to the adoption of deep learning. However, such neural networks are trained using labeled data only. Semi-supervised learning is a set of techniques that aim to incorporate unlabeled data during training to produce more capable models. We have modified a state-of-the-art object detection archi- tecture and designed a semi-supervised training scheme to utilize unlabeled data. These modifications have successfully allowed us to train the architecture in an unsupervised setting, and our semi-supervised experiments indicate improvements to training stability and reduced overfitting. 1
Persistent data structures
Kupec, Martin ; Mareš, Martin (advisor) ; Straka, Milan (referee)
This thesis discusses persistent data structures, that is structures which preserve their own history. We focus on pointer-based structures, where it is possible to reach both full and partial persistence in constant amortized time and space per operation. Persistent arrays are also discussed, but the existence of optimal persistent arrays remains an open problem. We also include specific applications of the general techniques and also examples of use of persistent data structures.
Qudratic field based cryptography
Straka, Milan ; Stanovský, David (advisor) ; Žemlička, Jan (referee)
Imaginary quadratic fields were first suggested as a setting for public-key cryptography by Buchmann and Williams already in 1988 and more cryptographic schemes followed. Although the resulting protocols are currently not as efficient as those based on elliptic curves, they are comparable to schemes based on RSA and, moreover, their security is believed to be independent of other widely-used protocols including RSA, DSA and elliptic curve cryptography. This work gathers present results in the field of quadratic cryptography. It recapitulates the algebraic theory needed to work with the class group of imaginary quadratic fields. Then it investigates algorithms of class group operations, both asymptotically and practically effective. It also analyses feasible cryptographic schemes and attacks upon them. A library implementing described cryptographic schemes is a part of this work.
Functional Data Stuctures and Algorithms
Straka, Milan ; Dvořák, Zdeněk (advisor) ; Koucký, Michal (referee) ; Brodal, Gerth (referee)
Title: Functional Data Structures and Algorithms Author: Milan Straka Institute: Computer Science Institute of Charles University Supervisor of the doctoral thesis: doc. Mgr. Zdeněk Dvořák, Ph.D, Computer Science Institute of Charles University Abstract: Functional programming is a well established programming paradigm and is becoming increasingly popular, even in industrial and commercial appli- cations. Data structures used in functional languages are principally persistent, that is, they preserve previous versions of themselves when modified. The goal of this work is to broaden the theory of persistent data structures and devise efficient implementations of data structures to be used in functional languages. Arrays are without any question the most frequently used data structure. Despite being conceptually very simple, no persistent array with constant time access operation exists. We describe a simplified implementation of a fully per- sistent array with asymptotically optimal amortized complexity Θ(log log n) and especially a nearly optimal worst-case implementation. Additionally, we show how to effectively perform a garbage collection on a persistent array. The most efficient data structures are not necessarily based on asymptotically best structures. On that account, we also focus on data structure...
Factorization of polynomials over finite fields
Straka, Milan ; Žemlička, Jan (advisor) ; Stanovský, David (referee)
Nazcv prace: Faktorizace polynoinu nad konccnynii telesy Autor: Milan Straka Katcdra (ustav): Katcdra algebry Vedouci bakalarske prace: Mgr. Jan Zcmlicka, Ph.D. E-mail vedouciho: Jan.Zemlicka((hnff. Abstrakt: Cilem prace je prozkoumat problem rozkladu polynomn nad konecnym telc- scm na soucin ircducibilnich polynoinu. PopHanim nekolika algoritmu hledaji- cich tento rozklad se ukaze, ze tento problem je vzdy fcsitclny v polynornialnim case vzhleclem kc stupni polynomu a poctu prvku konecneho telcsa. U jeduoho z algoritnm je po])sana implenientace s vclnii clobrou asymptotic- kou casovou slozito.sti O(nLylD log c/}, kdc i\. jc stupen rozkladaneho polynuinn nad telesem « q prvky. Program pouzivajiei jcdnodnssi, ale prakticky rychlcjsi variantu tohoto algoritnm jc soucasti ])racc. Klicova slova: faktorizace, kouecna telesa, polynoniy, algoritmns Title: Factoring polynomials over finite fields Author: Milan Straka Department: Department of Algebra Supervisor: Mgr. Jan Zemlicka, Ph.D. Supervisor's e-mail address: Jan. Abstract: The goal of this work is to present the problem of the decomposition of a polyno- mial over a finite field into a product of irreducible polynomials. By describing algorithms solving this problem, we show that the decomposition can always be found in...
Czech NLP with Contextualized Embeddings
Vysušilová, Petra ; Straka, Milan (advisor) ; Hajič, Jan (referee)
With the increasing amount of digital data in the form of unstructured text, the importance of natural language processing (NLP) increases. The most suc- cessful technologies of recent years are deep neural networks. This work applies the state-of-the-art methods, namely transfer learning of Bidirectional Encoders Representations from Transformers (BERT), on three Czech NLP tasks: part- of-speech tagging, lemmatization and sentiment analysis. We applied BERT model with a simple classification head on three Czech sentiment datasets: mall, facebook, and csfd, and we achieved state-of-the-art results. We also explored several possible architectures for tagging and lemmatization and obtained new state-of-the-art results in both tagging and lemmatization with fine-tunning ap- proach on data from Prague Dependency Treebank. Specifically, we achieved accuracy 98.57% for tagging, 99.00% for lemmatization, and 98.19% for joint accuracy of both tasks. Best models for all tasks are publicly available. 1
Low-resource Text Classification
Szabó, Adam ; Straka, Milan (advisor) ; Popel, Martin (referee)
The aim of the thesis is to evaluate Czech text classification tasks in the low-resource settings. We introduce three datasets, two of which were publicly available and one was created partly by us. This dataset is based on contracts provided by the web platform Hlídač Státu. It has most of the data annotated automatically and only a small part manually. Its distinctive feature is that it contains long contracts in the Czech language. We achieve outstanding results with the proposed model on publicly available datasets, which confirms the sufficient performance of our model. In addition, we performed ex- perimental measurements of noisy data and of various amounts of data needed to train the model on these publicly available datasets. On the contracts dataset, we focused on selecting the right part of each contract and we studied with which part we can get the best result. We have found that for a dataset that contains some systematic errors due to automatic annotation, it is more advantageous to use a shorter but more relevant part of the contract for classification than to take a longer text from the contract and rely on BERT to learn correctly. 1
Adaptive Handwritten Text Recognition
Procházka, Štěpán ; Straka, Milan (advisor) ; Straňák, Pavel (referee)
The need to preserve and exchange written information is central to the human society, with handwriting satisfying such need for several past millenia. Unlike optical character recognition of typeset fonts, which has been throughly studied in the last few decades, the task of handwritten text recognition, being considerably harder, lacks such attention. In this work, we study the capabilities of deep convolutional and recurrent neural networks to solve handwritten text extraction. To mitigate the need for large quantity of real ground truth data, we propose a suitable synthetic data generator for model pre-training, and carry out extensive set of experiments to devise a self-training strategy to adapt the model to unnanotated real handwritten letterings. The proposed approach is compared to supervised approaches and state-of-the-art results on both established and novel datasets, achieving satisfactory performance. 1

National Repository of Grey Literature : 30 records found   1 - 10nextend  jump to record:
See also: similar author names
3 Straka, Marek
15 Straka, Martin
2 Straka, Matej
13 Straka, Michal
2 Straka, Miroslav
Interested in being notified about new results for this query?
Subscribe to the RSS feed.