National Repository of Grey Literature 29 records found  1 - 10nextend  jump to record: Search took 0.01 seconds. 
Normalization of numbers into spoken form for text-to-speech systems
Růžička, Jakub ; Dušek, Ondřej (advisor) ; Peterek, Nino (referee)
Title: Normalization of numbers into spoken form for text-to-speech systems Author: Jakub Růžička Institute: Institute of Formal and Applied Linguistics Supervisor: Mgr. et Mgr. Ondřej Dušek, Ph.D., Institute of Formal and Applied Lin- guistics Abstract: A necessary part of any text-to-speech system is the normalization of num- bers and words containing numbers. The accuracy of this process can significantly affect the quality of the resulting speech. The main goal of this work is the design and imple- mentation of a number normalization module for Czech. Words containing digits are first assigned to one of the predefined categories. Based on the category given, possible spoken forms are subsequently generated. For the selection of the contextually correct variant, an existing language model is used. The system is distributed as a Python package and can run on Linux or in a Docker container whose configuration is part of the project. Moreover, a specialized data annotation application has been designed and written for creating the datasets for the Czech text normalization task. Two datasets with 1,882 sen- tences and 3,185 words requiring normalization were obtained using the data annotation service. The system achieved a sentence-level accuracy of over 80% on both datasets. We perform a detailed error...
Restoring and improving the technical quality of audio recordings using machine learning methods
Lechovský, Adam ; Peterek, Nino (advisor) ; Dušek, Ondřej (referee)
The goal of this thesis is to train an artificial neural network which will be able to improve the technical quality of audio recordings. To achieve this, three artificial audio distortions are used to train seven different deep neural networks on pairs of distorted and undistorted audio. The resulting 21 networks are then evaluated using a number of objective and subjective measures. In the end, the networks learned to remove artificial distortions very well, but they did not learn to improve the technical quality of undistorted inputs. 1
Design and Implementation of Sound Recognizer of Particular Grasshopper Species
Schwarz, Jan ; Peterek, Nino (advisor) ; Hlaváčová, Jaroslava (referee)
Biologists asked us to create a system that recognizes particular grasshopper species from stridulation records. Currently we recognize five grasshopper species which can be seen in the Czech Republic using a free available toolkit for speech recognition called HTK. In addition to the acoustic model itself we also created web sites, which would analyse a stridulation record and then save the result for subsequent utilization. The current model is based only on a limited amount of training records, but its results are satisfactory. The web sites also serve as a gathering system; consequently, it is possible to further extend and improve the model.
Speech Interface for Corpus Annotation Tools
Přikryl, Leoš ; Hajič, Jan (advisor) ; Peterek, Nino (referee)
The thesis considers design and implementation of the interface for the corpus annotation tools used at the Institute of Formal and Applied Linguistics (TrEd and its additional modules) in the natural language (speech). Already existing modules for automatic speech recognition from the University of West Bohemia in Pilsen are used.
Viewer of a vector map of the Czech Republic for mobile phones supporing Java
Stach, David ; Machek, Pavel (advisor) ; Peterek, Nino (referee)
The bachelor thesis is focused on creating of aplication for mobile phones, which provides to view map of the Czech Republic represented by vector data. Creation of vector data is not a part of this thesis, aplication uses already finished map. Processing of vector data and displaying of them is main point of thesis. Aplication has to use algorithms, which will enable fast work with map despite of limited sources of unit.
Error detection in speech recognition
Tobolíková, Petra ; Hajič, Jan (advisor) ; Peterek, Nino (referee)
This thesis tackles the problem of error detection in speech recognition. First, principles of recent approaches to automatic speech recognition are introduced. Various deficiencies of speech recognition that cause imperfect recognition results are outlined. Current known methods of "confidence score" computation are then described. The next chapter introduces three machine learning algorithms which where employed in the error detection methods implemented in this thesis: logistic regression, artificial neural networks and decision trees. This machine learning methods use certain attributes of the recognized words as input variables and predict an estimated confidence score value. The open source software "R" has been used throughout, showing the usage of the aforementioned methods. These methods have been tested on Czech radio and TV broadcasts. The results obtained by those methods are compared using ROC curves, standard errors and possible (oracle) WER reduction. Programming documentation of the code used in the implementation is enclosed as well. Finally, efficient word attributes for error detection are summarized.
Pronunciation Features of Czech Language - Dialect Analysis
Michlíková, Vendula ; Peterek, Nino (advisor) ; Korvas, Matěj (referee)
We implemented Výrče:SW, a tool for collecting and analysing audio recordings without the necessity of supervisor's assistance. The tool allows creating a wide range of recording scenarios, including the possibility to analyse the recordings and show the results. Using the created tool, we collected Výrče:Korpus, a read audio corpus of 34 speakers and 2376 utterances of 7 hours in length. The corpus also includes questionnaires that provide information about the dialect reliability of speakers. Sufficient amounts of speakers for dialect analysis are from the Central Bohemian dialect area and Silesian dialect area. On the two selected groups, we trained a simple monophone dialect recogniser based on Hidden Markov Models. Powered by TCPDF (www.tcpdf.org)
Distributed Sytem for Verification of Properties of Natural Numbers
Tomisová, Martina ; Mírovský, Jiří (advisor) ; Peterek, Nino (referee)
The result of my work is a system for distributed verification of properties of natural numbers. It has two parts - server and client. These communicate via HTTP protocol. The clients perform the computation, the server distributes the work (numbers) and gather results (properties of the given numbers). The input of one computation should be one natural number, as well as the result (output). The distribution can be used for verification of a given property for several natural numbers. Particular jobs can be added to the client as plugins. Two examples of plugins are a part of the work. The first one is very simple and shows how to create plugins. The second example searches for prime numbers (and has it's own arithmetics library for long numbers) - the server can distribute (possibly big) numbers, the client will verify whether a given number is a prime number.
Spoken Language Translation via Phoneme Representation of the Source Language
Polák, Peter ; Bojar, Ondřej (advisor) ; Peterek, Nino (referee)
We refactor the traditional two-step approach of automatic speech recognition for spoken language translation. Instead of conventional graphemes, we use phonemes as an intermediate speech representation. Starting with the acoustic model, we revise the cross-lingual transfer and propose a coarse-to-fine method providing further speed-up and performance boost. Further, we review the translation model. We experiment with source and target encoding, boosting the robustness by utilizing the fine-tuning and transfer across ASR and SLT. We empirically document that this conventional setup with an alternative representation not only performs well on standard test sets but also provides robust transcripts and translations on challenging (e.g., non-native) test sets. Notably, our ASR system outperforms commercial ASR systems. 1
Multilingual speech synthesis
Nekvinda, Tomáš ; Dušek, Ondřej (advisor) ; Peterek, Nino (referee)
This work explores multilingual speech synthesis. We compare three models based on Tacotron that utilize various levels of parameter sharing. Two of them follow recent multilingual text-to-speech systems. The first one makes use of a fully-shared encoder and an adversarial classifier that removes speaker-dependent information from the encoder. The other uses language-specific encoders. We introduce a new approach that combines the best of both previous methods. It enables effective parameter sharing using a meta- learning technique, preserves encoder's flexibility, and actively removes speaker-specific information in the encoder. We compare the three models on two tasks. The first one aims at joint multilingual training on ten languages and reveals their knowledge-sharing abilities. The second concerns code-switching. We show that our model effectively shares information across languages, and according to a subjective evaluation test, it produces more natural and accurate code-switching speech.

National Repository of Grey Literature : 29 records found   1 - 10nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.