National Repository of Grey Literature 29 records found  previous11 - 20next  jump to record: Search took 0.00 seconds. 
Data-driven Pronunciation Generation for ASR
Obedkova, Maria ; Plátek, Ondřej (advisor) ; Peterek, Nino (referee)
Data-Driven Pronunciation Generation for ASR Maria Obedkova In ASR systems, dictionaries are usually used to describe pronunciations of words in a language. These dictionaries are typically hand-crafted by linguists. One of the most significant drawbacks of dictionaries created this way is that linguistically motivated pronunciations are not necessarily the optimal ones for ASR. The goal of this research was to explore approaches of data-driven pro- nunciation generation for ASR. We investigated several approaches of lexicon generation and implemented the completely new data-driven solution based on the pronunciation clustering. We proposed an approach for feature extraction and researched different unsupervised methods for pronunciation clustering. We evaluated the proposed approach and compared it with the current hand-crafted dictionary. The proposed data-driven approach could beat the established base- lines but underperformed in comparison to the hand-crafted dictionary which could be due to unsatisfactory features extracted from data or insufficient fine tuning. 1
Improving text-to-speech in spoken dialogue systems by employing user's feedback
Hudeček, Vojtěch ; Žabokrtský, Zdeněk (advisor) ; Peterek, Nino (referee)
Although spoken dialogue systems have greatly improved, they still cannot handle communications involving unknown topics. One of the problems is, that they experience difficulties when they should pronounce unknown words. We will investigate methods that can improve spoken dialogue systems by correcting the pronunciation of unknown words. This is a crucial step to provide a better user experience, since for example mispronounced proper nouns are highly undesirable. Incorrect pronunciation is caused by imperfect phonetic representation of the word. We aim to detect incorrectly pronounced words, use knowledge about the pronunciation and user's feedback and correct the transcriptions accordingly. Furthermore, the learned phonetic transcriptions can be added to the speech recognition module's vocabulary. Thus extracting correct pronunciations benefits both speech recognition and text-to-speech components of the dialogue systems.
Pronunciation Validation in Speech Therapy Application
Černý, Patrik ; Peterek, Nino (advisor) ; Vidová Hladká, Barbora (referee)
Title: Pronunciation Validation in Speech Therapy Application Author: Bc. Patrik Černý Institute: Institute of Formal and Applied Linguistics Supervisor: Mgr. Nino Peterek, Ph.D., Institute of Formal and Applied Linguistics Abstract: A goal of this thesis is to design, create and test speech validation method based on current speech recognition algorithms. Resulting software is a speech therapy application for sounds or words training with feedback about pronunciation accuracy. Speech validation is based on CMUSphinx tools and on inaccurate pronunciation generation (using phonetic dictionary). Records with accurate and inaccurate pronunciations has been collected for training and testing purposes. It has been shown, that this design is not appropriate. Thanks to the software design, application can be easily extended by techniques, that could improve validation efficiency. Keywords: speech validation, word recognition, dyslalia, speech therapy appli- cation
Continously Learning Analyser of Audio-Visual Recordings
Košarko, Ondřej ; Peterek, Nino (advisor) ; Klusáček, David (referee)
This thesis introduces a tool for analysis of audiovisual records. The tool uses the audio and closed captions supplied by the user to prepare text annotation. The annotation contains a transcript of the show which is based on the closed captions. In addition, speaker diarization is performed to mark who spoke when. The diarization is performed by a third party library. The library is evaluated on data from DIALOG corpus. The inner workings of the library are described. To assign the right portions of the text to the right section of the record Kaldi, a speech recognition toolkit, is used. Furthermore the thesis contains an overview describing how closed captions are created; overview of speech corpora creation; and a brief review of literature on record analysis. 1
Neural networks for automatic speaker, language, and sex identification
Do, Ngoc ; Jurčíček, Filip (advisor) ; Peterek, Nino (referee)
Title: Neural networks for automatic speaker, language, and sex identifica- tion Author: Bich-Ngoc Do Department: Institute of Formal and Applied Linguistics Supervisor: Ing. Mgr. Filip Jurek, Ph.D., Institute of Formal and Applied Linguistics and Dr. Marco Wiering, Faculty of Mathematics and Natural Sciences, University of Groningen Abstract: Speaker recognition is a challenging task and has applications in many areas, such as access control or forensic science. On the other hand, in recent years, deep learning paradigm and its branch, deep neural networks have emerged as powerful machine learning techniques and achieved state-of- the-art in many fields of natural language processing and speech technology. Therefore, the aim of this work is to explore the capability of a deep neural network model, recurrent neural networks, in speaker recognition. Our pro- posed systems are evaluated on TIMIT corpus using speaker identification task. In comparison with other systems in the same test conditions, our systems could not surpass reference ones due to the sparsity of validation data. In general, our experiments show that the best system configuration is a combination of MFCCs with their dynamic features and a recurrent neural network model. We also experiment recurrent neural networks and convo- lutional neural...
Development of trainable policies for spoken dialogue systems
Le, Thanh Cong ; Jurčíček, Filip (advisor) ; Peterek, Nino (referee)
Abstract Development of trainable policies for spoken dialogue systems Thanh Le In human­human interaction, speech is the most natural and effective manner of communication. Spoken Dialogue Systems (SDS) have been trying to bring that high level interaction to computer systems, so with SDS, you could talk to machines rather than learn to use mouse and keyboard for performing a task. However, as inaccuracy in speech recognition and inherent ambiguity in spoken language, the dialogue state (user's desire) can never be known with certainty, and therefore, building such a SDS is not trivial. Statistical approaches have been proposed to deal with these uncertainties by maintaining a probability distribution over every possible dialogue state. Based on these distributions, the system learns how to interact with users, somehow to achieve the final goal in the most effective manner. In Reinforcement Learning (RL), the learning process is understood as optimizing a policy of choosing action conditioned on the current belief state. Since the space of dialogue...
Rozpoznávání řeči pomocí KALDI
Plátek, Ondřej ; Jurčíček, Filip (advisor) ; Peterek, Nino (referee)
The topic of this thesis is to implement efficient decoder for speech recognition training system ASR Kaldi (http://kaldi.sourceforge.net/). Kaldi is already deployed with decoders, but they are not convenient for dialogue systems. The main goal of this thesis to develop a real time decoder for a dialogue system, which minimize latency and optimize speed. Methods used for speeding up the decoder are not limited to multi-threading decoding or usage of GPU cards for general computations. Part of this work is devoted to training an acoustic model and also testing it in the "Vystadial" dialogue system. Powered by TCPDF (www.tcpdf.org)
Development of an English public transport information dialogue system
Vejman, Martin ; Jurčíček, Filip (advisor) ; Peterek, Nino (referee)
This thesis presents a development of an English spoken dialogue system based on the Alex dialogue system framework. The work describes a component adaptation of the framework for a different domain and language. The system provides public transport information in New York. This work involves creating a statistical model and the deployment of custom Kaldi speech recognizer. Its performance was better in comparison with the Google Speech API. The comparison was based on a subjective user satisfaction acquired by crowdsourcing. Powered by TCPDF (www.tcpdf.org)
Pronunciation Features of Czech Language - Dialect Analysis
Michlíková, Vendula ; Peterek, Nino (advisor) ; Korvas, Matěj (referee)
We implemented Výrče:SW, a tool for collecting and analysing audio recordings without the necessity of supervisor's assistance. The tool allows creating a wide range of recording scenarios, including the possibility to analyse the recordings and show the results. Using the created tool, we collected Výrče:Korpus, a read audio corpus of 34 speakers and 2376 utterances of 7 hours in length. The corpus also includes questionnaires that provide information about the dialect reliability of speakers. Sufficient amounts of speakers for dialect analysis are from the Central Bohemian dialect area and Silesian dialect area. On the two selected groups, we trained a simple monophone dialect recogniser based on Hidden Markov Models. Powered by TCPDF (www.tcpdf.org)
Statistical Natural Language Processing Methods in Music Notation Analysis
Libovický, Jindřich ; Peterek, Nino (advisor) ; Mareček, David (referee)
The thesis summarizes the research in application of statistical methods of computational linguistics in music processing and explains theoretical background of these applications. In the second part methods of symbolic melody extraction are explored. A corpus of approxi- mately 400 hours of melodies of different music styles was created. A melody model using the language modeling techniques was trained on this corpus. In the third part of the thesis the model is used for an attempt to develop an alternative method of audio melody extraction which uses the melody model instead of commonly used heuristics and rules. The chosen ap- proach works well only on simple input data and produces worse results than the commonly used methods on the MIREX contest data. On the other hand, the experiments help to understand the conceptual between the pitch frequency development - the physical melody - and the melody perceived on an abstract level in the symbolic notation - the symbolic melody. 1

National Repository of Grey Literature : 29 records found   previous11 - 20next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.