National Repository of Grey Literature 38 records found  1 - 10nextend  jump to record: Search took 0.01 seconds. 
Speaker Diarization
Tomášek, Pavel ; Karafiát, Martin (referee) ; Matějka, Pavel (advisor)
This work aims at a task of speaker diarization. The goal is to implement a system which is able to decide "who spoke when". Particular components of implementation are described. The main parts are feature extraction, voice activity detection, speaker segmentation and clustering and finally also postprocessing. This work also contains results of implemented system on test data including a description of evaluation. The test data comes from the NIST RT Evaluation 2005 - 2007 and the lowest error rate for this dataset is 18.52% DER. Results are compared with diarization system implemented by Marijn Huijbregts from The Netherlands, who worked on the same data in 2009 and reached 12.91% DER.
Unsupervised Adaptation of Speech Recognizer
Švec, Ján ; Karafiát, Martin (referee) ; Schwarz, Petr (advisor)
The goal of this thesis is to design and test techniques for unsupervised adaptation of speech recognizers on some audio data without any textual transcripts. A training set is prepared at first, and a baseline speech recognition system is trained. This sistem is used to transcribe some unseen data. We will experiment with an adaptation data selection process based on some speech transcript quality measurement. The system is re-trained on this new set than, and the accuracy is evaluated. Then we experiment with the amount of adaptation data.
End-to-End Speech Recognition for Low-Resource Languages
Sokolovskii, Vladislav ; Schwarz, Petr (referee) ; Karafiát, Martin (advisor)
Oblast automatického rozpoznávání řeči začala přijímat end-to-end řešení neuronové sítě pro vytváření rozpoznávačů řeči. Povaha datového hladu těchto typů systémů však umožňuje vytvářet rozpoznávače pouze pro jazyky s velkými zdroji, jako je angličtina, čínština nebo španělština. Ve scénářích s nízkými zdroji je třeba vyvinout některá řešení, která zmírní problém nedostatku dat. Jednou z nejúčinnějších technik je doladění předtrénovaného modelu. Problém se stávajícími přístupy ladění spočívá v tom, že sada tokenů cílového a zdrojového jazyka se obvykle liší. To je důvod, proč předchozí přístupy k učení vícejazyčného přenosu vyžadovaly změnu výstupní vrstvy nebo smíchání tokenů z různých jazyků ve výstupní vrstvě, případně použití univerzální sady tokenů anebo samostatné výstupní vrstvy pro každý jazyk. To je nežádoucí, jelikož sdílení napříč jazyky je v tomto případě latentní a neovladatelné ve výstupním prostoru, když jsou grafémy specifické pro daný jazyk disjunktní. Proto tato práce navrhuje mapování tokenů do společné sady před začátkem předtréninku. Stávající řešení spočívá v transliteraci zdrojového jazyka do cílového, novým přístupem je romanizace, kde je sada tokenů cílového jazyka romanizována tak, aby odpovídala anglické abecedě. Následně lze diakritiku z romanizovaných hypotéz obnovit pomocí dalšího modelu obnovy. To má výhodu ve zvýšení sdílení v prostoru výstupního grafému.
Fast and Accurate Keyword Spotting System
Lenčéš, Marián ; Karafiát, Martin (referee) ; Schwarz, Petr (advisor)
This bachelor's thesis deals with fast and accurate detection of keywords from audio records. The aim of was to study possibilities of word detection and to create several types of language models. These were then to be compared to each other. We focus here on the detection of keywords from English spoken audio records.
Speech Recognition For Selected Languages
Schmitt, Jan ; Karafiát, Martin (referee) ; Janda, Miloš (advisor)
This bachelor's thesis deals with recognition of continues speech for three languages - Bulgarian, Croatian and Swedish. There are described basics of speech processing and recognition methods like acoustic modeling using hidden Markov models and gaussian mixture models. Another aim of this work is preparing data for those languages from GlobalPhone database, so they may be used with speech recognition toolkits Kaldi and HTK. With data prepared there are several models trained and tested using Kaldi toolkit.
Automatic jazz arrangement
Chadim, Petr ; Karafiát, Martin (referee) ; Fapšo, Michal (advisor)
This Thesis is focused on the arranging of the melody, which is accompanied by jazz chords. It deals with creating a more harmonious voices using Block Voicing method. Distribution to target notes and passing notes is made using techniques of constraint programming (CSP). Passing notes are reharmonized by dominant seventh chord or by parallel chord. Using CSP a bass part is also created. To solve CSP is used Gecode library. The harmonious voices are arranged by Four Part Close Voicing. The application result is a tool for the music arranger.
New Techniques in Neural Networks Training - Connectionist Temporal Classification
Gajdár, Matúš ; Švec, Ján (referee) ; Karafiát, Martin (advisor)
This bachelor’s thesis deals with neural network and their use in speech recognition. Firstly,there is some theory about speech recognition, afterwards we show theory around neural networks in connection with connectionist temporal classification method. In next chapter we introduce toolkits, which were used for training of neural networks and also experiments done by them to find out impact of connectionist temporal classification method on precisionin phoneme decoding. The last chapter include summarization of work and overall evaluation of experiments.
Set of JavaApplets Demonstrations for Speech Processing
Kudr, Michal ; Karafiát, Martin (referee) ; Černocký, Jan (advisor)
The goal of the thesis is being familiar with methods a techniques used in speech processing. Using the obtained knowledge I propose three JavaApplets demonstrating selected methods. In this thesis we can find the theoretical analysis of selected problems.
Recurrent Neural Networks for Speech Recognition
Nováčik, Tomáš ; Karafiát, Martin (referee) ; Veselý, Karel (advisor)
This master thesis deals with the implementation of various types of recurrent neural networks via programming language lua using torch library. It focuses on finding optimal strategy for training recurrent neural networks and also tries to minimize the duration of the training. Furthermore various types of regularization techniques are investigated and implemented into the recurrent neural network architecture. Implemented recurrent neural networks are compared on the speech recognition task using AMI dataset, where they model the acustic information. Their performance is also compared to standard feedforward neural network. Best results are achieved using BLSTM architecture. The recurrent neural network are also trained via CTC objective function on the TIMIT dataset. Best result is again achieved using BLSTM architecture.
Domain Specific Data Crawling for Language Model Adaptation
Gregušová, Sabína ; Švec, Ján (referee) ; Karafiát, Martin (advisor)
The goal of this thesis is to implement a system for automatic language model adaptation for Phonexia ASR system. System expects input in the form of source that, which is analysed and appropriate terms for web search are chosen. Every web search results in a set of documents that undergo cleaning and filtering procedures. The resulting web corpora is mixed with Phonexia model and evaluated. In order to estimate the most optimal parameters, I conducted 3 sets of experiments for Hindi, Czech and Mandarin. The results of the experiments were very favourable and the implemented system managed to decrease perplexity and Word Error Rate in most cases.

National Repository of Grey Literature : 38 records found   1 - 10nextend  jump to record:
See also: similar author names
3 Karafiát, Michal
Interested in being notified about new results for this query?
Subscribe to the RSS feed.