Národní úložiště šedé literatury Nalezeno 41 záznamů.  předchozí11 - 20dalšíkonec  přejít na záznam: Hledání trvalo 0.01 vteřin. 
Doplňování interpunkce do automatického přepisu řeči
Ščavnický, Tomáš ; Veselý, Karel (oponent) ; Szőke, Igor (vedoucí práce)
Táto práca sa zaoberá rekonštrukciou interpunkcie vo výstupoch systémov na automatický prepis reči. Výsledný systém by mal byť schopný rekonštruovať interpunkciu vo všeobecnej zväčša hovorenej angličtine s rozumnou mierou presnosti. Prirodzený ľudský jazyk sa v istých prípadoch sa môže javiť nedeterministický a tvorba reťazcov často podlieha veľkému množstvu gramatických pravidiel. Kvôli tomu boli na predikciu interpunkcie vybrané algoritmy strojového učenia pre ich schopnosť rozoznať komplikované vzory v dátach. Bolo vykonaných niekoľko experimentov s rekurentnými neurónovými sieťami za účelom nájdenia najvhodnejšej architektúry modelu. Výsledné modely vytvorené počas týchto experimentov dosahujú presnosť porovnateľnú ak nie lepšiu než práce, v súčasnosti považované za najlepšie v obore.
Multi-Task Neural Networks for Speech Recognition
Egorova, Ekaterina ; Veselý, Karel (oponent) ; Karafiát, Martin (vedoucí práce)
The first part of this Master's thesis covers theoretical investigation into the principles and usage of neural networks, including their usability for the speech recognition tasks. Then it proceeds to summarize the multi-task neural networks' operating principles and some recent experiments with them. The practical part of the semester project reports changes made to a tool for neural network training which support multi-task training. Then the preparation of the settings is described, including a number of scripts written especially for this purpose. The experiments presented in the thesis explore the idea of using articulatory characteristics of phonemes as secondary tasks for multi-task training. The experiments are conducted on two different datasets of different quality and size and representing different languages - English and Vietnamese. Articulatory characteristics are occasionally combined with different secondary tasks, such as context, to see how well they function together. A comparison is made between the networks of different sizes to see how their size affects the effectiveness of multi-task training. These experiments show that multi-task training with the use of articulatory characteristics as secondary tasks can enhance training and yield better phoneme accuracy as a result. Finally, multi-task training is embedded to a speech recognition system as a feature extractor.
Automatic Speech Recognition System Continually Improving Based on Subtitled Speech Data
Kocour, Martin ; Veselý, Karel (oponent) ; Černocký, Jan (vedoucí práce)
Today's large vocabulary speech recognition systems are very accurate. However, tens or hundreds of hours of manually transcribed speech are needed in order to train such system. This kind of data is often unavailable, or they even do not exist for the desired language. A possible solution is to use commonly available but lower quality audiovisual data. This thesis addresses the methods of processing such data for semi-supervised training of acoustic models. Afterwards, it demonstrates how to continually improve already trained acoustic models by using these practically unlimited data. In this work is proposed a novel approach for selecting data based on similarity with the target domain.
Hybridní rozpoznávač izolovaných slov
Veselý, Karel ; Černocký, Jan (oponent) ; Grézl, František (vedoucí práce)
Rozpozávač izolovaných slov nezávislý na mluvčím má mnoho praktických použití. Například bude umožňovat ovládat hlasem různé domácí přístroje příští generace které budou komunikovat s PC. Ještě zajímavější je možnost jej vestavět do jakékoli aplikace nebo dokonce do operačního systému a rozšířit tak uživatelské rozhranní o nový prvek, hlasové ovládání. Dá se využít k ovládání pomocí klíčových slov, reakcí může být spuštění aplikace nebo jakákoli jiná specifická akce. Nejzajímavější možnost využití rozpoznávače izolovaných slov je v elektronických slovnících. Novým rysem slovníků příští generace by mohlo být hlasové vyhledávání slov. Velmi užitečná je možnost získat na výstupu seznam slov sežazený podle pravděpodobnosti vyslovení. Tento rys umožňuje uživateli jednoduše zjistit podobná slova a naučit se je lépe rozlišovat.
Automatic Speech Detection for VHF Channel
Nováková, Mária ; Veselý, Karel (oponent) ; Szőke, Igor (vedoucí práce)
A noisy environment in air traffic communication is an unavoidable problem. The communication between the control tower and the pilot should be the most reliable and effective. That is why voice activity detection is crucial for recognising the start of the speech segment of the communicants for automated systems. The speakers take turns providing information by pressing the push-to-talk button. To detect voice activity, various approaches are used. Even though these methods are effective, machine learning can easily outshine them. Neural networks are widely used in voice activity detection as well as in other areas. Properly trained models are efficient and adaptable. In this thesis, a solution for voice activity detection together with push-to-talk detection is proposed. Proposed models are evaluated and compared. The adaptation of the GPVAD approach is discussed and compared to the proposed models. Neural networks will have their chance to once again prove that they are suitable for any task.
Recurrent Neural Networks with Elastic Time Context in Language Modeling
Beneš, Karel ; Veselý, Karel (oponent) ; Hannemann, Mirko (vedoucí práce)
This thesis describes an experimental work in the field of statistical language modeling with recurrent neural networks (RNNs). A thorough literature survey on the topic is given, followed by a description of algorithms used for training the respective models. Most of the techniques have been implemented using Theano toolkit. Extensive experiments have been carried out with the Simple Recurrent Network (SRN), which revealed some previously unpublished findings. The best published result has not been replicated in case of static evaluation. In the case of dynamic evaluation, the best published result was outperformed by 1 %. Then, experiments with the Structurally Constrained Recurrent Network have been conducted, but the performance could not be improved over the SRN baseline. Finally, a novel enhancement of the SRN was proposed, leading to a Randomly Sparse RNN (RS-RNN) architecture. This enhancement is based on applying a fixed binary mask on the recurrent connections, thus forcing some recurrent weights to zero. It is empirically confirmed, that RS-RNN models learn the training corpus better and a combination of RS-RNN models achieved a 30% bigger gain on test data than a combination of dense SRN models of same size.
Semi-Supervised Training of Deep Neural Networks for Speech Recognition
Veselý, Karel ; Ircing, Pavel (oponent) ; Lamel, Lori (oponent) ; Burget, Lukáš (vedoucí práce)
In this thesis, we first present the theory of neural network training for the speech recognition, along with our implementation, that is available as the 'nnet1' training recipe in the Kaldi toolkit. The recipe contains RBM pre-training, mini-batch frame Cross-Entropy training and sequence-discriminative sMBR training. Then we continue with the main topic of this thesis: semi-supervised training of DNN-based ASR systems. Inspired by the literature survey and our initial experiments, we investigated several problems: First, whether the confidences are better to be calculated per-sentence, per-word or per-frame. Second, whether the confidences should be used for data-selection or data-weighting. Both approaches are compatible with the framework of weighted mini-batch SGD training. Then we tried to get better insight into confidence calibration, more precisely whether it can improve the efficiency of semi-supervised training. We also investigated how the model should be re-tuned with the correctly transcribed data. Finally, we proposed a simple recipe that avoids a grid search of hyper-parameters, and therefore is very practical for general use with any dataset. The experiments were conducted on several data-sets: for Babel Vietnamese with 10 hours of transcribed speech, the Word Error Rate (WER) was reduced by 2.5%. For Switchboard English with 14 hours of transcribed speech, the WER was reduced by 3.2%. Although we found it difficult to further improve the performance of semi-supervised training by means of enhancing the confidences, we still believe that our findings are of significant practical value: the untranscribed data are abundant and easy to obtain, and our proposed solution brings solid WER improvements and it is not difficult to replicate.
The Best Possible Speech Recognizer on Your Own Data
Sýkora, Tomáš ; Veselý, Karel (oponent) ; Szőke, Igor (vedoucí práce)
Many state-of-the-art results in different machine learning areas are presented on day-to-day basis. By adjusting these systems to perform perfectly on a specific subset of all general data, huge improvements may be achieved in their resulting accuracy. Usage of domain adaptation in automatic speech recognition can bring us to production level models capable of transcribing difficult and noisy customer conversations way more accurately than the general models trained on all kinds of language and speech data. In this work I present 17% word error rate improvement in our speech recognition task over the general domain speech recognizer from Google. The improvement was achieved by both very precise annotation and preparation of domain data and by combining state-of-the-art techniques and algorithms. The described system was successfully integrated into a production environment of the Parrot transcription company, where I am a member of the initial team, which drastically increased performance of the human transcribers.
Music Source Separation
Holík, Viliam ; Veselý, Karel (oponent) ; Mošner, Ladislav (vedoucí práce)
Neural networks are used for the problem of music source separation from recordings. One such network is Conv-TasNet. The aim of the work is to experiment with the already existing implementation of this network for the purpose of potential improvement. The models were trained on the MUSDB18 dataset. It was successively experimented with the change of the network structure, transforming signals from the time domain to the frequency domain for the purpose of calculating the loss function, replacing different loss functions with the original one, finding the optimal learning rate for each loss function and gradually decreasing the learning rate during the learning process. The best experiments according to the SDR metric were training with loss functions L1 and logarithmic L2 in the time domain with a higher initial learning rate with its gradual decrease during the learning process. In a relative comparison of the best models to the baseline, it is more than 2.5% improvement.
Automatic Speech Detection for VHF Channel
Nováková, Mária ; Veselý, Karel (oponent) ; Szőke, Igor (vedoucí práce)
A noisy environment in air traffic communication is an unavoidable problem. The communication between the control tower and the pilot should be the most reliable and effective. That is why voice activity detection is crucial for recognising the start of the speech segment of the communicants for automated systems. The speakers take turns providing information by pressing the push-to-talk button. To detect voice activity, various approaches are used. Even though these methods are effective, machine learning can easily outshine them. Neural networks are widely used in voice activity detection as well as in other areas. Properly trained models are efficient and adaptable. In this thesis, a solution for voice activity detection together with push-to-talk detection is proposed. Proposed models are evaluated and compared. The adaptation of the GPVAD approach is discussed and compared to the proposed models. Neural networks will have their chance to once again prove that they are suitable for any task.

Národní úložiště šedé literatury : Nalezeno 41 záznamů.   předchozí11 - 20dalšíkonec  přejít na záznam:
Viz též: podobná jména autorů
10 VESELÝ, Karel
2 Veselý, Kamil
Chcete být upozorněni, pokud se objeví nové záznamy odpovídající tomuto dotazu?
Přihlásit se k odběru RSS.