National Repository of Grey Literature 38 records found  previous11 - 20nextend  jump to record: Search took 0.00 seconds. 
Automatic Keyword Detection
Mašláňová, Marcela ; Karafiát, Martin (referee) ; Smrž, Pavel (advisor)
The main goal of this work is to survey the field of the automatic keywords tagging in a text and apply this background for automatically generating back-of-the-book indexes. Human made indexes are expensive and that's why we are looking for (semi)-automatic methods indexes. The theoretical part of this thesis deals with collocations, which are an important part of generated indexes. The practical part of the work applies selected methods to testing data and summarize results of experiments.
Multi-Task Neural Networks for Speech Recognition
Egorova, Ekaterina ; Veselý, Karel (referee) ; Karafiát, Martin (advisor)
První část této diplomové práci se zabývá teoretickým rozborem principů neuronových sítí, včetně možnosti jejich použití v oblasti rozpoznávání řeči. Práce pokračuje popisem viceúkolových neuronových sítí a souvisejících experimentů. Praktická část práce obsahovala změny software pro trénování neuronových sítí, které umožnily viceúkolové trénování. Je rovněž popsáno připravené prostředí, včetně několika dedikovaných skriptů. Experimenty představené v této diplomové práci ověřují použití artikulačních characteristik řeči pro viceúkolové trénování. Experimenty byly provedeny na dvou řečových databázích lišících se kvalitou a velikostí a representujících různé jazyky - angličtinu a vietnamštinu. Artikulační charakteristiky byly také kombinovány s jinými sekundárními úkoly, například kontextem, s záměrem ověřit jejich komplementaritu. Porovnaní je provedeno s neuronovými sítěmi různých velikostí tak, aby byl popsán vztah mezi velikostí neuronových sítí a efektivitou viceúkolového trénování. Závěrem provedených experimentů je, že viceúkolové trénování s použitím artikulačnich charakteristik jako sekundárních úkolů vede k lepšímu trénování neuronových sítí a výsledkem tohoto trénování může být přesnější rozpoznávání fonémů. V závěru práce jsou viceúkolové neuronové sítě testovány v systému rozpoznávání řeči jako extraktor příznaků.
Activity of Neural Network in Hidden Layers - Visualisation and Analysis
Fábry, Marko ; Grézl, František (referee) ; Karafiát, Martin (advisor)
Goal of this work was to create system capable of visualisation of activation function values, which were produced by neurons placed in hidden layers of neural networks used for speech recognition. In this work are also described experiments comparing methods for visualisation, visualisations of neural networks with different architectures and neural networks trained with different types of input data. Visualisation system implemented in this work is based on previous work of Mr. Khe Chai Sim and extended with new methods of data normalization. Kaldi toolkit was used for neural network training data preparation. CNTK framework was used for neural network training. Core of this work - the visualisation system was implemented in scripting language Python.
Grammar Based Automatic Speech Recognizer
Škorvaga, Vojtěch ; Karafiát, Martin (referee) ; Schwarz, Petr (advisor)
This work describes a development of system for network compilation for speech recognition based on Speech Recognition Grammar Specification (SRGS) grammar defined by W3C consortium. Together with the new module, the recognizer was integrated to the FreeSwitch software phone switch using a combination of MRCPv2/SIP/RTP networks protokols and tested.
Impact of Environment Acoustics on Speech Recognition Accuracy
Paliesek, Jakub ; Karafiát, Martin (referee) ; Szőke, Igor (advisor)
This diploma thesis deals with impact of room acoustics on automatic speech recognition (ASR) accuracy. Experiments were evaluated on speech corpus LibriSpeech and database of impulse responses and noise called ReverbDB. Used ASRs were based on Mini LibriSpeech recipe for Kaldi. First it was examined how well can ASR learn to transcribe in selected environments by using the same acoustic conditions during training and testing. Next, experiments were carried out with modifications of ASR architecture in order to achieve better robustness against new conditions by using methods for adapation to room acoustics - r-vectors and i-vectors. It was shown that recently proposed method of r-vectors is beneficial even when using real impulse responses for data augmentation.
Low-Dimensional Matrix Factorization in End-To-End Speech Recognition Systems
Gajdár, Matúš ; Grézl, František (referee) ; Karafiát, Martin (advisor)
The project covers automatic speech recognition with neural network training using low-dimensional matrix factorization. We are describing time delay neural networks with factorization (TDNN-F) and without it (TDNN) in Pytorch language. We are comparing the implementation between Pytorch and Kaldi toolkit, where we achieve similar results during experiments with various network architectures. The last chapter describes the impact of a low-dimensional matrix factorization on End-to-End speech recognition systems and also a modification of the system with TDNN(-F) networks. Using specific network settings, we were able to achieve better results with systems using factorization. Additionally, we reduced the complexity of training by decreasing network parameters with the use of TDNN(-F) networks.
Speaker Diarization of Meeting Data
Tůma, Radovan ; Konečný, Matej (referee) ; Karafiát, Martin (advisor)
This work is trying to propose Diarization System based on Bayesian Information Criterion (BIC). In this paper is possible to find description of background theory and short description of previously used systems. Idea of this work is to try to use methods proposed earlier in a faster and more reliable way. Proposed system was tested on some records to prove its error rate. Results of tests are not very good but some possible improvements are proposed.
Voice Activity Detection
Ent, Petr ; Karafiát, Martin (referee) ; Matějka, Pavel (advisor)
Práce pojednává o využití support vector machines v detekci řečové aktivity. V první části jsou zkoumány různé druhy příznaků, jejich extrakce a zpracování a je nalezena jejich optimální kombinace, která podává nejlepší výsledky. Druhá část představuje samotný systém pro detekci řečové aktivity a ladění jeho parametrů. Nakonec jsou výsledky porovnány s dvěma dalšími systémy, založenými na odlišných principech. Pro testování a ladění byla použita ERT broadcast news databáze. Porovnání mezi systémy bylo pak provedeno na databázi z NIST06 Rich Test Evaluations.
Automatic Transcription of Air-Traffic Communication to Text
Balok, Petr ; Karafiát, Martin (referee) ; Szőke, Igor (advisor)
This thesis solves the problem of getting text transcription from audio files containing air-traffic communication and audio files containing speech in two languages. I solved this problem using machine learning, specifically by using toolkits written in Python called NeMo and Whisper. Before fine-tuning, I got a 78 % word error rate on an ATC dataset and a 60 % word error rate on a bilingual dataset. Using these technologies, I managed to lower the word error rate to 24 % in transcriptions of air-traffic communication. I also got a 19 % word error rate for bilingual speech. The results of this thesis allow automatic transcription of air-traffic communication with a low rate of errors in the transcript. Furthermore, models trained on bilingual dataset allow transcribing audio files containing both English and Czech speech in one file.
High Level Analysis of the Psychotherapy Sessions
Polok, Alexander ; Karafiát, Martin (referee) ; Matějka, Pavel (advisor)
This work focuses on analyzing psychotherapy sessions within the DeePsy research project. This work aims to design and develop features that model the session dynamics, which can reveal seemingly subtle nuances. The mentioned features are automatically extracted from the source recording using neural networks. They are further processed, compared across sessions, and displayed graphically, creating a document that acts as a feedback document about the session for the therapist. Furthermore, this assistive tool can help therapists to professionally grow and to provide better psychotherapy in the future. A relative improvement in voice activity detection of 37.82% was achieved. The VBx diarization system was generalized to converge to two speakers with a minimum relative error rate degradation of 0.66%. An automatic speech recognition system has been trained with a 17.06% relative improvement over the best available hybrid model. Models for sentiment classification, type of therapeutic interventions, and overlapping speech detection were also trained.

National Repository of Grey Literature : 38 records found   previous11 - 20nextend  jump to record:
See also: similar author names
3 Karafiát, Michal
Interested in being notified about new results for this query?
Subscribe to the RSS feed.