National Repository of Grey Literature 7 records found  Search took 0.01 seconds. 
Automatic Transcription of Speech Supporting Code Switching
Bílek, Štěpán ; Karafiát, Martin (referee) ; Szőke, Igor (advisor)
This thesis addresses the issue of automatic speech recognition, focusing on the recognition of audio containing multilingual speech, known as code-switching. The problem of a lack of multilingual data for training is addressed by combining recordings in English and German. To achieve the closest approximation to real bilingual speech, a portion of the datasets is created by merging recordings of similar speakers. The Whisper model is trained and tested on the created data. In its original unadapted version, the model achieves an error rate of up to 70 %. The best models trained on combined datasets achieve error rates slightly above 7 %. The results of this study demonstrate methods for training models to achieve the best possible performance.
Human-machine collaboration - using speech processing
Kisler, Štěpán ; Hůlka, Tomáš (referee) ; Juříček, Martin (advisor)
This bachelor's thesis focuses on the design and implementation of a voice control system for the UR3 CB series collaborative robot from Universal Robots, aiming to simplify human-robot interaction. The introduction provides an overview of collaborative robotics, including its history, successful applications, and the possibilities of programming collaborative robots. Additionally, it explores speech recognition technology, covering its applications, history, and methods. The practical section compares existing speech recognition systems and selects the most suitable one for robot voice control. It also details the development of a voice control program in Python and the testing of the entire system, both in simulation and real-world conditions in a robotics laboratory.
Circulation
Reisová, Kristína ; Sikora, Erik (referee) ; Smetana, Matěj (advisor)
Urethral falls, crying tears, breathing mist, river of blood, sweat underneath, sneezing mildew, tides of saliva, bladder bays, springs under the skin, weeing spring, renal lagoon, stalactites in the nose,... Running around the water cycle, no beginning and no end. My voice is in it. When it flows, the voice sings. When evaporates it becomes steam, it whispers and when the rain starts to fall the rap begins. The water cycle is dependent on the water cycle, natural condition, and the weather. So I’m addicted to the environment, and the mood and atmosphere is what it is, and what it will be.
Automatic Transcription of Air-Traffic Communication to Text
Balok, Petr ; Karafiát, Martin (referee) ; Szőke, Igor (advisor)
This thesis solves the problem of getting text transcription from audio files containing air-traffic communication and audio files containing speech in two languages. I solved this problem using machine learning, specifically by using toolkits written in Python called NeMo and Whisper. Before fine-tuning, I got a 78 % word error rate on an ATC dataset and a 60 % word error rate on a bilingual dataset. Using these technologies, I managed to lower the word error rate to 24 % in transcriptions of air-traffic communication. I also got a 19 % word error rate for bilingual speech. The results of this thesis allow automatic transcription of air-traffic communication with a low rate of errors in the transcript. Furthermore, models trained on bilingual dataset allow transcribing audio files containing both English and Czech speech in one file.
Voice Control of Industrial and Medical Devices in Noisy Environments
Vymětalíková, Lucie ; Matoušek, Radomil (referee) ; Dobrovský, Ladislav (advisor)
This diploma thesis deals with voice control of industrial and medical devices in noisy environments. Different speech recognition models and methods for noise supression in speech signals are compared. Based on the research and conducted testing, a custom voice control system is designed. The system consists of a wake word detection model and a model for the predefined commands recognition. An audio response for the operator and a script execution based on the recognized commands is also implemented in the system. A modification for automatic door opening of the OpenTube2 laboratory box was designed.
High Level Analysis of the Psychotherapy Sessions
Polok, Alexander ; Karafiát, Martin (referee) ; Matějka, Pavel (advisor)
This work focuses on analyzing psychotherapy sessions within the DeePsy research project. This work aims to design and develop features that model the session dynamics, which can reveal seemingly subtle nuances. The mentioned features are automatically extracted from the source recording using neural networks. They are further processed, compared across sessions, and displayed graphically, creating a document that acts as a feedback document about the session for the therapist. Furthermore, this assistive tool can help therapists to professionally grow and to provide better psychotherapy in the future. A relative improvement in voice activity detection of 37.82% was achieved. The VBx diarization system was generalized to converge to two speakers with a minimum relative error rate degradation of 0.66%. An automatic speech recognition system has been trained with a 17.06% relative improvement over the best available hybrid model. Models for sentiment classification, type of therapeutic interventions, and overlapping speech detection were also trained.
Circulation
Reisová, Kristína ; Sikora, Erik (referee) ; Smetana, Matěj (advisor)
Urethral falls, crying tears, breathing mist, river of blood, sweat underneath, sneezing mildew, tides of saliva, bladder bays, springs under the skin, weeing spring, renal lagoon, stalactites in the nose,... Running around the water cycle, no beginning and no end. My voice is in it. When it flows, the voice sings. When evaporates it becomes steam, it whispers and when the rain starts to fall the rap begins. The water cycle is dependent on the water cycle, natural condition, and the weather. So I’m addicted to the environment, and the mood and atmosphere is what it is, and what it will be.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.