National Repository of Grey Literature 4 records found  Search took 0.00 seconds. 
Robust Speech Activity Detection
Popková, Anna ; Plchot, Oldřich (referee) ; Matějka, Pavel (advisor)
The aim of this work is to design and create a robust speech activity detector that is able to detect speech in different languages, in a noise environment and with music on background. I decided to solve this problem by using a neural network as a classification model that assigns one of the four possible classes - silence, speech, music, or noise to the input of audio recording. The resulting tool is able to detect the speech in at least 12 languages. Speech with musical background up to 88 % accuracy and system success on noisy data reaches from 84 % (5 dB SNR) to 88 % (20 dB SNR). This tool can be used for speech activity detection in various research areas of speech processing. The main contribution is the elimination of music, which when not eliminated, significantly increases the error rate of systems for speaker identification or speech recognition.
Emotion Detection from Speech
Popková, Anna ; Fér, Radek (referee) ; Matějka, Pavel (advisor)
This Bachelor Thesis deals with research in the field of emotion recognition mainly from speech and marginally from other modalities (video and physiological data). It closely describes the topology of the systems built specifically for the subject of this work. Moreover, it describes experiments leading to optimized pre-processing, regressor training and post-processing. Data used for these research origins from evaluation AV+EC 2015. Results of fusion systems producing the most precise prediction were sent to this evaluation. The Bottle-Neck features are newly tested and combined favorably with commonly used eGeMAPS features for the recognition of arousal. For valence, two kinds of video features are used. Muli-task system (recognizing both valence and arousal) using Bottle-Neck features produces competitive results and is only 13 % relatively behind the mentioned fusion system. This is especially appealing for applications where only audio is available.
Robust Speech Activity Detection
Popková, Anna ; Plchot, Oldřich (referee) ; Matějka, Pavel (advisor)
The aim of this work is to design and create a robust speech activity detector that is able to detect speech in different languages, in a noise environment and with music on background. I decided to solve this problem by using a neural network as a classification model that assigns one of the four possible classes - silence, speech, music, or noise to the input of audio recording. The resulting tool is able to detect the speech in at least 12 languages. Speech with musical background up to 88 % accuracy and system success on noisy data reaches from 84 % (5 dB SNR) to 88 % (20 dB SNR). This tool can be used for speech activity detection in various research areas of speech processing. The main contribution is the elimination of music, which when not eliminated, significantly increases the error rate of systems for speaker identification or speech recognition.
Emotion Detection from Speech
Popková, Anna ; Fér, Radek (referee) ; Matějka, Pavel (advisor)
This Bachelor Thesis deals with research in the field of emotion recognition mainly from speech and marginally from other modalities (video and physiological data). It closely describes the topology of the systems built specifically for the subject of this work. Moreover, it describes experiments leading to optimized pre-processing, regressor training and post-processing. Data used for these research origins from evaluation AV+EC 2015. Results of fusion systems producing the most precise prediction were sent to this evaluation. The Bottle-Neck features are newly tested and combined favorably with commonly used eGeMAPS features for the recognition of arousal. For valence, two kinds of video features are used. Muli-task system (recognizing both valence and arousal) using Bottle-Neck features produces competitive results and is only 13 % relatively behind the mentioned fusion system. This is especially appealing for applications where only audio is available.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.