National Repository of Grey Literature 4 records found  Search took 0.01 seconds. 
Robust Audio Dereverberation and Denoising
Košina, Simon ; Skácel, Miroslav (referee) ; Szőke, Igor (advisor)
The goal of this thesis was to create a speech enhancement and dereverberation model for audio recordings coming from aircraft VHF communication. First, the thesis covers some theoretical grounds of machine learning and types of neural networks commonly used in such scenarios. Following is a description of the used framework, datasets and the implementation itself. Last chapters are focused on the performed experiments and their evaluation. At the end we talk about the future work that can be done in order to further improve the achieved results.
Learning the Face Behind a Voice
Krušina, Josef ; Matějka, Pavel (referee) ; Plchot, Oldřich (advisor)
This work addresses the problem of mapping fixed representations (embeddings) of a speech signal to face embeddings and then generating a face from the mapped embedding using a generative adversarial network (GAN) that was trained for face generation. GANs are a type of neural networks that can generate data similar to the data they were trained on. The architecture of the proposed system is based on four components: a face embedding extractor, a voice embedding extractor, an algorithm on top of a GAN that can generate a face from a face embedding, and my mapping network used to map a voice embedding to a face embedding. The pre-trained neural networks FaceNet and SpeechBrain are adopted as embedding extractors. A model that uses a pre-trained StyleGAN2 is adopted for backward face generation. The contribution of this work is that it allows the extrapolation of a face from audio signal only.
Learning the Face Behind a Voice
Krušina, Josef ; Matějka, Pavel (referee) ; Plchot, Oldřich (advisor)
This work addresses the problem of mapping fixed representations (embeddings) of a speech signal to face embeddings and then generating a face from the mapped embedding using a generative adversarial network (GAN) that was trained for face generation. GANs are a type of neural networks that can generate data similar to the data they were trained on. The architecture of the proposed system is based on four components: a face embedding extractor, a voice embedding extractor, an algorithm on top of a GAN that can generate a face from a face embedding, and my mapping network used to map a voice embedding to a face embedding. The pre-trained neural networks FaceNet and SpeechBrain are adopted as embedding extractors. A model that uses a pre-trained StyleGAN2 is adopted for backward face generation. The contribution of this work is that it allows the extrapolation of a face from audio signal only.
Robust Audio Dereverberation and Denoising
Košina, Simon ; Skácel, Miroslav (referee) ; Szőke, Igor (advisor)
The goal of this thesis was to create a speech enhancement and dereverberation model for audio recordings coming from aircraft VHF communication. First, the thesis covers some theoretical grounds of machine learning and types of neural networks commonly used in such scenarios. Following is a description of the used framework, datasets and the implementation itself. Last chapters are focused on the performed experiments and their evaluation. At the end we talk about the future work that can be done in order to further improve the achieved results.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.