|
Convolutional Networks for Lip Reading
Kadleček, Josef ; Kišš, Martin (referee) ; Hradiš, Michal (advisor)
This thesis deals with current methods for automatic speech recognition and lip reading via neural networks. Furthermore it deals with similarities in the architectures of neural networks for audio and visual data and available datasets in the field of audiovisual automatic speech recognition. The main contribution of this thesis is set of experiments comparing different changes in neural network architecture and its impact on results. The thesis includes an implementation of a system for automatic speech recognition from audio (CER: 12.6 %) and visual (CER: 57,7 %) data. The architectures of both systems are based on features extraction via convolutional networks followed by recurrent layers LSTM, another layer of convolutions and loss function CTC.
|
|
Topic Detection from Spoken Speech
Škeřík, Zdeněk ; Szőke, Igor (referee) ; Schwarz, Petr (advisor)
This thesis is about topic detection from spoken speech. The first part of the thesis deals with speech transcription to text. The thesis describes two different solutions of the topic detection - a machine learning based solution and an expert solution that composes a very precise query describing the document topic. Both methods are tested on a set of recordings and compared.
|
| |
|
Convolutional Networks for Lip Reading
Kadleček, Josef ; Kišš, Martin (referee) ; Hradiš, Michal (advisor)
This thesis deals with current methods for automatic speech recognition and lip reading via neural networks. Furthermore it deals with similarities in the architectures of neural networks for audio and visual data and available datasets in the field of audiovisual automatic speech recognition. The main contribution of this thesis is set of experiments comparing different changes in neural network architecture and its impact on results. The thesis includes an implementation of a system for automatic speech recognition from audio (CER: 12.6 %) and visual (CER: 57,7 %) data. The architectures of both systems are based on features extraction via convolutional networks followed by recurrent layers LSTM, another layer of convolutions and loss function CTC.
|
| |
|
Topic Detection from Spoken Speech
Škeřík, Zdeněk ; Szőke, Igor (referee) ; Schwarz, Petr (advisor)
This thesis is about topic detection from spoken speech. The first part of the thesis deals with speech transcription to text. The thesis describes two different solutions of the topic detection - a machine learning based solution and an expert solution that composes a very precise query describing the document topic. Both methods are tested on a set of recordings and compared.
|