Emotion Recognition from Analysis of a Person’s Speech using Deep Learning
Galba, Šimon ; Kekely, Lukáš (oponent) ; Malik, Aamir Saeed (vedoucí práce) Typ dokumentu: Diplomové práce
This thesis deals with the analysis and implementation of a neural network for the purpose of recognizing emotions from human speech using deep learning. The thesis also focuses on tuning this network to achieve greater sensitivity to a specific emotion and explores the time and indirectly the financial requirements of this tuning. The inspiration for creating this work is the increasing integration of artificial intelligence in the fields of biology, healthcare, as well as psychology, and one of the goals is also to study the complexity of creating specific models of neural networks for purposes in these sciences, which should contribute to better accessibility of artificial intelligence models. The work is based on the implementation of the "AST: Audio Spectrogram Transformer" model, which is publicly available under the BSD 3-Clause License and utilizes methods that have been used so far for classification and recognition of images by converting an audio track into a spectrogram. The resulting values of weighted accuracy are as follows: 93.5% for the EMODB dataset, 92.8% for EMOVO, and 92.9% for the RAVDESS dataset.
Audio Spectrogram Transformer; deep learning; emotion classification; speech emotion recognition; speech signal processing; Audio Spectrogram Transformer; hluboké učení; klasifikace emocí; rozpoznávání emocí z řeči; zpracování řečového signálu
