National Repository of Grey Literature 1 records found  Search took 0.00 seconds. 
Information Combination Analysis in Multi-Channel Speaker Verification
Procházka, Jan ; Plchot, Oldřich (referee) ; Mošner, Ladislav (advisor)
In this work, we deal with the analysis and comparison of information combinations of multi-channel speech data for a speaker verification task. Three levels/representations were chosen for data fusion: signal-level, embedding-level, and score-level. At the signal level, spatial filters (beamforming) are implemented. Speech recordings serve as input to a neural network (ECAPA-TDNN architecture) that extracts embeddings, vector representations of the speaker. The vectors are further compared by cosine similarity module that results in scores, real numbers. Score-level fusion achieves the best relative improvement against single-channel recordings (up to 70 %). Embedding-level fusion provides the most consistent results for different recording conditions.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.