Národní úložiště šedé literatury Nalezeno 3 záznamů.  Hledání trvalo 0.00 vteřin. 
Evaluation and Optimization of Computational Costs in Speaker Recognition Systems
Gregušová, Sabína ; Silnova, Anna (oponent) ; Rohdin, Johan Andréas (vedoucí práce)
The goal of this thesis is to propose an evaluation metric that includes computational costs. Computational costs generally do not pose a problem in research, but it can become problematic in a commercial production system, where speed is essential. The proposed metric extends existing evaluation framework from NIST and adds parameter for time unit and time unit cost. These metrics are applied on real ASV and experiments show the potential for further research and possible use. The experiments focus on reducing the computational cost by posing a limit on maximum length of the utterance, but also limiting number of frames for x-vector extraction. Both optimizations reduced the computational costs and reached favorable results for the new metrics. Finally, experiments' results are compared and each system modification is ranked according to the new metrics.
Exploiting Uncertainty Information in Speaker Verification and Diarization
Silnova, Anna ; Šmídl, Václav (oponent) ; Villalba Lopez, Jesus Antonio (oponent) ; Burget, Lukáš (vedoucí práce)
This thesis considers two models allowing to utilize uncertainty information in the tasks of Automatic Speaker Verification and Speaker Diarization. The first model we consider is a modification of the widely-used Gaussian Probabilistic Linear Discriminant Analysis (G-PLDA) that models the distribution of the vector utterance representations called embeddings. In G-PLDA, the embeddings are assumed to be generated by adding a noise vector sampled from a Gaussian distribution to a speakerdependent vector. We show that when assuming that the noise was instead sampled from a Student's T-distribution, the PLDA model (we call this version heavy-tailed PLDA) can use the uncertainty information when making the verification decisions. Our model is conceptually similar to the HT-PLDA model defined by Kenny et al. in 2010, but, as we show in this thesis, it allows for fast scoring, while the original HT-PLDA definition requires considerable time and computation resources for scoring. We present the algorithm to train our version of HT-PLDA as a generative model. Also, we consider various strategies for discriminatively training the parameters of the model. We test the performance of generatively and discriminatively trained HT-PLDA on the speaker verification task. The results indicate that HT-PLDA performs on par with the standard G-PLDA while having the advantage of being more robust against variations in the data pre-processing. Experiments on the speaker diarization demonstrate that the HT-PLDA model not only provides better performance than the G-PLDA baseline model but also has the advantage of producing better-calibrated Log-Likelihood Ratio (LLR) scores. In the second model, unlike in HT-PLDA, we do not consider the embeddings as the observed data. Instead, in this model, the embeddings are normally distributed hidden variables. The embedding precision carries the information about the quality of the speech segment: for clean long segments, the precision should be high, and for short and noisy utterances, it should be low. We show how such probabilistic embeddings can be incorporated into the G-PLDA framework and how the parameters of the hidden embedding influence its impact when computing the likelihood with this model. In the experiments, we demonstrate how to utilize an existing neural network (NN) embedding extractor to provide not embeddings but parameters of probabilistic embedding distribution. We test the performance of the probabilistic embeddings model on the speaker diarization task. The results demonstrate that this model provides well-calibrated LLR scores allowing for better diarization when no development dataset is available to tune the clustering algorithm.
Evaluation and Optimization of Computational Costs in Speaker Recognition Systems
Gregušová, Sabína ; Silnova, Anna (oponent) ; Rohdin, Johan Andréas (vedoucí práce)
The goal of this thesis is to propose an evaluation metric that includes computational costs. Computational costs generally do not pose a problem in research, but it can become problematic in a commercial production system, where speed is essential. The proposed metric extends existing evaluation framework from NIST and adds parameter for time unit and time unit cost. These metrics are applied on real ASV and experiments show the potential for further research and possible use. The experiments focus on reducing the computational cost by posing a limit on maximum length of the utterance, but also limiting number of frames for x-vector extraction. Both optimizations reduced the computational costs and reached favorable results for the new metrics. Finally, experiments' results are compared and each system modification is ranked according to the new metrics.

Chcete být upozorněni, pokud se objeví nové záznamy odpovídající tomuto dotazu?
Přihlásit se k odběru RSS.