Národní úložiště šedé literatury Nalezeno 4 záznamů.  Hledání trvalo 0.01 vteřin. 
Learning Speech Separation Using Spatial Cues
Pavlus, Ján ; Mošner, Ladislav (oponent) ; Žmolíková, Kateřina (vedoucí práce)
This thesis discusses the idea of using spatial cues in speech separation for estimating target masks, that is stated in article \textit{Bootstrapping single-channel source separation via unsupervised spatial clustering on stereo mixtures}. This idea may make it possible to use real-world mixtures for the training of speech separation systems, which use neural networks. In the thesis two training methods, permutation invariant training and deep clustering method are mentioned and used for experiments with training neural networks using target masks estimated by spatial cues. The result of the work is a comparison of the results of these experiments with the results of the above-mentioned article. This comparison showed that the use of estimated masks with the help of spatial information can lead to a quality training of the speaker separation system.
Adversarial Augmentation for Robust Speech Separation
Pavlus, Ján ; Černocký, Jan (oponent) ; Žmolíková, Kateřina (vedoucí práce)
Speech separation is the task of separating single signals from the given mixture of multiple speakers. Neural networks trained for speech separation usually work well on artificial data but they often fail on real-world examples. To improve their behavior on real-world mixtures it is possible to use training data augmentations such as noise addition. Nevertheless, the power of these augmentations is limited as they have to be manually designed.     In this work, the modified version of the generative adversarial networks (GAN) model could improve this process by generating augmentations depending on the separation performance on these data. Speech separation could be then made more robust with each generator and separator training step. This system was subjected to experimentation. During these experiments, the parameters have been tuned to find the best setting that will successfully train the GAN model without collapsing. This setting was found and the most robust model from the training was selected and evaluated. Results show that the separator model trained by the GAN model does not achieve any significant improvement from the original separator model pretrained on the WSJ0-2mix dataset during the testing on the WHAM dataset. Nevertheless, another evaluation shows that the separator model trained by the GAN model is significantly more robust than the original one towards the generated noises.
Adversarial Augmentation for Robust Speech Separation
Pavlus, Ján ; Černocký, Jan (oponent) ; Žmolíková, Kateřina (vedoucí práce)
Speech separation is the task of separating single signals from the given mixture of multiple speakers. Neural networks trained for speech separation usually work well on artificial data but they often fail on real-world examples. To improve their behavior on real-world mixtures it is possible to use training data augmentations such as noise addition. Nevertheless, the power of these augmentations is limited as they have to be manually designed.     In this work, the modified version of the generative adversarial networks (GAN) model could improve this process by generating augmentations depending on the separation performance on these data. Speech separation could be then made more robust with each generator and separator training step. This system was subjected to experimentation. During these experiments, the parameters have been tuned to find the best setting that will successfully train the GAN model without collapsing. This setting was found and the most robust model from the training was selected and evaluated. Results show that the separator model trained by the GAN model does not achieve any significant improvement from the original separator model pretrained on the WSJ0-2mix dataset during the testing on the WHAM dataset. Nevertheless, another evaluation shows that the separator model trained by the GAN model is significantly more robust than the original one towards the generated noises.
Learning Speech Separation Using Spatial Cues
Pavlus, Ján ; Mošner, Ladislav (oponent) ; Žmolíková, Kateřina (vedoucí práce)
This thesis discusses the idea of using spatial cues in speech separation for estimating target masks, that is stated in article \textit{Bootstrapping single-channel source separation via unsupervised spatial clustering on stereo mixtures}. This idea may make it possible to use real-world mixtures for the training of speech separation systems, which use neural networks. In the thesis two training methods, permutation invariant training and deep clustering method are mentioned and used for experiments with training neural networks using target masks estimated by spatial cues. The result of the work is a comparison of the results of these experiments with the results of the above-mentioned article. This comparison showed that the use of estimated masks with the help of spatial information can lead to a quality training of the speaker separation system.

Chcete být upozorněni, pokud se objeví nové záznamy odpovídající tomuto dotazu?
Přihlásit se k odběru RSS.