Národní úložiště šedé literatury Nalezeno 4 záznamů.  Hledání trvalo 0.00 vteřin. 
Text-to-Speech Personalization
Luner, Michal ; Černocký, Jan (oponent) ; Brukner, Jan (vedoucí práce)
This thesis aims to develop a model that can convert input text written in Czech into speech that closely resembles a target speaker. This work is based on the VITS text-to-speech neural network model. The workflow is as follows: a Czech dataset is acquired, the neural network is trained, the trained model is then used to generate audio samples, which are evaluated using several objective metrics. A personalized dataset is developed and used to fine-tune the model, and the evaluation process is repeated. As a result, two fine-tuned models were developed. The male model achieved a~MOS of 4.12, and the female model achieved a~score of 3.02. The scores prove that a base model fine-tuned using a personalized dataset can achieve results close to the original audio. The contribution of this thesis is, apart from the personalized models, the pipeline for audio evaluation and dataset development, which can be easily adjusted for tasks on different data. In addition, a detailed analysis of best practices applied during the development of new datasets is provided.
Conversion of Whispered to Normal Voice
Gajda, Richard ; Černocký, Jan (oponent) ; Brukner, Jan (vedoucí práce)
The aim of this thesis is to develop a working program, that converts whispered speech input into voice using vocal excitation prediction, which is obtained from a neural network. The work is based on a study from Indian Institute of Science in Bengalore, India. The approach to the solution is the following: to acquire a dataset from training speakers, to implement the speech parameterization using the WORLD vocoder, to implement and train the neural networks, to experiment, to evaluate the results and, finally,  to propose future applications and improvements.
Non-Parallel Voice Conversion
Brukner, Jan ; Plchot, Oldřich (oponent) ; Černocký, Jan (vedoucí práce)
Voice conversion (VC) aims at converting the voice of source speaker to the voice of target speaker. It is popular in funny Internet videos but has also series of serious use cases, such as dubbing of audiovisual material and anonymization of voice (for example for witness protection). As it can serve for spoofing of voice identification systems, it is also an important tool for development spoofing detectors and counter-measures.     Training VC models has mainly been on parallel audios (ie. two speakers uttering the same text) and on high quality audio material. The goal of this thesis was to investigate developing VC on non-parallel data and with low quality signals, mainly from publicly available dataset VoxCeleb.  This work follows the state-of-the-art AutoVC architecture defined by Qian et al. It is based on neural network (NN) autoencoders, aiming to separate speech into content- and speaker-dependent embedding. The target speech is then obtained by replacing source speaker embedding by the target speaker one. We have improved Qian's architecture to process low-quality audio by experimenting with different speaker embeddings (d-vectors vs. x-vectors), introducing a speaker classifier from content embeddings in an adversarial setup, and tuning the size of content embeddings imposing an information bottleneck to the autoencoder. Also, we have defined another adversarial architecture by comparing original content embeddings with those obtained after the VC process. The results of experiments prove that non-parallel VC on low-quality data is indeed doable. The resulting audios were not so good as in case of using high-quality ones, but the speaker verification results after spoofing by proposed system have clearly shown a shift of voice characteristics toward the target speakers.
Konverze hlasu
Brukner, Jan ; Plchot, Oldřich (oponent) ; Černocký, Jan (vedoucí práce)
Práce se věnuje konverzi hlasu. Tedy metodě, ve které se snažíme modifikovat řečové parametry zdrojového mluvčího na cílového. V práci je nejdříve popsána Voice Conversion Challenge (VCC), ve které se účastníci snažili vytvořit co nejlepší systém pro konverzi hlasu. V další části jsou analyzovány komponenty baseline systému použitého ve VCC. Poté jsou navrženy úpravy, které mohou zlepšit kvalitu konvertovaného hlasu. Následně je stručně popsána implementace těchto úprav a vyhodnoceny výsledky změn. V závěru je část věnována dalším možnostem vylepšení konverze hlasu.

Viz též: podobná jména autorů
1 Brukner, Jakub
2 Brukner, Josef
Chcete být upozorněni, pokud se objeví nové záznamy odpovídající tomuto dotazu?
Přihlásit se k odběru RSS.