National Repository of Grey Literature 94 records found  1 - 10nextend  jump to record: Search took 0.01 seconds. 
Internet Voice Search
Belobrad, Michal ; Matějka, Pavel (referee) ; Schwarz, Petr (advisor)
This thesis is concerned with creating applications for touchscreen phones with the operating system Bada. The objective of this application is to allow users to search the web using their voice. We introduce with Samsung Wave for which this application was developed. In addition we look at the results of the recognizer processing, auto-complete, and their combination.
Basic Geometric Conception of Aircraft
Matějka, Pavel ; Kouřil, Martin (referee) ; Šošovička, Róbert (advisor)
The bachelor works is focused on characterization of basic geometric conception of aircraft, analyze their advantages and disadvantages in light of aerodynamic, construction and function properties. Useless for purpose in several categories and new trends of development.
Code Switching Detection in Speech
Povolný, Filip ; Glembek, Ondřej (referee) ; Matějka, Pavel (advisor)
This master's thesis deals with code-switching detection in speech. The state-of-the-art methods of language diarization are described in the first part of the thesis. The proposed method for implementation is based on acoustic approach to language identification using combination of GMM, i-vector and LDA. New Mandarin-English code-switching database was created for these experiments. Using this system, accuracy of 89,3 % is achieved on this database.
Speaker Diarization
Tomášek, Pavel ; Karafiát, Martin (referee) ; Matějka, Pavel (advisor)
This work aims at a task of speaker diarization. The goal is to implement a system which is able to decide "who spoke when". Particular components of implementation are described. The main parts are feature extraction, voice activity detection, speaker segmentation and clustering and finally also postprocessing. This work also contains results of implemented system on test data including a description of evaluation. The test data comes from the NIST RT Evaluation 2005 - 2007 and the lowest error rate for this dataset is 18.52% DER. Results are compared with diarization system implemented by Marijn Huijbregts from The Netherlands, who worked on the same data in 2009 and reached 12.91% DER.
Robust Speaker Verification
Profant, Ján ; Novotný, Ondřej (referee) ; Matějka, Pavel (advisor)
The goal of this paper is to analyze the impact of codec degraded speech on a state-ofthe-art speaker recognition system. Two feature extraction techniques are analyzed - Mel Frequency Cepstral Coefficients (MFCC) and the state-of-the-art system using Bottleneck features together with MFCC. Speaker recognition system is based on i-vector and Probabilistic Linear Discriminant Analysis (PLDA). We compared scenarios where PLDA is trained only on clean data, then system where we added also noise and reverberant data, and at last, codec degraded speech. We evaluated the systems on the matched conditions (data from the same codec are seen with PLDA) and also mismatched conditions (PLDA does not see any data from the tested codec). We experimented also with recently introduced technique for channel adaptation - Within-class Covariance Correction (WCC). We can see clear benefit of adding transcoded data to PLDA or WCC (with approximately same gain) for both tested conditions (matched and mismatched).
Unsupervised Evaluation of Speaker Recognition System
Odehnal, Ondřej ; Plchot, Oldřich (referee) ; Matějka, Pavel (advisor)
Tato práce je vystavěna nad moderním systémem pro rozpoznávání mluvčího (SID) založeného na x-vektorech. Cílem bakalářské práce je navrhnout a experimentálně vyhodnotit techniky pro evaluaci SID systému za použití audio nahrávek bez anotace tj. bez znalosti mluvčího. Pro tento účel je z každé nahrávky bez anotace vytvořen embedding. Ty se poté používají pro shlukování nahrávek a následné vytvoření pseudo-anotací. Na těchto anotacích se SID systém evaluuje pomocí equal error rate (EER) metriky. Za účelem vytvoření pseudo-anotací byly navrženy tyto shlukovací algoritmy učení bez učitele: K-means, Gaussian mixture models (GMM) a aglomerativní shlukování. Po testování vyšel jakožto nejlepší experimentální postup K-means se Silhouette metrikou, která používá kosinovou podobnost jako míru vzdálenosti. Nejlepší metoda dosáhla 5,72 % EER s referenčním EER = 5,15 %, které bylo spočítané se znalostí anotace na části datasetu SITW dev-core-core. Podobné výsledky byly získány na části datasetu SITW eval-core-core s odhadnutým EER = 5,86 % a referenčním 5,08 %. Rozdíl mezi hodnotami tvoří 0,57 % pro eval-core-core a 0, 78% pro dev-core-core. Další testy na NIST SRE16 a VoxCeleb1 datasetech byly provedeny za účelem ověření správnosti navrženého postupu. Obecně se dá říct, že navržený testovací postup měl chybu přibližně 1 %, což je poměrně dobrý výsledek pro algoritmus učení bez učitele.
Learning the Face Behind a Voice
Krušina, Josef ; Matějka, Pavel (referee) ; Plchot, Oldřich (advisor)
This work addresses the problem of mapping fixed representations (embeddings) of a speech signal to face embeddings and then generating a face from the mapped embedding using a generative adversarial network (GAN) that was trained for face generation. GANs are a type of neural networks that can generate data similar to the data they were trained on. The architecture of the proposed system is based on four components: a face embedding extractor, a voice embedding extractor, an algorithm on top of a GAN that can generate a face from a face embedding, and my mapping network used to map a voice embedding to a face embedding. The pre-trained neural networks FaceNet and SpeechBrain are adopted as embedding extractors. A model that uses a pre-trained StyleGAN2 is adopted for backward face generation. The contribution of this work is that it allows the extrapolation of a face from audio signal only.
Exploring New Paths in Neural-Network-Based Speaker Recognition
Sova, Damián ; Matějka, Pavel (referee) ; Glembek, Ondřej (advisor)
Since the assignment of this work is very broad, it was necessary to focus only on a certain area. In the end, this work aims to apply the Stochastic Weight Averaging optimization method to the training process of the Deep Neural Network. After presenting the necessary theoretical knowledge in the first part of the work, the second part with the experiments courses follows. In the theoretical part, the main focus is on presenting the complete lifecycle of the training and evaluation process, including a description of each component. The practical part provides a detailed look at each experiment, intended to demonstrate the effectiveness of the overall speaker recognition system's performance enhancement. The overall performance improvement is achieved by gradually applying various training configurations where the experience from previous experiments is taken into account. The key ingredient to the successful Stochastic Weight Averaging in the experiments was a sufficiently high Learning Rate value with the successive transition applied or Cyclic course of the Learning Rate.
Robust Speaker Verification with Deep Neural Networks
Profant, Ján ; Rohdin, Johan Andréas (referee) ; Matějka, Pavel (advisor)
The objective of this work is to study state-of-the-art deep neural networks based speaker verification systems called x-vectors on various conditions, such as wideband and narrowband data and to develop the system, which is robust to unseen language, specific noise or speech codec. This system takes variable length audio recording and maps it into fixed length embedding which is afterward used to represent the speaker. We compared our systems to BUT's submission to Speakers in the Wild Speaker Recognition Challenge (SITW) from 2016, which used previously popular statistical models - i-vectors. We observed, that when comparing single best systems, with recently published x-vectors we were able to obtain more than 4.38 times lower Equal Error Rate on SITW core-core condition compared to SITW submission from BUT. Moreover, we find that diarization substantially reduces error rate when there are multiple speakers for SITW core-multi condition but we could not see the same trend on NIST SRE 2018 VAST data.
Voice Activity Detection
Břenek, Roman ; Grézl, František (referee) ; Matějka, Pavel (advisor)
This thesis describes techniques for voice activity detection in audio recordings. It is necessary to  correctly classify all non-speech segments and recognize speech with noisy background.  The whole process of voice activity detection (VAD) is described in this thesis, i.e. digitizing audio  signal, feature extraction, training of the system, post-processing and final evaluation. There are  three different systems compared within the thesis . The first one is based on phoneme recognition using neural network, the other two are variations of Gaussian Mixture Models (GMM). Each system was tested on three data sets - Tactical Speaker Identification Speech Corpus (TSID), Ham Radio (HR) and Rich Transcription Evaluation (RT05-RT07). The best results of each system are compared with the results of the third side.

National Repository of Grey Literature : 94 records found   1 - 10nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.