National Repository of Grey Literature 94 records found  previous10 - 19nextend  jump to record: Search took 0.01 seconds. 
Voice Activity Detection
Břenek, Roman ; Grézl, František (referee) ; Matějka, Pavel (advisor)
This thesis describes techniques for voice activity detection in audio recordings. It is necessary to  correctly classify all non-speech segments and recognize speech with noisy background.  The whole process of voice activity detection (VAD) is described in this thesis, i.e. digitizing audio  signal, feature extraction, training of the system, post-processing and final evaluation. There are  three different systems compared within the thesis . The first one is based on phoneme recognition using neural network, the other two are variations of Gaussian Mixture Models (GMM). Each system was tested on three data sets - Tactical Speaker Identification Speech Corpus (TSID), Ham Radio (HR) and Rich Transcription Evaluation (RT05-RT07). The best results of each system are compared with the results of the third side.
Speaker Recognition Based on Long Temporal Context
Fér, Radek ; Matějka, Pavel (referee) ; Černocký, Jan (advisor)
Tato práce se zabývá extrakcí vhodných příznaků pro rozpoznávání řečníka z delších časových úseků. Po představení současných technik pro extrakci takových příznaků navrhujeme a popisujeme novou metodu pracující v časovém rozsahu fonémů a využívající známou techniku i-vektorů. Velké úsilí bylo vynaloženo na nalezení vhodné reprezentace temporálních příznaků, díky kterým by mohly být systémy pro rozpoznávání řečníka robustnější, zejména modelování prosodie. Náš přístup nemodeluje explicitně žádné specifické temporální parametry řeči, namísto toho používá kookurenci řečových rámců jako zdroj temporálních příznaků. Tuto techniku testujeme a analyzujeme na řečové databázi NIST SRE 2008. Z výsledků bohužel vyplývá, že pro rozpoznávání řečníka tato technika nepřináší očekávané zlepšení. Tento fakt diskutujeme a analyzujeme ke konci práce.
Learning the Face Behind a Voice
Kyjonka, Mojmír ; Matějka, Pavel (referee) ; Plchot, Oldřich (advisor)
This thesis deals with face reconstruction based on voice. The state of the art of this problem is investigated and model for such problem is trained. Model used in this thesis is based on the work "Reconstructing faces from voices" which architecture is based on Generative Adversarial Network (GAN). In this work, we used VGGFace and VoxCeleb datasets, and additionally, we created a small audiovisual dataset of Czech speakers. This work was implemented using the Python scripting language and PyTorch library.
Analysis of Telephone Call of Two People
Herceková, Monika ; Schwarz, Petr (referee) ; Matějka, Pavel (advisor)
This thesis deals with analysis of two people's phone call. It describes possible ways of speech and silence appearence in record and reasons criteria for listening the record. There is implemented prototype of application suggested in the thesis for analysis of telephone call. There are introduced possible extensions of the work at the end of the thesis.
Agreements and Disagreements between Automatic and Human Speaker Recognition
Valenta, Jakub ; Matějka, Pavel (referee) ; Rohdin, Johan Andréas (advisor)
Tato práce se zabývá problémem rozpoznáváním mluvčího. Uvedený pojem je definován a doplněn o jednotlivé metody, které s ním souvisí. Cílem práce je poukázat na shody a rozdíly mezi lidským a automatickým procesem rozpoznávání mluvčího. V úvodu práce jsou popsány teoretické poznatky z obou zmíněných oblastí, tj. na jaké aspekty lidské řeči se zaměřuje člověk, resp. automatický systém. Následně je provedeno několik experimentů, které mají za úkol srovnat tyto dvě metody. Tyto experimenty jsou vyhodnoceny tak, že je možné pozorovat, které testovací úlohy dokáže lépe vyřešit člověk, aby následně bylo možné tyto poznatky použít ke zlepšení funkce automatického systému. V závěru práce je takovýto návrh na zlepšení automatického systému předveden a otestován. Testování proběhlo úspěšně a byla zaznamenána vyšší přesnost při vyhodnocování. Takový výsledek tedy může být užitý v dalších výzkumech a umožnit tak další vývoj v oblasti automatického rozpoznávání mluvčích.
Intelligent Meeting Room Controlled by Voice
Bauer, Jan ; Matějka, Pavel (referee) ; Schwarz, Petr (advisor)
The aim of the thesis is to design and create system for intelligent room controlled by voice. The solution is based on Phonexia Speech Engine developed by Phonexia. The system runs on Raspberry Pi. The core functionality of the system is implemented in Python. The resulting solution is certainly interesting and with some updates may become intelligent assistant for meettings.
Acoustic Scene Classification from Speech
Grepl, Filip ; Beneš, Karel (referee) ; Matějka, Pavel (advisor)
This thesis deals with creating a system whose task is to recognize what type of location the recording was created at by analyzing the audio signal. The classifier is based on a multi-layer, fully connected neural network. The topology of the neural network is based on the baseline system provided for the DCASE competition. A dataset from this competition is also used for training and evaluating the neural network. The experiments are performed in particular with the representation of the properties of the audio records and with the format of the input data of the neural network. For this purpose, Mel-filter bank, block Mel-filter bank and MFCC flags are used. The experiments performed in this thesis brought a classification accuracy increased by 6.5 % compared to the baseline system of DCASE. Overall system success rate reached 67.5 %.
Text Dependent Speaker Verification
Fux, Jan ; Glembek, Ondřej (referee) ; Matějka, Pavel (advisor)
The goal of this Bachelor's thesis was to design text dependent speaker recognition system. There were few systems tested for MIT database. This database contains recordings of 0.46s average length. Best case for recognition is to use a combination of DTW system using posterior probability estimation (posteriograms) as an output of Phoneme recognizer and acoustic SID system based on iVectors and PLDA (Probabilistic Linear Component Analysis). Fusion with Neural network gives the best results (EER). These are 17.84% EER for women and 16.38% for men. It's 49.9% relative improvement for women and 54.2% for men against acoustic recognition alone.
Automatic Data Recording of Digital Satellite Broadcast
Řezníček, Ivo ; Matějka, Pavel (referee) ; Szőke, Igor (advisor)
This work aims at the creation of a~system of massive recording of multimedia data, especially speech data in various languages. The first issue is to find out high quality data source, the second is to build the system for managing and storing received data in the digital form. A digital satellite transmission is chosen as a signal source (DVB-S system). Main system features include recording of multiple streams in parallel, support of multiple cards, retrieving and storing of additional information (from Internet) and scheduling of recordings. The system will provide massive amounts of data for training of a language identification system.
Computer Graphics and Video Features for Speaker Recognition
Fér, Radek ; Matějka, Pavel (referee) ; Černocký, Jan (advisor)
We describe a non-traditional method for speaker recognition that uses features and algorithms used mainly for computer vision. Important theoretical knowledge of computer recognition is summarized first. The Boosted Binary Features are described and explored as an already proposed method, that has roots in computer vision. This method is evaluated on standard speaker recognition databases TIMIT and NIST SRE 2010. Experimental results are given and compared to standard methods. Possible directions for future work are proposed at the end.

National Repository of Grey Literature : 94 records found   previous10 - 19nextend  jump to record:
See also: similar author names
10 MATĚJKA, Petr
10 Matějka, Petr
Interested in being notified about new results for this query?
Subscribe to the RSS feed.