National Repository of Grey Literature 49 records found  1 - 10nextend  jump to record: Search took 0.00 seconds. 
Learning the Face Behind a Voice
Kyjonka, Mojmír ; Matějka, Pavel (referee) ; Plchot, Oldřich (advisor)
This thesis deals with face reconstruction based on voice. The state of the art of this problem is investigated and model for such problem is trained. Model used in this thesis is based on the work "Reconstructing faces from voices" which architecture is based on Generative Adversarial Network (GAN). In this work, we used VGGFace and VoxCeleb datasets, and additionally, we created a small audiovisual dataset of Czech speakers. This work was implemented using the Python scripting language and PyTorch library.
Analysis of Interview Audio
Polok, Alexander ; Plchot, Oldřich (referee) ; Matějka, Pavel (advisor)
The aim of this thesis is the analysis of psychotherapeutic sessions. Classifiers describing the therapy are extracted from the audio recordings. These are then aggregated, compared with other sessions, and graphically presented in a report summarizing the conversation. In this way, therapists are provided with feedback that can serve for professional growth and better psychotherapy in the future.
Detection of Pre-Recorded Messages in Speech
Boboš, Dominik ; Matějka, Pavel (referee) ; Černocký, Jan (advisor)
Rozpoznání před-nahraných zpráv v řeči (tzv. plechové huby) je užitečné pro jakékoliv následující dolování informací v řečových datech. Tato práce shrnuje teorii hledání podobných promluv v řeči a efektivní přístupy k porovnání dvou sekvencí. Ke zkoumání identifikace opakujících se informací v audiu je nutné mít velké množství dat s přesně se opakujícími úseky. Takovou datovou sadu jsme vygenerovali smícháním předem nahraných zpráv s telefonními hovory se změnami rychlosti, hlasitosti a opakování. Náš systém řeší scénáře "známých zpráv a "neznámých zpráv pomocí shlukování nebo detekce v blocích. Porovnali jsme techniky dynamického borcení času (DTW), přibližné shody řetězců a rekurentní kvantifikační analýzy, a nakonec jsme všechny uvedené techniky zkombinovali a získali tak přesný a efektivně pracující systém.
Robust Speaker Verification with Deep Neural Networks
Profant, Ján ; Rohdin, Johan Andréas (referee) ; Matějka, Pavel (advisor)
The objective of this work is to study state-of-the-art deep neural networks based speaker verification systems called x-vectors on various conditions, such as wideband and narrowband data and to develop the system, which is robust to unseen language, specific noise or speech codec. This system takes variable length audio recording and maps it into fixed length embedding which is afterward used to represent the speaker. We compared our systems to BUT's submission to Speakers in the Wild Speaker Recognition Challenge (SITW) from 2016, which used previously popular statistical models - i-vectors. We observed, that when comparing single best systems, with recently published x-vectors we were able to obtain more than 4.38 times lower Equal Error Rate on SITW core-core condition compared to SITW submission from BUT. Moreover, we find that diarization substantially reduces error rate when there are multiple speakers for SITW core-multi condition but we could not see the same trend on NIST SRE 2018 VAST data.
Robust Speech Activity Detection
Popková, Anna ; Plchot, Oldřich (referee) ; Matějka, Pavel (advisor)
The aim of this work is to design and create a robust speech activity detector that is able to detect speech in different languages, in a noise environment and with music on background. I decided to solve this problem by using a neural network as a classification model that assigns one of the four possible classes - silence, speech, music, or noise to the input of audio recording. The resulting tool is able to detect the speech in at least 12 languages. Speech with musical background up to 88 % accuracy and system success on noisy data reaches from 84 % (5 dB SNR) to 88 % (20 dB SNR). This tool can be used for speech activity detection in various research areas of speech processing. The main contribution is the elimination of music, which when not eliminated, significantly increases the error rate of systems for speaker identification or speech recognition.
Agreements and Disagreements between Automatic and Human Speaker Recognition
Valenta, Jakub ; Matějka, Pavel (referee) ; Rohdin, Johan Andréas (advisor)
Tato práce se zabývá problémem rozpoznáváním mluvčího. Uvedený pojem je definován a doplněn o jednotlivé metody, které s ním souvisí. Cílem práce je poukázat na shody a rozdíly mezi lidským a automatickým procesem rozpoznávání mluvčího. V úvodu práce jsou popsány teoretické poznatky z obou zmíněných oblastí, tj. na jaké aspekty lidské řeči se zaměřuje člověk, resp. automatický systém. Následně je provedeno několik experimentů, které mají za úkol srovnat tyto dvě metody. Tyto experimenty jsou vyhodnoceny tak, že je možné pozorovat, které testovací úlohy dokáže lépe vyřešit člověk, aby následně bylo možné tyto poznatky použít ke zlepšení funkce automatického systému. V závěru práce je takovýto návrh na zlepšení automatického systému předveden a otestován. Testování proběhlo úspěšně a byla zaznamenána vyšší přesnost při vyhodnocování. Takový výsledek tedy může být užitý v dalších výzkumech a umožnit tak další vývoj v oblasti automatického rozpoznávání mluvčích.
Acoustic Scene Classification from Speech
Grepl, Filip ; Beneš, Karel (referee) ; Matějka, Pavel (advisor)
This thesis deals with creating a system whose task is to recognize what type of location the recording was created at by analyzing the audio signal. The classifier is based on a multi-layer, fully connected neural network. The topology of the neural network is based on the baseline system provided for the DCASE competition. A dataset from this competition is also used for training and evaluating the neural network. The experiments are performed in particular with the representation of the properties of the audio records and with the format of the input data of the neural network. For this purpose, Mel-filter bank, block Mel-filter bank and MFCC flags are used. The experiments performed in this thesis brought a classification accuracy increased by 6.5 % compared to the baseline system of DCASE. Overall system success rate reached 67.5 %.
Intelligent Meeting Room Controlled by Voice
Bauer, Jan ; Matějka, Pavel (referee) ; Schwarz, Petr (advisor)
The aim of the thesis is to design and create system for intelligent room controlled by voice. The solution is based on Phonexia Speech Engine developed by Phonexia. The system runs on Raspberry Pi. The core functionality of the system is implemented in Python. The resulting solution is certainly interesting and with some updates may become intelligent assistant for meettings.
Acoustic Scene Classification from Speech
Dobrotka, Matúš ; Glembek, Ondřej (referee) ; Matějka, Pavel (advisor)
The topic of this thesis is an audio recording classification with 15 different acoustic scene classes that represent common scenes and places where people are situated on a regular basis. The thesis describes 2 approaches based on GMM and i-vectors and a fusion of the both approaches. The score of the best GMM system which was evaluated on the evaluation dataset of the DCASE Challenge is 60.4%. The best i-vector system's score is 68.4%. The fusion of the GMM system and the best i-vector system achieves score of 69.3%, which would lead to the 20th place in the all systems ranking of the DCASE 2017 Challenge (among 98 submitted systems from all over the world).
Designing and testing of new metal nanosubstrates for biomolecular sensors based on surface-enhanced Raman scattering (SERS) spectroscopy
Peksa, Vlastimil ; Procházka, Marek (advisor) ; Matějka, Pavel (referee) ; Richter, Ivan (referee)
Title: Designing and testing of new metal nanosubstrates for biomolecular sensors based on surface-enhanced Raman scattering (SERS) spectroscopy Author: Vlastimil Peksa Department: Institute of Physics of Charles University Supervisor: doc. RNDr. Marek Procházka, Ph.D., Institute of Physics of Charles University Abstract: This experimental methodical work was aimed at the optimization of selected gold and silver substrates and their use in construction of SERS-based biosensors, including following practical application. Several types of substrates, fabricated via a combination of bottom-up techniques on solid surfaces, were tested. The properties of these substrates were examined with probe molecules, namely methylene blue, porphyrins and tryptophan, on a confocal Raman microspectrometer. Obtained findings about the influence of analyte application, objective focusing and internal intensity standard were exploited for optimization of measurement procedures with regard to sensitivity, accuracy and reproducibility. A method for quantitative detection of food dye azorubine (E 122) in commercially available drinks was developed, based on these findings. Its results have shown its potential as a pre-scan method for field application and preliminary testing. Keywords: Metal nanosubstrates, biomolecules,...

National Repository of Grey Literature : 49 records found   1 - 10nextend  jump to record:
See also: similar author names
8 MATĚJKA, Petr
8 Matějka, Petr
Interested in being notified about new results for this query?
Subscribe to the RSS feed.