National Repository of Grey Literature 28 records found  1 - 10nextend  jump to record: Search took 0.00 seconds. 
Mobile platform for testing of automotive systems in Bluetooth Hands-Free communication
Mecerod, Václav ; Stifter, Jiří (referee) ; Kratochvíl, Tomáš (advisor)
Tato diplomová práce se zabývá problematikou implementace Hands-Free komunikačních systémů v automobilovém průmyslu. První kapitola je zaměřena na teoretické aspekty zpracování řeči v embedded aplikacích, jako je potlačení šumu, potlačení akustické zpětné vazby a další faktory ovlivňující kvalitu Hands-Free systémů. Druhá kapitola obsahuje návrh kompaktního flexibilního mobilního testovacího zařízení pro bezdrátové komunikační Hands-Free moduly.
Voice Conversion
Hodaň, David ; Novotný, Ondřej (referee) ; Černocký, Jan (advisor)
Voice conversion is the process of transformation of speech parameters belonging to one speaker in such a way that his/her speech sounds as spoken by someone else. This thesis presents a short summary of several techniques currently used for conversion. First, the theory of voice creation with an emphasis on key atributes that characterize and identify a speaker’s voice is described. Methods for voice modification are discussed, together with the advantages and pitfalls that predetermine the use-cases for suitable application of these methods. A high-level overview of how speech is transformed between the source and the target speakers is presented. This description is subsequently used to design a voice conversion system that is aimed to demonstrate one of the possible approaches to the conversion problem. The process of conversion consists of two phases: training and synthesis. As part of this project, a computer program for voice conversion based on the MATLAB programming environment has been developed. Its design, implementation and results are discussed.
Web-Based Lecture Browser
Žižka, Josef ; Mikolov, Tomáš (referee) ; Fapšo, Michal (advisor)
This thesis deals with a web-based lecture browser. Its goal is to facilitate the access to information with the use of modern speech and multimedia technologies. Technologies used for this browser are discussed. Video recordings play a very important role in the browser, and therefore the big portion of this work is aimed at the digital video and methods of its delivery using streaming servers. Solutions of similar multimedia browsers are mentioned. The reader is acquainted with the browser design. This includes describing the various components of the browser and how their mutual synchronization is done. The final version of the browser is introduced and the problems that occurred during the development process and deployment into service are mentioned. In the conclusion of this work the future development of the web-based lecture browser is discussed.
Vizualization of Outputs from Speech Technologies for Contact Centers
Zhezhela, Oleksandr ; Szőke, Igor (referee) ; Schwarz, Petr (advisor)
The thesis is aimed on visualisation of data mined by speech processing technologies. Some methods speech data extraction were studied and technologies for this task were analysed. The variety of meta data that can be mined from speech was defined. Were also examined existing standards and processes of call centres. Some requirements for the user interface were gathered and analysed. On that basis and after communication with call centre employees there was defined and implemented a concept for speech data visualization. Gained solutions were integrated into Speech Analytics Server (SPAS).
Learning the Face Behind a Voice
Zubalík, Petr ; Mošner, Ladislav (referee) ; Plchot, Oldřich (advisor)
The main goal of this thesis is to design and implement a system that will be able to generate a face based on the speech of a given person. This problem is solved using a system composed of three convolutional neural network models. The first one is based on the ResNet architecture and is used to extract features from speech recordings. The second model is a fully convolutional neural network which converts the extracted features into the styles which form a base for the final facial image. These styles are then passed as an input to the StyleGAN generator, which creates the resulting face. The proposed system is implemented in the Python programming language using the PyTorch framework. The last chapter of the thesis discusses some of the most significant experiments performed to fine-tune and test the developed system.
Keyword Spotting Implementation to Mobil Phone (Symbian 60)
Cipr, Tomáš ; Schwarz, Petr (referee) ; Szőke, Igor (advisor)
Keyword spotting is one of the many applications of automatic speech recognition. Its purpose is determining spots in given utterance in which some of the specified words were spoken. Keyword spotting has a great potential to enhance performance of new applications as well as the existing ones. An example could be a mobile phone voice control. Due to OS Symbian's coming to the market it is even possible for end user to implement a keyword spotting for a mobile phone on his or her own. The thesis describes theoretical prerequisites for keyword spotting and its implementation. Firstly the OS Symbian is presented with respect to the given task. Secondly each step of keyword spotting process is described. Finally the object design of keyword spotter is presented followed by implementation description. The thesis concludes with results review and notes on possible improvements.
Voice Conversion
Brukner, Jan ; Plchot, Oldřich (referee) ; Černocký, Jan (advisor)
Thesis deals with voice converion. Method, where we want to modify speech parameters of source speaker into that of a target speaker. At the beginning of thesis is described Voice Conversion Challenge (VCC), where participants tried to build better voice conversion systems. In the next part are analysed components of baseline system used in VCC. Modifications which could improve quality of converted voice are proposed. Then is briefly described implementation if these modifications and results are analysed. In the end is part dedicated to further improvements of voice conversion.
Non-Parallel Voice Conversion
Brukner, Jan ; Plchot, Oldřich (referee) ; Černocký, Jan (advisor)
Cílem konverze hlasu (voice conversion, VC) je převést hlas zdrojového řečníka na hlas cílového řečníka. Technika je populární je u vtipných internetových videí, ale má také řadu seriózních využití, jako je dabování audiovizuálního materiálu a anonymizace hlasu (například pro ochranu svědků). Vzhledem k tomu, že může sloužit pro spoofing systémů identifikace hlasu, je také důležitým nástrojem pro vývoj detektorů spoofingu a protiopatření.    Modely VC byly dříve trénovány převážně na paralelních (tj. dva řečníci čtou stejný text) a na vysoce kvalitních audio materiálech. Cílem této práce bylo prozkoumat vývoj VC na neparalelních datech a na signálech nízké kvality, zejména z veřejně dostupné databáze VoxCeleb. Práce vychází z moderní architektury AutoVC definované Qianem et al. Je založena na neurálních autoenkodérech, jejichž cílem je oddělit informace o obsahu a řečníkovi do samostatných nízkodimenzionýálních vektorových reprezentací (embeddingů). Cílová řeč se potom získá nahrazením embeddingu zdrojového řečníka embeddingem cílového řečníka. Qianova architektura byla vylepšena pro zpracování audio nízké kvality experimentováním s různými embeddingy řečníků (d-vektory vs. x-vektory), zavedením klasifikátoru řečníka z obsahových embeddingů v adversariálním schématu trénování neuronových sítí a laděním velikosti obsahového embeddingu tak, že jsme definovali informační bottle-neck v příslušné neuronové síti. Definovali jsme také další adversariální architekturu, která porovnává původní obsahové embeddingy s embeddingy získanými ze zkonvertované řeči. Výsledky experimentů prokazují, že neparalelní VC na nekvalitních datech je skutečně možná. Výsledná audia nebyla tak kvalitní případě hi fi vstupů, ale výsledky ověření řečníků po spoofingu výsledným systémem jasně ukázaly posun hlasových charakteristik směrem k cílovým řečníkům.
Analysis of Interview Audio
Polok, Alexander ; Plchot, Oldřich (referee) ; Matějka, Pavel (advisor)
The aim of this thesis is the analysis of psychotherapeutic sessions. Classifiers describing the therapy are extracted from the audio recordings. These are then aggregated, compared with other sessions, and graphically presented in a report summarizing the conversation. In this way, therapists are provided with feedback that can serve for professional growth and better psychotherapy in the future.
Cluster analysis in the field of pathological speech signal processing
Čapek, Karel ; Mžourek, Zdeněk (referee) ; Galáž, Zoltán (advisor)
The bachelor thesis deals with the calculation of speech features that quantifies the degradation of speech production caused by the presence of certain speech pathology and the subsequent clasification of considered speech pathologies into several groups using the k-means algorithm. The purpose was to find the groups of pathologies that in spite of possible differences in the origin do affect phonation and articulation skills of the speakers and damage the quality of speech. The work uses the phonation of vowels "a" speech task as the most commonly used speech task in the field of pathological speech processing, because of its resistance to demographic and linguistic characteristics of the speakers. Furthermore, the preliminary analysis was applied to the featuresin order to select the features to best characterize the degradation of speech production. Finally, the selected features were used to find the resulting groups of pathologies using k-means algorithm.

National Repository of Grey Literature : 28 records found   1 - 10nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.