National Repository of Grey Literature 260 records found  previous11 - 20nextend  jump to record: Search took 0.00 seconds. 
State of the art speech features used during the Parkinson disease diagnosis
Bílý, Ondřej ; Smékal, Zdeněk (referee) ; Mekyska, Jiří (advisor)
This work deals with the diagnosis of Parkinson's disease by analyzing the speech signal. At the beginning of this work there is described speech signal production. The following is a description of the speech signal analysis, its preparation and subsequent feature extraction. Next there is described Parkinson's disease and change of the speech signal by this disability. The following describes the symptoms, which are used for the diagnosis of Parkinson's disease (FCR, VSA, VOT, etc.). Another part of the work deals with the selection and reduction symptoms using the learning algorithms (SVM, ANN, k-NN) and their subsequent evaluation. In the last part of the thesis is described a program to count symptoms. Further is described selection and the end evaluated all the result.
Modern coding of speech signals using overcomplete models
Zapletal, Ondřej ; Průša, Zdeněk (referee) ; Rajmic, Pavel (advisor)
The theoretical contents of this thesis are studies of overcomplete models. Those are the models of signals, on which is set for their parametrization more variables, than it's necessary and consequently there's computed so-called sparse solution via iteration algorithms. A goal of this analysis is a selection just of the considerable (sparse) parameters. The theory is based on a linear algebra, vector spaces, bases and so-called frames. The task of the individual project of this thesis is a description and simulation of two speech coders: a classical coder based on linear predictive speech coding and a coder, that's making use of overcomplete stochastic ARMA processes models. A part of their realization is to simulate their decoders and a analyze their reconstruction quality. For their realization there is used MATLAB and an overcomplete models' library (toolbox frames).
Speech segmentation
Andrla, Petr ; Míča, Ivan (referee) ; Sysel, Petr (advisor)
The programme for the segmentation of a speech into fonems was created as a part of the master´s thesis. This programme was made in the programme Matlab and consists of several scripts. The programme serves for automatic segmentation. Speech segmentation is the process of identifying the boundaries between phonemes in spoken natural languages. Automatic segmentation is based on vector quantization. In the first step of algorithm, feature extraction is realized. Then speech segments are assigned to calculated centroids. Position where centroid is changed is marked as a boundary of phoneme. The audiorecords were elaborated by the programme and a operation of the automatic segmentation was analysed. A detailed manual was created to the programme too. Individual used methods of the elaboration of a speech were in the master´s thesis briefly descripted, its implementations in the programme and reasons of set of its parameters.
Comparison of Accuracy of Siri, Cortana and Google
Procingerová, Lucie ; Černocký, Jan (referee) ; Szőke, Igor (advisor)
The aim of this thesis is to compare the accuracy of translation of spoken word into text using several services. Primary it is about applications from Apple Inc., Microsoft Corporation and Google Inc., but there is also included several others, mostly available on-line. This document contains a descriptionn of the problem, analyzes the progress for each service. Subsequently, the test results are analyzed and compared with the reference outputs. In conclusion, there is a discussion of these experiments.
End-to-End Speech Recognition for Low-Resource Languages
Sokolovskii, Vladislav ; Schwarz, Petr (referee) ; Karafiát, Martin (advisor)
Oblast automatického rozpoznávání řeči začala přijímat end-to-end řešení neuronové sítě pro vytváření rozpoznávačů řeči. Povaha datového hladu těchto typů systémů však umožňuje vytvářet rozpoznávače pouze pro jazyky s velkými zdroji, jako je angličtina, čínština nebo španělština. Ve scénářích s nízkými zdroji je třeba vyvinout některá řešení, která zmírní problém nedostatku dat. Jednou z nejúčinnějších technik je doladění předtrénovaného modelu. Problém se stávajícími přístupy ladění spočívá v tom, že sada tokenů cílového a zdrojového jazyka se obvykle liší. To je důvod, proč předchozí přístupy k učení vícejazyčného přenosu vyžadovaly změnu výstupní vrstvy nebo smíchání tokenů z různých jazyků ve výstupní vrstvě, případně použití univerzální sady tokenů anebo samostatné výstupní vrstvy pro každý jazyk. To je nežádoucí, jelikož sdílení napříč jazyky je v tomto případě latentní a neovladatelné ve výstupním prostoru, když jsou grafémy specifické pro daný jazyk disjunktní. Proto tato práce navrhuje mapování tokenů do společné sady před začátkem předtréninku. Stávající řešení spočívá v transliteraci zdrojového jazyka do cílového, novým přístupem je romanizace, kde je sada tokenů cílového jazyka romanizována tak, aby odpovídala anglické abecedě. Následně lze diakritiku z romanizovaných hypotéz obnovit pomocí dalšího modelu obnovy. To má výhodu ve zvýšení sdílení v prostoru výstupního grafému.
Numerical simulation of of human voice propagation through the vocal tract and in the space around the body
Batelka, Jiří ; Hájek, Petr (referee) ; Švancara, Pavel (advisor)
This master's thesis handles description of the source-filter theory of voice production, anatomy of larynx, possible approaches to voice production modelling and selected works using these approaches in first chapter. Brief description of selected quantities used in acoustics and model creation follows. Models of only the head and head with female and male torso are created, including mesh testing to determine suitable element size. Models created in this thesis focus on description of voice propagation primarily in front of body and on influence of torso on sound propagation. Inclusion of torso results in fluctuations in frequency domain in range from 1 000 Hz to 8 000 Hz, more pronounced near lower frquencies. In transverse plane the presence of torso manifests in lower SPL in front of mouth and higher SPL on the sides for several frequencies. Regions with decrease of SPL in front of mouth are coindicent with frequencies, where higher SPL on sides in comparision with direction in front of the mouth is evident. These observations are in agreement with other works. No significant differences were observed between models with different torsos in the transverse plane. Below the transverse plane differences between models with different torsos can be observed, for example for some frequencies decrease in SPL isn't observed in front of mouth in directivity diagrams for model with male torso.
Linear prediciton and cepstral synthesis of speech signal in the TTS system
Mekyska, Jiří ; Stejskal, Vojtěch (referee) ; Smékal, Zdeněk (advisor)
This work deals with a linear prediction and cepstral synthesis of speech signal in the TTS (Text-to-Speech) systems with the opportunity of modeling the prosody. The work contains a description of speech signal in acoustic and phonetic plane, the principle of speech production and the way we can figure the speech signal in time and frequency domain. Next, there is the TTS block structure mentioned, whereas each block has its own detailed description. In the work, the modeling of prosody using the three most important suprasegmental features (fundamental tone, continuation and speech intensity) is also described. At the end of this work, there is a design and realization of universal Czech TTS system which is based on the speech synthesis in frequency domain. This system is implemented in program MATLAB.
Modification of Speech Rate
Kovářík, Aleš ; Schwarz, Petr (referee) ; Szőke, Igor (advisor)
This diploma thesis discusses modification of a speech rate. The PSOLA (Pitch Synchronous OverLap Add) method was used for the rate modification. This algorithm works in time domain. Another method -- phase vocoder, which works in frequency domain is also presented in an overview. This thesis extends the PSOLA method with a phoneme recognition, which allows for better understandability of the speech output by considering characteristics of the phonemes beeing pronounced. To examine this proposed method, an application connecting PSOLA and a phoneme recognizer was developed.
Database of vocal samples of human emotions
Hlavica, Michal ; Přinosil, Jiří (referee) ; Atassi, Hicham (advisor)
In this bachelor work is analyzed theory of emotions, how emotions arise and how they are physiologically expressed by human body. How these physiological expressions and emotions reflect into the human speech. Then is described process of creating of speech and basic prosodic and acoustic parameters relevant for research. Theory of creating of databases is described here as well, which is quality ground for database itself. The database is also part of this thesis and they are records cut from television programmes and serials. The next very important issue is description of software tool for subjective evaluating of databases, which was created as a part of this thesis. It was created in C++ language with help by compiler Builder C++ . Also a short analysis of exemplary records for every emotion is done here. This analysis deals with basic frequency, intensity and first three formants.
LPC Speech Coding
Zapletal, Ondřej ; Kyselý, František (referee) ; Rajmic, Pavel (advisor)
The contents of the thesis "LPC speech coding" are studies of this method of a parametric source coding, explanation of mathematical procedures that are used in it (linear prediction, autocorrelation, Levinson-Durbin algorithm, transfer to a form suitable for transmission, Chebyshev root searching polynomial method) and acquaintance with the signification and application of that method in real speech encoders. The task of the original project of this thesis is a description and simulation of a simple speech encoder based on LPC, which transforms a real speech signal into a bit flow, which contains all of the significant parameters for its backward reconstruction (LSF coefficients, pitch period, excitation level, voice detection - AMDF method). One part of this thesis is a discussion about currently used speech encoders.

National Repository of Grey Literature : 260 records found   previous11 - 20nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.