keywords:"speech recognition" - Search Results - Digital Repository

guest :: login Digital Repository
		Search		Submit		Help		About

Home > Search Results: keywords:"speech recognition"

Search:

Search Tips :: Advanced Search

Search collections:

Sort by:	Display results:	Output format:

	Automatic Generating of Subtitles by Speech Recognizer Csintalan, György ; Plchot, Oldřich (referee) ; Schwarz, Petr (advisor) This bachelor thesis describes developing an application for automatic generation of subtitles for films using BSAPI (Brno Speech Application Interface). At first the reader is guided to the problematics of this task and the aim of this work is described. In the next section the speech recognition API (BSAPI) is described from a theoretical point of wiev. After this, the next section is about extracting voice channel from video. The next chapter describes the implementation of the application. Further, experiments in different situations are described and possible improvements are discussed in order to achieve better outputs, for example speech enhancement by Wiener filter. In the conclusion are discussed achieved results and experiments. Detailed record
	Signal processing by hidden Markov models Hampl, Jindřich ; Pfeifer, Václav (referee) ; Sigmund, Milan (advisor) One of the most common methods for isolated words recognition is based on Hidden Markov models. Speech signal can be considered as a sequence of successive parts of the signal with specific statistical parameters. Hidden Markov model corresponds to the statistical model with the final number of states, which may be useful for signals such as speech. HTK module is a software tools, which is mostly used to work with hidden Markov models. Detailed record
	Impact of Environment Acoustics on Speech Recognition Accuracy Paliesek, Jakub ; Karafiát, Martin (referee) ; Szőke, Igor (advisor) This diploma thesis deals with impact of room acoustics on automatic speech recognition (ASR) accuracy. Experiments were evaluated on speech corpus LibriSpeech and database of impulse responses and noise called ReverbDB. Used ASRs were based on Mini LibriSpeech recipe for Kaldi. First it was examined how well can ASR learn to transcribe in selected environments by using the same acoustic conditions during training and testing. Next, experiments were carried out with modifications of ASR architecture in order to achieve better robustness against new conditions by using methods for adapation to room acoustics - r-vectors and i-vectors. It was shown that recently proposed method of r-vectors is beneficial even when using real impulse responses for data augmentation. Detailed record
	STATISTICAL LANGUAGE MODELS BASED ON NEURAL NETWORKS Mikolov, Tomáš ; Zweig, Geoffrey (referee) ; Hajič,, Jan (referee) ; Černocký, Jan (advisor) Statistické jazykové modely jsou důležitou součástí mnoha úspěšných aplikací, mezi něž patří například automatické rozpoznávání řeči a strojový překlad (příkladem je známá aplikace Google Translate). Tradiční techniky pro odhad těchto modelů jsou založeny na tzv. N-gramech. Navzdory známým nedostatkům těchto technik a obrovskému úsilí výzkumných skupin napříč mnoha oblastmi (rozpoznávání řeči, automatický překlad, neuroscience, umělá inteligence, zpracování přirozeného jazyka, komprese dat, psychologie atd.), N-gramy v podstatě zůstaly nejúspěšnější technikou. Cílem této práce je prezentace několika architektur jazykových modelůzaložených na neuronových sítích. Ačkoliv jsou tyto modely výpočetně náročnější než N-gramové modely, s technikami vyvinutými v této práci je možné jejich efektivní použití v reálných aplikacích. Dosažené snížení počtu chyb při rozpoznávání řeči oproti nejlepším N-gramovým modelům dosahuje 20%. Model založený na rekurentní neurovové síti dosahuje nejlepších publikovaných výsledků na velmi známé datové sadě (Penn Treebank). Detailed record
	Far-Field Speech Recognition Žmolíková, Kateřina ; Malenovský, Vladimír (referee) ; Černocký, Jan (advisor) Systémy rozpoznávání řeči v dnešní době dosahují poměrně vysoké úspěšnosti. V případě řeči, která je snímána vzdáleným mikrofonem a je tak narušena množstvím šumu a dozvukem (reverberací), je ale přesnost rozpoznávání značně zhoršena. Tento problém je možné zmírnit využitím mikrofonních polí. Tato práce se zabývá technikami, které umožňují kombinovat signály z více mikrofonů tak, aby byla zlepšena kvalita výsledného signálu a tedy i přesnost rozpoznávání. Práce nejprve shrnuje teorii rozpoznávání řeči a uvádí nejpoužívanější algoritmy pro zpracování mikrofonních polí. Následně jsou demonstrovány a analyzovány výsledky použití dvou metod pro beamforming a metody dereverberace vícekanálových signálů. Na závěr je vyzkoušen alternativní způsob beamformingu za použití neuronových sítí. Detailed record
	Semi-Supervised Training of Deep Neural Networks for Speech Recognition Veselý, Karel ; Ircing, Pavel (referee) ; Lamel, Lori (referee) ; Burget, Lukáš (advisor) V této dizertační práci nejprve prezentujeme teorii trénování neuronových sítí pro rozpoznávání řeči společně s implementací trénovacího receptu 'nnet1', který je součástí toolkitu s otevřeným kódem Kaldi. Recept se skládá z předtrénování bez učitele pomocí algoritmu RBM, trénování klasifikátoru z řečových rámců s kriteriální funkcí Cross-entropy a ze sekvenčního trénování po větách s kriteriální funkcí sMBR. Následuje hlavní téma práce, kterým je semi-supervised trénování se smíšenými daty s přepisem i bez přepisu. Inspirováni konferenčními články a úvodními experimenty jsme se zaměřili na několik otázek: Nejprve na to, zda je lepší konfidence (t.j. důvěryhodnosti automaticky získaných anotací) počítat po větách, po slovech nebo po řečových rámcích. Dále na to, zda by konfidence měly být použity pro výběr dat nebo váhování dat - oba přístupy jsou kompatibilní s trénováním pomocí metody stochastického nejstrmějšího sestupu, kde jsou gradienty řečových rámců násobeny vahou. Dále jsme se zabývali vylepšováním semi-supervised trénování pomocí kalibrace kofidencí a přístupy, jak model dále vylepšit pomocí dat se správným přepisem. Nakonec jsme navrhli jednoduchý recept, pro který není nutné časově náročné ladění hyper-parametrů trénování, a který je prakticky využitelný pro různé datové sady. Experimenty probíhaly na několika sadách řečových dat: pro rozpoznávač vietnamštiny s 10 přepsaným hodinami (Babel) se chybovost snížila o 2.5%, pro angličtinu se 14 přepsanými hodinami (Switchboard) se chybovost snížila o 3.2%. Zjistili jsme, že je poměrně těžké dále vylepšit přesnost systému pomocí úprav konfidencí, zároveň jsme ale přesvědčení, že naše závěry mají značnou praktickou hodnotu: data bez přepisu je jednoduché nasbírat a naše navrhované řešení přináší dobrá zlepšení úspěšnosti a není těžké je replikovat. Detailed record
	Voice commands recognition in audiosignal Šrámek, Martin ; Grepl, Robert (referee) ; Krejsa, Jiří (advisor) You are holding in your hands the Bachelor thesis which deals with design and realizing of isolated voice recognition system. The motivation of this thesis was an interest in remote control of robotic mechanisms by voice and a research of speech signal processing. It is widely developed these days. The thesis is divided into two parts. The first one is concerned with summarizing of recognition knowledge, in the second one this knowledge is used in design of a system. Detailed record
	Radar Interface and Its Interconnection to Air Traffic Control System Simulation Buchníčková, Tereza ; Ondřej, Karel (referee) ; Smrž, Pavel (advisor) This thesis aims to create an application for training new air traffic control officers. The system is implemented as a JavaScript web application using the JQuery and Leaflet libraries. The server part is written in Python using the BlueSky library for air traffic simulation. The thesis presents a theoretical background and discussed the design and implementation of the system. The result is an application that offers to display current air traffic, or where the user, in the role of air traffic control officer, can practice communication with a pilot on simulated air traffic. The application allows the recording of voice communication and, in cooperation with an automatic speech recognition system, converts this communication into text displayed on the screen. In addition to the support of the training of air traffic operators, this application also serves as a demonstration of the results of the research groups KnoT and Speech from the Faculty of Information Technology, Brno University of Technology. Detailed record
	Voice Control of Industrial and Medical Devices in Noisy Environments Vymětalíková, Lucie ; Matoušek, Radomil (referee) ; Dobrovský, Ladislav (advisor) This diploma thesis deals with voice control of industrial and medical devices in noisy environments. Different speech recognition models and methods for noise supression in speech signals are compared. Based on the research and conducted testing, a custom voice control system is designed. The system consists of a wake word detection model and a model for the predefined commands recognition. An audio response for the operator and a script execution based on the recognized commands is also implemented in the system. A modification for automatic door opening of the OpenTube2 laboratory box was designed. Detailed record
	High Level Analysis of the Psychotherapy Sessions Polok, Alexander ; Karafiát, Martin (referee) ; Matějka, Pavel (advisor) This work focuses on analyzing psychotherapy sessions within the DeePsy research project. This work aims to design and develop features that model the session dynamics, which can reveal seemingly subtle nuances. The mentioned features are automatically extracted from the source recording using neural networks. They are further processed, compared across sessions, and displayed graphically, creating a document that acts as a feedback document about the session for the therapist. Furthermore, this assistive tool can help therapists to professionally grow and to provide better psychotherapy in the future. A relative improvement in voice activity detection of 37.82% was achieved. The VBx diarization system was generalized to converge to two speakers with a minimum relative error rate degradation of 0.66%. An automatic speech recognition system has been trained with a 17.06% relative improvement over the best available hybrid model. Models for sentiment classification, type of therapeutic interventions, and overlapping speech detection were also trained. Detailed record

Interested in being notified about new results for this query?
Subscribe to the RSS feed.

Digital Repository :: :: :: ::
Powered by v1.1.2
Maintained by

This site is also available in the following languages:
Česky English