National Repository of Grey Literature 107 records found  1 - 10nextend  jump to record: Search took 0.01 seconds. 
Search in speech recordings based on semantic vectors
Boboš, Dominik ; Karafiát, Martin (referee) ; Schwarz, Petr (advisor)
V současné době přetížené informacemi jsou efektivní metody vyhledávání informací velice žádané. Tato práce shrnuje metody pro získávání vektorových reprezentací pro text a zvuk, známé také jako sémantické vektory. Podívali jsme se hlouběji na multimodální mo\-de\-ly, jako jsou SpeechT5 a SeamlessM4T, které transformují tyto typy vstupu do jednoho sdíleného vektorového prostoru. Na základě těchto modelů jsme vybudovali systém, který nám umožňuje vyhledávat v datech bez ohledu na modalitu. Abychom mohli vyhodnotit navrhované řešení, kromě standardního rozpoznávání klíčových slov, také pro úlohy sémantického vyhledávání, manuálně jsme označili datovou sadu pro zachycení podobných sémantických významů klíčových slov nebo frází. Nakonec jsme provedli několik experimentů, kde jsme prozkoumali možnosti modelů omezením pozorovaného kontextu během dotrénovaní neuronové sítě nebo zapojením systémů převodu textu na řeč (TTS) ke zlepšení celkového výkonu.
Search engine for the BUT website
Vŕbik, Pavol ; Veigend, Petr (referee) ; Dytrych, Jaroslav (advisor)
The goal of this thesis is to design and implement a new search for the BUT IS using a full-text search tool. The originally used search was causing excessive load on the database, and therefore, needed to be replaced. Based on the analysis performed, Elasticsearch was selected as a suitable tool for full-text search. For this tool, text parsers were prepared to allow linguistic analysis in Czech and English. To synchronize the data between the central database and Elasticsearch, a tool was implemented that runs at regular intervals to keep the search up-to-date. The result of the work is a new search integrated into the search engines in the public part of the BUT information system.
TRECVid Search Information Retrieval
Čeloud, David ; Mlích, Jozef (referee) ; Chmelař, Petr (advisor)
The master's thesis deals with Information Retrieval. It summarizes the knowledge in the field of Information Retrieval theory. Furthermore, the work gives an overview of models used in Information Retrieval, the data and the actual issues and their possible solutions. The practical part of the master's thesis is focused on the implementation of methods of information retrieval in textual data. The last part is dedicated to experiments validating the implementation and its possible improvements.
Wikipedia Page Classification
Suchý, Ondřej ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
The goal of this paper is to design and implement a system for selection of Wikipedia articles relevant to a given topic in order to reduce the amount of memory taken by its offline version. The solution of this problem was achieved with use of methods from information retrieval and theirs implementation using Elasticsearch search engine. The system tries to determine the area of user's interest by given keywords and make a selection of articles from that area. This is achieved by measuring of similarity of articles and adding all articles from frequent categories in the selection. The sizes of the output files for queries over Simple English Wikipedia are usually below 30 MB.
Brno Communication Agent
Křištof, Jiří ; Fajčík, Martin (referee) ; Smrž, Pavel (advisor)
The aim of this thesis is the implementation of a communication agent, which provides information about Brno. The communication agent uses three - tier architecture . For the question answering , machine learning and neural network techniques are used . User tests determined the success rate 84 %. 58 % of the primary users were satisfied with the system. Main benefit of the work is facilitating the retrieving of information about Brno for its residents and visitors .
Stemming Methods Used in Text Mining
Adámek, Tomáš ; Chmelař, Petr (referee) ; Bartík, Vladimír (advisor)
The main theme of this master's thesis is a description of text mining. This document is specialized to English texts and their automatic data preprocessing. The main part of this thesis analyses various stemming algorithms (Lovins, Porter and Paice/Husk). Stemming is a procedure for automatic conflating semantically related terms together via the use of rule sets. Next part of this thesis describes design of an application for various types of stemming algorithms. Application is based on the Java platform with using of graphic library Swing and MVC architecture. Next chapter contains description of implementation of the application and stemming algorithms. In the last part of this master's thesis experiments with stemming algorithms and comparing the algorithm from viewpoint to the results of classification the text are described.
Information Retrieval in Text Data
Tkadlčík, Luboš ; Burget, Radek (referee) ; Bartík, Vladimír (advisor)
This thesis researches the issue of text data mining and information retrieval. It describes the most common representations of text documents and retrieval strategies. The aim of this thesis is design and implementation of application, which realises information retrieval via vector space model. The application implements three different ways of similarity calculation: cosine measure, the Jaccard coefficient and the Dice coefficient. Achieved results are assessed. Possible continuance of the project is outlined.
Library for Support of ReReSearch System Development
Heller, Stanislav ; Otrusina, Lubomír (referee) ; Šperka, Svatopluk (advisor)
At this time, the development of the ReReSearch system is significantly slowed down by mutual incompatibility of system modules, by the fact that developers often repeat already known mistakes and of course by poor communication between developers in general. To solve this problem, there was a need to create a component which would implement and unify often performed tasks in development of ReReSearch system and this way to spend time of ReReSearch developers. The result of this effort is so-called "rrslib" - a Python library, which is supposed to be a helper for everyone, who works on parts of ReReSearch project: database, data extractors, web-based agents, crawlers, XML-processing etc. The library should serve for more consistent, faster and more reliable development of ReReSearch system.
Mining of Textual Data from the Web for Speech Recognition
Kubalík, Jakub ; Plchot, Oldřich (referee) ; Mikolov, Tomáš (advisor)
Prvotním cílem tohoto projektu bylo prostudovat problematiku jazykového modelování pro rozpoznávání řeči a techniky pro získávání textových dat z Webu. Text představuje základní techniky rozpoznávání řeči a detailněji popisuje jazykové modely založené na statistických metodách. Zvláště se práce zabývá kriterii pro vyhodnocení kvality jazykových modelů a systémů pro rozpoznávání řeči. Text dále popisuje modely a techniky dolování dat, zvláště vyhledávání informací. Dále jsou představeny problémy spojené se získávání dat z webu, a v kontrastu s tím je představen vyhledávač Google. Součástí projektu byl návrh a implementace systému pro získávání textu z webu, jehož detailnímu popisu je věnována náležitá pozornost. Nicméně, hlavním cílem práce bylo ověřit, zda data získaná z Webu mohou mít nějaký přínos pro rozpoznávání řeči. Popsané techniky se tak snaží najít optimální způsob, jak data získaná z Webu použít pro zlepšení ukázkových jazykových modelů, ale i modelů nasazených v reálných rozpoznávacích systémech.
Information Retrieval in Czech Wikipedia
Balgar, Marek ; Bartík, Vladimír (referee) ; Chmelař, Petr (advisor)
The main task of this Masters Thesis is to understand questions of information retrieval and text classifi cation. The main research is focused on the text data, the semantic dictionaries and especially the knowledges inferred from the Wikipedia. In this thesis is also described implementation of the querying system, which is based on achieved knowledges. Finally properties and possible improvements of the system are talked over.

National Repository of Grey Literature : 107 records found   1 - 10nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.