TRECVid Search Information Retrieval
Čeloud, David ; Mlích, Jozef (referee) ; Chmelař, Petr (advisor)
The master's thesis deals with Information Retrieval. It summarizes the knowledge in the field of Information Retrieval theory. Furthermore, the work gives an overview of models used in Information Retrieval, the data and the actual issues and their possible solutions. The practical part of the master's thesis is focused on the implementation of methods of information retrieval in textual data. The last part is dedicated to experiments validating the implementation and its possible improvements.
Wikipedia Page Classification
Suchý, Ondřej ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
The goal of this paper is to design and implement a system for selection of Wikipedia articles relevant to a given topic in order to reduce the amount of memory taken by its offline version. The solution of this problem was achieved with use of methods from information retrieval and theirs implementation using Elasticsearch search engine. The system tries to determine the area of user's interest by given keywords and make a selection of articles from that area. This is achieved by measuring of similarity of articles and adding all articles from frequent categories in the selection. The sizes of the output files for queries over Simple English Wikipedia are usually below 30 MB.
Information Retrieval in Text Data
Tkadlčík, Luboš ; Burget, Radek (referee) ; Bartík, Vladimír (advisor)
This thesis researches the issue of text data mining and information retrieval. It describes the most common representations of text documents and retrieval strategies. The aim of this thesis is design and implementation of application, which realises information retrieval via vector space model. The application implements three different ways of similarity calculation: cosine measure, the Jaccard coefficient and the Dice coefficient. Achieved results are assessed. Possible continuance of the project is outlined.
Mining of Textual Data from the Web for Speech Recognition
Kubalík, Jakub ; Plchot, Oldřich (referee) ; Mikolov, Tomáš (advisor)
Prvotním cílem tohoto projektu bylo prostudovat problematiku jazykového modelování pro rozpoznávání řeči a techniky pro získávání textových dat z Webu. Text představuje základní techniky rozpoznávání řeči a detailněji popisuje jazykové modely založené na statistických metodách. Zvláště se práce zabývá kriterii pro vyhodnocení kvality jazykových modelů a systémů pro rozpoznávání řeči. Text dále popisuje modely a techniky dolování dat, zvláště vyhledávání informací. Dále jsou představeny problémy spojené se získávání dat z webu, a v kontrastu s tím je představen vyhledávač Google. Součástí projektu byl návrh a implementace systému pro získávání textu z webu, jehož detailnímu popisu je věnována náležitá pozornost. Nicméně, hlavním cílem práce bylo ověřit, zda data získaná z Webu mohou mít nějaký přínos pro rozpoznávání řeči. Popsané techniky se tak snaží najít optimální způsob, jak data získaná z Webu použít pro zlepšení ukázkových jazykových modelů, ale i modelů nasazených v reálných rozpoznávacích systémech.
Multimodal Database Search
Krejčíř, Tomáš ; Stryka, Lukáš (referee) ; Chmelař, Petr (advisor)
The field that deals with storing and effective searching of multimedia documents is called Information retrieval. This paper describes solution of effective searching in collections of shots. Multimedia documents are presented as vectors in high-dimensional space, because in such collection of documents it is easier to define semantics as well as the mechanisms of searching. The work aims at problems of similarity searching based on metric space, which uses distance functions, such as Euclidean, Chebyshev or Mahalanobis, for comparing global features and cosine or binary rating for comparing local features. Experiments on the TRECVid dataset compare implemented distance functions. Best distance function for global features appears to be Mahalanobis and for local features cosine rating.
Information Retrieval in Research Portals
Ďulík, Jan ; Smrž, Pavel (referee) ; Schmidt, Marek (advisor)
This paper deals with the information retrieval in research portals with the intention of the retrieval in scientific publications. We define concepts related to the information retrieval, classification and knowledge representation. We also present existing search tools used as the initial inspiration for the design of the search intergace. Futhermore we describe the implementation as well as the process of collecting sample data. In the last chapter we discuss usability of the developed web application.
Automatically Updated Bibliography
Valo, Boris ; Škoda, Petr (referee) ; Smrž, Pavel (advisor)
This paper describes the development of application for automatically updated bibliography. Nowadays, many Internet users search informations they need, this is important especially in sets of scientific publications and articles. The aim of this thesis is convenient tool for users to create their own portal. This is achieved by storing documents and their subsequent search using ElasticSearch. Retrieval is made by Boolean queries and additional search using similarity search tool MoreLikeThis. At the end of this thesis is described the way of testing and evaluation of retrieval.
Video Retrieval
Černý, Petr ; Mlích, Jozef (referee) ; Chmelař, Petr (advisor)
This thesis summarizes the information retrieval theory, the relational model basic and focuses on the data indexing in relational database systems. The thesis focuses on multimedia data searching. It includes description of automatic multimedia data content extraction and multimedia data indexing. Practical part discusses design and solution implementation for improving query effectivity for multidimensional vector similarity which describes multimedia data. Thesis final part discusses experiments with this solution.
Information Retrieval
Šabatka, Pavel ; Bartík, Vladimír (referee) ; Chmelař, Petr (advisor)
The purpose of this thesis is a summary of theoretical knowledge in the field of information retrieval. This document contains mathematical models that can be used for information retrieval algorithms, including how to rank them. There are also examined the specifics of image and text data. The practical part is then an implementation of the algorithm in video shots of the TRECVid 2009 dataset based on high-level features. The uniqueness of this algorithm is to use internet search engines to obtain terms similarity. The work contains a detailed description of the implemented algorithm including the process of tuning and conclusions of its testing.
Techniques of Web Pages Indexing
Tužil, Jiří ; Burget, Radek (referee) ; Kunc, Michael (advisor)
This work addresses the techniques of information searching in the Internet. It describes the structure of the presented data and their conversion into information usable in searching process. It shows various approaches of PageRank, HITS and SALSA full-text algorithms, notifying about possible difficulties and inaccuracies as well as underlining the advantages of these search techniques. The work shows the design development and implementation of a sample full-text search tool.

