National Repository of Grey Literature 107 records found  beginprevious21 - 30nextend  jump to record: Search took 0.00 seconds. 
Designing a Multilingual Fact-Checking Dataset from Existing Question-Answering Data
Kamenický, Daniel ; Aparovich, Maksim (referee) ; Fajčík, Martin (advisor)
Tato práce se zabývá nedostatkem vícejazyčných datových sad pro kontrolu faktů, které by obsahovaly důkazy podporující nebo vyvracející fakt. Proto se tato práce zabývá převodem datového souboru pro kontrolu faktů z již existujícího datového souboru otázek a odpovědí. V této práci jsou studovány dva přístupy ke konverzi datové sady. Prvním přístupem je vytvoření datové sady založené na jednojazyčném předem natrénovaném seq-2-seq modelu T5. Model je trénován na anglickém datovém souboru. Vstupy a výstupy jsou překládány do požadovaných jazyků. Druhým přístupem je využití vícejazyčného modelu mT5, který přebírá vstup a generuje výstup v požadovaném jazyce. Pro vícejazyčný model je zapotřebí přeložit trénovací datové sady. Jako hlavní problém této práce se ukázal překlad, který v málo zdrojovém jazyce dosáhl kolem 30 % úspěšnosti. Experimenty ukázaly lepší výsledky v tvrzeních generovaných z jednojazyčného modelu s využitím strojového překladu. Na druhou stranu, tvrzení generované z vícejazyčného modelu dosáhly úspěšnosti 73 % oproti tvrzením z jednojazyčného modelu s dosaženou úspěšností 88 %. Modely byly vyhodnoceny modelem ověřování faktů založeném na TF-IDF. Dosažená přesnost modelu na obou datových sadách se blíží 0,5. Z toho lze usoudit, že výsledné datové sady mohou být náročné pro modely ověřování faktů.
Matching Images to Texts
Hajič, Jan ; Pecina, Pavel (advisor) ; Průša, Daniel (referee)
We build a joint multimodal model of text and images for automatically assigning illustrative images to journalistic articles. We approach the task as an unsupervised representation learning problem of finding a common representation that abstracts from individual modalities, inspired by multimodal Deep Boltzmann Machine of Srivastava and Salakhutdinov. We use state-of-the-art image content classification features obtained from the Convolutional Neural Network of Krizhevsky et al. as input "images" and entire documents instead of keywords as input texts. A deep learning and experiment management library Safire has been developed. We have not been able to create a successful retrieval system because of difficulties with training neural networks on the very sparse word observation. However, we have gained substantial understanding of the nature of these difficulties and thus are confident that we will be able to improve in future work.
Automatic suggestion of illustrative images
Odcházel, Ondřej ; Pecina, Pavel (advisor) ; Holub, Martin (referee)
The objective of this thesis is to implement a web application designed for recommendation of stock photos. The application gets the input from newspaper articles in Czech or English and, based on the text itself, suggests appropriate stock photos. The implemented application also searches images according to visual similarity. The thesis deals with theoretical aspects of keywords extraction and language of text detection. Further it analyzes possibilities of efficient search for similar vectors that are used in the search component for visually similar images. It also describes the possibilities in development of modern web frontend and backend. The quality of algorithm for recommending stock photos is tested on users. Powered by TCPDF (www.tcpdf.org)
Searching relevant articles in extensive collections
Vojt, Ján ; Novák, Jiří (advisor) ; Bartoš, Tomáš (referee)
Searching text in articles is usually implemented with fulltext search. Using more advanced techniques however, it is possible to achieve significantly better results. The subject of this work is to create a universal library for searching extensible collections, specialized in czech language. The library makes use of tools capable of working with morphology while considering importance of words. It also conducts an experiment with word pairs, which adds context into the search process. The success rate of this experiment is tried on an extensible collection of data. Created library is a unique tool for processing extensible collections of czech text, while at the same time it is ready for further extension by new languages and methods.
Analysis of Information Sources for Development of Software Application for Printing Industry
Urbánek, Matyáš ; Basl, Josef (advisor) ; Lipková, Helena (referee)
Diploma work is targeted on analysis and description of information sources for software application and management information systems in printing industry. At first information needs and information overloading are shortly defined. Next are descripted concepts of printing industry, software for printing industry and relatives concepts of management information systems and workflow systems. In next chapter keywords for information retrieval was arranged in hierarchical tree structures. Core specialised journals and usefull information sources were found within information retrieval. In work are involved both deep web sources and surface web information sources. Found information object are descripted and cited by ISO 690. At the end of work are given categorised bibliographic lists and simple web page project for specialised information portal.
Semantic relation extraction from unstructured data in the business domain
Rampula, Ilana ; Pecina, Pavel (advisor) ; Kuboň, Vladislav (referee)
Text analytics in the business domain is a growing field in research and practical applications. We chose to concentrate on Relation Extraction from unstructured data which was provided by a corporate partner. Analyzing text from this domain requires a different approach, counting with irregularities and domain specific attributes. In this thesis, we present two methods for relation extraction. The Snowball system and the Distant Supervision method were both adapted for the unique data. The methods were implemented to use both structured and unstructured data from the database of the company. Keywords: Information Retrieval, Relation Extraction, Text Analytics, Distant Supervision, Snowball
Intelligent information retrieval and its trends
Pačísková, Jana ; Papík, Richard (advisor) ; Ivánek, Jiří (referee)
This thesis is focused on information retrieval in the context of its historical development, it presents trends in integration of intelligent features in it, and thus the emergence of intelligent information retrieval. Individual intelligent elements are described in a separate chapter, following chapter then introduces their use, including specific examples. Thesis also traces research on the topic of intelligent information retrieval in selected institutions both in the Czech republic and abroad; results of this survey for Czech republic are presented in the enclosed search.
Brno Communication Agent
Křištof, Jiří ; Fajčík, Martin (referee) ; Smrž, Pavel (advisor)
The aim of this thesis is the implementation of a communication agent, which provides information about Brno. The communication agent uses three - tier architecture . For the question answering , machine learning and neural network techniques are used . User tests determined the success rate 84 %. 58 % of the primary users were satisfied with the system. Main benefit of the work is facilitating the retrieving of information about Brno for its residents and visitors .
Multilingual Open-Domain Question Answering
Slávka, Michal ; Dočekal, Martin (referee) ; Fajčík, Martin (advisor)
Táto práca sa zaoberá automatickým viacjazyčným zodpovedaním na otázky v otvorenej doméne. V tejto práci sú navrhnuté prístupy k tejto málo prebádanej doméne. Konkrétne skúma, či: (i) použitie prekladu z angličtiny je dostačujúce, (ii) multilinguálne systémy vedia využiť preklad otázky do iných jazykov (iii) alebo je výhodnejšie nepoužívať žiaden preklad. Porovnávam použitie anglického systému založeného na modeli T5, ktorý využíva strojový preklad s natívne viacjazyčnými systémami založenými na viacjazyčnom modeli MT5. Anglický systém so strojovým prekladom mierne prekonáva svoje jednojazyčné náprotivky vo viacerých úlohách. Napriek tomu, že tento model bol natrénovaný na väčšom množstve dát zlepšenie nie je dostatočne signifikantné. To ukazuje, že použitie natívne viacjazyčných systémov je sľubným prístupom pre budúci výskum. Tiež prezentujem metódu získavania dokumentov v rôznych jazykoch pomocou algoritmu BM25 a porovnávam ju s anglickým retrievalom. Používanie viacjazyčných dôkazov sa javí ako prospešné a zlepšuje výkonnosť systému systémov.
Syntax in methods for information retrieval
Straková, Jana
Title: Information Retrieval Using Syntax Information Author: Bc. Jana Kravalová Department: Institute of Formal and Applied Linguistics Supervisor: Mgr. Pavel Pecina, Ph.D. Supervisor's e-mail address: pecina@ufal.mff.cuni.cz Abstract: In the last years, application of language modeling in infor- mation retrieval has been studied quite extensively. Although language models of any type can be used with this approach, only traditional n-gram models based on surface word order have been employed and described in published experiments (often only unigram language models). The goal of this thesis is to design, implement, and evaluate (on Czech data) a method which would extend a language model with syntactic information, automatically obtained from documents and queries. We attempt to incorporate syntactic information into language models and experimentally compare this approach with uni- gram and bigram model based on surface word order. We also empirically compare methods for smoothing, stemming and lemmatization, effectiveness of using stopwords and pseudo relevance feedback. We perform a detailed ana- lysis of these retrieval methods and describe their performance in detail. Keywords: information retrieval, language modelling, depenency syntax, smo- othing

National Repository of Grey Literature : 107 records found   beginprevious21 - 30nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.