National Repository of Grey Literature 76 records found  beginprevious21 - 30nextend  jump to record: Search took 0.01 seconds. 
Automatically Updated Web Portal
Staněk, Petr ; Škoda, Petr (referee) ; Smrž, Pavel (advisor)
This bachelor's thesis is dedicated to the design and implementation of an automatically updated web portal that tries to resolve the shortcomings of the portals filled with other people's content. Furthermore, it presents a comparison of the existing scientific portals, it discusses the problems of extraction, saving and searching for information. General mechanisms are demonstrated on the European research projects portal, which removes the shortcomings of CORDIS, the official information portal for European research and development. The thesis takes the existing product as a prototype and its aim is to improve the quality of the extraction and extend the system to detect any potential problems and notified an administrator of them. This was achieved by increasing the robustness and speed of the extractor, by registering all the important events associated with the extraction and, on the other side, the implementation of the separate administrator section of the web portal, which informs the administrator about problems and offers the problem-solving devices.
Brno Communication Agent
Jurkovič, Juraj ; Fajčík, Martin (referee) ; Smrž, Pavel (advisor)
The goal of this thesis is explore and subsequently apply techniques and technical solutions in development of information agents. Thesis primarily focuses on solving individual sub tasks using state of the art systems, interconnecting these systems, their adoption for specific domain and implementation of individual modules of communication agent system. User interface is based on multi-platform chat application Telegram. Information extraction from user input is executed by Dialogflow. Several external services are used for user request fulfillment. Elasticsearch is used for searching structured data. For answering open domain questions from free text we use R-net implementation. The resulting can have both ,its knowledge base and range of requests it can fulfill, easily extended and can be deployed to chat platform of choice.
Administration Interface of an Information Extraction System
Gongol, Jakub ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
This thesis covers the subject of information extraction from Web. The main objective is the design and implementation of an administration interface for an extraction system implemented as web application on Java platform. The application provides an editor for extraction tasks specification in the form of interactive graphs. It includes the possibility to upload and process an ontology from a file and generate graph according to selected ontology properties. The solution ensures integration with the FITLayout tool.
A Tool for Recognition and Verification of Spedition Orders
Kalivoda, Vojtěch ; Hradiš, Michal (referee) ; Herout, Adam (advisor)
The aim of this work is to design and implement a web tool that will facilitate the work of dispatchers of forwarding and transport companies through automated recognition of important information in orders. Thanks to the recognition, not all information has to be manually rewritten by dispatchers, which saves time. Order recognition is based on finding entities in a document, representing its surroundings with vectors using word2vec models and subsequent classification using convolutional neural networks. The tool can recognize 20 types of information in real time with an average success rate of 72.35~\%. As part of the work, a dataset of almost 1~700 orders was collected and 141 of them were annotated. Part of the work is a web application that serves as an interface for the tool and data collection.
Extraction of information from identity documents
Hudcovský, Erik ; Lattenberg, Ivo (referee) ; Caha, Tomáš (advisor)
This thesis is about the processing information from personal documents (ID card or passport) into the form that is further easily to be processed for computers and the IT industry in general. This process is implemented by the application I developed as part of my bachelor's thesis. The application contains the scanned document, the document type and the form of the required output. As the output we get the document type in the required format. The entire application is using in process an external OCR tool (OpticalCharacter Recognition), which is implemented so that it can be easily replaced by another OCR tool. I used Tesseract in my application. This OCR tool is the simpliest and most accurate of the free OCR tools at the same time. It also has strong community support and is still being developed. In this thesis, I also focused on its testing, both on the samples of text I created, and on real scans of documents. The application is also processed as an installation package, so it can be easily imported into other projects. The entire application is displayed as OpenSource on GitHube under the free license of MIT.
Automated Extraction of Information from Emails
Kanda, Rastislav ; Zbořil, František (referee) ; Vídeňský, František (advisor)
The purpose of this thesis is to familiarize oneself with methodology of information extraction from text. On the basis of acquired knowledge, propose a design and implement a system, which should be capable of gathering information from email messages. Proposed system should help Kiwi.com s.r.o. with processing of incoming email messages from travel companies. In current situation it is possible to process those email messages automatically. However, to process those messages automatically, it is necessary to manually create a template suitable for extraction. Possible alteration could be algorithm ROBULA+, which can generate more robust XPath locator from given XPath locator. These locators should be more resistant to changes in the HTML structure. ROBULA+ algorithm is a central point of automated creation of templates suitable for parsing email messages. Implemented system can be qualified with satisfactory successivity (approximately 75%). This means that system is able to find reference to created reservation in three out of four cases.
Methods for Information Extraction in Text Documents
Sychra, Tomáš ; Burget, Radek (referee) ; Bartík, Vladimír (advisor)
Knowledge discovery in text documents is part of data mining. However, text documents have different properties in comparison to regular databases. This project contains an overview of methods for knowledge discovery in text documents. The most frequently used task in this area is document classification. Various approaches for text classification will be described. Finally, I will present algorithm Winnow that should perform better than any other algorithm for classification. There is a description of Winnow implementation and an overview of experimental results.
Consistency Checking of Relations Extracted from Text
Stejskal, Jakub ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This bachelor thesis is dedicated to mechanical techniques that are used in the natural language processing and information extraction from particular text. It is approaching the general methods that starting to process the raw text and it continues to the relations extraction from processed language constructs, moreover it provides options for the use of obtained relational data which can be seen for example in the project DBpedia. Another milestone of the described bachelor thesis is the design and implementation of an automated system for extracting information about entities, which do not have their own article on the English version of Wikipedia. Thesis also presents algorithms developed for the extraction of entities with their own name, the verification of the articles ‘existence of the extracted entities and finally the actual extraction of information about individual entities, which can be used during the information consistency checking. In the end, it can be seen the results and suggestions for further development of the created system.
Digital Steganography for Executables
Bever, Ľuboš ; Šimek, Václav (referee) ; Strnadel, Josef (advisor)
Steganography for executable files is the least common steganography. Research in this area has subsided after several, not many, attempts to implement it. The aim of this work is the implementation of existing methods and its modification proposal. Extensible software that has been created, can be also used to implement other methods. The implemented methods were properly tested,  evaluated and compared. The comparison results show, that the used instruction substitution method, roughly corresponds to its reference value 1/110, however the results are highly dependent on the input binaries. The proposed extension of this method averages a data rate of 1/84, which is only 1.5 times less than the value obtained from another existing implementation in which specialized software was used to search for equivalence classes. The maximum data rate obtained from test programs is 1/38.
Document Information Extraction
Janík, Roman ; Špaňhel, Jakub (referee) ; Hradiš, Michal (advisor)
S rozvojem digitalizace přichází potřeba analýzy historických dokumentů. Důležitou úlohou pro extrakci informací a dolování dat je rozpoznávání pojmenovaných entit. Cílem této práce je vyvinout systém pro extrakci informací z českých historických dokumentů, jako jsou noviny, kroniky a matriční knihy. Byl navržen systém pro extrakci informací, jehož vstupem jsou naskenované historické dokumenty zpracované OCR algoritmem. Systém je založen na modifikovaném modelu RoBERTa. Extrakce informací z českých historických dokumentů přináší výzvy v podobě nutnosti vhodného korpusu pro historickou Češtinu. Pro trénování systému byly použity korpusy Czech Named Entity Corpus (CNEC) a Czech Historical Named Entity Corpus (CHNEC), spolu s mým vlastním vytvořeným korpusem. Systém dosahuje úspěšnosti 88,85 F1 skóre na CNEC a 87,19 F1 skóre na CHNEC. Toto je zlepšení o 1,36 F1 u CNEC a 5,19 F1 u CHNEC a tedy nejlepší známé výsledky.

National Repository of Grey Literature : 76 records found   beginprevious21 - 30nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.