National Repository of Grey Literature 60 records found  beginprevious19 - 28nextend  jump to record: Search took 0.01 seconds. 
Extraction of Semantic Relations from Text
Pospíšil, Milan ; Schmidt, Marek (referee) ; Smrž, Pavel (advisor)
Today exists many semi-structured documents, whitch we want convert to structured form. Goal of this work is create a system, that make this task more automatized. That could be difficult problem, because most of these documents are not generated by computer, so system have to tolerate differences. We also need some semantic understanding, thats why we choose only domain of meeting minutes documents.
Extracting text data from the webpages
Mazal, Zdeněk ; Morský, Ondřej (referee) ; Fojtová, Lucie (advisor)
This work focus at data and especially text mining from Web pages, an overview of programs for downloading the text and ways of their extraction. It also contains an overview of the most frequently used programs for extracting data from internet. The output of this thesis is a Java program that can download text from a selection of servers and save them into xml le.
Automatically Updated Web Portal
Staněk, Petr ; Škoda, Petr (referee) ; Smrž, Pavel (advisor)
This bachelor's thesis is dedicated to the design and implementation of an automatically updated web portal that tries to resolve the shortcomings of the portals filled with other people's content. Furthermore, it presents a comparison of the existing scientific portals, it discusses the problems of extraction, saving and searching for information. General mechanisms are demonstrated on the European research projects portal, which removes the shortcomings of CORDIS, the official information portal for European research and development. The thesis takes the existing product as a prototype and its aim is to improve the quality of the extraction and extend the system to detect any potential problems and notified an administrator of them. This was achieved by increasing the robustness and speed of the extractor, by registering all the important events associated with the extraction and, on the other side, the implementation of the separate administrator section of the web portal, which informs the administrator about problems and offers the problem-solving devices.
A Tool for Recognition and Verification of Spedition Orders
Kalivoda, Vojtěch ; Hradiš, Michal (referee) ; Herout, Adam (advisor)
The aim of this work is to design and implement a web tool that will facilitate the work of dispatchers of forwarding and transport companies through automated recognition of important information in orders. Thanks to the recognition, not all information has to be manually rewritten by dispatchers, which saves time. Order recognition is based on finding entities in a document, representing its surroundings with vectors using word2vec models and subsequent classification using convolutional neural networks. The tool can recognize 20 types of information in real time with an average success rate of 72.35~\%. As part of the work, a dataset of almost 1~700 orders was collected and 141 of them were annotated. Part of the work is a web application that serves as an interface for the tool and data collection.
Consistency Checking of Relations Extracted from Text
Stejskal, Jakub ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This bachelor thesis is dedicated to mechanical techniques that are used in the natural language processing and information extraction from particular text. It is approaching the general methods that starting to process the raw text and it continues to the relations extraction from processed language constructs, moreover it provides options for the use of obtained relational data which can be seen for example in the project DBpedia. Another milestone of the described bachelor thesis is the design and implementation of an automated system for extracting information about entities, which do not have their own article on the English version of Wikipedia. Thesis also presents algorithms developed for the extraction of entities with their own name, the verification of the articles ‘existence of the extracted entities and finally the actual extraction of information about individual entities, which can be used during the information consistency checking. In the end, it can be seen the results and suggestions for further development of the created system.
Document Information Extraction
Janík, Roman ; Špaňhel, Jakub (referee) ; Hradiš, Michal (advisor)
S rozvojem digitalizace přichází potřeba analýzy historických dokumentů. Důležitou úlohou pro extrakci informací a dolování dat je rozpoznávání pojmenovaných entit. Cílem této práce je vyvinout systém pro extrakci informací z českých historických dokumentů, jako jsou noviny, kroniky a matriční knihy. Byl navržen systém pro extrakci informací, jehož vstupem jsou naskenované historické dokumenty zpracované OCR algoritmem. Systém je založen na modifikovaném modelu RoBERTa. Extrakce informací z českých historických dokumentů přináší výzvy v podobě nutnosti vhodného korpusu pro historickou Češtinu. Pro trénování systému byly použity korpusy Czech Named Entity Corpus (CNEC) a Czech Historical Named Entity Corpus (CHNEC), spolu s mým vlastním vytvořeným korpusem. Systém dosahuje úspěšnosti 88,85 F1 skóre na CNEC a 87,19 F1 skóre na CHNEC. Toto je zlepšení o 1,36 F1 u CNEC a 5,19 F1 u CHNEC a tedy nejlepší známé výsledky.
Distributed Tool for Extraction of Information from Network Flows
Sedlák, Michal ; Grégr, Matěj (referee) ; Žádník, Martin (advisor)
This work deals with the extraction of information from flow records that are the result of network monitoring by the IPFIX system. The goal of the work is to design a tool that allows querying stored network flows created by the open-source collector IPFIXcol2. Querying is performed with the highest possible efficiency and performance in mind, which is achieved by using appropriate data structures and thread-level parallelization, as well as by using multiple machines.
Visual Pattern Detection in Web Pages
Kotraš, Martin ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
The work solves the extraction of information from websites using the technique of searching for visual patterns - spatial relations between areas on the website and the same visual styles of these areas - with the extension of new techniques to improve results. It uses a user-specified ontological data model, which describes which data items will be extracted from the specified web page and how the individual items on the page look, mainly from a text point of view. As part of the work, a console application VizGet in Java was created using the FitLayout framework to obtain a visual model of the website. Testing the application on 7 different domains, including a list of the best movies, e-shop products, or weather forecasts, showed that the success rate of the application ranges in about 75 % of subtests above 85 % F-score and in more than 90 % of subtests above 60 % F-score, where 45 % of subtests achieve an F-score of 100 %. The VizGet application can thus be deployed for practical use in non-critical applications, while it is open to further extensions and possibilities for improvement.
A Tool for Recognition and Verification of Spedition Orders
Kalivoda, Vojtěch ; Hradiš, Michal (referee) ; Herout, Adam (advisor)
The aim of this work is to design and implement a web tool that will facilitate the work of dispatchers of forwarding and transport companies through automated recognition of important information in orders. Thanks to the recognition, not all information has to be manually rewritten by dispatchers, which saves time. Order recognition is based on finding entities in a document, representing its surroundings with vectors using word2vec models and subsequent classification using convolutional neural networks. The tool can recognize 20 types of information in real time with an average success rate of 72.35~\%. As part of the work, a dataset of almost 1~700 orders was collected and 141 of them were annotated. Part of the work is a web application that serves as an interface for the tool and data collection.
Detecting semantic relations in texts and their integration with external data resources
Kríž, Vincent ; Vidová Hladká, Barbora (advisor)
We present a strategy to automate the extraction of semantic relations from texts. Both machine learning and rule-based techniques are investigated and the impact of different linguistic knowledge is analyzed for the various approaches. To implement the extraction system RExtractor, several natural language processing tools have been improved: from sentence splitting and tokenization modules to dependency syntax parsers. Furthermore, we created the Czech Legal Text Treebank with several layers of linguistic annotation, which is used to train and test each stage of the proposed system. As a result of the performed work, new Semantic Web resources and tools are available for automatic processing of texts.

National Repository of Grey Literature : 60 records found   beginprevious19 - 28nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.