National Repository of Grey Literature 75 records found  previous11 - 20nextend  jump to record: Search took 0.01 seconds. 
Visual Pattern Detection in Web Pages
Kotraš, Martin ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
The work solves the extraction of information from websites using the technique of searching for visual patterns - spatial relations between areas on the website and the same visual styles of these areas - with the extension of new techniques to improve results. It uses a user-specified ontological data model, which describes which data items will be extracted from the specified web page and how the individual items on the page look, mainly from a text point of view. As part of the work, a console application VizGet in Java was created using the FitLayout framework to obtain a visual model of the website. Testing the application on 7 different domains, including a list of the best movies, e-shop products, or weather forecasts, showed that the success rate of the application ranges in about 75 % of subtests above 85 % F-score and in more than 90 % of subtests above 60 % F-score, where 45 % of subtests achieve an F-score of 100 %. The VizGet application can thus be deployed for practical use in non-critical applications, while it is open to further extensions and possibilities for improvement.
Metadata Extraction from Scientific Papers
Lokaj, Tomáš ; Dytrych, Jaroslav (referee) ; Otrusina, Lubomír (advisor)
This work deals with the Metadata Extraction from Scienti c Papers. There is generally described issue of information extraction, focusing on the processing of text documents. There is also presented programme clanky2meta.py designed to search for relevant  information in scienti c publication, created by the author. At the end of this work is a comparsion of systems dealing with the same issue, especially with the CiteSeerX system.
Web Page Segmentation Algorithms Based on Clustering
Lengál, Tomáš ; Bartík, Vladimír (referee) ; Burget, Radek (advisor)
This report deals with segmentation of web pages, which is important discipline of information extraction. In the first part, we describe several general ways to implement it. After that we introduce method Box Clustering Segmentation, which comes with a slightly different approach towards segmentation. In the second half, we describe implementation of this method as a part of framework FITLayout and final testing.
Identifying Entity Types and Attributes Across Languages
Švub, Daniel ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
The target of this thesis is to analyze articles on the Wikipedia internet encyclopedia and to convert their text written in natural language into a structured database of persons, places and other entities. The essence of the implemented program is the determination of the type of entity based on its typical characteristics, and the extraction of the most important attributes of this entity in the Czech and Slovak languages. The result of this task is a knowledge base allowing simple searching and sorting of information. Thanks to its easy extensibility, it is possible to add identification of other types of entities and other features to the program, as well as a support of other languages.
Methods of Information Extraction
Adamček, Adam ; Smrž, Pavel (referee) ; Kouřil, Jan (advisor)
The goal of information extraction is to retrieve relational data from texts written in natural human language. Applications of such obtained information is wide - from text summarization, through ontology creation up to answering questions by QA systems. This work describes design and implementation of a system working in computer cluster which transforms a dump of Wikipedia articles to a set of extracted information that is stored in distributed RDF database with a possibility to query it using created user interface.
Encyclopedia Expert
Krč, Martin ; Schmidt, Marek (referee) ; Smrž, Pavel (advisor)
This project focuses on a system that answers questions formulated in natural language. Firstly, the report discusses problems associated with question answering systems and some commonly employed approaches. Emphasis is laid on shallow methods, which do not require many linguistic resources. The second part describes our work on a system that answers factoid questions, utilizing Czech Wikipedia as a source of information. Answer extraction is partly based on specific features of Wikipedia and partly on pre-defined patterns. Results show that for answering simple questions, the system provides significant improvements in comparison with a standard search engine.
Automatic Navigation on Private Websites
Kliment, Radek ; Rychlý, Marek (referee) ; Křivka, Zbyněk (advisor)
This thesis deals with technologies related to web pages and describes the navigation across them including the authentication to access their private sections and the user context management. It introduces the design of the mechanism for the automated navigation including new scripting language and tools for the visual description. The work also contains the design of the application using the mechanism and the implementation of its parts. The last chapter sums up the knowledge acquired by testing on various websites.
Extraction of Semantic Relations from Text
Pospíšil, Milan ; Schmidt, Marek (referee) ; Smrž, Pavel (advisor)
Today exists many semi-structured documents, whitch we want convert to structured form. Goal of this work is create a system, that make this task more automatized. That could be difficult problem, because most of these documents are not generated by computer, so system have to tolerate differences. We also need some semantic understanding, thats why we choose only domain of meeting minutes documents.
Extracting text data from the webpages
Mazal, Zdeněk ; Morský, Ondřej (referee) ; Fojtová, Lucie (advisor)
This work focus at data and especially text mining from Web pages, an overview of programs for downloading the text and ways of their extraction. It also contains an overview of the most frequently used programs for extracting data from internet. The output of this thesis is a Java program that can download text from a selection of servers and save them into xml le.
Automatically Updated Web Portal
Staněk, Petr ; Škoda, Petr (referee) ; Smrž, Pavel (advisor)
This bachelor's thesis is dedicated to the design and implementation of an automatically updated web portal that tries to resolve the shortcomings of the portals filled with other people's content. Furthermore, it presents a comparison of the existing scientific portals, it discusses the problems of extraction, saving and searching for information. General mechanisms are demonstrated on the European research projects portal, which removes the shortcomings of CORDIS, the official information portal for European research and development. The thesis takes the existing product as a prototype and its aim is to improve the quality of the extraction and extend the system to detect any potential problems and notified an administrator of them. This was achieved by increasing the robustness and speed of the extractor, by registering all the important events associated with the extraction and, on the other side, the implementation of the separate administrator section of the web portal, which informs the administrator about problems and offers the problem-solving devices.

National Repository of Grey Literature : 75 records found   previous11 - 20nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.