National Repository of Grey Literature 5 records found  Search took 0.01 seconds. 
Extraction of unspecified relations from the web
Ovečka, Marek ; Svátek, Vojtěch (advisor) ; Labský, Martin (referee)
The subject of this thesis is non-specific knowledge extraction from the web. In recent years, tools that improve the results of this type of knowledge extraction were created. The aim of this thesis is to become familiar with these tools, test and propose the use of results. In this thesis these tools are described and compared and extraction is carried out using OLLIE. Based on the results of the extractions, two methods of enriching extractions using name entity recognition, are proposed. The first method proposes to modify the weights of extractions and second proposes the enrichment of extractions by named entities. The paper proposed ontology, which allows to capture the structure of enriched extractions. In the last part practical experiment is carried out, in which the proposed methods are demonstrated. Future research in this field would be useful in areas of extraction and categorization of relational phrases.
Mobile personal assistants
Techl, Jan ; Sigmund, Tomáš (advisor) ; Labský, Martin (referee)
This thesis focuses on analysis, definition and description of mobile personal assistants as a phenomenon emerging in past few years. Mobile personal assistants are first mentioned in the context of computational linguistics and information needs, which is one of the motivations to use them. Main interest of this thesis is an introduction of the core technologies for the natural language communication between the assistant and its user, followed by an introduction of host environments and possible usage. The thesis also presents the limitations and risks resulting from using them, which are in some ways affecting their usability. Beside the analysis the main focus is on the design and implementation of the natural language understanding (NLU) system, which can be used in particular personal assistant application. This system is implemented as a web service and consists of an annotation scheme with a set of components. The results show that the system architecture and tools used are suitable solution for the construction of a basic NLU system, which has been created and which is in the compliance with the requested parameters. It is still difficult task to achieve high precision, which depends on many factors including the amount of training data, which was very small in this case. However, the resulting application is a solid starting point for its further development and extensions.
Extracting Structured Data from Czech Web Using Extraction Ontologies
Pouzar, Aleš ; Svátek, Vojtěch (advisor) ; Labský, Martin (referee)
The presented thesis deals with the task of automatic information extraction from HTML documents for two selected domains. Laptop offers are extracted from e-shops and free-published job offerings are extracted from company sites. The extraction process outputs structured data of high granularity grouped into data records, in which corresponding semantic label is assigned to each data item. The task was performed using the extraction system Ex, which combines two approaches: manually written rules and supervised machine learning algorithms. Due to the expert knowledge in the form of extraction rules the lack of training data could be overcome. The rules are independent of the specific formatting structure so that one extraction model could be used for heterogeneous set of documents. The achieved success of the extraction process in the case of laptop offers showed that extraction ontology describing one or a few product types could be combined with wrapper induction methods to automatically extract all product type offers on a web scale with minimum human effort.
Extrakce informací z webových stránek pomoci extrakčních ontologií
Labský, Martin ; Berka, Petr (advisor) ; Strossa, Petr (referee) ; Vojtáš, Peter (referee) ; Snášel, Václav (referee)
Automatic information extraction (IE) from various types of text became very popular during the last decade. Owing to information overload, there are many practical applications that can utilize semantically labelled data extracted from textual sources like the Internet, emails, intranet documents and even conventional sources like newspaper and magazines. Applications of IE exist in many areas of computer science: information retrieval systems, question answering or website quality assessment. This work focuses on developing IE methods and tools that are particularly suited to extraction from semi-structured documents such as web pages and to situations where available training data is limited. The main contribution of this thesis is the proposed approach of extended extraction ontologies. It attempts to combine extraction evidence from three distinct sources: (1) manually specified extraction knowledge, (2) existing training data and (3) formatting regularities that are often present in online documents. The underlying hypothesis is that using extraction evidence of all three types by the extraction algorithm can help improve its extraction accuracy and robustness. The motivation for this work has been the lack of described methods and tools that would exploit these extraction evidence types at the same time. This thesis first describes a statistically trained approach to IE based on Hidden Markov Models which integrates with a picture classification algorithm in order to extract product offers from the Internet, including textual items as well as images. This approach is evaluated using a bicycle sale domain. Several methods of image classification using various feature sets are described and evaluated as well. These trained approaches are then integrated in the proposed novel approach of extended extraction ontologies, which builds on top of the work of Embley [21] by exploiting manual, trained and formatting types of extraction evidence at the same time. The intended benefit of using extraction ontologies is a quick development of a functional IE prototype, its smooth transition to deployed IE application and the possibility to leverage the use of each of the three extraction evidence types. Also, since extraction ontologies are typically developed by adapting suitable domain ontologies and the ontology remains in center of the extraction process, the work related to the conversion of extracted results back to a domain ontology or schema is minimized. The described approach is evaluated using several distinct real-world datasets.
Extrakce informací z textu
Michalko, Boris ; Labský, Martin (advisor) ; Svátek, Vojtěch (referee) ; Nováček, Jan (referee)
Cieľom tejto práce je preskúmať dostupné systémy pre extrakciu informácií a možnosti ich použitia v projekte MedIEQ. Teoretickú časť obsahuje úvod do oblasti extrakcie informácií. Popisujem účel, potreby a použitie a vzťah k iným úlohám spracovania prirodzeného jazyka. Prechádzam históriou, nedávnym vývojom, meraním výkonnosti a jeho kritikou. Taktiež popisujem všeobecnú architektúru IE systému a základné úlohy, ktoré má riešiť, s dôrazom na extrakciu entít. V praktickej časti sa nacházda prehľad algoritmov používaných v systémoch pre extrakciu informácií. Opisujem oba typy algoritmov ? pravidlové aj štatistické. V ďalšej kapitole je zoznam a krátky popis existujúcich voľných systémov. Nakoniec robím vlastný experiment s dvomi systémami ? LingPipe a GATE na vybraných korpusoch. Meriam rôzne výkonnostné štatistiky. Taktiež som vytvoril malý slovník a regulárny výraz pre email aby som demonštroval taktiež pravidlá pre extrahovanie určitých špecifických informácií.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.