National Repository of Grey Literature 1,181 records found  beginprevious1169 - 1178next  jump to record: Search took 0.01 seconds. 
Extracting Structured Data from Czech Web Using Extraction Ontologies
Pouzar, Aleš ; Svátek, Vojtěch (advisor) ; Labský, Martin (referee)
The presented thesis deals with the task of automatic information extraction from HTML documents for two selected domains. Laptop offers are extracted from e-shops and free-published job offerings are extracted from company sites. The extraction process outputs structured data of high granularity grouped into data records, in which corresponding semantic label is assigned to each data item. The task was performed using the extraction system Ex, which combines two approaches: manually written rules and supervised machine learning algorithms. Due to the expert knowledge in the form of extraction rules the lack of training data could be overcome. The rules are independent of the specific formatting structure so that one extraction model could be used for heterogeneous set of documents. The achieved success of the extraction process in the case of laptop offers showed that extraction ontology describing one or a few product types could be combined with wrapper induction methods to automatically extract all product type offers on a web scale with minimum human effort.
Prediction of inpatient mortality for patients with myocardial infarction
Kratochvíl, Václav ; Kružík, H. ; Tůma, P. ; Vomlel, Jiří ; Somol, Petr
The topic of this paper is the standartization of inpatient mortality for patients with myocardial infarction based on discovered correlations between risk factors and the mortality.
Extrakce informací z webových stránek pomoci extrakčních ontologií
Labský, Martin ; Berka, Petr (advisor) ; Strossa, Petr (referee) ; Vojtáš, Peter (referee) ; Snášel, Václav (referee)
Automatic information extraction (IE) from various types of text became very popular during the last decade. Owing to information overload, there are many practical applications that can utilize semantically labelled data extracted from textual sources like the Internet, emails, intranet documents and even conventional sources like newspaper and magazines. Applications of IE exist in many areas of computer science: information retrieval systems, question answering or website quality assessment. This work focuses on developing IE methods and tools that are particularly suited to extraction from semi-structured documents such as web pages and to situations where available training data is limited. The main contribution of this thesis is the proposed approach of extended extraction ontologies. It attempts to combine extraction evidence from three distinct sources: (1) manually specified extraction knowledge, (2) existing training data and (3) formatting regularities that are often present in online documents. The underlying hypothesis is that using extraction evidence of all three types by the extraction algorithm can help improve its extraction accuracy and robustness. The motivation for this work has been the lack of described methods and tools that would exploit these extraction evidence types at the same time. This thesis first describes a statistically trained approach to IE based on Hidden Markov Models which integrates with a picture classification algorithm in order to extract product offers from the Internet, including textual items as well as images. This approach is evaluated using a bicycle sale domain. Several methods of image classification using various feature sets are described and evaluated as well. These trained approaches are then integrated in the proposed novel approach of extended extraction ontologies, which builds on top of the work of Embley [21] by exploiting manual, trained and formatting types of extraction evidence at the same time. The intended benefit of using extraction ontologies is a quick development of a functional IE prototype, its smooth transition to deployed IE application and the possibility to leverage the use of each of the three extraction evidence types. Also, since extraction ontologies are typically developed by adapting suitable domain ontologies and the ontology remains in center of the extraction process, the work related to the conversion of extracted results back to a domain ontology or schema is minimized. The described approach is evaluated using several distinct real-world datasets.
Fast Dependency-Aware Feature Selection in Very-High-Dimensional Pattern Recognition Problems
Somol, Petr ; Grim, Jiří
The paper addresses the problem of making dependency-aware feature selection feasible in pattern recognition problems of very high dimensionality. The idea of individually best ranking is generalized to evaluate the contextual quality of each feature in a series of randomly generated feature subsets. Each random subset is evaluated by a criterion function of arbitrary choice (permitting functions of high complexity). Eventually, the novel dependency-aware feature rank is computed, expressing the average benefit of including a feature into feature subsets. The method is efficient and generalizes well especially in very-high-dimensional problems, where traditional context-aware feature selection methods fail due to prohibitive computational complexity or to over-fitting. The method is shown well capable of over-performing the commonly applied individual ranking which ignores important contextual information contained in data.
Introduction to Feature Selection Toolbox 3 – The C++ Library for Subset Search, Data Modeling and Classification
Somol, Petr ; Vácha, Pavel ; Mikeš, Stanislav ; Hora, Jan ; Pudil, Pavel ; Žid, Pavel
We introduce a new standalone widely applicable software library for feature selection (also known as attribute or variable selection), capable of reducing problem dimensionality to maximize the accuracy of data models, performance of automatic decision rules as well as to reduce data acquisition cost. The library can be exploited by users in research as well as in industry. Less experienced users can experiment with different provided methods and their application to real-life problems, experts can implement their own criteria or search schemes taking advantage of the toolbox framework. In this paper we first provide a concise survey of a variety of existing feature selection approaches. Then we focus on a selected group of methods of good general performance as well as on tools surpassing the limits of existing libraries. We build a feature selection framework around them and design an object-based generic software library. We describe the key design points and properties of the library.
Special Issue on Hybrid Intelligent Systems 2007
Abraham, A. ; Húsek, Dušan ; Snášel, V.
Special Issue on Hybrid Intelligent Systems 2007. Neural Network World. Vol. 17, No. 6 (2007), p.505-688 The issue contains papers prepared specially for this issue by authors of some best evaluated papers presented on HIS'07) at Kaiserslautern, Germany, during September 17-19, 2007. The Current research interests in HIS and covered in this issue focus on integration of the different computing paradigms such as fuzzy logic, euro-computation, evolutionary computation, probabilistic computing, intelligent agents, machine learning, and other intelligent computing frameworks. There is also a growing interest in the role of sensors, their integration and evaluation in such frameworks. The phenomenal growth of hybrid intelligent systems and related topics has obliged.
Využití imsetů při učení bayesovských sítí
Vomlel, Jiří ; Studený, Milan
This paper describes a modification of the greedy equivalence search (GES) algorithm. The presented modification is based on the algebraic approach to learning. The states of the search space are standard imsets. Each standard imset represents an equivalence class of Bayesian networks. For a given quality criterion the database is represented by the respective data imset. This allows a very simple update of a given quality criterion since the moves between states are represented by differential imsets. We exploit a direct characterization of lower and upper inclusion neighborhood, which allows an efficient search for the best structure in the inclusion neighborhood. The algorithm was implemented in R and is freely available.
On the way to learning deterministic objects
Bůcha, Jindřich
The paper deals with learning knowledge about objects, i.e. entities of the real environment. This is an important topic, often neglected by machine learning. The whole experimental approach is implemented via the integration of several areas of AI, namely machine learning, knowledge base management, reasoning, especially Prolog-like and analogical. More specifically, the approach is based on further generalization of already learned (generalized) rules, and on analogy.

National Repository of Grey Literature : 1,181 records found   beginprevious1169 - 1178next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.