National Repository of Grey Literature 5 records found  Search took 0.00 seconds. 
Knowledge Data Discovery
Jirmásek, Tomáš ; Chmelař, Petr (referee) ; Jurka, Pavel (advisor)
This bachelor's thesis deals with knowledge discovery in databases and is focused on Bayesian classification. The main goal of this thesis was to implement one of the methods of data mining and to verify its functionality on chosen data set. The application is implemented in programming language Java. MySQL database was chosen as a data storage for data set prepared to extract patterns from it. Information needed to start data mining task are gained from input DMSL document. The results of data mining are also stored into output DMSL document. The DMSL language had to be extended because of implemented method, Bayesian classification.
Mapping of PMML and BKEF documents using PHP in the SEWEBAR CMS
Vojíř, Stanislav ; Kliegr, Tomáš (advisor) ; Zamazal, Ondřej (referee)
In the data mining process, it is necessary to prepare the source dataset - for example, to select the cutting or grouping of continuous data attributes etc. and use the knowledge from the problem area. Such a preparation process can be guided by background (domain) knowledge obtained from experts. In the SEWEBAR project, we collect the knowledge from experts in a rich XML-based representation language, called BKEF, using a dedicated editor, and save into the database of our custom-tailored (Joomla!-based) CMS system. Data mining tools are then able to generate, from this dataset, mining models represented in the standardized PMML format. It is then necessary to map a particular column (attribute) from the dataset (in PMML) to a relevant 'metaattribute' of the BKEF representation. This specific type of schema mapping problem is addressed in my thesis in terms of algorithms for automatic suggestion of mapping of columns to metaattributes and from values of these columns to BKEF 'metafields'. Manual corrections of this mapping by the user are also supported. The implementation is based on the PHP language and then it was tested on datasets with information about courses taught in 5 universities in the U.S.A. from Illinois Semantic Integration Archive. On this datasets, the auto-mapping suggestion process archieved the precision about 70% and recall about 77% on unknown columns, but when mapping the previously user-mapped data (using implemented learning module), the recall is between 90% and 100%.
Transformation of PMML files into Topic Maps
Ovečka, Marek ; Kliegr, Tomáš (advisor) ; Hazucha, Andrej (referee)
The goal of my bachelor thesis is to learn more about the possibilities of transformation of XML files into XML TopicMaps. Then I select one method and use it for implementation of transformation of PMML files, which contains model of association rules into TopicMap containing Association rule Ontology (AROn). For transformation, two different methods were selected. First is transformation of input XML file with XSLT and examples of how this can be done are shown. Second way is to use a programming engine, which enables to process input file and which implements either its own or standardized API for working with TopicMaps. For illustration of creating TopicMaps I have chosen TMAPI 2.0. Examples show basic operations such as creating new topic, creating association between two topics and creating occurrence. This paper consists of three main chapters. First is about the structure of input PMML document and its parts. Second chapter is about TopicMaps, the principle of this technology and possibilities of working with it are briefly shown. In the third chapter I comment all parts of the program which perform the transformation. In this chapter I also describe and explain the mapping between parts of the input XML file and the components of AROn ontology. The final program should serve as a component of the Sewebar project. For its complete functionality it needs to customize the transformation for the newest structure of the input document and customize the program itself for integration with other components of the project.
Utilization of XML databases for retrieval of data-mining specifications
Marek, Tomáš ; Kliegr, Tomáš (advisor) ; Kosek, Jiří (referee)
The aim of this work is to create a querying system in analytical reports stored as PMML documents. These PMML documents are stored in native XML database, because these documents are structured as XML documents. Selected XML database is available for free and its resources and means meet the proposed solution. Also searching algorithm is created to search these documents by means of XQuery language. Inasmuch as searched data have the character of the XML data the use of language for querying XML data suggests. In terms of the use of the XQuery language structure of PMML document was explored and data links in these documents was used to ensure proper search results. Results of the search are association rules from these analytical reports stored in PMML documents, requests of the search are attributes to be in the rules, their values and other limits of the search. So that the whole system is complete and could be fully used, it is necessary to create a communication environment through which the work with stored data is performed. For this purpose, Java and REST(ful) architecture for creating applications are used.
Indexing and searching XML documents with Lucene
Beránek, Lukáš ; Kliegr, Tomáš (advisor) ; Pinkas, Otakar (referee)
The creation of analytical report is a process in which we try to obtain and preserve the results of data mining tasks for further usage. Next step after the creation is to transform them into user friendly and accessible way that can be easily access for example as an online HTML document in the SEWEBAR project. The increasing number of resulting documents is the main reason of the need to possess means to search on structured date like XML documents that correspond with the PMML standard in which the reports are currently being saved. The main goal will be in stating available means for indexing and full text searching of XML documents targeted upon searching association rules that can be found in output documents produced by programs LISp-Miner or Ferda. After the initial analysis and assessment of the current state an extension for CMS Joomla! will be created in order to satisfy the need for indexing and searching indexed data. As source files for created Jucene extension we use analytic reports saved in the database of the Joomla content management system stored in PMML format. Stored PMML document will be simplified, optimized and transformed by means of an XSL transformation for better indexing possibilities in requested structure and with maintaining logical order of the document data mining task. Transformed document will then be inserted into the Zend Lucene document index. To achieve this in PHP environment the DOMDocument library will be used. Created workflow will supply user interface for work with indexed rules. Also it will provide the users with means for searching association rules based on user specified queries which can be processed by Zend Search Lucene framework. When rules that correspond to the user query are found the system will score the results and display them to the user. One of the goals is not only to create the Jucene component but also to give its users step-by-step guidance either they are the site administrators or ordinary visitors.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.