National Repository of Grey Literature 32 records found  1 - 10nextend  jump to record: Search took 0.02 seconds. 
Clustering objects with the MCluster-Miner procedure of the LISp-Miner system
Pelc, Tomáš ; Šimůnek, Milan (advisor) ; Šulc, Zdeněk (referee)
This bachelor thesis deals with clustering objects with the MCluster-Miner procedure of the LISp-Miner system. The first aim of this bachelor thesis is clustering objects with the mentioned pro-cedure and analyzing its possible usage on different datasets. To achieve this goal, the procedure was applied on six different datasets. The secong aim of this thesis is to analyze and compare implemented algorithms, similarity measures and to propose recommendations for clustering parameters. To achieve this goal, the available algorithms and similarity measures are compared based on achieved results (the quality of distribution objects into clusters, the time of clustering task, the number of attributes used for clustering). Based on these comparisons, the recommen-dations for clustering parameters are proposed. The benefits of this thesis are these recommenda-tions, comparisons of available algorithms and similarity measures, summary of actual state (da-ted to May 2017) of the MCluster-Miner module and showing the possibility of displaying results of clustering task at the interactive analysis of geodata. The theoretical part comprises the description of the LISp-Miner system, basic clustering principles, clustering methods and similari-ty measures used by the GUHA-procedure MCluster-Miner, and the MCluster-Miner module. In the practical part the MCluster-Miner procedure is being applied on six different datasets and the achieved results are summarized there.
Automation of a data mining process in the data about traffic accidents in the Czech Republic
Podavka, Jan ; Šimůnek, Milan (advisor) ; Urbaniec, Krzysztof (referee)
This master thesis deals with automation process of a data mining in the LISp-Miner program. The aim of this thesis is to create an automated process that analyzes analytical questions in the data about traffic accidents in the Czech Republic using a LMCL scripting language and LM Exec module. Theoretical part of thesis describes the process of knowledge discovery in databases and most widely used methodology. It also describes the relevant topics for the work with LISp-Miner. The practical part is focused on description of traffic accidents in the Czech Republic, a description of the used data, creation and evaluation of analytical questions and especially a description of created scripts. The output of the thesis is a group of scripts and manual how to use them again, so they can be reused for analysis of actual data on traffic accidents not only in the Czech Republic, if they have the same data structure.
Comparison of Approaches to Synthetic Data Generation
Šejvlová, Ludmila ; Šimůnek, Milan (advisor) ; Pavlíčková, Jarmila (referee)
The diploma thesis deals with synthetic data, selected approaches to their generation together with a practical task of data generation. The goal of the thesis is to describe the selected approaches to data generation, capture their key advantages and disadvantages and compare the individual approaches to each other. The practical part of the thesis describes generation of synthetic data for teaching knowledge discovery using databases. The thesis includes a basic description of synthetic data and thoroughly explains the process of their generation. The approaches selected for further examination are random data generation, the statistical approach, data generation languages and the ReverseMiner tool. The thesis also describes the practical usage of synthetic data and the suitability of each approach for certain purposes. Within this thesis, educational data Hotel SD were created using the ReverseMiner tool. The data contain relations discoverable with SD (set-difference) GUHA-procedures.
Míry podobnosti pro nominální data v hierarchickém shlukování
Šulc, Zdeněk ; Řezanková, Hana (advisor) ; Šimůnek, Milan (referee) ; Žambochová, Marta (referee)
This dissertation thesis deals with similarity measures for nominal data in hierarchical clustering, which can cope with variables with more than two categories, and which aspire to replace the simple matching approach standardly used in this area. These similarity measures take into account additional characteristics of a dataset, such as frequency distribution of categories or number of categories of a given variable. The thesis recognizes three main aims. The first one is an examination and clustering performance evaluation of selected similarity measures for nominal data in hierarchical clustering of objects and variables. To achieve this goal, four experiments dealing both with the object and variable clustering were performed. They examine the clustering quality of the examined similarity measures for nominal data in comparison with the commonly used similarity measures using a binary transformation, and moreover, with several alternative methods for nominal data clustering. The comparison and evaluation are performed on real and generated datasets. Outputs of these experiments lead to knowledge, which similarity measures can generally be used, which ones perform well in a particular situation, and which ones are not recommended to use for an object or variable clustering. The second aim is to propose a theory-based similarity measure, evaluate its properties, and compare it with the other examined similarity measures. Based on this aim, two novel similarity measures, Variable Entropy and Variable Mutability are proposed; especially, the former one performs very well in datasets with a lower number of variables. The third aim of this thesis is to provide a convenient software implementation based on the examined similarity measures for nominal data, which covers the whole clustering process from a computation of a proximity matrix to evaluation of resulting clusters. This goal was also achieved by creating the nomclust package for the software R, which covers this issue, and which is freely available.
Analysis of real data for Customer Services Department
Maximilián, Michal ; Šimůnek, Milan (advisor) ; Veselý, Jiří (referee)
The goal of this bachelor thesis is to find certain relationships by analyzing real CRM data. These relationships would then be used to specify a draft of content of companys new webside. The analysis will be completed through CF-Miner and KL-Miner procedures, which are procedures of LISp-Miner system, which is an academic system for Knowledge Discovery in Databases, based on the GUHA method. The whole analysis process is divided according to the phases of the CRISP-DM methodology. The contribution of this thesis is primarily to find unknown relationships and dependencies, which will be effectively used in real life, along with the introduction of methods and techniques used in the analysis, and last, but not least, the introduction of LISp-Miner system itself. The thesis is divided into a theoretical and empirical sections. In the first three chapters, I will explain what is meant by Knowledge Discovery in Databases and what techniques, methodologies and procedures are used during this process. Further, I will explain individual phases of KDD corresponding to the CRISP-DM methodology. Towards the end of the theoretical part, I will describe LISp-Miner system that has been used for this analysis. The empirical section is divided according to the CRISP-DM methodology, where I will first introduce the scope and the data that will be analyzed. In further steps, I will prepare the analyzed data and use them to solve analytical problems. At the end of the empirical part, I will interpret the results of individual analyses and suggest use in real life.
Options of presentation of KDD results on Web
Koválik, Tomáš ; Rauch, Jan (advisor) ; Šimůnek, Milan (referee)
This diploma thesis covers KDD analysis of data and options of presentation of KDD results on Web. The paper is divided into three main sections, which follow the whole process of this thesis. In the first section are mentioned theoretical basics needed for understanding of discussed problem. In this section are described notions data matrix and domain knowledge, concept of CRISP-DM methodology, GUHA method, system LISp-Miner and implementation of GUHA method in LISp-Miner including description of core procedures 4ft-Miner and CF-Miner. The second section is dedicated to the first goal of this paper. It briefly summarizes analysis made during pre-analysis phase. Then is described process of analysis of domain knowledge in a given data set. The third part focuses on the second goal of this thesis, which is problem of presentation of KDD results on Web. This section covers brief theoretical basis for used technologies. Then is described development of export script for automatic generation of website from results found using LISp-Miner system including description of structure of the output and recommendations for work in LISp-Miner system.
Analysis of the real data from the restaurant sector
Šimeček, Petr ; Rauch, Jan (advisor) ; Šimůnek, Milan (referee)
The aim of this thesis is to analyze the real data from the restaurant sector in the center of Prague, prove assumptions based on existing knowledge and explore hidden relations. The database management system MySQL was used for the initial transformation of the original data structure. The data after the transformation were converted into a form that it was possible to manipulate with it using the procedure LMDataSource of the system LISp-Miner. The analysis of association of relations were used for the procedure 4ft-Miner of the system LISp-Miner. The MySQL database system was used for the frequency analysis to obtain results, and Microsoft Word and Excel were used to interpret the results. Some of the assumptions in the research were found proven. Furthermore, an interesting combination of relations was discovered. The output of this work allows the owner of the data to use some of the data analysis results for the optimization of internal processes. In addition, this study points out other possible ways to analyze these data.
Utilization of System LISp-Miner in the Analysis of the Factors Influencing the Dominance of Cyanobacteria in Phytoplankton
Hlaváčová, Tereza ; Šimůnek, Milan (advisor) ; Potužák, Jan (referee)
The aim of this work is to describe steps associated with solving analytical questions using the LISp-Miner in the data from water-analyzes of 12 ponds in South Bohemia in the period from year 2007 to 2012. Analytical questions are primarily focused on issues of cyanobacteria, based on instructions of data-owner, Povodí Vltavy, státní podnik. Apart from a description of the application of procedures KL-Miner, CF-Miner and 4ft-Miner on data, the work aims to prepare an automating process based on steps made during using procedures. The theoretical part is a summary of the basic concepts and principles associated with association rules and GUHA method. The practical part follows the CRISP-DM methodology. The result is a proposal of automation process by which it is possible to look for interesting rules in the hydrobiological and hydrochemical data. Then there is a set of recommendations for better utilization of database for KDD, with proposals how to modify and prepare the data.
Analýza dát z oblasti kontroly kvality použitím systému LISp-Miner
Štefke, Martin ; Šimůnek, Milan (advisor) ; Srogoňová, Kristína (referee)
Objective of the bachelor thesis is analysis of occurrence of non-conforming products in SEWS Slovakia. There were analyzed production defects from the period January 2013 to October 2014, the analysis was perform from the database in the academic system LISp-Miner. In the initial theoretical part is a summary of the different approaches to the issue of knowledge discovery from databases.The following practical part is described the treatment and processing of data,define the basic analytic issues. At the end there are defined relevant relationship betweendata and analytical methods.
Combining OLAP and data mining for analysis on trainee dataset
Borokshinova, Anastasia ; Chudán, David (advisor) ; Šimůnek, Milan (referee)
The aim of this thesis is to show the possibility of combining two data analyses techniques OLAP and data mining in a certain area. The principal method of achieving the aim will be continuous comparison and check of acquired results using two techniques. A practise dataset on credits provided to physical persons is used for practical application. The data analysis will be performed using Power Pivot MS Excel complement and LISp-Miner system. For work with LISp-System the 4ft Miner procedure will specifically be used, which proceeds according to respected CRISP-DM data mining methodology. The thesis added value consists first of all in presentation of a possibility of OLAP and data mining linkage on the same dataset, thus reducing the number of erroneous conclusions which analysers might arrive at on the basis of one technique. Other benefits consist in presentation of relational data transfer to multi-dimensional structure and practical options of 4ft-Miner system LISp-Miner procedure usage.

National Repository of Grey Literature : 32 records found   1 - 10nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.