National Repository of Grey Literature 6 records found  Search took 0.00 seconds. 
Cluster Analysis Module of a Data Mining System
Riedl, Pavel ; Burgetová, Ivana (referee) ; Zendulka, Jaroslav (advisor)
This master's thesis deals with development of a module for a data mining system, which is being developed on FIT. The first part describes the general knowledge discovery process and cluster analysis including cluster validation; it also describes Oracle Data Mining including algorithms, which it uses for clustering. At the end it deals with the system and the technologies it uses, such as NetBeans Platform and DMSL. The second part describes design of a clustering module and a module used to compare its results. It also deals with visualization of cluster analysis results and shows the achievements.
Clustering objects with the MCluster-Miner procedure of the LISp-Miner system
Pelc, Tomáš ; Šimůnek, Milan (advisor) ; Šulc, Zdeněk (referee)
This bachelor thesis deals with clustering objects with the MCluster-Miner procedure of the LISp-Miner system. The first aim of this bachelor thesis is clustering objects with the mentioned pro-cedure and analyzing its possible usage on different datasets. To achieve this goal, the procedure was applied on six different datasets. The secong aim of this thesis is to analyze and compare implemented algorithms, similarity measures and to propose recommendations for clustering parameters. To achieve this goal, the available algorithms and similarity measures are compared based on achieved results (the quality of distribution objects into clusters, the time of clustering task, the number of attributes used for clustering). Based on these comparisons, the recommen-dations for clustering parameters are proposed. The benefits of this thesis are these recommenda-tions, comparisons of available algorithms and similarity measures, summary of actual state (da-ted to May 2017) of the MCluster-Miner module and showing the possibility of displaying results of clustering task at the interactive analysis of geodata. The theoretical part comprises the description of the LISp-Miner system, basic clustering principles, clustering methods and similari-ty measures used by the GUHA-procedure MCluster-Miner, and the MCluster-Miner module. In the practical part the MCluster-Miner procedure is being applied on six different datasets and the achieved results are summarized there.
Míry podobnosti pro nominální data v hierarchickém shlukování
Šulc, Zdeněk ; Řezanková, Hana (advisor) ; Šimůnek, Milan (referee) ; Žambochová, Marta (referee)
This dissertation thesis deals with similarity measures for nominal data in hierarchical clustering, which can cope with variables with more than two categories, and which aspire to replace the simple matching approach standardly used in this area. These similarity measures take into account additional characteristics of a dataset, such as frequency distribution of categories or number of categories of a given variable. The thesis recognizes three main aims. The first one is an examination and clustering performance evaluation of selected similarity measures for nominal data in hierarchical clustering of objects and variables. To achieve this goal, four experiments dealing both with the object and variable clustering were performed. They examine the clustering quality of the examined similarity measures for nominal data in comparison with the commonly used similarity measures using a binary transformation, and moreover, with several alternative methods for nominal data clustering. The comparison and evaluation are performed on real and generated datasets. Outputs of these experiments lead to knowledge, which similarity measures can generally be used, which ones perform well in a particular situation, and which ones are not recommended to use for an object or variable clustering. The second aim is to propose a theory-based similarity measure, evaluate its properties, and compare it with the other examined similarity measures. Based on this aim, two novel similarity measures, Variable Entropy and Variable Mutability are proposed; especially, the former one performs very well in datasets with a lower number of variables. The third aim of this thesis is to provide a convenient software implementation based on the examined similarity measures for nominal data, which covers the whole clustering process from a computation of a proximity matrix to evaluation of resulting clusters. This goal was also achieved by creating the nomclust package for the software R, which covers this issue, and which is freely available.
Cluster Analysis Module of a Data Mining System
Riedl, Pavel ; Burgetová, Ivana (referee) ; Zendulka, Jaroslav (advisor)
This master's thesis deals with development of a module for a data mining system, which is being developed on FIT. The first part describes the general knowledge discovery process and cluster analysis including cluster validation; it also describes Oracle Data Mining including algorithms, which it uses for clustering. At the end it deals with the system and the technologies it uses, such as NetBeans Platform and DMSL. The second part describes design of a clustering module and a module used to compare its results. It also deals with visualization of cluster analysis results and shows the achievements.
Classical and recent approaches in cluster analysis
Řezanková, Hana
The paper focuses on the development of selected approaches in cluster analysis. There are recently proposed similarity measures for objects characterized by nominal variables, development of algorithms for k-clustering and development of methods for clustering large data files and categorical data. As concerns algorithms for k-clustering, attention is paid to take into account the uncertainty in classifying objects into clusters, namely FCM (fuzzy k-means), PCM, FPCM, RCM, RFCM and RFPCM algorithms. For large data files, algorithms CURE, ROCK, CLARA, CLARANS and BIRCH are included, for categorical data clustering there are COOLCAT and ROCK algorithms. Two-step cluster analysis to cluster large data sets with variables of different types is mentioned.
Evaluation of Cluster Analysis Methods
Löster, Tomáš ; Řezanková, Hana (advisor) ; Berka, Petr (referee) ; Dohnal, Gejza (referee)
Cluster analysis includes a range of methods and practices that are used primarily for classification of objects. It takes an important role in many areas. Since the resulting distribution of objects into clusters may vary depending on the selected methods and specifications, it is appropriate to assess the results obtained. This paper proposes new ways of evaluating these results in a situation where objects are characterized by qualitative variables or by variables of different types. These coefficients can be used either to compare different methods (in terms of better outcomes) or for finding of the optimal number of clusters. All of them are based on the detection of variability which is also used for measuring of dissimilarity of objects and clusters. The newly proposed evaluation methods are applied to real data sets (of different sizes, with different number of variables, including variables of different types) and the behavior of these coefficients in different conditions is being examined. These data sets have known as well as unknown classification of objects into clusters. The best coefficient for evaluating clustering results with different types of variables can be considered, based on the analysis carried out, the modified coefficient of CHF. Local maximum value according to which the results of the clustering are evaluated, almost always exists. The analysis has proven that in most cases this value meets the expected results of the well-known classification of objects into clusters. The existence of local extremes of the other coefficients depends on specific data sets and is not always feasible.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.