National Repository of Grey Literature 10 records found  Search took 0.00 seconds. 
Clustering objects with the MCluster-Miner procedure of the LISp-Miner system
Pelc, Tomáš ; Šimůnek, Milan (advisor) ; Šulc, Zdeněk (referee)
This bachelor thesis deals with clustering objects with the MCluster-Miner procedure of the LISp-Miner system. The first aim of this bachelor thesis is clustering objects with the mentioned pro-cedure and analyzing its possible usage on different datasets. To achieve this goal, the procedure was applied on six different datasets. The secong aim of this thesis is to analyze and compare implemented algorithms, similarity measures and to propose recommendations for clustering parameters. To achieve this goal, the available algorithms and similarity measures are compared based on achieved results (the quality of distribution objects into clusters, the time of clustering task, the number of attributes used for clustering). Based on these comparisons, the recommen-dations for clustering parameters are proposed. The benefits of this thesis are these recommenda-tions, comparisons of available algorithms and similarity measures, summary of actual state (da-ted to May 2017) of the MCluster-Miner module and showing the possibility of displaying results of clustering task at the interactive analysis of geodata. The theoretical part comprises the description of the LISp-Miner system, basic clustering principles, clustering methods and similari-ty measures used by the GUHA-procedure MCluster-Miner, and the MCluster-Miner module. In the practical part the MCluster-Miner procedure is being applied on six different datasets and the achieved results are summarized there.
Analysis of a questionnaire survey of graduates of the master degree at the University of Economics Prague
Barus, Miroslav ; Šulc, Zdeněk (advisor) ; Blatná, Dagmar (referee)
This bachelor thesis deals with the analysis of the data from the questionnaire survey of the graduates of the master degree at the University of Economics in Prague. The data was obtained from the Reflex 2010 research project under the auspices of the Education Policy Center of Charles University in Prague. The aim of the thesis is to compare the performance of graduates according to various indicators. The work is divided into two parts. The first deals with the theoretical definition and description of tests comparing the mean value and the relative frequency of the investigated features used in the analysis. The second part analyzes and compares the application of graduates broken down by faculty, gender and place of employment. The aim of the thesis is also to evaluate the impact of the use of invalid hypothesis tests.
Descriptive statistics in R with application on real data
Pirohová, Eva ; Bašta, Milan (advisor) ; Šulc, Zdeněk (referee)
The aim of this thesis is to introduce the statistical software R, its use in descriptive statistics and to explain the principle of entering commands and functions in R. In the theoretical part of the thesis the user will be given the basic information about descriptive statistics and the key principles of R software. An important part of the thesis is an illustration of functions and codes which are to be entered into a script or a command window. The practical part represents the way of applying descriptive statistics on real data by using functions and codes in R, including the corresponding types of graphs. There is always an explanation following the example. This thesis contains all the important information which those who start with basic statistical analysis and R software should know. Despite the fact that working with R may be rather complicated at the beginning, this thesis is written in such a way so that it can be read by a beginner in R and statistical analysis.
Statistical methods in stylometry
Dupal, Pavel ; Kaspříková, Nikola (advisor) ; Šulc, Zdeněk (referee)
The aim of this thesis is to provide an overview of some of the commonly used methods in the area of authorship attribution (stylometry). The text begins with a recap of history from the end of the 19th century to present time and the required terminology from the field of text mining is presented and explained. What follows is a list of selected methods from the field of multidimensional statistics (principal components analysis, cluster analysis) and machine learning (Support Vector Machines, Naive Bayes) and their application as pertains to stylometrical problems, including several methods created specifically for use in this field (bootstrap consensus tree, contrast analysis). Finally these same methods are applied to a practical problem of authorship verification based on a corpus bulit from the works of four internet writers.
Míry podobnosti pro nominální data v hierarchickém shlukování
Šulc, Zdeněk ; Řezanková, Hana (advisor) ; Šimůnek, Milan (referee) ; Žambochová, Marta (referee)
This dissertation thesis deals with similarity measures for nominal data in hierarchical clustering, which can cope with variables with more than two categories, and which aspire to replace the simple matching approach standardly used in this area. These similarity measures take into account additional characteristics of a dataset, such as frequency distribution of categories or number of categories of a given variable. The thesis recognizes three main aims. The first one is an examination and clustering performance evaluation of selected similarity measures for nominal data in hierarchical clustering of objects and variables. To achieve this goal, four experiments dealing both with the object and variable clustering were performed. They examine the clustering quality of the examined similarity measures for nominal data in comparison with the commonly used similarity measures using a binary transformation, and moreover, with several alternative methods for nominal data clustering. The comparison and evaluation are performed on real and generated datasets. Outputs of these experiments lead to knowledge, which similarity measures can generally be used, which ones perform well in a particular situation, and which ones are not recommended to use for an object or variable clustering. The second aim is to propose a theory-based similarity measure, evaluate its properties, and compare it with the other examined similarity measures. Based on this aim, two novel similarity measures, Variable Entropy and Variable Mutability are proposed; especially, the former one performs very well in datasets with a lower number of variables. The third aim of this thesis is to provide a convenient software implementation based on the examined similarity measures for nominal data, which covers the whole clustering process from a computation of a proximity matrix to evaluation of resulting clusters. This goal was also achieved by creating the nomclust package for the software R, which covers this issue, and which is freely available.
Cluster analysis as a tool for classification of objects
Budilová, Šárka ; Löster, Tomáš (advisor) ; Šulc, Zdeněk (referee)
Cluster analysis is a popular method of multivariate statistics. Based on mutual similarities between objects this method is able to classify and divide objects into several groups or clusters. The results of the clustering can be different by using different methods, measures of distance and procedures. The main aim of this thesis is to compare the results of several methods of cluster analysis with the known classification of classes from the original data file. In total, there are 15 data files, which were analyzed and each of them contained known information about the right allocation of objects in groups. The success of clustering of each method was calculated by comparing the known classification of classes and resulted clusters. In addition to the comparison of individual methods of cluster analysis was compared the impact of standardization and correlation to the success of each method. To reflect the distance betweeen the objects within each clusters, squared Euclidean distance was used. The results of this thesis point out that better success of clustering were achieved in the case of correlated variables in data file. The succes of clustering was higher about 2 percent points than in the case when correlated variables were deleted from data set. The methods divided 69,8 % objects before standardization and 70,8 % objects after standardization. The results also show a large importance of standardization in the case of Ward´s method. After standardization this method rank the most objects into correct classification classes and were more succesful, about nine percent points. In the case of correlated variables is the succes of the method 76,4 %. Standardization positively influences also centroid method and the method of farthest neighbour. Median method, nearest neighbour method and the method of average linkage achieve higher success of clustering in the case of original, nonstandardized variables (uneven variables).
Contingency table analysis from questionnaire survey data of drivers
Velacková, Barbora ; Šulc, Zdeněk (advisor) ; Pecáková, Iva (referee)
The bachelor thesis deals with the contingency table analysis from questionnaire survey data of drivers. The data were obtained from the agency Data Collect s.r.o., which conducted the survey in 2014. The aim of the thesis is to analyse the behaviour of drivers and their habits, which could increase the risk of accidents. The thesis is divided into two main parts; in the first one, methods of contingency table analysis are described; in the second one, the presented analyses are applied to the survey data. Firstly, the behaviour of single and young drivers is analysed, then the differences between men and women drivers. Calculations were made using the software SPSS and MS Excel, in which all the graphs and tables were made.
Analysis of the similarity of the human development index values between European states
Šafaříková, Kristýna ; Malá, Ivana (advisor) ; Šulc, Zdeněk (referee)
Main goal of this thesis is to analyze human development index for European countries and provide cluster analysis not only of human development index but even of another quality of life variables and to find similarities between particular countries by using hierarchical methods. The first part focuses on quality of life and definition of human development index. Human development index is one possibility how to measure quality of life, there are mentioned another possibilities, though how to analyze it. The second part of the thesis focuses on cluster analysis definition, which is used for searching for similarities between particular countries. Five hierarchical cluster methods is used for classify countries into clusters. Euclidean metric is used for express the distance between countries. Similar variables between countries is judged according to sorting into clusters by hierarchical methods. Diploma thesis enlightens similarity between European countries from quality of life overview and provides statistical evidence about this topic. Results of the thesis confirms similarities between geographical close states.
Methods of analysing multivariate contingency tables
Šulc, Zdeněk ; Pecáková, Iva (advisor) ; Coufalová, Petra (referee)
This thesis occupies with a relationship of two significant methods of analyzing multivariate contingency tables, namely correspondence analysis and loglinear models. The thesis is divided into three parts. The first one is dedicated to basic terms of categorical data analysis, mainly to contingency tables and their distributions. Primarily, the emphasis is placed on their multidimensional form. The second part presents tools and techniques of both methods in a scope needed for their practical use and interpretation of their results. A practical application of both methods is included in the third part which is presented on the data from a market research. This part describes settings for both analyses in a statistical software SPSS and the subsequent interpretation of their outputs. A comparison of analyzed methods in terms of their use can be found in the conclusion.
Analysis of students' results of the subject 4ST201
Šulc, Zdeněk ; Malá, Ivana (advisor) ; Helman, Karel (referee)
The main aim of this bachelor thesis is to compare and to test the results of students, who attended the course of statistics at the University of Economics in Prague. The thesis is divided into theoretical and practical part. The theoretical part explains basic methods which are used in the thesis. The practical part deals with an analysis which is mainly focused on mid-term tests. The results of this thesis are summarized in the conclusion.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.