National Repository of Grey Literature 37 records found  previous11 - 20nextend  jump to record: Search took 0.01 seconds. 
Hodnocení Výsledků Fuzzy Shlukování
Říhová, Elena ; Pecáková, Iva (advisor) ; Řezanková, Hana (referee) ; Žambochová, Marta (referee)
Cluster analysis is a multivariate statistical classification method, implying different methods and procedures. Clustering methods can be divided into hard and fuzzy; the latter one provides a more precise picture of the information by clustering objects than hard clustering. But in practice, the optimal number of clusters is not known a priori, and therefore it is necessary to determine the optimal number of clusters. To solve this problem, the validity indices help us. However, there are many different validity indices to choose from. One of the goals of this work is to create a structured overview of existing validity indices and techniques for evaluating fuzzy clustering results in order to find the optimal number of clusters. The main aim was to propose a new index for evaluating the fuzzy clustering results, especially in cases with a large number of clusters (defined as more than five). The newly designed coefficient is based on the degrees of membership and on the distance (Euclidean distance) between the objects, i.e. based on principles from both fuzzy and hard clustering. The suitability of selected validity indices was applied on real and generated data sets with known optimal number of clusters a priory. These data sets have different sizes, different numbers of variables, and different numbers of clusters. The aim of the current work is regarded as fulfilled. A key contribution of this work was a new coefficient (E), which is appropriate for evaluating situations with both large and small numbers of clusters. Because the new validity index is based on the principles of both fuzzy clustering and hard clustering, it is able to correctly determine the optimal number of clusters on both small and large data sets. A second contribution of this research was a structured overview of existing validity indices and techniques for evaluating the fuzzy clustering results.
Building credit scoring models using selected statistical methods in R
Jánoš, Andrej ; Bašta, Milan (advisor) ; Pecáková, Iva (referee)
Credit scoring is important and rapidly developing discipline. The aim of this thesis is to describe basic methods used for building and interpretation of the credit scoring models with an example of application of these methods for designing such models using statistical software R. This thesis is organized into five chapters. In chapter one, the term of credit scoring is explained with main examples of its application and motivation for studying this topic. In the next chapters, three in financial practice most often used methods for building credit scoring models are introduced. In chapter two, the most developed one, logistic regression is discussed. The main emphasis is put on the logistic regression model, which is characterized from a mathematical point of view and also various ways to assess the quality of the model are presented. The other two methods presented in this thesis are decision trees and Random forests, these methods are covered by chapters three and four. An important part of this thesis is a detailed application of the described models to a specific data set Default using the R program. The final fifth chapter is a practical demonstration of building credit scoring models, their diagnostics and subsequent evaluation of their applicability in practice using R. The appendices include used R code and also functions developed for testing of the final model and code used through the thesis. The key aspect of the work is to provide enough theoretical knowledge and practical skills for a reader to fully understand the mentioned models and to be able to apply them in practice.
Clustering and regression analysis of micro panel data
Sobíšek, Lukáš ; Pecáková, Iva (advisor) ; Komárek, Arnošt (referee) ; Brabec, Marek (referee)
The main purpose of panel studies is to analyze changes in values of studied variables over time. In micro panel research, a large number of elements are periodically observed within the relatively short time period of just a few years. Moreover, the number of repeated measurements is small. This dissertation deals with contemporary approaches to the regression and the clustering analysis of micro panel data. One of the approaches to the micro panel analysis is to use multivariate statistical models originally designed for crosssectional data and modify them in order to take into account the within-subject correlation. The thesis summarizes available tools for the regression analysis of micro panel data. The known and currently used linear mixed effects models for a normally distributed dependent variable are recapitulated. Besides that, new approaches for analysis of a response variable with other than normal distribution are presented. These approaches include the generalized marginal linear model, the generalized linear mixed effects model and the Bayesian modelling approach. In addition to describing the aforementioned models, the paper also includes a brief overview of their implementation in the R software. The difficulty with the regression models adjusted for micro panel data is the ambiguity of their parameters estimation. This thesis proposes a way to improve the estimations through the cluster analysis. For this reason, the thesis also contains a description of methods of the cluster analysis of micro panel data. Because supply of the methods is limited, the main goal of this paper is to devise its own two-step approach for clustering micro panel data. In the first step, the panel data are transformed into a static form using a set of proposed characteristics of dynamics. These characteristics represent different features of time course of the observed variables. In the second step, the elements are clustered by conventional spatial clustering techniques (agglomerative clustering and the C-means partitioning). The clustering is based on a dissimilarity matrix of the values of clustering variables calculated in the first step. Another goal of this paper is to find out whether the suggested procedure leads to an improvement in quality of the regression models for this type of data. By means of a simulation study, the procedure drafted herein is compared to the procedure applied in the kml package of the R software, as well as to the clustering characteristics proposed by Urso (2004). The simulation study demonstrated better results of the proposed combination of clustering variables as compared to the other combinations currently used. A corresponding script written in the R-language represents another benefit of this paper. It is available on the attached CD and it can be used for analyses of readers own micro panel data.
Contingency table analysis from questionnaire survey data of drivers
Velacková, Barbora ; Šulc, Zdeněk (advisor) ; Pecáková, Iva (referee)
The bachelor thesis deals with the contingency table analysis from questionnaire survey data of drivers. The data were obtained from the agency Data Collect s.r.o., which conducted the survey in 2014. The aim of the thesis is to analyse the behaviour of drivers and their habits, which could increase the risk of accidents. The thesis is divided into two main parts; in the first one, methods of contingency table analysis are described; in the second one, the presented analyses are applied to the survey data. Firstly, the behaviour of single and young drivers is analysed, then the differences between men and women drivers. Calculations were made using the software SPSS and MS Excel, in which all the graphs and tables were made.
Assessment of education quality as factor of employability on a job market of University od Economics in Prague graduates
Bezděk, Jaroslav ; Fischer, Jakub (advisor) ; Pecáková, Iva (referee)
This bachelor thesis focuses on research of university education assessment from the University of Economics in Prague, with accent on Faculty of Informatics and Statistics. However, the main aim of the bachelor thesis is to analyse, if any item of university education evaluation can be considered to become an employability factor for graduates on a job market, even five years after their graduation. Analysis was made over the data file from REFLEX 2013 survey via questionnaire. Quality of university education was researched with the aid of frequency tables and employability factors were researched on calculation of dependence rates for nominal and ordinal variables. Benefit of the bachelor thesis is firstly in option to use results for making a level of education at the University of Economics in Prague better. For example, there was found out, that nowadays students at Faculty of Informatics and Statistics do not have so good communication skills in foreign language to be acceptable in their present job.
Selection Bias Reduction in Credit Scoring Models
Ditrich, Josef ; Hebák, Petr (advisor) ; Pecáková, Iva (referee) ; Zamrazilová, Eva (referee)
Nowadays, the use of credit scoring models in the financial sector is a common practice. Credit scoring plays an important role in profitability and transparency of lending business. Given the high credit volumes, even a small improvement of discriminatory and predictive power of a credit scoring model may provide a substantial additional profit. Scoring models are applied on the through-the-door population, however, for creating them or adjusting already existing credit rules, it is usual to use only the data corresponding to accepted applicants for which payment discipline can be observed. This discrepancy can lead to reject bias (or selection bias in general). Methods trying to eliminate or reduce this phenomenon are known by the term reject inference. In general, these methods try to assess the behavior of rejected applicants or to obtain an additional information about them. In the dissertation thesis, I dealt with the enlargement method which is based on a random acceptance of applicants that would have been rejected. This method is not only time consuming but also expensive. Therefore I looked for the ways how to reduce the cost of acquiring additional information about rejected applicants. As a result, I have proposed a modification which I called the enlargement method with sorting variable. It was validated on real bank database with two possible sorting variables and the results were compared with the original version of the method. It was shown that both tested approaches can reduce its cost while retaining the accuracy of the scoring models.
The analysis of dependence of the material deprivation of the households in the Czech Republic on the selected indicators
Cafourková, Magdalena ; Řezanková, Hana (advisor) ; Pecáková, Iva (referee)
The aim of this thesis is to analyse the material deprivation of the households with regard to the selected indicators, i.e. the costs that the household spends on housing, a region where the household is located, the number of the members and the dependent children in the household, age and sex of a head of the household, and economic activity and education level of the members of the household. The thesis aims not only to prove the dependence among the selected indicators but also to quantify this dependence by using the odds ratio. The individual effect of all variables was proven except of the one related to the number of the dependent children. It was also demonstrated that the factors constituting a threat for the households by a material deprivation rate vary by the different age groups. However, it can be concluded that across all the age groups, the material deprivation rate is determined by the sex of a head of the household, education level of the members of the household, and the costs that the household spends on housing.
Methods of analysing multivariate contingency tables
Šulc, Zdeněk ; Pecáková, Iva (advisor) ; Coufalová, Petra (referee)
This thesis occupies with a relationship of two significant methods of analyzing multivariate contingency tables, namely correspondence analysis and loglinear models. The thesis is divided into three parts. The first one is dedicated to basic terms of categorical data analysis, mainly to contingency tables and their distributions. Primarily, the emphasis is placed on their multidimensional form. The second part presents tools and techniques of both methods in a scope needed for their practical use and interpretation of their results. A practical application of both methods is included in the third part which is presented on the data from a market research. This part describes settings for both analyses in a statistical software SPSS and the subsequent interpretation of their outputs. A comparison of analyzed methods in terms of their use can be found in the conclusion.
Statistical Analysis of Temperature and Precipitation Time Series in the Czech Republic in Period 1961-2008
Helman, Karel ; Pecáková, Iva (advisor) ; Čermák, Václav (referee) ; Žák, Michal (referee)
The present dissertation deals with an analysis of monthly time series of average temperatures and precipitation sums recorded at 44 sites in the Czech Republic over the period of 1961--2008. The main research purpose is to acquire deeper knowledge of regularities in the climatic time series development, using an appropriate set of statistical methods. A secondary objective is to search and find correlations between the research outcomes and basic geographic coordinates (altitude, longitude and latitude) of particular measurement stations and comparing all the results achieved for the selected climatic elements. There are two major contributions of this work. In the first place, it presents new knowledge in the field of climatic time series, particularly in connection with the strength and development of their seasonal component, further for instance analysing the relation between the distribution of a residual component and the geographic coordinates of the measurement stations. Another contribution lies in an extensive application of statistical methods of climatic time series analysis. Several types of methods were used, having employed both widely and rarely applied statistical tools (linear trends analysis and Box-Jenkins methodology respectively) as well as those used for the very first time (moving-seasonal time series).
Visualization of Multivariate Statistical Data
Maroušek, Vít ; Pecáková, Iva (advisor) ; Černý, Jindřich (referee)
The thesis deals with the possibilities of visualization of multivariate statistical data. Since this is a very broad area the thesis is divided into four sections, two of which are theoretically and two practically oriented. The first section is devoted to theoretical aspects of data visualization. It contains information about the building blocks of graphs, and how the brain processes graphs in various stages of perception. The second section charts the available chart types that can be used to display data. Selected types of graphs for continuous and discontinuous multidimensional data are described in detail. The third section focuses on available software tools for creating graphs. The section describes several programs, with focus on STATISTICA, R and MS Excel. The knowledge gained in previous chapters was sufficient source of information to perform a graphical analysis of multidimensional continuous and discrete data and using advanced analytical methods in the last section. This analysis is performed separately on the data file with continuous variables and on a data file with discontinuous (categorical) variables.

National Repository of Grey Literature : 37 records found   previous11 - 20nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.