National Repository of Grey Literature 37 records found  beginprevious28 - 37  jump to record: Search took 0.01 seconds. 
Robust Regularized Discriminant Analysis Based on Implicit Weighting
Kalina, Jan ; Hlinka, Jaroslav
In bioinformatics, regularized linear discriminant analysis is commonly used as a tool for supervised classification problems tailormade for high-dimensional data with the number of variables exceeding the number of observations. However, its various available versions are too vulnerable to the presence of outlying measurements in the data. In this paper, we exploit principles of robust statistics to propose new versions of regularized linear discriminant analysis suitable for highdimensional data contaminated by (more or less) severe outliers. The work exploits a regularized version of the minimum weighted covariance determinant estimator, which is one of highly robust estimators of multivariate location and scatter. The performance of the novel classification methods is illustrated on real data sets with a detailed analysis of data from brain activity research.
Fulltext: content.csg - Download fulltextPDF
Plný tet: v1241-16 - Download fulltextPDF
Diagnostics for Robust Regression: Linear Versus Nonlinear Model
Kalina, Jan
Robust statistical methods represent important tools for estimating parameters in linear as well as nonlinear econometric models. In contrary to the least squares, they do not suffer from vulnerability to the presence of outlying measurements in the data. Nevertheless, they need to be accompanied by diagnostic tools for verifying their assumptions. In this paper, we propose the asymptotic Goldfeld-Quandt test for the regression median. It allows to formulate a natural procedure for models with heteroscedastic disturbances, which is again based on the regression median. Further, we pay attention to nonlinear regression model. We focus on the nonlinear least weighted squares estimator, which is one of recently proposed robust estimators of parameters in a nonlinear regression. We study residuals of the estimator and use a numerical simulation to reveal that they can be severely heteroscedastic also for data generated from a model with homoscedastic disturbances. Thus, we give a warning that standard residuals of the robust nonlinear estimator may produce misleading results if used for the standard diagnostic tools
On Exact Heteroscedasticity Testing for Robust Regression
Kalina, Jan ; Peštová, Barbora
The paper is devoted to the least weighted squares estimator, which is one of highly robust estimators for the linear regression model. Novel permutation tests of heteroscedasticity are proposed. Also the asymptotic behavior of the permutation test statistics of the Goldfeld-Quandt and Breusch-Pagan tests is investigated. A numerical experiment on real economic data is presented, which also shows how to perform a robust prediction model under heteroscedasticity.
Detection of Unusual Events in Temporal Data
Černík, Tomáš ; Bartík, Vladimír (referee) ; Zendulka, Jaroslav (advisor)
Bachelor thesis deals with detection of unusual events (anomalies) in available temporal data. Theoretical part describes existing techniques and algorithms used to detect outliers. There are also introduced meteorological data that are after that used for experimental verification of implemented detection algorithms. Second part, practical one, describes design and implementation of application and algorithms. Algorithms are also tested in search for point, contextual and collective anomalies.
Data Mining Module of a Data Mining System on NetBeans Platform
Výtvar, Jaromír ; Křivka, Zbyněk (referee) ; Zendulka, Jaroslav (advisor)
The aim of this work is to get basic overview about the process of obtaining knowledge from databases - datamining and to analyze the datamining system developed at FIT BUT on the NetBeans platform in order to create a new mining module. We decided to implement a module for mining outliers and to extend existing regression module with multiple linear regression using generalized linear models. New methods using existing methods of Oracle Data Mining.
The Introduction and Application of General Regression Model
Hrabec, Pavel ; Štarha, Pavel (referee) ; Bednář, Josef (advisor)
This thesis sumarizes in detail general linear regression model, including testing statistics for coefficients, submodels, predictions and mostly tests of outliers and large leverage points. It describes how to include categorial variables into regression model. This model was applied to describe saturation of photographs of bread, where input variables were, type of flour, type of addition and concntration of flour. After identification of outliers it was possible to create mathematical model with high coefficient of determination, which will be usefull for experts in food industry for preliminar identification of possible composition of bread.
Some Robust Estimation Tools for Multivariate Models
Kalina, Jan
Standard procedures of multivariate statistics and data mining for the analysis of multivariate data are known to be vulnerable to the presence of outlying and/or highly influential observations. This paper has the aim to propose and investigate specific approaches for two situations. First, we consider clustering of categorical data. While attention has been paid to sensitivity of standard statistical and data mining methods for categorical data only recently, we aim at modifying standard distance measures between clusters of such data. This allows us to propose a hierarchical agglomerative cluster analysis for two-way contingency tables with a large number of categories, based on a regularized measure of distance between two contingency tables. Such proposal improves the robustness to the presence of measurement errors for categorical data. As a second problem, we investigate the nonlinear version of the least weighted squares regression for data with a continuous response. Our aim is to propose an efficient algorithm for the least weighted squares estimator, which is formulated in a general way applicable to both linear and nonlinear regression. Our numerical study reveals the computational aspects of the algorithm and brings arguments in favor of its credibility.
Robustness Aspects of Knowledge Discovery
Kalina, Jan
The sensitivity of common knowledge discovery methods to the presence of outlying measurements in the observed data is discussed as their major drawback. Our work is devoted to robust methods for information extraction from data. First, we discuss neural networks for function approximation and their sensitivity to the presence of noise and outlying measurements in the data. We propose to fit neural networks in a robust way by means of a robust nonlinear regression. Secondly, we consider information extraction from categorical data, which commonly suffers from measurement errors. To improve its robustness properties, we propose a regularized version of the common test statistics, which may find applications e.g. in pattern discovery from categorical data.
Cluster analysis of large data sets: new procedures based on the method k-means
Žambochová, Marta ; Řezanková, Hana (advisor) ; Húsek, Dušan (referee) ; Antoch, Jaromír (referee)
Abstract Cluster analysis has become one of the main tools used in extracting knowledge from data, which is known as data mining. In this area of data analysis, data of large dimensions are often processed, both in the number of objects and in the number of variables, which characterize the objects. Many methods for data clustering have been developed. One of the most widely used is a k-means method, which is suitable for clustering data sets containing large number of objects. It is based on finding the best clustering in relation to the initial distribution of objects into clusters and subsequent step-by-step redistribution of objects belonging to the clusters by the optimization function. The aim of this Ph.D. thesis was a comparison of selected variants of existing k-means methods, detailed characterization of their positive and negative characte- ristics, new alternatives of this method and experimental comparisons with existing approaches. These objectives were met. I focused on modifications of the k-means method for clustering of large number of objects in my work, specifically on the algorithms BIRCH k-means, filtering, k-means++ and two-phases. I watched the time complexity of algorithms, the effect of initialization distribution and outliers, the validity of the resulting clusters. Two real data files and some generated data sets were used. The common and different features of method, which are under investigation, are summarized at the end of the work. The main aim and benefit of the work is to devise my modifications, solving the bottlenecks of the basic procedure and of the existing variants, their programming and verification. Some modifications brought accelerate the processing. The application of the main ideas of algorithm k-means++ brought to other variants of k-means method better results of clustering. The most significant of the proposed changes is a modification of the filtering algorithm, which brings an entirely new feature of the algorithm, which is the detection of outliers. The accompanying CD is enclosed. It includes the source code of programs written in MATLAB development environment. Programs were created specifically for the purpose of this work and are intended for experimental use. The CD also contains the data files used for various experiments.
Metody identifikace počtu shluků a odlehlých hodnot implementované v profesionálních statistických programových systémech
Řezanková, H. ; Húsek, Dušan
The paper deals with possibilities how to determine the optimal number of groups of objects and find outlying objects when objects are clustered by different methods implemented in commercial statistical software packages. In the example, the aim is finding groups of similar binary variables. The methods as cluster analyses (hierarchical, k-medoids, fuzzy, two-step), multidimensional scaling, factor analysis and Boolean factor analysis are used.

National Repository of Grey Literature : 37 records found   beginprevious28 - 37  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.