National Repository of Grey Literature 108 records found  beginprevious89 - 98next  jump to record: Search took 0.00 seconds. 
Some Robust Estimation Tools for Multivariate Models
Kalina, Jan
Standard procedures of multivariate statistics and data mining for the analysis of multivariate data are known to be vulnerable to the presence of outlying and/or highly influential observations. This paper has the aim to propose and investigate specific approaches for two situations. First, we consider clustering of categorical data. While attention has been paid to sensitivity of standard statistical and data mining methods for categorical data only recently, we aim at modifying standard distance measures between clusters of such data. This allows us to propose a hierarchical agglomerative cluster analysis for two-way contingency tables with a large number of categories, based on a regularized measure of distance between two contingency tables. Such proposal improves the robustness to the presence of measurement errors for categorical data. As a second problem, we investigate the nonlinear version of the least weighted squares regression for data with a continuous response. Our aim is to propose an efficient algorithm for the least weighted squares estimator, which is formulated in a general way applicable to both linear and nonlinear regression. Our numerical study reveals the computational aspects of the algorithm and brings arguments in favor of its credibility.
Robustness of High-Dimensional Data Mining
Kalina, Jan ; Duintjer Tebbens, Jurjen ; Schlenker, Anna
Standard data mining procedures are sensitive to the presence of outlying measurements in the data. This work has the aim to propose robust versions of some existing data mining procedures, i.e. methods resistant to outliers. In the area of classification analysis, we propose a new robust method based on a regularized version of the minimum weighted covariance determinant estimator. The method is suitable for data with the number of variables exceeding the number of observations. The method is based on implicit weights assigned to individual observations. Our approach is a unique attempt to combine regularization and high robustness, allowing to downweight outlying high-dimensional observations. Classification performance of new methods and some ideas concerning classification analysis of high-dimensional data are illustrated on real raw data as well as on data contaminated by severe outliers.
Highly Robust Estimation of the Autocorrelation Coefficient
Kalina, Jan ; Vlčková, Katarína
The classical autocorrelation coefficient estimator in the time series context is very sensitive to the presence of outlying measurements in the data. This paper proposes several new robust estimators of the autocorrelation coefficient. First, we consider an autoregressive process of the first order AR(1) to be observed. Robust estimators of the autocorrelation coefficient are proposed in a straightforward way based on robust regression. Further, we consider the task of robust estimation of the autocorrelation coefficient of residuals of linear regression. The task is connected to verifying the assumption of independence of residuals and robust estimators of the autocorrelation coefficient are defined based on the Durbin-Watson test statistic for robust regression. The main result is obtained for the implicitly weighted autocorrelation coefficient with small weights assigned to outlying measurements. This estimator is based on the least weighted squares regression and we exploit its asymptotic properties to derive an asymptotic test that the autocorrelation coefficient is equal to 0. Finally, we illustrate different estimators on real economic data, which reveal the advantage of the approach based on the least weighted squares regression. The estimator turns out to be resistant against the presence of outlying measurements.
Robust Regularized Cluster Analysis for High-Dimensional Data
Kalina, Jan ; Vlčková, Katarína
This paper presents new approaches to the hierarchical agglomerative cluster analysis for high-dimensional data. First, we propose a regularized version of the hierarchical cluster analysis for categorical data with a large number of categories. It exploits a regularized version of various test statistics of homogeneity in contingency tables as the measure of distance between two clusters. Further, our aim is cluster analysis of continuous data with a large number of variables. Various regularization techniques tailor-made for high-dimensional data have been proposed, which have however turned out to suffer from a high sensitivity to the presence of outlying measurements in the data. As a robust solution, we recommend to combine two newly proposed methods, namely a regularized version of robust principal component analysis and a regularized Mahalanobis distance, which is based on an asymptotically optimal regularization of the covariance matrix. We bring arguments in favor of the newly proposed methods.
Autocorrelated residuals of robust regression
Kalina, Jan
The work is devoted to the Durbin-Watson test for robust linear regression methods. First we explain consequences of the autocorrelation of residuals on estimating regression parameters. We propose an asymptotic version of the Durbin-Watson test for regression quantiles and trimmed least squares and derive an asymptotic approximation to the exact null distribution of the test statistic, exploiting the asymptotic representation for both regression estimators. Further, we consider the least weighted squares estimator, which is a highly robust estimator based on the idea to down-weight less reliable observations. We compare various versions of the Durbin-Watson test for the least weighted squares estimator. The asymptotic test is derived using two versions of the asymptotic representation. Finally, we investigate a weighted Durbin-Watson test using the weights determined by the least weighted squares estimator. The exact test is described and also an asymptotic approximation to the distribution of the weighted statistic under the null hypothesis is obtained.
Robustness Aspects of Knowledge Discovery
Kalina, Jan
The sensitivity of common knowledge discovery methods to the presence of outlying measurements in the observed data is discussed as their major drawback. Our work is devoted to robust methods for information extraction from data. First, we discuss neural networks for function approximation and their sensitivity to the presence of noise and outlying measurements in the data. We propose to fit neural networks in a robust way by means of a robust nonlinear regression. Secondly, we consider information extraction from categorical data, which commonly suffers from measurement errors. To improve its robustness properties, we propose a regularized version of the common test statistics, which may find applications e.g. in pattern discovery from categorical data.

National Repository of Grey Literature : 108 records found   beginprevious89 - 98next  jump to record:
See also: similar author names
1 Kalina, J.
2 Kalina, Jakub
75 Kalina, Jan
2 Kalina, Jaroslav
4 Kalina, Jiří
4 Kalina, Josef
Interested in being notified about new results for this query?
Subscribe to the RSS feed.