
Implicitly weighted robust estimation of quantiles in linear regression
Kalina, Jan ; Vidnerová, Petra
Estimation of quantiles represents a very important task in econometric regression modeling, while the standard regression quantiles machinery is well developed as well as popular with a large number of econometric applications. Although regression quantiles are commonly known as robust tools, they are vulnerable to the presence of leverage points in the data. We propose here a novel approach for the linear regression based on a specific version of the least weighted squares estimator, together with an additional estimator based only on observations between two different novel quantiles. The new methods are conceptually simple and comprehensible. Without the ambition to derive theoretical properties of the novel methods, numerical computations reveal them to perform comparably to standard regression quantiles, if the data are not contaminated by outliers. Moreover, the new methods seem much more robust on a simulated dataset with severe leverage points.


A Nonparametric Bootstrap Comparison of Variances of Robust Regression Estimators.
Kalina, Jan ; Tobišková, N. ; Tichavský, J.
While various robust regression estimators are available for the standard linear regression model, performance comparisons of individual robust estimators over real or simulated datasets seem to be still lacking. In general, a reliable robust estimator of regression parameters should be consistent and at the same time should have a relatively small variability, i.e. the variances of individual regression parameters should be small. The aim of this paper is to compare the variability of Sestimators, MMestimators, least trimmed squares, and least weighted squares estimators. While they all are consistent under general assumptions, the asymptotic covariance matrix of the least weighted squares remains infeasible, because the only available formula for its computation depends on the unknown random errors. Thus, we take resort to a nonparametric bootstrap comparison of variability of different robust regression estimators. It turns out that the best results are obtained either with MMestimators, or with the least weighted squares with suitable weights; the latter estimator is especially recommendable for small sample sizes.


How to downweight observations in robust regression: A metalearning study
Kalina, Jan ; Pitra, Zbyněk
Metalearning is becoming an increasingly important methodology for extracting knowledge from a data base of available training data sets to a new (independent) data set. The concept of metalearning is becoming popular in statistical learning and there is an increasing number of metalearning applications also in the analysis of economic data sets. Still, not much attention has been paid to its limitations and disadvantages. For this purpose, we use various linear regression estimators (including highly robust ones) over a set of 30 data sets with economic background and perform a metalearning study over them as well as over the same data sets after an artificial contamination.


Nonparametric Bootstrap Techniques for Implicitly Weighted Robust Estimators
Kalina, Jan
The paper is devoted to highly robust statistical estimators based on implicit weighting, which have a potential to find econometric applications. Two particular methods include a robust correlation coefficient based on the least weighted squares regression and the minimum weighted covariance determinant estimator, where the latter allows to estimate the mean and covariance matrix of multivariate data. New tools are proposed allowing to test hypotheses about these robust estimators or to estimate their variance. The techniques considered in the paper include resampling approaches with or without replacement, i.e. permutation tests, bootstrap variance estimation, and bootstrap confidence intervals. The performance of the newly described tools is illustrated on numerical examples. They reveal the suitability of the robust procedures also for noncontaminated data, as their confidence intervals are not much wider compared to those for standard maximum likelihood estimators. While resampling without replacement turns out to be more suitable for hypothesis testing, bootstrapping with replacement yields reliable confidence intervals but not corresponding hypothesis tests.


Nonparametric Bootstrap Techniques for Implicitly Weighted Robust Estimators
Kalina, Jan
The paper is devoted to highly robust statistical estimators based on implicit weighting, which have a potential to find econometric applications. Two particular methods include a robust correlation coefficient based on the least weighted squares regression and the minimum weighted covariance determinant estimator, where the latter allows to estimate the mean and covariance matrix of multivariate data. New tools are proposed allowing to test hypotheses about these robust estimators or to estimate their variance. The techniques considered in the paper include resampling approaches with or without replacement, i.e. permutation tests, bootstrap variance estimation, and bootstrap confidence intervals. The performance of the newly described tools is illustrated on numerical examples. They reveal the suitability of the robust procedures also for noncontaminated data, as their confidence intervals are not much wider compared to those for standard maximum likelihood estimators. While resampling without replacement turns out to be more suitable for hypothesis testing, bootstrapping with replacement yields reliable confidence intervals but not corresponding hypothesis tests.


Robust Metalearning: Comparing Robust Regression Using A Robust Prediction Error
Peštová, Barbora ; Kalina, Jan
The aim of this paper is to construct a classification rule for predicting the best regression estimator for a new data set based on a database of 20 training data sets. Various estimators considered here include some popular methods of robust statistics. The methodology used for constructing the classification rule can be described as metalearning. Nevertheless, standard approaches of metalearning should be robustified if working with data sets contaminated by outlying measurements (outliers). Therefore, our contribution can be also described as robustification of the metalearning process by using a robust prediction error. In addition to performing the metalearning study by means of both standard and robust approaches, we search for a detailed interpretation in two particular situations. The results of detailed investigation show that the knowledge obtained by a metalearning approach standing on standard principles is prone to great variability and instability, which makes it hard to believe that the results are not just a consequence of a mere chance. Such aspect of metalearning seems not to have been previously analyzed in literature.


How to downweight observations in robust regression: A metalearning study
Kalina, Jan ; Pitra, Zbyněk
Metalearning is becoming an increasingly important methodology for extracting knowledge from a data base of available training data sets to a new (independent) data set. The concept of metalearning is becoming popular in statistical learning and there is an increasing number of metalearning applications also in the analysis of economic data sets. Still, not much attention has been paid to its limitations and disadvantages. For this purpose, we use various linear regression estimators (including highly robust ones) over a set of 30 data sets with economic background and perform a metalearning study over them as well as over the same data sets after an artificial contamination. We focus on comparing the prediction performance of the least weighted squares estimator with various weighting schemes. A broader spectrum of classification methods is applied and a support vector machine turns out to yield the best results. While results of a leave1out cross validation are very different from results of autovalidation, we realize that metalearning is highly unstable and its results should be interpreted with care. We also focus on discussing all possible limitations of the metalearning methodology in general.


Giant landslides on volcanic islands on the example of the Hawaii archipelago
Kalina, Jan ; Blahůt, Jan (advisor) ; Novotný, Jan (referee)
The bachelor thesis deals with the largest slope movements on Earth  landslides on volcanic islands, with focus on the Hawaii archipelago. It summarizes the knowledge of their classification and evaluates their possible causes with respect to the specifics of the volcanic islands. After introduction, it focuses on a specific area of interest that is the Hawaiian Islands and describes landslides that have occurred during the geological history of these islands. A database of all known the 19 largest landslides is also made, where their location, classification, age and morphometric data such as volume, perimeter, area, length, width and height are recorded. This database will become a part of the database of the giant landslides on volcanic islands on Earth, which is being created at the Institute of Rock Structure and Mechanics of the Czech Academy of Sciences. The database is further explored in the statistical chapter, where the mathematical procedure for calculating the relative runout of the slope movement and the potential energy of the landslide is explained. Additionally, the box plots comparing the selected morphometric parameters are created. For illustrative nature, a map of the giant landslides is also included.


Various Approaches to Szroeter’s Test for Regression Quantiles
Kalina, Jan ; Peštová, B.
Regression quantiles represent an important tool for regression analysis popular in econometric applications, for example for the task of detecting heteroscedasticity in the data. Nevertheless, they need to be accompanied by diagnostic tools for verifying their assumptions. The paper is devoted to heteroscedasticity testing for regression quantiles, while their most important special case is commonly denoted as the regression median. Szroeter’s test, which is one of available heteroscedasticity tests for the least squares, is modified here for the regression median in three different ways: (1) asymptotic test based on the asymptotic representation for regression quantiles, (2) permutation test based on residuals, and (3) exact approximate test, which has a permutation character and represents an approximation to an exact test. All three approaches can be computed in a straightforward way and their principles can be extended also to other heteroscedasticity tests. The theoretical results are expected to be extended to other regression quantiles and mainly to multivariate quantiles.


Various Approaches to Szroeter’s Test for Regression Quantiles
Kalina, Jan ; Peštová, Barbora
Regression quantiles represent an important tool for regression analysis popular in econometric applications, for example for the task of detecting heteroscedasticity in the data. Nevertheless, they need to be accompanied by diagnostic tools for verifying their assumptions. The paper is devoted to heteroscedasticity testing for regression quantiles, while their most important special case is commonly denoted as the regression median. Szroeter’s test, which is one of available heteroscedasticity tests for the least squares, is modified here for the regression median in three different ways: (1) asymptotic test based on the asymptotic representation for regression quantiles, (2) permutation test based on residuals, and (3) exact approximate test, which has a permutation character and represents an approximation to an exact test. All three approaches can be computed in a straightforward way and their principles can be extended also to other heteroscedasticity tests. The theoretical results are expected to be extended to other regression quantiles and mainly to multivariate quantiles.
