
A Robustified Metalearning Procedure for Regression Estimators
Kalina, Jan ; Neoral, A.
Metalearning represents a useful methodology for selecting and recommending a suitable algorithm or method for a new dataset exploiting a database of training datasets. While metalearning is potentially beneficial for the analysis of economic data, we must be aware of its instability and sensitivity to outlying measurements (outliers) as well as measurement errors. The aim of this paper is to robustify the metalearning process. First, we prepare some useful theoretical tools exploiting the idea of implicit weighting, inspired by the least weighted squares estimator. These include a robust coefficient of determination, a robust version of mean square error, and a simple rule for outlier detection in linear regression. We perform a metalearning study for recommending the best linear regression estimator for a new dataset (not included in the training database). The prediction of the optimal estimator is learned over a set of 20 real datasets with economic motivation, while the least squares are compared with several (highly) robust estimators. We investigate the effect of variable selection on the metalearning results. If the training as well as validation data are considered after a proper robust variable selection, the metalearning performance is improved remarkably, especially if a robust prediction error is used.


A Robustified Metalearning Procedure for Regression Estimators
Kalina, Jan ; Neoral, A.
Metalearning represents a useful methodology for selecting and recommending a suitable algorithm or method for a new dataset exploiting a database of training datasets. While metalearning is potentially beneficial for the analysis of economic data, we must be aware of its instability and sensitivity to outlying measurements (outliers) as well as measurement errors. The aim of this paper is to robustify the metalearning process. First, we prepare some useful theoretical tools exploiting the idea of implicit weighting, inspired by the least weighted squares estimator. These include a robust coefficient of determination, a robust version of mean square error, and a simple rule for outlier detection in linear regression. We perform a metalearning study for recommending the best linear regression estimator for a new dataset (not included in the training database). The prediction of the optimal estimator is learned over a set of 20 real datasets with economic motivation, while the least squares are compared with several (highly) robust estimators. We investigate the effect of variable selection on the metalearning results. If the training as well as validation data are considered after a proper robust variable selection, the metalearning performance is improved remarkably, especially if a robust prediction error is used.


Implicitly weighted robust estimation of quantiles in linear regression
Kalina, Jan ; Vidnerová, Petra
Estimation of quantiles represents a very important task in econometric regression modeling, while the standard regression quantiles machinery is well developed as well as popular with a large number of econometric applications. Although regression quantiles are commonly known as robust tools, they are vulnerable to the presence of leverage points in the data. We propose here a novel approach for the linear regression based on a specific version of the least weighted squares estimator, together with an additional estimator based only on observations between two different novel quantiles. The new methods are conceptually simple and comprehensible. Without the ambition to derive theoretical properties of the novel methods, numerical computations reveal them to perform comparably to standard regression quantiles, if the data are not contaminated by outliers. Moreover, the new methods seem much more robust on a simulated dataset with severe leverage points.


A Nonparametric Bootstrap Comparison of Variances of Robust Regression Estimators.
Kalina, Jan ; Tobišková, Nicole ; Tichavský, Jan
While various robust regression estimators are available for the standard linear regression model, performance comparisons of individual robust estimators over real or simulated datasets seem to be still lacking. In general, a reliable robust estimator of regression parameters should be consistent and at the same time should have a relatively small variability, i.e. the variances of individual regression parameters should be small. The aim of this paper is to compare the variability of Sestimators, MMestimators, least trimmed squares, and least weighted squares estimators. While they all are consistent under general assumptions, the asymptotic covariance matrix of the least weighted squares remains infeasible, because the only available formula for its computation depends on the unknown random errors. Thus, we take resort to a nonparametric bootstrap comparison of variability of different robust regression estimators. It turns out that the best results are obtained either with MMestimators, or with the least weighted squares with suitable weights. The latter estimator is especially recommendable for small sample sizes.


How to downweight observations in robust regression: A metalearning study
Kalina, Jan ; Pitra, Z.
Metalearning is becoming an increasingly important methodology for extracting knowledge from a data base of available training data sets to a new (independent) data set. The concept of metalearning is becoming popular in statistical learning and there is an increasing number of metalearning applications also in the analysis of economic data sets. Still, not much attention has been paid to its limitations and disadvantages. For this purpose, we use various linear regression estimators (including highly robust ones) over a set of 30 data sets with economic background and perform a metalearning study over them as well as over the same data sets after an artificial contamination.


Nonparametric Bootstrap Techniques for Implicitly Weighted Robust Estimators
Kalina, Jan
The paper is devoted to highly robust statistical estimators based on implicit weighting, which have a potential to find econometric applications. Two particular methods include a robust correlation coefficient based on the least weighted squares regression and the minimum weighted covariance determinant estimator, where the latter allows to estimate the mean and covariance matrix of multivariate data. New tools are proposed allowing to test hypotheses about these robust estimators or to estimate their variance. The techniques considered in the paper include resampling approaches with or without replacement, i.e. permutation tests, bootstrap variance estimation, and bootstrap confidence intervals. The performance of the newly described tools is illustrated on numerical examples. They reveal the suitability of the robust procedures also for noncontaminated data, as their confidence intervals are not much wider compared to those for standard maximum likelihood estimators. While resampling without replacement turns out to be more suitable for hypothesis testing, bootstrapping with replacement yields reliable confidence intervals but not corresponding hypothesis tests.


Nonparametric Bootstrap Techniques for Implicitly Weighted Robust Estimators
Kalina, Jan
The paper is devoted to highly robust statistical estimators based on implicit weighting, which have a potential to find econometric applications. Two particular methods include a robust correlation coefficient based on the least weighted squares regression and the minimum weighted covariance determinant estimator, where the latter allows to estimate the mean and covariance matrix of multivariate data. New tools are proposed allowing to test hypotheses about these robust estimators or to estimate their variance. The techniques considered in the paper include resampling approaches with or without replacement, i.e. permutation tests, bootstrap variance estimation, and bootstrap confidence intervals. The performance of the newly described tools is illustrated on numerical examples. They reveal the suitability of the robust procedures also for noncontaminated data, as their confidence intervals are not much wider compared to those for standard maximum likelihood estimators. While resampling without replacement turns out to be more suitable for hypothesis testing, bootstrapping with replacement yields reliable confidence intervals but not corresponding hypothesis tests.


Robust Metalearning: Comparing Robust Regression Using A Robust Prediction Error
Peštová, Barbora ; Kalina, Jan
The aim of this paper is to construct a classification rule for predicting the best regression estimator for a new data set based on a database of 20 training data sets. Various estimators considered here include some popular methods of robust statistics. The methodology used for constructing the classification rule can be described as metalearning. Nevertheless, standard approaches of metalearning should be robustified if working with data sets contaminated by outlying measurements (outliers). Therefore, our contribution can be also described as robustification of the metalearning process by using a robust prediction error. In addition to performing the metalearning study by means of both standard and robust approaches, we search for a detailed interpretation in two particular situations. The results of detailed investigation show that the knowledge obtained by a metalearning approach standing on standard principles is prone to great variability and instability, which makes it hard to believe that the results are not just a consequence of a mere chance. Such aspect of metalearning seems not to have been previously analyzed in literature.


How to downweight observations in robust regression: A metalearning study
Kalina, Jan ; Pitra, Zbyněk
Metalearning is becoming an increasingly important methodology for extracting knowledge from a data base of available training data sets to a new (independent) data set. The concept of metalearning is becoming popular in statistical learning and there is an increasing number of metalearning applications also in the analysis of economic data sets. Still, not much attention has been paid to its limitations and disadvantages. For this purpose, we use various linear regression estimators (including highly robust ones) over a set of 30 data sets with economic background and perform a metalearning study over them as well as over the same data sets after an artificial contamination. We focus on comparing the prediction performance of the least weighted squares estimator with various weighting schemes. A broader spectrum of classification methods is applied and a support vector machine turns out to yield the best results. While results of a leave1out cross validation are very different from results of autovalidation, we realize that metalearning is highly unstable and its results should be interpreted with care. We also focus on discussing all possible limitations of the metalearning methodology in general.


Giant landslides on volcanic islands on the example of the Hawaii archipelago
Kalina, Jan ; Blahůt, Jan (advisor) ; Novotný, Jan (referee)
The bachelor thesis deals with the largest slope movements on Earth  landslides on volcanic islands, with focus on the Hawaii archipelago. It summarizes the knowledge of their classification and evaluates their possible causes with respect to the specifics of the volcanic islands. After introduction, it focuses on a specific area of interest that is the Hawaiian Islands and describes landslides that have occurred during the geological history of these islands. A database of all known the 19 largest landslides is also made, where their location, classification, age and morphometric data such as volume, perimeter, area, length, width and height are recorded. This database will become a part of the database of the giant landslides on volcanic islands on Earth, which is being created at the Institute of Rock Structure and Mechanics of the Czech Academy of Sciences. The database is further explored in the statistical chapter, where the mathematical procedure for calculating the relative runout of the slope movement and the potential energy of the landslide is explained. Additionally, the box plots comparing the selected morphometric parameters are created. For illustrative nature, a map of the giant landslides is also included.
