National Repository of Grey Literature 37 records found  beginprevious18 - 27next  jump to record: Search took 0.01 seconds. 
How to down-weight observations in robust regression: A metalearning study
Kalina, Jan ; Pitra, Z.
Metalearning is becoming an increasingly important methodology for extracting knowledge from a data base of available training data sets to a new (independent) data set. The concept of metalearning is becoming popular in statistical learning and there is an increasing number of metalearning applications also in the analysis of economic data sets. Still, not much attention has been paid to its limitations and disadvantages. For this purpose, we use various linear regression estimators (including highly robust ones) over a set of 30 data sets with economic background and perform a metalearning study over them as well as over the same data sets after an artificial contamination.
Robust Metalearning: Comparing Robust Regression Using A Robust Prediction Error
Peštová, Barbora ; Kalina, Jan
The aim of this paper is to construct a classification rule for predicting the best regression estimator for a new data set based on a database of 20 training data sets. Various estimators considered here include some popular methods of robust statistics. The methodology used for constructing the classification rule can be described as metalearning. Nevertheless, standard approaches of metalearning should be robustified if working with data sets contaminated by outlying measurements (outliers). Therefore, our contribution can be also described as robustification of the metalearning process by using a robust prediction error. In addition to performing the metalearning study by means of both standard and robust approaches, we search for a detailed interpretation in two particular situations. The results of detailed investigation show that the knowledge obtained by a metalearning approach standing on standard principles is prone to great variability and instability, which makes it hard to believe that the results are not just a consequence of a mere chance. Such aspect of metalearning seems not to have been previously analyzed in literature.
How to down-weight observations in robust regression: A metalearning study
Kalina, Jan ; Pitra, Zbyněk
Metalearning is becoming an increasingly important methodology for extracting knowledge from a data base of available training data sets to a new (independent) data set. The concept of metalearning is becoming popular in statistical learning and there is an increasing number of metalearning applications also in the analysis of economic data sets. Still, not much attention has been paid to its limitations and disadvantages. For this purpose, we use various linear regression estimators (including highly robust ones) over a set of 30 data sets with economic background and perform a metalearning study over them as well as over the same data sets after an artificial contamination. We focus on comparing the prediction performance of the least weighted squares estimator with various weighting schemes. A broader spectrum of classification methods is applied and a support vector machine turns out to yield the best results. While results of a leave-1-out cross validation are very different from results of autovalidation, we realize that metalearning is highly unstable and its results should be interpreted with care. We also focus on discussing all possible limitations of the metalearning methodology in general.
Algorithms for anomaly detection in data from clinical trials and health registries
Bondarenko, Maxim ; Blaha, Milan (referee) ; Schwarz, Daniel (advisor)
This master's thesis deals with the problems of anomalies detection in data from clinical trials and medical registries. The purpose of this work is to perform literary research about quality of data in clinical trials and to design a personal algorithm for detection of anomalous records based on machine learning methods in real clinical data from current or completed clinical trials or medical registries. In the practical part is described the implemented algorithm of detection, consists of several parts: import of data from information system, preprocessing and transformation of imported data records with variables of different data types into numerical vectors, using well known statistical methods for detection outliers and evaluation of the quality and accuracy of the algorithm. The result of creating the algorithm is vector of parameters containing anomalies, which has to make the work of data manager easier. This algorithm is designed for extension the palette of information system functions (CLADE-IS) on automatic monitoring the quality of data by detecting anomalous records.
Algorithms for anomaly detection in data from clinical trials and health registries
Bondarenko, Maxim ; Blaha, Milan (referee) ; Schwarz, Daniel (advisor)
This master's thesis deals with the problems of anomalies detection in data from clinical trials and medical registries. The purpose of this work is to perform literary research about quality of data in clinical trials and to design a personal algorithm for detection of anomalous records based on machine learning methods in real clinical data from current or completed clinical trials or medical registries. In the practical part is described the implemented algorithm of detection, consists of several parts: import of data from information system, preprocessing and transformation of imported data records with variables of different data types into numerical vectors, using well known statistical methods for detection outliers and evaluation of the quality and accuracy of the algorithm. The result of creating the algorithm is vector of parameters containing anomalies, which has to make the work of data manager easier. This algorithm is designed for extension the palette of information system functions (CLADE-IS) on automatic monitoring the quality of data by detecting anomalous records.
Robust regression - outlier detection
Hradilová, Lenka ; Blatná, Dagmar (advisor) ; Černý, Jindřich (referee)
This master thesis is focused on methods of outlier detection. The aim of this work is to assess the suitability of using robust methods on real data of EKO-KOM, a.s. The first part of the thesis provides an overview and a theoretical treatise on classic and robust methods of outlier detection. These methods are subsequently applied to the obtained data file of EKO-KOM, a.s. in the practical part of the thesis. At the conclusion of the thesis, there are recommendations about suitability of methods, which are based on comparison of classical and robust methods.
Stable distributions and their applications
Volchenkova, Irina ; Klebanov, Lev (advisor) ; Beneš, Viktor (referee)
The aim of this thesis is to show that the use of heavy-tailed distributions in finance is theoretically unfounded and may cause significant misunderstandings and fallacies in model interpretation. The main reason seems to be a wrong understanding of the concept of the distributional tail. Also in models based on real data it seems more reasonable to concentrate on the central part of the distribution not tails. Powered by TCPDF (www.tcpdf.org)
Robustification of statistical and econometrical regression methods
Jurczyk, Tomáš ; Víšek, Jan Ámos (advisor) ; Hlávka, Zdeněk (referee) ; Malý, Marek (referee)
Title: Robustification of statistical and econometrical regression methods Author: Mgr. Tomáš Jurczyk Department: Department of probability and mathematical statistics Supervisor: prof. RNDr. Jan Ámos Víšek CSc., IES FSV UK Praha Abstract: Multicollinearity and outlier presence are two problems of data which can occur during the regression analysis. In this thesis we are interested mainly in situations where combined outlier-multicollinearity problem is present. We will show first the behavior of classical methods developed for overcoming one of these problems. We will investigate the functionality of methods proposed as robust multicollinearity detectors as well. We will prove that proposed two-step procedures (in one step typically based on robust regression methods) are failing in outlier detection and therefore also multicollinearity detection, if the strong multicollinearity is present in the majority of the data. We will propose a new one-step method as a candidate for the robust detector of multicollinearity as well as the robust ridge regression estimate. We will derive its properties, behavior and propose the diagnostic tools derived from that method. Keywords: multicollinearity, outliers, robust detector of multicollinearity, ro- bust ridge regression 1
Outliers
Kudrnáč, Vojtěch ; Zvára, Karel (advisor) ; Anděl, Jiří (referee)
This paper concerns itself with the methods of identifying outliers in an otherwise normally distributed data set. Several significant tests and criteria designed for this purpose are described here, Peirce's criterion, Chauvenet's criterion, Grubbs' test, Dixon's test and Cochran's test. Deriving of the tests and criteria is indicated and finally the results of the use of the test and criteria on simulated data with normal distribution and inserted outlier are looked into. Codes in programming language R with the implementation of these test and criteria using existing functions are included. Powered by TCPDF (www.tcpdf.org)
Robust Regression Estimators: A Comparison of Prediction Performance
Kalina, Jan ; Peštová, Barbora
Regression represents an important methodology for solving numerous tasks of applied econometrics. This paper is devoted to robust estimators of parameters of a linear regression model, which are preferable whenever the data contain or are believed to contain outlying measurements (outliers). While various robust regression estimators are nowadays available in standard statistical packages, the question remains how to choose the most suitable regression method for a particular data set. This paper aims at comparing various regression methods on various data sets. First, the prediction performance of common robust regression estimators are compared on a set of 24 real data sets from public repositories. Further, the results are used as input for a metalearning study over 9 selected features of individual data sets. On the whole, the least trimmed squares turns out to be superior to the least squares or M-estimators in the majority of the data sets, while the process of metalearning does not succeed in a reliable prediction of the most suitable estimator for a given data set.

National Repository of Grey Literature : 37 records found   beginprevious18 - 27next  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.