Název:
Some Robust Estimation Tools for Multivariate Models
Autoři:
Kalina, Jan Typ dokumentu: Příspěvky z konference Konference/Akce: International Days of Statistics and Economics /9./, Prague (CZ), 2015-09-10 / 2015-09-12
Rok:
2015
Jazyk:
eng
Abstrakt: Standard procedures of multivariate statistics and data mining for the analysis of multivariate data are known to be vulnerable to the presence of outlying and/or highly influential observations. This paper has the aim to propose and investigate specific approaches for two situations. First, we consider clustering of categorical data. While attention has been paid to sensitivity of standard statistical and data mining methods for categorical data only recently, we aim at modifying standard distance measures between clusters of such data. This allows us to propose a hierarchical agglomerative cluster analysis for two-way contingency tables with a large number of categories, based on a regularized measure of distance between two contingency tables. Such proposal improves the robustness to the presence of measurement errors for categorical data. As a second problem, we investigate the nonlinear version of the least weighted squares regression for data with a continuous response. Our aim is to propose an efficient algorithm for the least weighted squares estimator, which is formulated in a general way applicable to both linear and nonlinear regression. Our numerical study reveals the computational aspects of the algorithm and brings arguments in favor of its credibility.
Klíčová slova:
cluster analysis; high-dimensional data; outliers; robust data mining Číslo projektu: GA13-01930S (CEP), GA13-17187S (CEP) Poskytovatel projektu: GA ČR, GA ČR Zdrojový dokument: The 9th International Days of Statistics and Economics Conference Proceedings, ISBN 978-80-87990-06-3