Název:
Robustness of High-Dimensional Data Mining
Autoři:
Kalina, Jan ; Duintjer Tebbens, Jurjen ; Schlenker, Anna Typ dokumentu: Příspěvky z konference Konference/Akce: ITAT 2014. European Conference on Information Technologies - Applications and Theory /14./, Demänovská dolina (SK), 2014-09-25 / 2014-09-29
Rok:
2014
Jazyk:
eng
Abstrakt: Standard data mining procedures are sensitive to the presence of outlying measurements in the data. This work has the aim to propose robust versions of some existing data mining procedures, i.e. methods resistant to outliers. In the area of classification analysis, we propose a new robust method based on a regularized version of the minimum weighted covariance determinant estimator. The method is suitable for data with the number of variables exceeding the number of observations. The method is based on implicit weights assigned to individual observations. Our approach is a unique attempt to combine regularization and high robustness, allowing to downweight outlying high-dimensional observations. Classification performance of new methods and some ideas concerning classification analysis of high-dimensional data are illustrated on real raw data as well as on data contaminated by severe outliers.
Klíčová slova:
classification analysis; high-dimensional data; robust estimation Číslo projektu: GA13-17187S (CEP), GA13-06684S (CEP), 264513, 494/2013 Poskytovatel projektu: GA ČR, GA ČR, GA UK, CESNET Development Fund Zdrojový dokument: ITAT 2014. Information Technologies - Applications and Theory. Part II, ISBN 978-80-87136-19-5
Instituce: Ústav informatiky AV ČR
(web)
Informace o dostupnosti dokumentu:
Dokument je dostupný v repozitáři Akademie věd. Původní záznam: http://hdl.handle.net/11104/0236770