Original title: Robustness of High-Dimensional Data Mining
Authors: Kalina, Jan ; Duintjer Tebbens, Jurjen ; Schlenker, Anna
Document type: Papers
Conference/Event: ITAT 2014. European Conference on Information Technologies - Applications and Theory /14./, Demänovská dolina (SK), 2014-09-25 / 2014-09-29
Year: 2014
Language: eng
Abstract: Standard data mining procedures are sensitive to the presence of outlying measurements in the data. This work has the aim to propose robust versions of some existing data mining procedures, i.e. methods resistant to outliers. In the area of classification analysis, we propose a new robust method based on a regularized version of the minimum weighted covariance determinant estimator. The method is suitable for data with the number of variables exceeding the number of observations. The method is based on implicit weights assigned to individual observations. Our approach is a unique attempt to combine regularization and high robustness, allowing to downweight outlying high-dimensional observations. Classification performance of new methods and some ideas concerning classification analysis of high-dimensional data are illustrated on real raw data as well as on data contaminated by severe outliers.
Keywords: classification analysis; high-dimensional data; robust estimation
Project no.: GA13-17187S (CEP), GA13-06684S (CEP), 264513, 494/2013
Funding provider: GA ČR, GA ČR, GA UK, CESNET Development Fund
Host item entry: ITAT 2014. Information Technologies - Applications and Theory. Part II, ISBN 978-80-87136-19-5

Institution: Institute of Computer Science AS ČR (web)
Document availability information: Fulltext is available in the digital repository of the Academy of Sciences.
Original record: http://hdl.handle.net/11104/0236770

Permalink: http://www.nusl.cz/ntk/nusl-175460


The record appears in these collections:
Research > Institutes ASCR > Institute of Computer Science
Conference materials > Papers
 Record created 2014-10-09, last modified 2023-12-06


No fulltext
  • Export as DC, NUŠL, RIS
  • Share