Original title:
Some Robust Distances for Multivariate Data
Authors:
Kalina, Jan ; Peštová, Barbora Document type: Papers Conference/Event: MME 2016. International Conference Mathematical Methods in Economics /34./, Liberec (CZ), 2016-09-06 / 2016-09-09
Year:
2016
Language:
eng Abstract:
Numerous methods of multivariate statistics and data mining suffer from the presence of outlying measurements in the data. This paper presents new distance measures suitable for continuous data. First, we consider a Mahalanobis distance suitable for high-dimensional data with the number of variables (largely) exceeding the number of observations. We propose its doubly regularized version, which combines a regularization of the covariance matrix with replacing the means of multivariate data by their regularized counterparts. We formulate explicit expressions for some versions of the regularization of the means, which can be interpreted as a denoising (i.e. robust version) of standard means. Further, we propose a robust cosine similarity measure, which is based on implicit weighting of individual observations. We derive properties of the newly proposed robust cosine similarity, which includes a proof of the high robustness in terms of the breakdown point.
Keywords:
distance measures; high dimension; multivariate data; regularization; robustness Project no.: GA13-01930S (CEP) Funding provider: GA ČR Host item entry: Proceedings of the 34th International Conference Mathematical Methods in Economics MME 2016, ISBN 978-80-7494-296-9 Note: Související webová stránka: http://mme2016.tul.cz/conferenceproceedings/mme2016_conference_proceedings.pdf#page=377
Institution: Institute of Computer Science AS ČR
(web)
Document availability information: Fulltext is available on demand via the digital repository of the Academy of Sciences. Original record: http://hdl.handle.net/11104/0262275