Original title:
Explaining Anomalies with Sapling Random Forests
Authors:
Pevný, T. ; Kopp, Martin Document type: Papers Conference/Event: ITAT 2014. European Conference on Information Technologies - Applications and Theory /14./, Demänovská dolina (SK), 2014-09-25 / 2014-09-29
Year:
2014
Language:
eng Abstract:
The main objective of anomaly detection algorithms is finding samples deviating from the majority. Although a vast number of algorithms designed for this already exist, almost none of them explain, why a particular sample was labelled as an anomaly. To address this issue, we propose an algorithm called Explainer, which returns the explanation of sample’s differentness in disjunctive normal form (DNF), which is easy to understand by humans. Since Explainer treats anomaly detection algorithms as black-boxes, it can be applied in many domains to simplify investigation of anomalies. The core of Explainer is a set of specifically trained trees, which we call sapling random forests. Since their training is fast and memory efficient, the whole algorithm is lightweight and applicable to large databases, datastreams, and real-time problems. The correctness of Explainer is demonstrated on a wide range of synthetic and real world datasets.
Keywords:
anomaly explanation; decision trees; feature selection; random forest Project no.: GA13-17187S (CEP), GPP103/12/P514 (CEP) Funding provider: GA ČR, GA ČR Host item entry: ITAT 2014. Information Technologies - Applications and Theory. Part II, ISBN 978-80-87136-19-5
Institution: Institute of Computer Science AS ČR
(web)
Document availability information: Fulltext is available in the digital repository of the Academy of Sciences. Original record: http://hdl.handle.net/11104/0236783