Název: Interpreting and Clustering Outliers with Sapling Random Forests
Autoři: Kopp, Martin ; Pevný, T. ; Holeňa, Martin
Typ dokumentu: Příspěvky z konference
Konference/Akce: ITAT 2014. European Conference on Information Technologies - Applications and Theory /14./, Demänovská dolina (SK), 2014-09-25 / 2014-09-29
Rok: 2014
Jazyk: eng
Abstrakt: The main objective of outlier detection is finding samples considerably deviating from the majority. Such outliers, often referred to as anomalies, are nowadays more and more important, because they help to uncover interesting events within data. Consequently, a considerable amount of statistical and data mining techniques to identify anomalies was proposed in the last few years, but only a few works at least mentioned why some sample was labelled as an anomaly. Therefore, we propose a method based on specifically trained decision trees, called sapling random forest. Our method is able to interpret the output of arbitrary anomaly detector. The explanation is given as a subset of features, in which the sample is most deviating, or as conjunctions of atomic conditions, which can be viewed as antecedents of logical rules easily understandable by humans. To simplify the investigation of suspicious samples even more, we propose two methods of clustering anomalies into groups. Such clusters can be investigated at once saving time and human efforts. The feasibility of our approach is demonstrated on several synthetic and one real world datasets.
Klíčová slova: anomaly detection; anomaly interpretation; clustering; decision trees; feature selection; random forest
Číslo projektu: GA13-17187S (CEP), GPP103/12/P514 (CEP)
Poskytovatel projektu: GA ČR, GA ČR
Zdrojový dokument: ITAT 2014. Information Technologies - Applications and Theory. Part II, ISBN 978-80-87136-19-5

Instituce: Ústav informatiky AV ČR (web)
Informace o dostupnosti dokumentu: Dokument je dostupný v repozitáři Akademie věd.
Původní záznam: http://hdl.handle.net/11104/0236773

Trvalý odkaz NUŠL: http://www.nusl.cz/ntk/nusl-175461


Záznam je zařazen do těchto sbírek:
Věda a výzkum > AV ČR > Ústav informatiky
Konferenční materiály > Příspěvky z konference
 Záznam vytvořen dne 2014-10-09, naposledy upraven 2023-12-06.


Není přiložen dokument
  • Exportovat ve formátu DC, NUŠL, RIS
  • Sdílet