MARK SCHWABACHER

Member since: Sep 24, 2010, NASA ARC

About

MARK's Resources (9)

MARK's Projects (1)

Mining Distance-Based Outliers in Near Linear Time

Shared by MARK SCHWABACHER, updated on Sep 22, 2010

Summary

Author(s) :: Stephen Bay, Mark Schwabacher

Abstract

Full title: Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule

Abstract: Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.

show more info

Publication Name: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Publication Location: N/A
Year Published: 2003

Files

BaySchwabacherKDD2003.pdf: Mining Distance-Based Outliers in Near Linear Time with Randomization and a Simple Pruning Rule

198.0 KB

372 downloads

Discussions

Add New Comment

MARK's Projects (1)

Ares I-X Ground Diagnostic ...
8 members

1 resources

Need help?

Visit our help center