Analytics Magazine Analytics Magazine, May/June 2014 | Page 52

B IG DATA the density associated with a point by counting number of points in a region of a specified radius around a point. Points with a density above a threshold are classified as core points, while noise points are defined as non-core points that don’t have core points within the specified radius. Noise points are discarded and clusters are formed around core points. This very idea of density-based identification of a cluster helps in creating clusters of various shapes. CURE (Clustering with Representatives) [2] also does well at capturing clusters of various shapes and sizes, since only the representative points of a cluster are used to compute its distance from other clusters. The clustering algorithm starts with each input point as a separate cluster, and at each successive st \Y\