International Core Journal of Engineering 2020-26 | Page 140
(a) (b)
(c) (d)
(b)
(c) (d)
Figure 3. approaches on D31 set
Figure 2. approaches on compound set
Figure3 shows the clustering results of each approaches
on data ‘R15’, R15 has 15 clusters in total, no noise in the
data set, and each cluster is regular shape, so with reasonable
parameters, all 15 clusters can be identified by the four
algorithms. DBNNN(a) , K-Means(c) and Birch(d) have
good performance and high accuracy in noise recognition,
but DBSCAN treats many boundary points as noise points,
here we set eps = 0.5, minpts = 15.
(a)
(a)
IV. C ONCLUSIONS
A parametric clustering algorithm based on natural
nearest neighbor (3N) is proposed. First, the DBNNN
algorithm adaptive generates a natural eigenvalue NE k
according to the concept of natural nearest neighbor. With
the natural eigenvalue NE k , the density of each observation
point is determined by comparing the number of reverse
nearest neighbors with NE k . Then, the algorithm expands the
density by the way of density expansion. To generate each
cluster, in order to solve the problem of unreasonable
eigenvalues, we propose the concept of similarity between
clusters, which is used to synthesize misdivided clusters.
Finally, we determine the attribution of boundary points by
similarity between points. Experiments show that the
algorithm can perform well in many data sets without
providing parameters artificially, and it outperforms some
classical algorithms in some data sets.
(b)
R EFERENCES
[1]
[2]
(c)
(d)
[3]
Figure 3. approaches on R15 set
Figure4 shows the clustering results of each approaches
on data ‘D31’. The parameters of DBSCAN in this data set
are very difficult to set, many parameters we were tried and
all clusters could not be found, on the way, we set eps = 0.5,
minpts = 4, it can be seen that many clusters have not been
found by DBSCAN(b), and many boundary points have been
identified as noise points. As for K-means(c) and Birch(d),
we tell it the correct number of clusters, These two
algorithms can recognize all clusters on such regular shape
and clear boundary data sets, but still a cluster is
misclassified in their respective results. DBNNN(a) not only
can all clusters be identified, but also the accuracy is high.
Only a few outlier boundary points are identified as noise
points.
[4]
[5]
[6]
[7]
[8]
[9]
118
Macqueen J. Some Methods for Classification and Analysis of
MultiVariate Observations[C]// Proc of Berkeley Symposium on
Mathematical Statistics & Probability. 1965.
Vinod H D. Integer Programming and the Theory of Grouping[J].
Publications of the American Statistical Association, 1969,
64(326):14.
Dan P, Moore A W. X-means: Extending K-means with Efficient
Estimation of the Number of Clusters[C]// Seventeenth International
Conference on Machine Learning. 2000.
Guha S, Rastogi R, Shim K, et al. CURE : An Efficient Clustering
Algorithm for Large Databases[J]. Information Systems, 1998,
26(1):35-58.
Zhang T, Ramakrishnan R, Livny M. BIRCH: an efficient data
clustering method for var large databases. ACM SIGMOD Rec,
25(2):103-104. 1996
Karypis G . Chameleon : Hierarchical Clustering Using Dynamic
Modeling[J]. IEEE Computer, 1999, 32.
Balcan, Maria-Florina, Liang Y, Gupta P. Robust Hierarchical
Clustering[J]. Eprint Arxiv, 2014.
Ester M, Kriegel, Hans-Peter, Sander J, et al. A density-based
algorithm for discovering clusters a density-based algorithm for
discovering clusters in large spatial databases with noise[C]//
International Conference on Knowledge Discovery & Data Mining.
1996.
Lian D, Xu L, Feng G, et al. A local-density based spatial clustering
algorithm with noise[J]. Information Systems, 2007, 32(7):978-986.