International Core Journal of Engineering 2020-26 | Page 140

(a) (b) (c) (d) (b) (c) (d) Figure 3. approaches on D31 set Figure 2. approaches on compound set Figure3 shows the clustering results of each approaches on data ‘R15’, R15 has 15 clusters in total, no noise in the data set, and each cluster is regular shape, so with reasonable parameters, all 15 clusters can be identified by the four algorithms. DBNNN(a) , K-Means(c) and Birch(d) have good performance and high accuracy in noise recognition, but DBSCAN treats many boundary points as noise points, here we set eps = 0.5, minpts = 15. (a) (a) IV. C ONCLUSIONS A parametric clustering algorithm based on natural nearest neighbor (3N) is proposed. First, the DBNNN algorithm adaptive generates a natural eigenvalue NE k according to the concept of natural nearest neighbor. With the natural eigenvalue NE k , the density of each observation point is determined by comparing the number of reverse nearest neighbors with NE k . Then, the algorithm expands the density by the way of density expansion. To generate each cluster, in order to solve the problem of unreasonable eigenvalues, we propose the concept of similarity between clusters, which is used to synthesize misdivided clusters. Finally, we determine the attribution of boundary points by similarity between points. Experiments show that the algorithm can perform well in many data sets without providing parameters artificially, and it outperforms some classical algorithms in some data sets. (b) R EFERENCES [1] [2] (c) (d) [3] Figure 3. approaches on R15 set Figure4 shows the clustering results of each approaches on data ‘D31’. The parameters of DBSCAN in this data set are very difficult to set, many parameters we were tried and all clusters could not be found, on the way, we set eps = 0.5, minpts = 4, it can be seen that many clusters have not been found by DBSCAN(b), and many boundary points have been identified as noise points. As for K-means(c) and Birch(d), we tell it the correct number of clusters, These two algorithms can recognize all clusters on such regular shape and clear boundary data sets, but still a cluster is misclassified in their respective results. DBNNN(a) not only can all clusters be identified, but also the accuracy is high. Only a few outlier boundary points are identified as noise points. [4] [5] [6] [7] [8] [9] 118 Macqueen J. Some Methods for Classification and Analysis of MultiVariate Observations[C]// Proc of Berkeley Symposium on Mathematical Statistics & Probability. 1965. Vinod H D. Integer Programming and the Theory of Grouping[J]. Publications of the American Statistical Association, 1969, 64(326):14. Dan P, Moore A W. X-means: Extending K-means with Efficient Estimation of the Number of Clusters[C]// Seventeenth International Conference on Machine Learning. 2000. Guha S, Rastogi R, Shim K, et al. CURE : An Efficient Clustering Algorithm for Large Databases[J]. Information Systems, 1998, 26(1):35-58. Zhang T, Ramakrishnan R, Livny M. BIRCH: an efficient data clustering method for var large databases. ACM SIGMOD Rec, 25(2):103-104. 1996 Karypis G . Chameleon : Hierarchical Clustering Using Dynamic Modeling[J]. IEEE Computer, 1999, 32. Balcan, Maria-Florina, Liang Y, Gupta P. Robust Hierarchical Clustering[J]. Eprint Arxiv, 2014. Ester M, Kriegel, Hans-Peter, Sander J, et al. A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise[C]// International Conference on Knowledge Discovery & Data Mining. 1996. Lian D, Xu L, Feng G, et al. A local-density based spatial clustering algorithm with noise[J]. Information Systems, 2007, 32(7):978-986.