International Core Journal of Engineering 2020-26 | Page 138

In algorithm 2.1 ik represent NO.k nearest neighbor of point i , RNN(i) is a subset of data set, it represents all points that regard point i as their neighbors, commonly known as the reverse neighbors of point i. Flag = 1 by default, when each observation point in the data set has at least one inverse neighbor, Flag is set to 0, then the algorithm terminates operation. algorithm 2.1 return NE k , NE k NN, RNN finally. number of reverse nearest neighbors is larger than the natural eigenvalues as the core points. We also propose the method of similarity between clusters. To correct the misclassification of clusters caused by inappropriate eigenvalues. Finally, the outlier points are judged as boundary points or noise points by the similarity of points. The rest of this paper is organized as follows: In Section 2,we introduce how DBNNN works. Experimental results and performance analysis are presented in Section 3. Finally some conclusion are included in Section 4. B. Expand Cluster After automatically generating eigenvalues, this algorithm uses density-based method to distinguish different clusters. The number of reverse neighbors of observation points is used as the density of observation points. The more the number of reverse neighbors, the greater the density of observation points. We regard the observation points whose number of reverse nearest neighbors is larger than the eigenvalue as the core observation points, while those whose number is smaller than the eigenvalue as the noise points for the time being. II. A LGORITHM A. Natural Nearest Neighbor Natural nearest neighbor is proposed by Dr.Zou Xianlin et al, a branch of nearest neighbor algorithm. In Nearest Neighbor Based Algorithms, K-nearest neighbor algorithm is widely used in machine learning and data mining. K-nearest neighbor algorithm needs to artificially give a parameter K selected according to experience, which represents k data points with the closest distance or the highest similarity for each data point. In the natural nearest neighbor algorithm, it can be understood that there is such a K value named natural eigenvalue,we express it as NE k , but it is self-adaptive and does not need to be given artificially. The specific process is as follows: first, we take NE k = 1, then traverse the data set, get the first nearest neighbor of each data point and the current reverse nearest neighbor of each point, and then NE k = NE k + 1, the k-nearest neighbor and the reverse-nearest neighbor of each point are supplemented, and the above steps are repeated until each point has at least one reverse-nearest neighbor. When traversing to the i-th data point, we express the k-nearest neighbor of data[i] as NE k NN (i), and RNN(i) Random traversal starts from a data point x. If x is a core point and the number of reverse neighbors of x is greater than the number of neighbor eigenvalues NE k , it is allocated to one of the clusters, and the core points of its neighbors are placed in a queue, at the same time, they are allocated to the same cluster as x. Before queues are listed as empty sets, we delete the first point in the queue in turn. If it is also the core point, we still assign their NE k -nearest neighbors to the same cluster. Here is a brief description of the algorithm of expand clusters: Algorithm 2.2 expand cluster Input D, indicate the reverse neighbors of data[i] . Finally NE k stays at one value. At this time, we can find that the denser the area, the moref reverse neighbors each point has, and conversely, the fewer reverse neighbors. the most outlying point will only have one point as reverse neighbor. Simply describing algorithm is: 1) 2) 3) 4) 5) 6) 7) 8) 9) NE k Ȼ NE k NN and RNN by Algorithm 2.1 input DATASET 1) 2) 3) 4) 5) 6) 7) 8) 9) 10) 11) 12) 13) 14) 15) k = 1 Flag = 1 RNN = ‡ While(Flag): For i in data: ik = k_nearest(i, NE k ) RNN(ik) = RNN(ik) + i NE k NN = NE k NN + ik If({RNN(i)| i  ( data [ 1 ] ~ data [ N ]) != Flag = 0 NE k = k Else: Flag = 1 k = k+1 Return NE k , NE k NN, RNN 10) 11) 12) 13) 14) 15) 16) 17) ‡ }): NE k assign[ x D ] = 0 cluster = 1 for x in D: If count(RNN(x) < NE k ): assign[x] = 0 Else: assign[x] = cluster assign[queue] = cluster queue = NE k NN(x) While(queue z ‡ ): y = queue.popleft() If count(RNN(x) < NE k ): For i in NE k NN(y): If (assign[i] == 0): assign[i] = cluster cluster = cluster + 1 Return assign,cluster C. Similarity of clusters Because the generation of eigenvalues depends heavily on the degree of data set regularity: if the data set has more outliers, the generated eigenvalues will be particularly large. If the data is very neat and the density is very uniform, the eigenvalues will be very small. In density-based clustering, if the eigenvalues are too large, many different clusters may be allocated together. If the eigenvalues are too small, one cluster will be divided into several clusters. In this chapter, 116