International Core Journal of Engineering 2020-26 | Page 138
In algorithm 2.1 ik represent NO.k nearest neighbor of
point i , RNN(i) is a subset of data set, it represents all points
that regard point i as their neighbors, commonly known as
the reverse neighbors of point i. Flag = 1 by default, when
each observation point in the data set has at least one inverse
neighbor, Flag is set to 0, then the algorithm terminates
operation. algorithm 2.1 return NE k , NE k NN, RNN
finally.
number of reverse nearest neighbors is larger than the natural
eigenvalues as the core points. We also propose the method
of similarity between clusters. To correct the
misclassification of clusters caused by inappropriate
eigenvalues. Finally, the outlier points are judged as
boundary points or noise points by the similarity of points.
The rest of this paper is organized as follows: In Section
2,we introduce how DBNNN works. Experimental results
and performance analysis are presented in Section 3. Finally
some conclusion are included in Section 4.
B. Expand Cluster
After automatically generating eigenvalues, this
algorithm uses density-based method to distinguish different
clusters. The number of reverse neighbors of observation
points is used as the density of observation points. The more
the number of reverse neighbors, the greater the density of
observation points. We regard the observation points whose
number of reverse nearest neighbors is larger than the
eigenvalue as the core observation points, while those whose
number is smaller than the eigenvalue as the noise points for
the time being.
II. A LGORITHM
A. Natural Nearest Neighbor
Natural nearest neighbor is proposed by Dr.Zou Xianlin
et al, a branch of nearest neighbor algorithm. In Nearest
Neighbor Based Algorithms, K-nearest neighbor algorithm is
widely used in machine learning and data mining. K-nearest
neighbor algorithm needs to artificially give a parameter K
selected according to experience, which represents k data
points with the closest distance or the highest similarity for
each data point. In the natural nearest neighbor algorithm, it
can be understood that there is such a K value named natural
eigenvalue,we express it as NE k , but it is self-adaptive and
does not need to be given artificially. The specific process is
as follows: first, we take NE k = 1, then traverse the data set,
get the first nearest neighbor of each data point and the
current reverse nearest neighbor of each point, and then NE k
= NE k + 1, the k-nearest neighbor and the reverse-nearest
neighbor of each point are supplemented, and the above steps
are repeated until each point has at least one reverse-nearest
neighbor. When traversing to the i-th data point, we express
the k-nearest neighbor of data[i] as NE k NN (i), and RNN(i)
Random traversal starts from a data point x. If x is a core
point and the number of reverse neighbors of x is greater
than the number of neighbor eigenvalues NE k , it is
allocated to one of the clusters, and the core points of its
neighbors are placed in a queue, at the same time, they are
allocated to the same cluster as x. Before queues are listed as
empty sets, we delete the first point in the queue in turn. If it
is also the core point, we still assign their NE k -nearest
neighbors to the same cluster. Here is a brief description of
the algorithm of expand clusters:
Algorithm 2.2 expand cluster
Input D,
indicate the reverse neighbors of data[i] . Finally NE k stays
at one value. At this time, we can find that the denser the
area, the moref reverse neighbors each point has, and
conversely, the fewer reverse neighbors. the most outlying
point will only have one point as reverse neighbor.
Simply describing
algorithm is:
1)
2)
3)
4)
5)
6)
7)
8)
9)
NE k Ȼ NE k NN and RNN by
Algorithm 2.1
input DATASET
1)
2)
3)
4)
5)
6)
7)
8)
9)
10)
11)
12)
13)
14)
15)
k = 1
Flag = 1
RNN =
While(Flag):
For i in data:
ik = k_nearest(i, NE k )
RNN(ik) = RNN(ik) + i
NE k NN = NE k NN + ik
If({RNN(i)| i ( data [ 1 ] ~ data [ N ]) !=
Flag = 0
NE k = k
Else:
Flag = 1
k = k+1
Return NE k , NE k NN, RNN
10)
11)
12)
13)
14)
15)
16)
17)
}):
NE k
assign[ x D ] = 0
cluster = 1
for x in D:
If count(RNN(x) < NE k ):
assign[x] = 0
Else:
assign[x] = cluster
assign[queue] = cluster
queue = NE k NN(x)
While(queue z ):
y = queue.popleft()
If count(RNN(x) <
NE k ):
For i in NE k NN(y):
If (assign[i] == 0):
assign[i] = cluster
cluster = cluster + 1
Return assign,cluster
C. Similarity of clusters
Because the generation of eigenvalues depends heavily
on the degree of data set regularity: if the data set has more
outliers, the generated eigenvalues will be particularly large.
If the data is very neat and the density is very uniform, the
eigenvalues will be very small. In density-based clustering, if
the eigenvalues are too large, many different clusters may be
allocated together. If the eigenvalues are too small, one
cluster will be divided into several clusters. In this chapter,
116