International Core Journal of Engineering 2020-26 | Page 154
The location of Darknet-53 network can be seen in the
structure diagram. The structure of Darknet-53 network is
shown in Figure 7.
the predicted candidate box,
is the predicted category.
III. E XPERIMENTAL R ESULTS A ND A NALYSIS
The characteristic of this network is that it uses
continuous convolution layers of 3 *3 and 1 *1 instead of
pooling layer and full connection layer. In the process of
computation, the network completes the tensor size
transformation by changing the step size of the convolution
core. In Figure 3.14, there are five RESN layers. Each RESN
layer has a convolution layer with a step size of (2,2). This
shows that every convolution layer passes through, the edge
length of the image is reduced to half of the original one, that
is, the area is reduced to one fourth of the original one. After
five layers of convolution, the feature map will be reduced to
1/32 of the original input size, that is, input 416*416, and
output 13*13. So take as an example, because data flows
through all darknet-53 networks, the output is 13*13*n, and
is 26*26*n,
is 52*25*n. In fact, in order to
so on
improve the detection speed of the network, the number of
Darknet layers can be reduced, but darknet-53 has been used
to detect 30 frames per second, so darknet-53 is used as
feature extraction network.
In this paper, bilinear difference and Gauss low-pass filter
are used as degradation functions. Different image quality
images are obtained by different degree of difference
degradation. The training data resolution is 256*256 pixels,
and the resolution is reduced to 181*181, 128*128 and 86*86,
respectively, so that the algorithm can accurately detect ships
in many situations. Transfer learning is used to train the
weights in the original model. The CPU of the computer used
in this paper is Intel Core i7 7700, with 4 cores and 8 true
threads, core speed over 4 GHz, dual channel 16G DDR4-
2400 memory, and graphics card is Nvidia
GTX1070Ti.Because only the dimension of the last layer has
changed, the model used in this paper transfers the learning
method to train the model, that is, freezing all the layers
before the last layer, only changing the weights of the last
layer. Because of the limitation of memory, the batch size
parameter is set to 8. Adam is used as the optimizer to train
the network. The initial learning rate weight alpha is 1e-4, the
exponential attenuation rate of first-order moment estimation
is 0.9, and the exponential attenuation rate of second-order
moment estimation is 0.999.
The YOLO V3 model outputs three feature maps of
different scales, namely , and in Figure 3.13. In the
original algorithm, each grid cell predicts three boxes, and
each box has five basic parameters, namely, location
information: ( , ), length and width of the selected box:
( , ℎ), confidence of recognition, and then class probability.
Because this paper only deals with ship images, the class of
this algorithm is 0. The dimension of can be obtained by
using the following formula.
=
∗
+
+
+
+
= 3 ∗ (5 + 0) = 15
So the depth of
The loss curve is as follows:
+
in Figure 6 is 15.
In the loss function of YOLO V3 model, we need to pay
attention to six information, namely ( , , , ℎ,
)
mentioned above and the category of objects:
. In the
super-resolution task of naval vessels, only one kind of target
is identified, so in order to simplify the algorithm and
improve the operation speed, this algorithm only uses the first
five terms as the content of the formula function, so the
following formula is obtained:
∑ .
=
∗ .
∑ .
ℎ
∗ .
∑ .
∑ .
∗ .
1
1
∑ .
]+
∑ .
is the actual category, and
∑ .
1
[(
−
[
∗ .
) +(
−
∑ .
−
− ) ] +
+ ℎ −
Figure 8 Loss Value Curve
In order to test the effectiveness of training, 200 ship
images with 256*256 pixels resolution are used as test data
to test whether the original YOLO and the migrated YOLO
can detect the ships. Because the algorithm only marks the
position of the ship, this paper uses the True Positive Rate
(TPR) as the detection standard to test the ship images in
different scenarios. The TPR formula is as follows:
(2)
TPR =
1
−
+
The results are as follows:
(1)
T ABLE 1 T EST RESULTS OF YOLO IN DIFFERENT DEGRADED IMAGES
is the weight of position error,
is the
confidence weight of candidate box without object,
is
the confidence weight of candidate box with object,
1 judges whether the j candidate box in the first grid
algorithm
Original YOLO
judges whether the center of object
contains object, 1
falls in i, , are the actual coordinates. Value, , are the
predicted coordinate values, , ℎ are the width and height
of the actual candidate box, , ℎ are the width and height of
Retraining YOLO
132
Resolving
power
256*256
181*181
128*128
86*86
256*256
181*181
128*128
86*86
Detection rate
100%
95.5%
92.5%
85.5%
99%
97%
97.5%
97%