International Core Journal of Engineering 2020-26 | Page 89
feature map to reduce the feature map dimension to 1024. The
feature map is sent. In the RPN, N candidate regions are
generated. At the same time, the feature map after
dimensionality reduction generates a position-sensitive score
graph with channel number channel=(C+1)*k^2 through a
special convolution, where C is the number of signature
categories +1, K^2 The size of the scores map that is finally
obtained for each category. Finally, the ROI Pooling operation
is performed by combining the candidate region and the score
map generated by the RPN network to obtain a vector of K^2
size. Here, we use the average pooling method to vote on the
obtained vector to obtain the confidence of each category.
Finally, the flag type corresponding to the highest reliability
value is taken as the detection type.
in the stage of image processing and shallow neural network,
and the accuracy and performance are low. Anti-interference,
real-time, universality and robustness are the criteria for
measuring detection algorithms. It is necessary to conduct in-
depth research onroad marking algorithms based on deep
learning.
At present, the target detection algorithm based on deep
learning is mainly divided into a single-stage detection
algorithm and a two-stage detection algorithm. In the section
III, We will give the detail introduction about implement two-
stage and one-stage object detection methods to detect road
marking. Comparison and anlalysis of experimental results
are given in section IV. Conclusion is drawn in Section V.
III. E XPERIMENTAL F RAMEWORK
During the training process, when the parameters are
updated by backpropagation, the cross-entropy of the loss
value needs to be calculated. Similar to the Faster RCNN, the
general target detection is divided into two parts: positioning
and classification. The R-FCN loss function is also Divided
into classification and positioning. Which represents the
cross-entropy loss of the classification, representing the
regression Loss of the target location. We define the cross-
entropy as:
A. Two-stage road marking detection
In 2014, Ross Girshick proposed the RCNN model [2],
which applied the convolutional neural network to the target
detection for the first time and achieved great results. The
structure of the algorithm also became the classic model of
target detection. Later, the Fast RCNN Faster RCNN R-FCN
appeared. These are both two-stage detection algorithms. On
account of the R-FCN algorithm position-sensitive distributed
convolutional network replacing the fully connected network
after the ROI pooling layer, the feature sharing of the whole
network is realized, which effectively solves the contradiction
between the translation invariance of the object classification
and the translational change of the object detection. The
previous Fast RCNN and Faster RCNN took a lot of time.
Therefore, R-FCN is used as one of the models for road
marker detection. By reason of ResNet18 performance is
similar to ResNet50,while the speed is faster than ResNet18,
the 18-layer residual network is applied for training.
L(s,
, y, w, h) =
(
∗) + [
∗
> 0]
( , ∗ ).(1)
In the R-FCN model training process, the RPN and R-FCN
network alternate training methods are used to share the
features. The two models are alternately trained twice to
obtain the final model. The four-stage training method is
adopted: RPN training-R-FCN training-RPN training-R -
FCN training. The OHEM algorithm is also used in the
training forward training. The obtained ROIs are arranged
according to the cross entropy loss values in descending order,
and finally only the B ROIs with the largest loss value are
returned. The visualization results are shown in Fig. 1.
In the model training, the input image is first extracted by
the backbone network ResNet to obtain a feature map with a
dimension of 208. A 1024 convolution layer is added to the
Fig. 1. Visual test result.
method of multi-level convolution feature extraction in SSD
algorithm makes the feature extraction of small targets poor,
and the detection effect is not good. Considering that in the
actual road marker detection scene, due to the small target
appearing in the distance, in 2018, Zuoxin Li et al. proposed
an improved model FSSD for insufficient SSD feature
extraction [17], which improved the accuracy of SSD and still
maintained Higher speed, so this paper also uses FSSD as one
of the models for road identifier detection. Single-stage
detection often has the problem of unbalanced sample
categories. In 2017, Tsung-Yi Lin et al. presented Focal Loss
[16], which can improve the accuracy of the model. Therefore,
this paper will also modify the SSD of Focal Loss as one of
the models.
B. One-stage road marking detection
The two-stage detection network such as RCNN has an
RPN structure. The detection accuracy is improved and the
algorithm is slow, which cannot meet the real-time
requirements of some scenarios. In response to this problem,
some scholars have proposed a single-stage detection
algorithm based on regression, such as YOLO SSD, which
can ensure both the accuracy and speed. In 2015, Joseph
Redmon proposed an end-to-end YOLO algorithm [7] with a
speed of 45 frames per second, but there is a certain error in
recall rate and positioning accuracy. In 2016, Wei Liu et al
pinpointed that the SSD algorithm solves the problem of
positioning accuracy of the YOLO algorithm [6], and
combines the regression idea of YOLO with the anchor
mechanism of Faster RCNN to improve the accuracy under
the premise of YOLO. Therefore, this paper uses SSD
algorithm as one of the models for road marker detection. The
1) SSD-basedroad marking detection
SSD uses VGG-16 as the main network architecture. In
fact, based on VGG-16, the last two fully connected layers
67