International Core Journal of Engineering 2020-26 | Page 90
value of p is very small, so the modulation function is
infinitely close to 1, which is basically unchanged from the
original loss. For samples that are easy to classify and
classified correctly, p is close to 1, and the modulation factor
is infinitely close to zero, contributing substantially to the total
loss. When γ is 0, focal loss becomes the traditional cross
entropy, γ increases, and the modulation coefficient increases
accordingly. Focal loss can easily determine the influence of
difficult to classify and easily classified samples on total loss.
become convolutional layers and four convolutional layers are
added. In this paper, the weighting parameters of the VGG-16
are pre-trained in the iLSVRC database, and the SSD-300
model is used to make the size of the image input model
300*300. The principle of the SSD model extracts the various
convolutional layers of the VGG architecture to obtain
different types of feature maps. Here, six layers of feature
maps of conv4-3, conv_7, conv6_2, conv7_2, conv8_2, and
conv9_2 are selected, and the size of the feature maps are
respectively (38, 38). ), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1),
form a feature pyramid. For different convolutional layers, a
3*3 convolution kernel is used for convolution. Different
feature maps will generate a series of default boxes. Each
default box generates 7 categories of confidence for
classification, and each default box is generated. The
coordinate values (x, y, w, h) are used as regression, and the
number of default boxes generated by each layer of the feature
map
is
38*38*4+19*19*6+10*10*6+5*5*6+3*
3*4+1*1*4=8732. Finally, the results obtained in the previous
are combined and passed to the loss value.
3) FSSD-based road marking detection
The FSSD model is initialized using the weight parameters
of the pre-trained VGG-16 in the iLSVRC database, trained
using the FSSD-300 model. Different from the SSD model,
the FSSD adopts the feature fusion method to extract features,
and fuses the features of the conv4_3, fc_7 layers of the VGG
model and the newly added conv6_2 and conv7_2. Due to the
different size of each layer's feature map, the feature size of
the FSSD with a size of 38*38 is the standard size, and the
layer with the larger feature size is downsampled by the
maximum pool to change it to the standard size. For small size
feature maps, the method of bilinear interpolation becomes
standard size. Finally, the 1*1 convolution kernel is used to
fuse the features of each layer, and after normalization, the
feature pyramid is formed for target detection.
The loss value is calculated as shown in Equation 2, where
L_conf (x, c) represents the loss value of the target
classification, and L_loc (x, l, g) represents the loss value of
the target location. In the process of calculating loss,
according to the IoU method, the overlap between the prior
box and the ground truth is calculated, and the candidate box
larger than 0.5 is taken as the positive sample, and the
candidate frame smaller than 0.5 is used as the negative
sample, and the default box is generated in the default box.
The sample is the majority. In order to ensure the correctness
of the model training, we select the negative samples with
higher confidence in the order from largest to smallest, so that
the ratio of positive and negative samples is 1:3.
L(x, c, l, g) = 1/N
IV. E XPERIMENT
A. database description
1) Existing database description
At present, the public libraries of road signs mainly
include KITTI Dataset [22], Roadmarking database [23],
Baidu apollo database [24]. The KITTI dataset counts theroad
markings as a large category, with no straight-through, left-
turn and right-turn subdivisions.
(x, l, g) . (2)
(x, c) + α
The Roadmarking database is a foreign database, and the
domestic traffic signs are slightly different, and the road
conditions are relatively simple. Larger scale, Baidu apollo
database road conditions are more complicated, but the
ground truth data of road marking is not yet public. On the
whole, there is no road marking database based on the large-
scale and complicated road conditions for the road marking in
China. Therefore, it is necessary to establish a database
suitable for China's road marking national conditions.
2) Improved SSD model based on Focal Loss
The two-stage detection algorithm such as R-FCN has
higher accuracy and slower speed, and the accuracy of SSD
model is not as good as the former one. Tsung-Yi Lin believes
that it is caused by the imbalance of sample categories. The
number of negative samples in the above SSD model training
is too large, and dominates in the total loss, which makes the
optimization direction of the model develop in a direction we
do not want. In R-FCN, we use the OHEM algorithm, but this
algorithm is based on the premise of increasing the weight of
misclassified data, that is, ignoring the sample data that is easy
to classify. Using a new loss loss obtained by modifying the
standard cross entropy can reduce the weight of easily
categorized samples, and make the model more focused on
samples that are difficult to classify. The traditional cross
entropy loss is shown in Equation 3, where p is the probability.
CE(p, y) =
− log( )
− log(1 − )
=1
ℎ
.
2) Self-builtroad marking dataset
The data set of this paper is mainly divided into three parts:
Tencent Street View, Baidu Apollo Database and real-time
collection of Beijing and Huashan road information. Five
kinds of data expansion methods such as rotation change are
adopted. The rotation image is used to simulate the actual
detection of different angles in the scene. The brightness
change is used to simulate the actual detection of the
illumination changes in the scene. The scale transformation is
used to simulate the actual detection of scene markers at
different distances. In this case, by adjusting the contrast to
simulate the actual detection of the road surface marker wear
is not clear, through the horizontal flip to simulate the
situation of different scenes. After the above five methods are
freely combined, the expansion ratio is 1:18, and the expanded
database contains a total of 36,811 images. The types and
quantities of image markers before and after expansion are
shown in Table I.
(3)
The formula of Focal loss is shown as 4, where γ is the
focusing parameter, γ ≥ 0, and (1-p) γ is called the modulation
coefficient.
FL(p) = −(1 − ) log ( ).
(4)
When the sample is misclassified during training, the
68