International Core Journal of Engineering 2020-26 | Page 90

value of p is very small, so the modulation function is infinitely close to 1, which is basically unchanged from the original loss. For samples that are easy to classify and classified correctly, p is close to 1, and the modulation factor is infinitely close to zero, contributing substantially to the total loss. When γ is 0, focal loss becomes the traditional cross entropy, γ increases, and the modulation coefficient increases accordingly. Focal loss can easily determine the influence of difficult to classify and easily classified samples on total loss. become convolutional layers and four convolutional layers are added. In this paper, the weighting parameters of the VGG-16 are pre-trained in the iLSVRC database, and the SSD-300 model is used to make the size of the image input model 300*300. The principle of the SSD model extracts the various convolutional layers of the VGG architecture to obtain different types of feature maps. Here, six layers of feature maps of conv4-3, conv_7, conv6_2, conv7_2, conv8_2, and conv9_2 are selected, and the size of the feature maps are respectively (38, 38). ), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1), form a feature pyramid. For different convolutional layers, a 3*3 convolution kernel is used for convolution. Different feature maps will generate a series of default boxes. Each default box generates 7 categories of confidence for classification, and each default box is generated. The coordinate values (x, y, w, h) are used as regression, and the number of default boxes generated by each layer of the feature map is 38*38*4+19*19*6+10*10*6+5*5*6+3* 3*4+1*1*4=8732. Finally, the results obtained in the previous are combined and passed to the loss value. 3) FSSD-based road marking detection The FSSD model is initialized using the weight parameters of the pre-trained VGG-16 in the iLSVRC database, trained using the FSSD-300 model. Different from the SSD model, the FSSD adopts the feature fusion method to extract features, and fuses the features of the conv4_3, fc_7 layers of the VGG model and the newly added conv6_2 and conv7_2. Due to the different size of each layer's feature map, the feature size of the FSSD with a size of 38*38 is the standard size, and the layer with the larger feature size is downsampled by the maximum pool to change it to the standard size. For small size feature maps, the method of bilinear interpolation becomes standard size. Finally, the 1*1 convolution kernel is used to fuse the features of each layer, and after normalization, the feature pyramid is formed for target detection. The loss value is calculated as shown in Equation 2, where L_conf (x, c) represents the loss value of the target classification, and L_loc (x, l, g) represents the loss value of the target location. In the process of calculating loss, according to the IoU method, the overlap between the prior box and the ground truth is calculated, and the candidate box larger than 0.5 is taken as the positive sample, and the candidate frame smaller than 0.5 is used as the negative sample, and the default box is generated in the default box. The sample is the majority. In order to ensure the correctness of the model training, we select the negative samples with higher confidence in the order from largest to smallest, so that the ratio of positive and negative samples is 1:3. L(x, c, l, g) = 1/N IV. E XPERIMENT A. database description 1) Existing database description At present, the public libraries of road signs mainly include KITTI Dataset [22], Roadmarking database [23], Baidu apollo database [24]. The KITTI dataset counts theroad markings as a large category, with no straight-through, left- turn and right-turn subdivisions. (x, l, g) . (2) (x, c) + α The Roadmarking database is a foreign database, and the domestic traffic signs are slightly different, and the road conditions are relatively simple. Larger scale, Baidu apollo database road conditions are more complicated, but the ground truth data of road marking is not yet public. On the whole, there is no road marking database based on the large- scale and complicated road conditions for the road marking in China. Therefore, it is necessary to establish a database suitable for China's road marking national conditions. 2) Improved SSD model based on Focal Loss The two-stage detection algorithm such as R-FCN has higher accuracy and slower speed, and the accuracy of SSD model is not as good as the former one. Tsung-Yi Lin believes that it is caused by the imbalance of sample categories. The number of negative samples in the above SSD model training is too large, and dominates in the total loss, which makes the optimization direction of the model develop in a direction we do not want. In R-FCN, we use the OHEM algorithm, but this algorithm is based on the premise of increasing the weight of misclassified data, that is, ignoring the sample data that is easy to classify. Using a new loss loss obtained by modifying the standard cross entropy can reduce the weight of easily categorized samples, and make the model more focused on samples that are difficult to classify. The traditional cross entropy loss is shown in Equation 3, where p is the probability. CE(p, y) = − log( ) − log(1 − ) =1 ℎ . 2) Self-builtroad marking dataset The data set of this paper is mainly divided into three parts: Tencent Street View, Baidu Apollo Database and real-time collection of Beijing and Huashan road information. Five kinds of data expansion methods such as rotation change are adopted. The rotation image is used to simulate the actual detection of different angles in the scene. The brightness change is used to simulate the actual detection of the illumination changes in the scene. The scale transformation is used to simulate the actual detection of scene markers at different distances. In this case, by adjusting the contrast to simulate the actual detection of the road surface marker wear is not clear, through the horizontal flip to simulate the situation of different scenes. After the above five methods are freely combined, the expansion ratio is 1:18, and the expanded database contains a total of 36,811 images. The types and quantities of image markers before and after expansion are shown in Table I. (3) The formula of Focal loss is shown as 4, where γ is the focusing parameter, γ ≥ 0, and (1-p) γ is called the modulation coefficient. FL(p) = −(1 − ) log ( ). (4) When the sample is misclassified during training, the 68