International Core Journal of Engineering 2020-26 | Page 84

澳 transformation, random cropping, color transformation, and Gaussian blur on the logo exemplar. It should be noted that each transformation is independent and random. The mathematical description is given below by taking an affine transformation as an example. Since the convolutional neural network itself has translation invariance, this paper does not perform translation transformation for the logo exemplar, so the dimension of the affine transformation is reduced from 3D to 2D. The specific mathematical form of the affine transformation for the logo exemplar I on the 2D plane is as follows: B. Background Image Selection In the real scene, an object is generally not possible to exist alone. It tends to be inextricably linked to the environment and other objects around it. This is commonly referred to as Context [18]. Various types of context have theoretically proved to play a very important role in the field of computer vision and image processing, which can improve the accuracy and speed of detection and recognition algorithms [19, 20]. As the background image carrying the global context of the synthetic logo image, [12] and [16] only consider the diversity of the context in its selection, just use the 6000 non-logo images in the FlickrLogos-32 as the background image. Such a simple processing like this will inevitably lead to a lot of unreal context information in the synthetic logo image, which will affect the generalization ability of the acquired deep model in the real scene. As shown in Fig. 4, these logos appear very bluntly in unrelated scenes. Although this does not prevent humans from recognizing, synthetic images with completely inconsistent context information during actual training are likely to exist as noise data. Fig. 5. Algorithm flow for background image selection. I * R T PI R T Fig. 4. Examples of unreal synthetic logo image. ª cos T «  sin T ¬ where the matrix Therefore, in terms of background image selection, this paper is working to alleviate the impact of the inconsistency of the context information caused by the weak semantic correlation between the logo exemplar and the background image on the performance of the algorithm. Specifically, this paper first crawls 300 scene images related to each type of logo on the Google Image Search website, and then combined with the popular CNN-based scene classification model Place365-VGG [21]. The scene images are sorted in batches, and then five Top-1 scenes with the highest frequency of occurrence of each type of logo are counted as the background source of the synthetic image. What needs to be explained is that Places365-VGG is an open source CNN scene classification model for the subsets of the large-scale scene image database Places2. Its network structure is VGG-16, which has the current highest Top-1 classification accuracy on the validation set and test set of Places365. Fig. 5 shows the algorithm flow of background image selection with Starbucks as an example. with the angles T sin T º P cos T » ¼ ª a c º « d b » ¬ ¼ (1) R T defines a rotation transformation randomly chosen from a range of [  180 q ,180 q ] , while the probability of random transformation is controlled below 0.1, because logos usually rotates less in the actual scene [12]. The matrix P mixes the definition of scaling and shearing transformation. For scaling transformation this paper calculates the size distribution of the overall object of the Flickrlogos-32 dataset (Fig. 6) and control the long edge change of the logo exemplar to be a random number between 40px and 250px, and the short side to scale proportionally; The parameter of the shearing transformation is selected as a random number between [0, 0.2]. C. Logo Exemplar Transformation Traditional image data augmentation methods have been proven to effectively enrich the train set and improve the robustness and generalization ability of the detection and recognition model [22]. Due to the large- scale variation of the logo presented in the actual natural scene [9], and the different shooting angles may cause problems such as rotation, distortion, deformation and partial occlusion of the logo [23]. In addition, the imaging device has different resolution and illumination conditions. Therefore, in order to fit the actual scene as much as possible and enrich the diversity of the logo in the synthetic image, this paper has tried a series of enhanced transformations such as affine Fig. 6. Flickrlogos-32 object size distribution. D. Logo Image Synthesis In view of the planarity of the logo object, this paper overlays the randomly transformed logo exemplar on the 62