International Core Journal of Engineering 2020-26 | Page 82

2019 International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM) 澳 Context-based Synthetic Data for Logo Recognition Yuchao Jiang Chao Gao Lixin Ji Yancheng Wu National Digital Switching System Engineering & Technological R&D Center Zhengzhou, China [email protected] National Digital Switching System Engineering & Technological R&D Center Zhengzhou, China [email protected] National Digital Switching System Engineering & Technological R&D Center Zhengzhou, China [email protected] National Digital Switching System Engineering & Technological R&D Center Zhengzhou, China [email protected] labeled massive data. For example, Gupta et al. [13] and Jaderberg et al. [14] trained text recognition models by artificially synthesizing natural scene text data. Georgakis et al. [15] guided synthesis by segmenting possible support planes such as table and counter at the semantic level. The placement of objects in the image assists in the detection of objects in indoor scenes. Eggert et al. [16] used synthetic data to train SVM classifiers for company logo detection. Based on [16], Su et al. [12] did the first attempt to train deep models using large-scale synthetic logo images by considering the diversity of contexts, which greatly improved the robustness of the detection model to complex backgrounds. On the defect side, the method of synthesizing logo images used in [12, 16] is a little simple, the fitting of the synthetic image to the real scene image is insufficient, which completes the synthesis of the logo image only by randomly placing the transformed logo template in any location of scene image. This way will lead to insufficient contextual authenticity of the synthetic logo image, causing the model to learn too much detail of the synthetic image, and cannot be generalized well to the real scene image. Fig. 1 shows the examples of synthetic image in [12, 16]. Abstract—In order to solve the problem of sparse training samples in logo recognition task, a multi-type context-based logo data synthesis algorithm is proposed. The algorithm comprehensively utilizes the local and full context of the logo object and the scene image to guide the synthesis of the logo image. The experimental results on the FlickrLogos-32 show that the proposed algorithm can greatly improve the performance of the logo recognition algorithm without relying on additional manual annotation, verify the validity of the synthesis algorithm, and further prove that multi-type context can improve the performance of the object recognition algorithm. Keywords—Logo recognition, context, data synthesis, deep learning I. I NTRODUCTION Logo recognition is a challenging task in computer vision. It has a wide range of applications in many fields, such as sensitive video recognition [1], trademark identification and property protection [2], intelligent traffic [3]. For the recognition of general objects, the deep learning method has achieved great success [4~6]. In general, the construction of a deep neural network for object recognition requires a large number of manually labeled training data. However, the public dataset that can be obtained in the logo recognition task is very small. The existing logo dataset is shown in Table I. It is clear that such a small amount of training data is far from enough for learning a deep model with millions of parameters. Expanding the dataset by adding manual annotation is a straightforward and simple solution to this problem, but expensive labeling costs and large amount of time consumption are sometimes insufferable, and compared to the general objects, it is difficult to obtain real scene images containing logo in many cases.  T ABLE I. E XISTING LOGO DETECTION DATASETS . PA: P UBLIC A VAILABILITY . Dataset BelgaLogos[7] FlickrLogos-27[8] FlickrLogos-32[9] LOGO-NET[10] Logos-32plus[11] TopLogo10[12] Logo # 37 27 32 160 32 10 Object # 2695 4671 3404 130608 12302 863 Image # 1951 1080 2240 73414 7830 700 PA Yes Yes Yes No Yes Yes  Synthetic data generation refers to the method of automatically generating synthetic data that approximates real data without relying on manual labeling. When there is no sufficient training data available for training large deep networks, the method is an effective alternative to manually 978-1-7281-4691-1/19/$31.00 ©2019 IEEE DOI 10.1109/AIAM48774.2019.00019 Fig. 1. Examples of synthetic image in [12, 16]. In order to solve the problem of the insufficiency of annotation data in the logo recognition task under the deep 60