International Core Journal of Engineering 2020-26 | Page 82
2019 International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM)
澳
Context-based Synthetic Data for Logo Recognition
Yuchao Jiang Chao Gao Lixin Ji Yancheng Wu
National Digital Switching
System Engineering &
Technological R&D
Center
Zhengzhou, China
[email protected] National Digital Switching
System Engineering &
Technological R&D
Center
Zhengzhou, China
[email protected] National Digital Switching
System Engineering &
Technological R&D
Center
Zhengzhou, China
[email protected] National Digital Switching
System Engineering &
Technological R&D
Center
Zhengzhou, China
[email protected]
labeled massive data. For example, Gupta et al. [13] and
Jaderberg et al. [14] trained text recognition models by
artificially synthesizing natural scene text data. Georgakis et
al. [15] guided synthesis by segmenting possible support
planes such as table and counter at the semantic level. The
placement of objects in the image assists in the detection of
objects in indoor scenes. Eggert et al. [16] used synthetic
data to train SVM classifiers for company logo detection.
Based on [16], Su et al. [12] did the first attempt to train
deep models using large-scale synthetic logo images by
considering the diversity of contexts, which greatly
improved the robustness of the detection model to complex
backgrounds. On the defect side, the method of synthesizing
logo images used in [12, 16] is a little simple, the fitting of
the synthetic image to the real scene image is insufficient,
which completes the synthesis of the logo image only by
randomly placing the transformed logo template in any
location of scene image. This way will lead to insufficient
contextual authenticity of the synthetic logo image, causing
the model to learn too much detail of the synthetic image,
and cannot be generalized well to the real scene image. Fig.
1 shows the examples of synthetic image in [12, 16].
Abstract—In order to solve the problem of sparse training
samples in logo recognition task, a multi-type context-based
logo data synthesis algorithm is proposed. The algorithm
comprehensively utilizes the local and full context of the logo
object and the scene image to guide the synthesis of the logo
image. The experimental results on the FlickrLogos-32 show
that the proposed algorithm can greatly improve the
performance of the logo recognition algorithm without relying
on additional manual annotation, verify the validity of the
synthesis algorithm, and further prove that multi-type context
can improve the performance of the object recognition
algorithm.
Keywords—Logo recognition, context, data synthesis, deep
learning
I. I NTRODUCTION
Logo recognition is a challenging task in computer vision.
It has a wide range of applications in many fields, such as
sensitive video recognition [1], trademark identification and
property protection [2], intelligent traffic [3]. For the
recognition of general objects, the deep learning method has
achieved great success [4~6]. In general, the construction of
a deep neural network for object recognition requires a large
number of manually labeled training data. However, the
public dataset that can be obtained in the logo recognition
task is very small. The existing logo dataset is shown in
Table I. It is clear that such a small amount of training data is
far from enough for learning a deep model with millions of
parameters. Expanding the dataset by adding manual
annotation is a straightforward and simple solution to this
problem, but expensive labeling costs and large amount of
time consumption are sometimes insufferable, and compared
to the general objects, it is difficult to obtain real scene
images containing logo in many cases.
T ABLE I. E XISTING LOGO DETECTION DATASETS . PA: P UBLIC
A VAILABILITY .
Dataset
BelgaLogos[7]
FlickrLogos-27[8]
FlickrLogos-32[9]
LOGO-NET[10]
Logos-32plus[11]
TopLogo10[12]
Logo #
37
27
32
160
32
10
Object #
2695
4671
3404
130608
12302
863
Image #
1951
1080
2240
73414
7830
700
PA
Yes
Yes
Yes
No
Yes
Yes
Synthetic data generation refers to the method of
automatically generating synthetic data that approximates
real data without relying on manual labeling. When there is
no sufficient training data available for training large deep
networks, the method is an effective alternative to manually
978-1-7281-4691-1/19/$31.00 ©2019 IEEE
DOI 10.1109/AIAM48774.2019.00019
Fig. 1. Examples of synthetic image in [12, 16].
In order to solve the problem of the insufficiency of
annotation data in the logo recognition task under the deep
60