International Core Journal of Engineering 2020-26 | Page 84
澳
transformation, random cropping, color transformation, and
Gaussian blur on the logo exemplar. It should be noted that
each transformation is independent and random. The
mathematical description is given below by taking an affine
transformation as an example. Since the convolutional neural
network itself has translation invariance, this paper does not
perform translation transformation for the logo exemplar, so
the dimension of the affine transformation is reduced from
3D to 2D. The specific mathematical form of the affine
transformation for the logo exemplar I on the 2D plane is
as follows:
B. Background Image Selection
In the real scene, an object is generally not possible to
exist alone. It tends to be inextricably linked to the
environment and other objects around it. This is commonly
referred to as Context [18]. Various types of context have
theoretically proved to play a very important role in the field
of computer vision and image processing, which can
improve the accuracy and speed of detection and recognition
algorithms [19, 20]. As the background image carrying the
global context of the synthetic logo image, [12] and [16]
only consider the diversity of the context in its selection, just
use the 6000 non-logo images in the FlickrLogos-32 as the
background image. Such a simple processing like this will
inevitably lead to a lot of unreal context information in the
synthetic logo image, which will affect the generalization
ability of the acquired deep model in the real scene. As
shown in Fig. 4, these logos appear very bluntly in unrelated
scenes. Although this does not prevent humans from
recognizing, synthetic images with completely inconsistent
context information during actual training are likely to exist
as noise data.
Fig. 5. Algorithm flow for background image selection.
I * R T PI
R T
Fig. 4. Examples of unreal synthetic logo image.
ª cos T
« sin T
¬
where the matrix
Therefore, in terms of background image selection, this
paper is working to alleviate the impact of the inconsistency
of the context information caused by the weak semantic
correlation between the logo exemplar and the background
image on the performance of the algorithm. Specifically, this
paper first crawls 300 scene images related to each type of
logo on the Google Image Search website, and then
combined with the popular CNN-based scene classification
model Place365-VGG [21]. The scene images are sorted in
batches, and then five Top-1 scenes with the highest
frequency of occurrence of each type of logo are counted as
the background source of the synthetic image. What needs to
be explained is that Places365-VGG is an open source CNN
scene classification model for the subsets of the large-scale
scene image database Places2. Its network structure is
VGG-16, which has the current highest Top-1 classification
accuracy on the validation set and test set of Places365. Fig.
5 shows the algorithm flow of background image selection
with Starbucks as an example.
with the angles
T
sin T º
P
cos T » ¼
ª a c º
« d b »
¬
¼
(1)
R T defines a rotation transformation
randomly chosen from a range of
[ 180 q ,180 q ] , while the probability of random
transformation is controlled below 0.1, because logos usually
rotates less in the actual scene [12]. The matrix P mixes
the definition of scaling and shearing transformation. For
scaling transformation this paper calculates the size
distribution of the overall object of the Flickrlogos-32
dataset (Fig. 6) and control the long edge change of the logo
exemplar to be a random number between 40px and 250px,
and the short side to scale proportionally; The parameter of
the shearing transformation is selected as a random number
between [0, 0.2].
C. Logo Exemplar Transformation
Traditional image data augmentation methods have been
proven to effectively enrich the train set and improve the
robustness and generalization ability of the detection and
recognition model [22]. Due to the large- scale variation of
the logo presented in the actual natural scene [9], and the
different shooting angles may cause problems such as
rotation, distortion, deformation and partial occlusion of the
logo [23]. In addition, the imaging device has different
resolution and illumination conditions. Therefore, in order to
fit the actual scene as much as possible and enrich the
diversity of the logo in the synthetic image, this paper has
tried a series of enhanced transformations such as affine
Fig. 6. Flickrlogos-32 object size distribution.
D. Logo Image Synthesis
In view of the planarity of the logo object, this paper
overlays the randomly transformed logo exemplar on the
62