International Core Journal of Engineering 2020-26 | Page 152
2019 International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM)
Research on ROI algorithm of ship image based on
improved YOLO
Li Tianwei, Zhang Kun, Li Wei, Huang Qian *
Dalian Naval Academy
Navigation Department
Dalian, China
[email protected]
Correspondence author: Huang Qian *
Abstract—In this paper, a region of interest (ROI) extraction
algorithm based on YOLO algorithm is proposed. The
algorithm optimizes the output tensor dimension of YOLO
model, generates different image quality images of naval vessels
by using image degradation function, and retrains the network
by means of migration learning, which enhances the accuracy
and detection rate of the algorithm. Compared with the original
algorithm, the detection rate of the algorithm is improved by
4.25% on average, which proves the effectiveness of the
algorithm.
Figure 1 Typical images of ships at sea
As shown in Figure 2, the proportion of pixels in the whole
image of long-distance ship image is relatively small. Most of
the pixels are in areas such as sea wave and sky that users do
not care about. The proportion of effective pixels is 0.878%,
1.56% and 14.06%, respectively.
Keywords—YOLO;Transfer learning; ROI; patchGAN;
I. I NTRODUCTION
In machine vision and image processing, region of interest
(ROI) is defined as the region of interest (ROI) which needs
to be processed from the processed image in the form of box,
circle, ellipse and irregular polygon. Operators and functions
are often used in machine vision software such as Halcon,
OpenCV, and MATLAB to obtain ROI of region of interest
and to process the image in the next step. For ship images in
sea and air background, it is difficult to select the appropriate
region of interest because of the complexity and variability of
ship images[1].
Figure 2 Schematic diagram of effective pixels in marine ship images
Region of interest (ROI) is a hot and difficult issue in
recent years[2]. In fact, the concept of ROI was first proposed
for JPEG2000 compression. When people observe an image,
they are usually only interested in the content of a specific area
in the image, and hope that this or these areas have a higher
resolution and pay less attention to the rest, as long as they
meet certain visual requirements. To extract image ROI, we
usually solve it from two aspects: one is using image
segmentation technology to extract ROI, the other is starting
from the visual characteristics of human eyes, by simulating
the visual characteristics of human eyes, to find specific visual
sensitive areas, and rank these visual sensitive areas as ROI[3-
4].
Different from the traditional ROI algorithm, Our
algorithm uses deep learning technology[5] and YOLO
algorithm to select the ship area under the background of sea
and air. The algorithm is described below.
II. A LGORITHM D ESCRIPTION
YOLO[6-7], called You only look once, is a "One-stage"
target detection algorithm. In 2016, proposed by Redmon et
al., the algorithm can predict the location and classification
of multiple objects at one time, and truly achieve end-to-end
target detection, and play a fast advantage. With the
introduction of V3 version, the speed and accuracy have been
further improved.YOLO V3 model is complex, which can be
divided into Darknet-53 network for feature extraction and
tensor output network for output.
As shown in Figure 1, the image of naval vessels has the
characteristics of small effective pixel proportion, large image
interference and random location. Under sea and air
conditions, because there is no shelter on the sea surface, the
observation distance is usually more than 20 km. The image
quality of optoelectronic equipment decreases due to the
influence of fog and background light. Under these conditions,
it is difficult to find the ship in time and accurately, and to
display the ship image clearly and quickly to the users.
978-1-7281-4691-1/19/$31.00 ©2019 IEEE
DOI 10.1109/AIAM48774.2019.00033
The candidate box of YOLO V3 model is based on anchor
mechanism, and its schematic diagram is shown in Figure 3.
The actual prediction values of the network
are . According to the four formulas in the figure
above, the coordinates of the center point and the width and
height of of the prediction frame are calculated.
130