International Core Journal of Engineering 2020-26 | Page 152

2019 International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM) Research on ROI algorithm of ship image based on improved YOLO Li Tianwei, Zhang Kun, Li Wei, Huang Qian * Dalian Naval Academy Navigation Department Dalian, China [email protected] Correspondence author: Huang Qian * Abstract—In this paper, a region of interest (ROI) extraction algorithm based on YOLO algorithm is proposed. The algorithm optimizes the output tensor dimension of YOLO model, generates different image quality images of naval vessels by using image degradation function, and retrains the network by means of migration learning, which enhances the accuracy and detection rate of the algorithm. Compared with the original algorithm, the detection rate of the algorithm is improved by 4.25% on average, which proves the effectiveness of the algorithm. Figure 1 Typical images of ships at sea As shown in Figure 2, the proportion of pixels in the whole image of long-distance ship image is relatively small. Most of the pixels are in areas such as sea wave and sky that users do not care about. The proportion of effective pixels is 0.878%, 1.56% and 14.06%, respectively. Keywords—YOLO;Transfer learning; ROI; patchGAN; I. I NTRODUCTION In machine vision and image processing, region of interest (ROI) is defined as the region of interest (ROI) which needs to be processed from the processed image in the form of box, circle, ellipse and irregular polygon. Operators and functions are often used in machine vision software such as Halcon, OpenCV, and MATLAB to obtain ROI of region of interest and to process the image in the next step. For ship images in sea and air background, it is difficult to select the appropriate region of interest because of the complexity and variability of ship images[1]. Figure 2 Schematic diagram of effective pixels in marine ship images Region of interest (ROI) is a hot and difficult issue in recent years[2]. In fact, the concept of ROI was first proposed for JPEG2000 compression. When people observe an image, they are usually only interested in the content of a specific area in the image, and hope that this or these areas have a higher resolution and pay less attention to the rest, as long as they meet certain visual requirements. To extract image ROI, we usually solve it from two aspects: one is using image segmentation technology to extract ROI, the other is starting from the visual characteristics of human eyes, by simulating the visual characteristics of human eyes, to find specific visual sensitive areas, and rank these visual sensitive areas as ROI[3- 4]. Different from the traditional ROI algorithm, Our algorithm uses deep learning technology[5] and YOLO algorithm to select the ship area under the background of sea and air. The algorithm is described below. II. A LGORITHM D ESCRIPTION YOLO[6-7], called You only look once, is a "One-stage" target detection algorithm. In 2016, proposed by Redmon et al., the algorithm can predict the location and classification of multiple objects at one time, and truly achieve end-to-end target detection, and play a fast advantage. With the introduction of V3 version, the speed and accuracy have been further improved.YOLO V3 model is complex, which can be divided into Darknet-53 network for feature extraction and tensor output network for output. As shown in Figure 1, the image of naval vessels has the characteristics of small effective pixel proportion, large image interference and random location. Under sea and air conditions, because there is no shelter on the sea surface, the observation distance is usually more than 20 km. The image quality of optoelectronic equipment decreases due to the influence of fog and background light. Under these conditions, it is difficult to find the ship in time and accurately, and to display the ship image clearly and quickly to the users. 978-1-7281-4691-1/19/$31.00 ©2019 IEEE DOI 10.1109/AIAM48774.2019.00033 The candidate box of YOLO V3 model is based on anchor mechanism, and its schematic diagram is shown in Figure 3. The actual prediction values of the network are    . According to the four formulas in the figure above, the coordinates of the center point and the width and height of    of the prediction frame are calculated. 130