International Core Journal of Engineering 2020-26 | Page 196

clone code, developers can prioritize these clone-prone code fragments for better management. using text processing tool Highlight 3.4.8 and call Python imgkit 1.0.1 package implementation, image processing and clone code detection based on image similarity and visualization use Python programming and image processing tool Python PIL, scientific calculation and analysis tool numpy 1.15.2 and network analysis software Gephi, and the rest of the work is realized with Python. V. C ONCLUSION This paper proposes an image-based clone code detection and visualization method, detecting clone code based on semantic similarity of images, it is a new source code representation method, which provides a new perspective for clone code detection. Visualize the detected clone code, can highlight the key information in the data, improve the value of the data, facilitate the relevant personnel to analyze the clone data. Image representation source code fragments are used for clone detection, which lays a foundation for further research on deep learning. Figure 4. Clone code visual result D. Comparative experiment and analysis In order to verify the effectiveness of ICCV, this experiment was compared with NiCad under the same detection software and the same experimental environment. NiCad was selected because it has excellent performance in the evaluation and comparative study of current clone code detection methods [7][38]. Currently, there are few ways for clone code detection of Python language, while NiCad can detect clone in Python language and is widely recognized in the field of clone code. NiCad can detect clone of type-1, type-2, and part type-3. The research work in this paper still has some shortcomings, for example, some clone code may be missed due to the limited number of lines of code selected, the detection algorithm can be further optimized, can only be tested for Python language projects and so on. In the future work, this paper will continue to study and solve these problems, and intend to conduct more in-depth research on clone code detection by combining deep learning techniques to achieve better results. A CKNOWLEDGMENTS Using the experimental evaluation data set and the mutation insertion related information constructed above, ICCV and NiCad were respectively used for clone detection, and different recall and precision were obtained as shown in table IV. WANG Yafang, born in 1994, M. S. candidate. Her research interests include software analysis, code analysis, etc. LIU Dongsheng, born in 1956, Ph. D., professor. His research interests include software analysis, code analysis, computer aided instruction, etc. TABLE IV. R ECALL AND ACCURACY FOR DIFFERENT METHOD . Project pandas scipy django pytorch scikit-learn keras Average NiCad recall precision 0.97 1.00 0.88 1.00 0.86 1.00 0.76 0.98 0.97 1.00 1.00 1.00 0.82 0.96 R EFERENCES ICCV recall precision 0.94 1.00 0.95 1.00 0.85 1.00 0.81 0.98 0.97 1.00 1.00 1.00 0.87 0.96 [1] [2] [3] By comparing the experimental results, it is found that the two kinds of clone code detection methods have good detection effect on six softwares. Among the six tested software, both NiCad and ICCV were able to detect clone code with high precision, almost 100%. According to the respective average values of recall and accuracy, recall for clone code detected using ICCV were about 5 percentage points higher than those detected using NiCad. According to the clone information detected by human inspection, the precision of ICCV is comparable to the precision of NiCad. Although ICCV misses some real code clone fragments, ICCV can detect those that cannot be detected by NiCad. The results of the inspection and analysis personnel include the author herself, 5 school clone research experts and 6 enterprise software developers. [4] [5] [6] [7] [8] [9] E. Experimental validity The code preprocessing in this paper is mainly implemented through Python, code converted to image 174 Chen W K, Li B G and Gupta R. Code compacti on of matching single-entry multiple-exit regions. [C]// International Static Analysis Symposium. Springer, 2003: 401-417. Kim M, Sazawal V, Notkin D, et al. An empirical study of code clone genealogies[C]// 13th ACM SIGSOFT international symposium on Foundations of software engineering. ACM, 2005: 187-196. Patenaude J F, Merlo E, Dagenais M, et al. Extending software quality assessment techniques to java systems[C]// Proceedings of the 7th International Workshop on Program Comprehension. IEEE, 1999:49-56. Kamiya T, Kusumoto S, Inoue K. CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code[J]. IEEE Transactions on Software Engineering, 2002, 28(7):654-670. Rieger M , Stéphane Ducasse, Lanza M . Insights into System-Wide Code Duplication[C]// Conference on Reverse Engineering. IEEE, 2005: 100-109. Roy C K, Cordy J R. A survey on software clone detection research. Queen's School of Computing TR, 2007, 541(115):64-68. Roy C K, Cordy J R, Koschke R. Comparison and evaluation of code clone detection techniques and tools: A qualitative approach [J]. Science of Computer Programming, 2009, 74(7):470-495. Bellon S, Koschke R, Antoniol G, et al. Comparison and evaluation of clone detection tools [J]. IEEE Transactions on Software Engineering, 2007, 33(9):577-591. Roy C K, Cordy J R. NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization[C]// Program Comprehension, 2008. ICPC 2008. The 16th IEEE International Conference on. IEEE, 2008:172-181.