International Core Journal of Engineering 2020-26 | Page 196
clone code, developers can prioritize these clone-prone code
fragments for better management.
using text processing tool Highlight 3.4.8 and call Python
imgkit 1.0.1 package implementation, image processing and
clone code detection based on image similarity and
visualization use Python programming and image
processing tool Python PIL, scientific calculation and
analysis tool numpy 1.15.2 and network analysis software
Gephi, and the rest of the work is realized with Python.
V. C ONCLUSION
This paper proposes an image-based clone code
detection and visualization method, detecting clone code
based on semantic similarity of images, it is a new source
code representation method, which provides a new
perspective for clone code detection. Visualize the detected
clone code, can highlight the key information in the data,
improve the value of the data, facilitate the relevant
personnel to analyze the clone data. Image representation
source code fragments are used for clone detection, which
lays a foundation for further research on deep learning.
Figure 4. Clone code visual result
D. Comparative experiment and analysis
In order to verify the effectiveness of ICCV, this
experiment was compared with NiCad under the same
detection software and the same experimental environment.
NiCad was selected because it has excellent performance in
the evaluation and comparative study of current clone code
detection methods [7][38]. Currently, there are few ways for
clone code detection of Python language, while NiCad can
detect clone in Python language and is widely recognized in
the field of clone code. NiCad can detect clone of type-1,
type-2, and part type-3.
The research work in this paper still has some
shortcomings, for example, some clone code may be
missed due to the limited number of lines of code selected,
the detection algorithm can be further optimized, can only
be tested for Python language projects and so on. In the
future work, this paper will continue to study and solve
these problems, and intend to conduct more in-depth
research on clone code detection by combining deep
learning techniques to achieve better results.
A CKNOWLEDGMENTS
Using the experimental evaluation data set and the
mutation insertion related information constructed above,
ICCV and NiCad were respectively used for clone detection,
and different recall and precision were obtained as shown in
table IV.
WANG Yafang, born in 1994, M. S. candidate. Her
research interests include software analysis, code analysis,
etc. LIU Dongsheng, born in 1956, Ph. D., professor. His
research interests include software analysis, code analysis,
computer aided instruction, etc.
TABLE IV. R ECALL AND ACCURACY FOR DIFFERENT METHOD .
Project
pandas
scipy
django
pytorch
scikit-learn
keras
Average
NiCad
recall
precision
0.97
1.00
0.88
1.00
0.86
1.00
0.76
0.98
0.97
1.00
1.00
1.00
0.82
0.96
R EFERENCES
ICCV
recall
precision
0.94
1.00
0.95
1.00
0.85
1.00
0.81
0.98
0.97
1.00
1.00
1.00
0.87
0.96
[1]
[2]
[3]
By comparing the experimental results, it is found that
the two kinds of clone code detection methods have good
detection effect on six softwares. Among the six tested
software, both NiCad and ICCV were able to detect clone
code with high precision, almost 100%. According to the
respective average values of recall and accuracy, recall for
clone code detected using ICCV were about 5 percentage
points higher than those detected using NiCad. According
to the clone information detected by human inspection, the
precision of ICCV is comparable to the precision of NiCad.
Although ICCV misses some real code clone fragments,
ICCV can detect those that cannot be detected by NiCad.
The results of the inspection and analysis personnel include
the author herself, 5 school clone research experts and 6
enterprise software developers.
[4]
[5]
[6]
[7]
[8]
[9]
E. Experimental validity
The code preprocessing in this paper is mainly
implemented through Python, code converted to image
174
Chen W K, Li B G and Gupta R. Code compacti on of matching
single-entry multiple-exit regions. [C]// International Static Analysis
Symposium. Springer, 2003: 401-417.
Kim M, Sazawal V, Notkin D, et al. An empirical study of code
clone genealogies[C]// 13th ACM SIGSOFT international
symposium on Foundations of software engineering. ACM, 2005:
187-196.
Patenaude J F, Merlo E, Dagenais M, et al. Extending software
quality assessment techniques to java systems[C]// Proceedings of the
7th International Workshop on Program Comprehension. IEEE,
1999:49-56.
Kamiya T, Kusumoto S, Inoue K. CCFinder: A Multilinguistic
Token-Based Code Clone Detection System for Large Scale Source
Code[J]. IEEE Transactions on Software Engineering, 2002,
28(7):654-670.
Rieger M , Stéphane Ducasse, Lanza M . Insights into System-Wide
Code Duplication[C]// Conference on Reverse Engineering. IEEE,
2005: 100-109.
Roy C K, Cordy J R. A survey on software clone detection research.
Queen's School of Computing TR, 2007, 541(115):64-68.
Roy C K, Cordy J R, Koschke R. Comparison and evaluation of code
clone detection techniques and tools: A qualitative approach [J].
Science of Computer Programming, 2009, 74(7):470-495.
Bellon S, Koschke R, Antoniol G, et al. Comparison and evaluation
of clone detection tools [J]. IEEE Transactions on Software
Engineering, 2007, 33(9):577-591.
Roy C K, Cordy J R. NICAD: Accurate Detection of Near-Miss
Intentional Clones Using Flexible Pretty-Printing and Code
Normalization[C]// Program Comprehension, 2008. ICPC 2008. The
16th IEEE International Conference on. IEEE, 2008:172-181.