Information processing for Similar Source code using LSH Algorithm

Authors

Mrs. Vani Dave
M.Tech Research Scholar, Department of Computer Science and Engineering, Kanpur Institute of Technology, Kanpur, India.

Mr Sanjeev Kumar shukla
Assistant professor and Head of Department, Department of Computer Applications, Kanpur Institute of Technology, Kanpur, India.

Abstract

In this study, we propose a method to quickly search for similar source files for a given source file as a method to examine the origin of reused code. By outputting not only the same contents but also similar contents, it corresponds to the source file that has been changed during reuse. In addition, locality-sensitive hashing is used to search from a large number of source files, enabling fast search. By this method, it is possible to know the origin of the reused code. A case study was conducted on a library that is being reused written in C language. Some of the changes were unique to the project, and some were no longer consistent with the source files. As a result, it was possible to detect the source files that were reused from among the 200 projects with 92% accuracy. In addition, when we measured the execution time of the search using 4 files, the search was completed within 1 second for each file.