A Dynamic Improvement of a Training Dataset for Source Code Classification Using Deep Learning approach

Authors

Ms. Anshika Shukla
M.Tech Research Scholar, Department of Computer Science and Engineering, Kanpur Institute of Technology, Kanpur, India.

Mr. Sanjeev Kumar Shukla
Assistant professor and Head of Department, Department of Computer Applications, Kanpur Institute of Technology, Kanpur, India.

Abstract

In recent years, there are various methods for source code classification using deep learning approaches have been proposed. The classification accuracy of the method using deep learning is greatly influenced by the training data set. Therefore, it is possible to create a model with higher accuracy by improving the construction method of the training data set. In this study, we propose a dynamic learning data set improvement method for source code classification using deep learning. In the proposed method, we first train and verify the source code classification model using the training data set. Next, we reconstruct the training data set based on the verification result. We create a high-precision model by repeating this learning and reconstruction and improving the learning data set. In the evaluation experiment, the source code classification model was learned using the proposed method, and the classification accuracy was compared with the three baseline methods. As a result, it was found that the model learned using the proposed method has the highest classification accuracy. We also confirmed that the proposed method improves the classification accuracy of the model from 0.64 to 0.96