As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
A software code plagiarism detection scheme based on ensemble learning is designed to address the issue of low accuracy in traditional abstract syntax tree based software code infringement detection methods. We adopt the AST structure of the code to integrate domain partitioning in IR with AST, and use a weighted simplified abstract syntax tree to design feature extraction and similarity calculation methods, to achieve partial detection of semantic plagiarism and calculate the similarity between text and source code. Then, the feature set of the known classification training set is placed into a random forest based ensemble classifier for training, and an association between error rate and the classification effect of the decision tree in the random forest are proposed to acquire feature node matching with the feature in the code base. The experimental results show that our scheme has higher accuracy than traditional detection methods based on abstract syntax trees. It can not only detect code similarity, but also provide the types of plagiarism, which has better comprehensive identification performance.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.