172
Views
3
CrossRef citations to date
0
Altmetric
Computers and Computing

Semantic-Based Integrated Plagiarism Detection Approach for English Documents

, &
 

Abstract

The proposed work models a novel plagiarism detection system based on the semantic features to uncover the cases of plagiarism. The system constructs the dynamic relation matrix for each suspicious and source sentence pair to measure the degree of similarity using semantic features. Two Weighted Inverse Distance and GlossDice procedures show several text properties (synonyms, shortest path, etc.) to overcome the limitations of the existing features and new similarity metric for plagiarism detection are presented in this paper. Moreover, this research investigates the independent performance of various features to detect plagiarized cases and combine the best features by assigning different weight contributions to further enhance the system performance. Weighted Inverse Distance integrated with SynJaccard boosts the system performance and shows promising results. Initially, all the experiments were performed on PAN-PC-11dataset, and then PAN-14 text alignment dataset was used to validate the results of the proposed system. The effectiveness of the proposed system has been measured using standard performance measures i.e. Precision, Recall, F-measure, Granularity, and Plagdet score. The proposed system has outperformed the other baseline systems with precision (0.9459), recall (0.8861), f-measure (0.8917), and plagdet (0.8857) on the PAN-PC-11 dataset. For PAN-14 text alignment, the system exhibits precision (0.9257), recall (0.9055), f-measure (0.8931), and plagdet (0.8806).

Additional information

Notes on contributors

Manpreet Kaur

Manpreet Kaur received her master's degree in computer science and engineering from the University Institute of Engineering and Technology, Panjab University, Chandigarh, in 2021. Her research interests include natural language processing, image processing and computer vision.

Vishal Gupta

Vishal Gupta is currently working as associate professor in the Department of Computer Science & Engineering, University Institute of Engineering & Technology, Panjab University, Chandigarh. His main research interests include natural language processing, deep learning, machine learning, information retrieval, and text mining. He has won Young Scientist Award and Faculty Research Award. He was selected in World top 2% scientists, ranking list-2019 released by Stanford University in computer science. Corresponding author. Email: [email protected]

Ravreet Kaur

Ravreet Kaur is currently working as assistant professor in the Department of Computer Science & Engineering, University Institute of Engineering & Technology, Panjab University, Chandigarh. Her main research interests include parallel & distributed computing, future network architecture, deep learning and the architecture and key technology of the new generation internet of things. Email: [email protected]

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.