172
Views
3
CrossRef citations to date
0
Altmetric
Computers and Computing

Semantic-Based Integrated Plagiarism Detection Approach for English Documents

, &

REFERENCES

  • I. Ahmad, “How much data is generated every minute?[infographic],” Retrieved September, 2020, Available: https://www.socialmediatoday.com/news/hmuch-data-is-generated-every-minute-infographic-1/525692/.
  • Merriam-Webster. “Plagiarize.” In Merriam-Webster.com dictionary. Retrieved September 26, 2020, Available: https://www.merriam-webster.com/dictionary/plagiarize. n.d.
  • D. Gupta, “Study on extrinsic text plagiarism detection techniques and tools,” J. Eng. Sci. Technol. Rev., Vol. 9, no. 5, 1 Sep. 2016. DOI:10.25103/jestr.135.04
  • M. Elamine, F. Bougares, S. Mechti, and L. H. Belguith, “Extrinsic plagiarism detection for French language with word embeddings,” in International Conference on Intelligent Systems Design and Applications, Springer, Cham, 3 Dec. 2019, pp. 217–224.
  • M. AlSallal, R. Iqbal, V. Palade, S. Amin, and V. Chang, “An integrated approach for intrinsic plagiarism detection,” Future Gener. Comput. Syst., Vol. 96, pp. 700–712, Jul. 2019. DOI:10.1016/j.future.2017.11.023
  • I. Bensalem, P. Rosso, and S. Chikhi, “On the use of character n-grams as the only intrinsic evidence of plagiarism,” Lang. Resour. Eval., Vol. 53, no. 3, pp. 363–396, Sep. 2019. DOI:10.1007/s10579-019-09444-w
  • PAN-PC-2011 datasets. Available: https://pan.webis.de/data.
  • M. Potthast, M. Hagen, T. Gollub, M. Tippmann, J. Kiesel, P. Rosso, and B. Stein, “Overview of the 5th international competition on plagiarism detection,” in CLEF Conference on Multilingual and Multimodal Information Access Evaluation, CELCT, 2013, pp. 301–331.
  • A. Barrón-Cedeño, and P. Rosso, “On automatic plagiarism detection based on n-grams comparison,” in European Conference on Information Retrieval, Springer, Berlin, Heidelberg, Apr. 2009, pp. 696–700.
  • E. Stamatatos, “Plagiarism detection using stopword n-grams,” J. Am. Soc. Inf. Sci. Technol., Vol. 62, no. 12, pp. 2512–2527, Dec. 2011. DOI:10.1002/asi.21630
  • P. Shrestha, and T. Solorio, “Using a variety of n-grams for the detection of different kinds of plagiarism,” in Notebook for PAN at CLEF 2013, 2013.
  • D. Gupta, K. Vani, and L. M. Leema, “Plagiarism detection in text documents using sentence bounded stop word n-grams,” J. Eng. Sci. Technol., Vol. 11, no. 10, pp. 1403–1420, Oct. 2016.
  • M. Wielgosz, P. Russek, E. Jamro, and K. Wiatr, “Evaluation and implementation of n-gram-based algorithm for fast text comparison,” Comput. Inform., Vol. 36, no. 4, pp. 887–907, Nov. 2017.
  • D. Leman, M. Rahman, F. Ikorasaki, B. S. Riza, and M. B. Akbbar, “Rabin karp and Winnowing algorithm for Statistics of text document plagiarism detection,” in 2019 7th International Conference on Cyber and IT Service Management (CITSM), IEEE, Nov. 2019, Vol. 7, pp. 1–5.
  • S. Schleimer, D. S. Wilkerson, and A. Aiken, “Winnowing: local algorithms for document fingerprinting,” in Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, Jun. 2003, pp. 76–85.
  • J. Grman, and R. Ravas, “Improved implementation for finding text similarities in large collections of data,” in Notebook Papers of CLEF 2011 LABs and Workshops, 2011.
  • C. Grozea, and M. Popesc, “The encoplot similarity measure for automatic detection of plagiarism notebook for PAN at CLEF 2011,” 2011.
  • A. H. Osman, N. Salim, M. S. Binwahlan, R. Alteeb, and A. Abuobieda, “An improved plagiarism detection scheme based on semantic role labeling,” Appl. Soft. Comput., Vol. 12, no. 5, pp. 1493–1502, May 2012. DOI:10.1016/j.asoc.2011.12.021
  • A. Ekbal, S. Saha, and G. Choudhary, “Plagiarism detection in text using vector space model,” in 2012 12th International Conference on Hybrid Intelligent Systems (HIS), IEEE, Dec. 2012, pp.366-371.
  • V. K. Singh, and V. K. Singh, “Vector space model: an information retrieval system,” Int. J. Adv. Engg. Res. Studies/IV/II/, Vol. 4, no. 2, pp. 141–143, Jan. 2015.
  • M. Paul, and S. Jamal, “An improved SRL based plagiarism detection technique using sentence ranking,” Procedia. Comput. Sci., Vol. 46, pp. 223–230, Jan. 2015. DOI:10.1016/j.procs.2015.02.015
  • P. Jaccard, “The distribution of the flora in the alpine zone,” New Phytol., Vol. 11, no. 2, pp. 37–50, Feb. 1912. DOI:10.1111/j.1469-8137.1912.tb05611.x
  • F. Rahutomo, T. Kitasuka, and M. Aritsugi, “Semantic cosine similarity,” 7th Int. Stud. Conf. Adv. Sci. Technol. ICAST, Vol. 4, no. 1, pp. 1, Oct. 2012.
  • S. M. Alzahrani, N. Salim, and V. Palade, “Uncovering highly obfuscated plagiarism cases using fuzzy semantic-based similarity model,” J. King Saud Univ.-Comput. Inf. Sci., Vol. 27, no. 3, pp. 248–268, Jul. 2015. DOI:10.1016/j.jksuci.2014.12.001
  • Z. Wu, and M. Palmer. “Verb semantics and lexical selection,” arXiv preprint cmp-lg/9406033, Jun. 1994.
  • A. Abdi, N. Idris, R. M. Alguliyev, and R. M. Aliguliyev, “PDLK: plagiarism detection using linguistic knowledge,” Expert. Syst. Appl., Vol. 42, no. 22, pp. 8936–8946, Dec. 2015. DOI:10.1016/j.eswa.2015.07.048
  • D. Lin, “An information-theoretic definition of similarity,” InIcml, Vol. 98, no. 1998, pp. 296–304, Jul. 1998.
  • A. Abdi, S. M. Shamsuddin, N. Idris, R. M. Alguliyev, and R. M. Aliguliyev, “A linguistic treatment for automatic external plagiarism detection,” Knowl. Based. Syst., Vol. 135, pp. 135–146, Nov. 2017. DOI:10.1016/j.knosys.2017.08.008
  • K. Vani, and D. Gupta, “Using K-means cluster based techniques in external plagiarism detection,” in 2014 International Conference on Contemporary Computing and Informatics (IC3I), IEEE, Nov. 2014 pp. 1268–1273.
  • R. L. Cilibrasi, and P. M. Vitanyi, “The google similarity distance,” IEEE Trans. Knowl. Data Eng., Vol. 19, no. 3, pp. 370–383, 2007. DOI:10.1109/TKDE.2007.48
  • R. M. Aliguliyev, “A new sentence similarity measure and sentence based extractive technique for automatic text summarization,” Expert. Syst. Appl., Vol. 36, no. 4, pp. 7764–7772, 2009. DOI:10.1016/j.eswa.2008.11.022
  • A. Brlek, P. Franjic, and N. Uzelac, “Plagiarism detection using word2vec model,” Text Anal. Retr., Vol. 4, pp. 4–7, 2016.
  • M. Sahi, and V. Gupta, “A novel technique for detecting plagiarism in documents exploiting information sources,” Cognit. Comput., Vol. 9, no. 6, pp. 852–867, Dec. 2017. DOI:10.1007/s12559-017-9502-4
  • L. Ahuja, V. Gupta, and R. Kumar, “A new Hybrid technique for detection of plagiarism from text documents,” Arab. J. Sci. Eng., Vol. 45, no. 12, pp. 1–4, May 2020.
  • R. Rada, H. Mili, E. Bicknell, and M. Blettner, “Development and application of a metric on semantic nets,” IEEE Trans. Syst. Man Cybern., Vol. 19, no. 1, pp. 17–30, 1989. DOI:10.1109/21.24528
  • A. Daud, J. A. Khan, J. A. Nasir, R. A. Abbasi, N. R. Aljohani, and J. S. Alowibdi, “Latent dirichlet allocation and POS tags based method for external plagiarism detection: LDA and POS tags based plagiarism detection,” International Journal on Semantic Web and Information Systems, Vol. 14, no. 3, pp. 53–69, 2018.
  • D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” J. Mach. Learn. Res., Vol. 3, pp. 993–1022, Mar. 2003.
  • K. Vani, and D. Gupta, “Unmasking text plagiarism using syntactic-semantic based natural language processing techniques: comparisons, analysis and challenges,” Inf. Process. Manag., Vol. 54, no. 3, pp. 408–432, May 2018. DOI:10.1016/j.ipm.2018.01.008
  • E. Gharavi, H. Veisi, and P. Rosso, “Scalable and language-independent embedding-based approach for plagiarism detection considering obfuscation type: no training phase,” Neural Computing and Applications, Vol. 7, pp. 1–5, Nov.2019.
  • D. Bollegala, Y. Matsuo, and M. Ishizuka, “A relational model of semantic similarity between words using automatically extracted lexical pattern clusters from the web,” in Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, ACL, Singapore, Aug. 2009, pp. 803–812.
  • G. Zhu, and C. A. Iglesias, “Computing semantic similarity of concepts in knowledge graphs,” IEEE Trans. Knowl. Data Eng., Vol. 29, no. 1, pp. 72–85, 2016. DOI:10.1109/TKDE.2016.2610428
  • G. A. Miller, “Wordnet: a lexical database for English,” Commun. ACM, Vol. 38, no. 11, pp. 39–41, Nov. 1995. DOI:10.1145/219717.219748
  • W. N. Francis, and H. Kucera, “Brown corpus manual,” Lett. Ed., Vol. 5, no. 2, pp. 7, Jul. 1979.
  • E. Loper, and S. Bird. “Nltk: The natural language toolkit,” arXiv preprint cs/0205028, May 2002.
  • N. Hardeniya, J. Perkins, D. Chopra, N. Joshi, and I. Mathur. Natural language processing: python and NLTK. UK: Packt Publishing Ltd, Nov. 2016.
  • A. Tomasic, and H. Garcia-Molina, “Query processing and inverted indices in shared-nothing text document information retrieval systems,” VLDB. J., Vol. 2, no. 3, pp. 243–275, Jul. 1993. DOI:10.1007/BF01228671
  • Gene Ontology Consortium, “The Gene Ontology (GO) database and informatics resource,” Nucleic Acids Res., Vol. 32, no. suppl_1, pp. D258–D261, Jan. 2004.
  • P. Altheide, “Spatial Data Transfer Standard (SDTS),” 1087–1095, 2008.
  • C. Yang, Y. Zhu, M. Zhong, and R. Li, “Semantic similarity computation in knowledge graphs: comparisons and improvements,” in 2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW), Apr. 2019, pp. 249–252, IEEE.
  • M. K. Prasad, and P. Sharma, “Combining common words and semantic features for sentence similarity,” in 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT), IEEE, Jul. 2018, pp.1–4.
  • M. Potthast, B. Stein, A. Barrón-Cedeño, and P. Rosso, “An evaluation framework for plagiarism detection",  In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, Beijing-China, Aug. 2010, pp. 997–1005.
  • M. Potthast, M. Hagen, A. Beyer, M. Busse, M. Tippmann, and B. Stein, “Overview of the 6th international competition on plagiarism detection,” in CLEF 2014 evaluation labs and Workshop – working notes papers. CEUR Workshop proceedings, L. Cappellato, N. Ferro, M. Halvey, and W. Kraaij, Ed. Sheffield-UK: CLEF and CEUR-WS, 2014, pp. 845–876.
  • E. S. El-Alfy, R. E. Abdel-Aal, W. G. Al-Khatib, and F. Alvi, “Boosting paraphrase detection through textual similarity metrics with abductive networks,” Appl. Soft. Comput., Vol. 26, pp. 444–453, Jan. 2015. DOI:10.1016/j.asoc.2014.10.021
  • M. P. Eiselt, and A. B. Rosso, “Overview of the 1st international competition on plagiarism detection,” in 3rd PAN Workshop, Uncovering Plagiarism, Authorship and Social Software Misuse, 2009, pp. 1.
  • A. Barrón-Cedeño, M. Potthast, P. Ross, B. Stein, and A. Eiselt, “Corpus and evaluation measures for automatic plagiarism detection”, Proceedings of the Seventh International Conference on Language Resources and Evaluation, Malta, May 2010, pp. 771–774.
  • D. Bollegala, Y. Matsuo, and M. Ishizuka, “Measuring semantic similarity between words using web search engines,” www, Vol. 7, pp. 757–766, May 2007.
  • S. A. Elavarasi, J. Akilandeswari, and K. Menaga, “A survey on semantic similarity measure,” Int J Res Advent Technol, Vol. 2, no. 3, pp. 389–398, Mar. 2014.
  • H. T. Nguyen, P. H. Duong, and E. Cambria, “Learning short-text semantic similarity with word embeddings and external knowledge sources,” Knowl. Based. Syst., Vol. 182, pp. 104842, Oct. 2019. DOI:10.1016/j.knosys.2019.07.013
  • D. K. Tayal, A. Jain, A. Roy, and M. Gupta, “An Improved word similarity measure For ontological context,” in 2019 International Conference on Advances in Computing, Communication and Control (ICAC3), IEEE, Dec. 2019, pp. 1–5.
  • Turnitin Plagiarism detection software. Available: https://www.turnitin.com/.
  • Plagiarized text source. Available: https://en.wikipedia.org/wiki/Fonterra.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.