33
Views
0
CrossRef citations to date
0
Altmetric
Articles

Graph-based phishing detection: URLGBM model driven by machine learning

ORCID Icon, &
Pages 481-495 | Received 08 Jul 2023, Accepted 09 Apr 2024, Published online: 18 Apr 2024
 

Abstract

Phishing attacks are a form of social engineering that involves the transmission of deceptive communications imitating a trustworthy source to deceive users into revealing confidential information. Antiphishing systems have made significant strides in recent years, allowing internet users to secure confidential and private information against such attacks. In this study, we propose a URL graph representation based on a random walk algorithm, specifically PageRank, for weighting URL tokens. To create the graph, an imaginary walker visits the URL tokens one at a time and assigns a value to each token based on the probability of encountering the target URL during the walk. We studied different random walk (rw) variations and their effects on the URL string. The BM25 algorithm was employed to produce a sparse matrix for the classification task from the token scores obtained. Experiments conducted with logistic regression revealed that the proposed model achieved an accuracy of 98.98%, a false alarm rate of 1.72%, and a missing alarm rate of 0.302%. The model also attained a 97.17% accuracy on a benchmark dataset.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Data availability statement

The datasets, designated as D1 and D2, utilized in our study on the detection of phishing attacks, are conveniently accessible through the following links: https://github.com/ebubekirbbr/pdd/tree/master/input, https://sites.google.com/view/url-phishing-detection/main

Notes

Additional information

Notes on contributors

Abdelali Elkouay

Elkouay Abdelali is currently a PhD student at University of Chouaib Doukkali, EL Jadida, Morocco. His research interests include Cybersecurity, Data mining, Machine Learning and Natural Language Processing (NLP).

Najem Moussa

Moussa Najem is a full Professor at Department of Computer Science, Faculty of Science, Mohammed V University, Morocco. He obtained his PhD in Statistical Physics from Mohammed V University in 1998. His areas of interest include Machine learning, Intelligent transportation systems, Wireless and Sensor Networks, Epidemic and worm propagation.

Abdallah Madani

Abdellah Madani is currently a Professor and PhD Tutor in Department of Computer Science, Chouaib Doukkali University, Faculty of Sciences, El Jadida, Morocco. His main research interests include optimization algorithms, text mining, traffic flow and modeling platforms. He is the author of many research papers published at conference proceedings and international journals.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.