Phishing Detection System Through Hybrid Machine Learning Based on URL

Karim, Abdul and Shahroz, Mobeen and Mustofa, Khabib and Belhaouari, Samir Brahim and Joga, S. Ramana Kumar (2023) Phishing Detection System Through Hybrid Machine Learning Based on URL. IEEE Access, 11. pp. 36805-36822. ISSN 21693536

[thumbnail of Phishing_Detection_System_Through_Hybrid_Machine_Learning_Based_on_URL.pdf] Text
Phishing_Detection_System_Through_Hybrid_Machine_Learning_Based_on_URL.pdf
Restricted to Registered users only

Download (2MB)

Abstract

Currently, numerous types of cybercrime are organized through the internet. Hence, this study mainly focuses on phishing attacks. Although phishing was first used in 1996, it has become the most severe and dangerous cybercrime on the internet. Phishing utilizes email distortion as its underlying mechanism for tricky correspondences, followed by mock sites, to obtain the required data from people in question. Different studies have presented their work on the precaution, identification, and knowledge of phishing attacks; however, there is currently no complete and proper solution for frustrating them. Therefore, machine learning plays a vital role in defending against cybercrimes involving phishing attacks. The proposed study is based on the phishing URL-based dataset extracted from the famous dataset repository, which consists of phishing and legitimate URL attributes collected from 11000+ website datasets in vector form. After preprocessing, many machine learning algorithms have been applied and designed to prevent phishing URLs and provide protection to the user. This study uses machine learning models such as decision tree (DT), linear regression (LR), random forest (RF), naive Bayes (NB), gradient boosting classifier (GBM), K-neighbors classifier (KNN), support vector classifier (SVC), and proposed hybrid LSD model, which is a combination of logistic regression, support vector machine, and decision tree (LR+SVC+DT) with soft and hard voting, to defend against phishing attacks with high accuracy and efficiency. The canopy feature selection technique with cross fold valoidation and Grid Search Hyperparameter Optimization techniques are used with proposed LSD model. Furthermore, to evaluate the proposed approach, different evaluation parameters were adopted, such as the precision, accuracy, recall, F1-score, and specificity, to illustrate the effects and efficiency of the models. The results of the comparative analyses demonstrate that the proposed approach outperforms the other models and achieves the best results.

Item Type: Article
Uncontrolled Keywords: Voting classifier,and decision tree (LSD),cyber security,ensemble classifier,logistic regression,machine learning,protocol,social networks,support vector machine,uniform resource locator (URL)
Subjects: Q Science > QA Mathematics
Depositing User: Rita Yulianti Yulianti
Date Deposited: 05 Apr 2024 07:34
Last Modified: 05 Apr 2024 07:34
URI: https://ir.lib.ugm.ac.id/id/eprint/503

Actions (login required)

View Item
View Item