Download PDFOpen PDF in browserComparative Analysis of Machine Learning Models for Phishing Website Detection Using URL and Web FeaturesEasyChair Preprint 156829 pages•Date: January 7, 2025AbstractPhishing website attacks are a significant global threat, targeting people who rely on websites and shared links for their studies, work, or daily activities to steal personal information. Traditional detection models are often insufficient due to the evolving sophistication of phishing attacks. This paper presents an evaluation of three machine learning models for detecting phishing attempts through URLs or websites. It leverages both URL structure and web-based features, using a publicly available dataset with 11,430 samples and 87 attributes. Here, we evaluated the effectiveness of three models: Random Forest, Support Vector Machines (SVM) and XGBoost. These models analyze phishing indicators across three categories: URL structure, webpage content, and external services, ensuring comprehensive representation. The findings demonstrate that the Random Forest model is the most effective, achieving an accuracy of approximately 97%, followed closely by SVM, while the XGBoost model achieves an accuracy of 95%. This research describes how URL and web features work well to identify phishing websites and demonstrates how machine learning could improve anti- phishing solutions. These outcomes provide the basis for further studies in the detection methods occurring in real time and adding more feature sets in order to enhance anti- phishing efforts. Keyphrases: Cybersecurity, Phishing Website Detection, Random Forests, URL features, XGBoost, machine learning
|