Machine Learning for Phishing Detection through URL Analysis

Ivana Hartmann Tolić

1 , Mirta Vujnovac

2 ,

Ivana Hartmann Tolić

1 , Mirta Vujnovac

2 ,

Affiliations

1. Faculty of Electrical Engineering, Computer Science and Information Technology Osijek, Kneza Trpimira 2b, 31000 Osijek, Croatia
2. III. Gymnasium Osijek, Kamila Firingera 14, 31000 Osijek, Croatia

Affiliations

1. Faculty of Electrical Engineering, Computer Science and Information Technology Osijek, Kneza Trpimira 2b, 31000 Osijek, Croatia
2. III. Gymnasium Osijek, Kamila Firingera 14, 31000 Osijek, Croatia

Abstract: In recent times, phishing attacks and online identity theft have posed a significant threat to cybersecurity, utilizing fraudulent websites to deceive users into disclosing sensitive data. Phishing is a form of social engineering where attackers disseminate false information via fraudulent websites to deceive victims into disclosing personal data, either to acquire further information or to achieve financial gain. Given the rapid evolution of technology and phishing tactics, coupled with the increasingly frequent exchange of information online, effective methods for detecting fraudulent URLs are essential. The objective of this study was to evaluate the effectiveness of various machine and deep learning models in classifying malicious and legitimate web addresses without analyzing page content. Experimental results demonstrate that convolutional neural networks (CNNs) can achieve an accuracy of up to 98.7%, while ensemble models such as Random Forest and XGBoost also exhibit high accuracy exceeding 96%, thereby significantly outperforming traditional approaches like logistic regression. As phishing strategies continue to evolve, adaptive models such as ensemble learning techniques and deep learning architectures will be pivotal for safeguarding online security and for comprehending the effective mitigation of emerging cyber threats.

Keywords: social engineering, ensemble models, cyber attacks, URL classification, SMOTE

Abstract

Summary: Phishing attacks have posed a significant threat to cybersecurity in recent years. Phishing is a form of social engineering in which attackers provide misleading information via fake websites in order to trick the victim into disclosing private information to obtain further information or gain a financial advantage. With the rapid development of technology and phishing tactics, access to information and the frequent exchange of information, effective methods for detecting fake URLs are needed. The goal is to evaluate the effectiveness of different models in classifying malicious and legitimate web addresses without analyzing the content of the page. This study aimed to evaluate the effectiveness of various machine learning and deep learning models in classifying malicious and legitimate web addresses without analyzing page content. Experimental results show that convolutional neural networks (CNNs) can achieve accuracy rates of up to 98.7%, while ensemble models such as Random Forest and XGBoost also demonstrate high accuracy, exceeding 96%, significantly outperforming traditional approaches like logistic regression. As phishing strategies continue to evolve, adaptive models such as ensemble learning techniques, deep learning architectures will be fundamental to securing online security and crucial to understanding how to effectively counter emerging cybersecurity threats.

Keywords

Keywords: Cyber attacks, Ensemble models, SMOTE, Social engineering, URL classification

This work is licensed under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International.

Received: 10.06.2025.

Accepted: 23.07.2025.

Number of views: 135

Download PDF article

Festung

Machine Learning for Phishing Detection through URL Analysis

- 1/2025, Published

Ivana Hartmann Tolić

1

, Mirta Vujnovac

2

,

,

Ivana Hartmann Tolić

1

, Mirta Vujnovac

2

,

Browse by categories

Festung