ABSTRACT

Web phishing poses a significant security challenge for web users owing to three primary factors. First, it is easy to implement and does not require profound technical expertise in programming or networking. Second, it can be executed across various platforms, including the web, SMS, and social media platforms. Finally, this type of attack relies on social engineering, meaning that users' responses are influenced by the content presented to them. Over the past few decades, there has been a proliferation of methods and services designed for phishing detection. In this study, we introduced a novel approach to web phishing detection based on a hybrid weighted machine learning framework. Our method harnesses the capabilities of four distinct machine learning algorithms, including an unsupervised approach (K-means) and three supervised techniques. The outputs of these algorithms were strategically weighted to produce a final decision. To train and evaluate our proposed algorithm, we employed a vast dataset encompassing no content web features, totaling 111 distinct attributes. The correlations between these features and the classification outcomes were leveraged to streamline the feature set, and various correlation values were explored. Our findings from the training and validation phases underscore the significance of the correlation between the chosen features in determining the accuracy of the algorithm. In summary, our research introduces an innovative approach to combat web phishing, showcasing the potential of hybrid machine learning techniques and the critical role of feature selection through correlation analysis to enhance detection accuracy. The accuracy outcomes of the various algorithms exhibited a range of values, ranging from 0.6561 to 0.8833, across different correlation thresholds when considering all features.

Keywords: Web Phishing, Cybersecurity, Machine Learning, Detection Methods, Cyber Threats, Security Challenges.