Performance Analysis of Machine Learning Algorithms Used for Web Based Phishing Detection

Authors

Shailendra Baliram Torane
D. J. Sanghvi College of Engineering, Mumbai, India.

Dr. Narendra Shekokar
Department of Computer Engineering, D. J. Sanghvi College of Engineering, Mumbai, India.

Abstract

Phishing is a cybercrime technique in which the attacker creates a copy of genuine websites with the same color pattern, layout, font, and logo and with a domain name that matches with the real one. Then, broadcast this fake website through various online modes like emails and social media. The attacker creates lucrative offers or discounts to lure in people to click on the phishing link. Once the user clicks on this phishing link, they a re directed to the duplicate website that the attacker had created. The user believes that it is the real website and enters his/her login details and other confidential data. This data is stored on the attacker’s server thus giving him full access to the victim’s data. The phishing attack is mainly targeted to collect confidential data of the victim. This data includes Username, Passwords, Bank details, security Credit card numbers etc. Machine Learning algorithms are being used widely in detecting phishing websites. This paper shows performance analysis of three Machine learning algorithms used for URL phishing detection. These algorithms are Extreme Learning Machine, Support Vector Machine and Naïve Bayes algorithm. The paper analyses these algorithms on the parameters of Accuracy, Precision, Recall, F1 score and Confusion matrix. The dataset includes 11,000 entries and 30 features from UC Irvine dataset repository. The literature survey shows how only importance is given to only one parameter i.e., Accuracy parameter when analyzing performance of the URL phishing detection algorithms. This paper concludes on how Accuracy parameter does not show full picture on the overall performance of the URL phishing detection algorithms and also how Precision and Recall parameters are very important in understanding the working of these algorithms.