NAIVE BAYES CLASSIFIER WITH MODIFIED SMOOTHING TECHNIQUES FOR BETTER SPAM CLASSIFICATION

Journal: International Journal of Computer Science and Mobile Computing - IJCSMC (Vol.3, No. 10)

Publication Date: 2014-10-30

Authors : Gurneet Kaur; Er. Neelam Oberai;

Page : 869-878

Keywords : Naïve Bayes Classifier; Text Classification; Smoothing Methods; Spam Classification;

Source : Download Find it from : Google Scholar

Abstract

Text Mining has become an important research area due to the glorification of electronic documents available on web. Spam (junk-email) identification is one of the important application areas of Text Mining. Naive Bayes is very popular in commercial and open-source anti-spam e-mail filters. There are, however, several forms of Naive Bayes, something the anti-spam literature does not always acknowledge A good spam filter is not just judged by its accuracy in identifying spam, but by its overall performance It has been found that it largely depends on the smoothing method, which aims to adjust the probability of an unseen event from the seen event that arises due to data sparseness. The aim is at enhancing the performance of Naïve Bayes Classifier in classifying spam mails by proposing a modification to Jelinek-Mercer Smoothing and Dirichlet Smoothing method against the Laplace method of traditional Naïve Bayes Classifier. To overcome these issues, Naive Bayes Classifier is implemented with the modification in Smoothing techniques for calculating the collection probability for the model. The modified smoothing method calculates the collection probability by using the uniform distribution probability. The improved method shows the high performance in case of large data set, with precise number of keywords, with variations in smoothing factor. The improved method shows the high performance in case of varying data set, varying number of keywords and variations in smoothing factor based on the data set used.

Main Menu

Searching By

PARTNERS

NAIVE BAYES CLASSIFIER WITH MODIFIED SMOOTHING TECHNIQUES FOR BETTER SPAM CLASSIFICATION

Abstract

Advertisement