Spam text classification using LSTM Recurrent Neural Network
Journal: International Journal of Emerging Trends in Engineering Research (IJETER) (Vol.9, No. 9)Publication Date: 2021-09-11
Authors : Yeshwanth Zagabathuni;
Page : 1271-1275
Keywords : ;
Abstract
Sequence Classification is one of the on-demand research projects in the field of Natural Language Processing (NLP). Classifying a set of images or text into an appropriate category or class is a complex task that a lot of Machine Learning (ML) models fail to accomplish accurately and end up under-fitting the given dataset. Some of the ML algorithms used in text classification are KNN, Naïve Bayes, Support Vector Machines, Convolutional Neural Networks (CNNs), Recursive CNNs, Recurrent Neural Networks (RNNs), Long Short Term Memory (LSTM), etc. For this experimental study, LSTM and a few other algorithms were chosen for a more comparative study. The dataset used is the SMS Spam Collection Dataset from Kaggle and 150 more entries were additionally added from different sources. Two possible class labels for the data points are spam and ham. Each entry consists of the class label, a few sentences of text followed by a few useless features that are eliminated. After converting the text to the required format, the models are run and then evaluated using various metrics. In experimental studies, the LSTM gives much better classification accuracy than the other machine learning models. F1-Scores in the high nineties were achieved using LSTM for classifying the text. The other models showed very low F1-Scores and Cosine Similarities indicating that they had underperformed on the dataset. Another interesting observation is that the LSTM had reduced the number of false positives and false negatives than any other model.
Other Latest Articles
- Product Re-Engineering by Topology Optimization for Forged Component
- Experimental Evaluation on Mixed Mode I/II Stress Intensity Factors using CTS welded and non-welded specimen of Aluminum Alloy AA3003
- Six-Stroke Cylinder Engine : An Emerging Technology
- Effect of Upstream Ramp on Film-Cooling effectiveness
- Smart Contact Tracing and Classifier System for Covid-19 Cases
Last modified: 2021-09-12 19:41:32