TWITTER SENTIMENT ANALYSIS USING DEMPSTER SHAFER ALGORITHM BASED FEATURE SELECTION AND ONE AGAINST ALL MULTICLASS SVM CLASSIFIER
Journal: International Journal of Advanced Research in Engineering and Technology (IJARET) (Vol.11, No. 02)Publication Date: 2020-02-29
Authors : NAGAMANJULA R; A. PETHALAKSHMI;
Page : 163-185
Keywords : Twitter Sentiment Analysis; Split Attached Word; acronyms expansion; Preprocessing; Feature Selection; Classification;
Abstract
Rapid development of social media and internet technologies has conquered much intention on sentiment analysis. Twitter is one of the social media used by numerous users about some subject matter in the form of tweets. Twitter Sentiment Analysis (TSA) is the way of finding sentiments and opinions in the tweets. Still, achieving high accuracy in TSA is difficult due to characteristics of twitter data such as spelling errors, abbreviation and special characters. Therefore, our main intention is to attain high accuracy in TSA. To achieve this intention, we majorly concentrated on five processes: Data cleaning, Preprocessing, Feature extraction, Feature Selection and Classification. In the data cleaning stage, we perform four major processes: URL removal, Username Removal, Punctuation Removal and Spell Correction. By executing the data cleaning process, this work enhances the efficacy of TSA. To increase the accuracy in TSA, we adopt preprocessing where tokenization, stop word removal, lemmatization and stemming, acronyms expansion, slangs correction, split attached word, and POS tagging. In order to improve the classification accuracy, we execute the feature extraction process where eight features are extracted. A key bottleneck in TSA is a huge amount of data which makes difficulties in the training of ML during sentiment classification. To tackle this hurdle, we select the best features from the extracted features using the Dempster Shafer algorithm. Sentiments are classified using the One against All-Multiclass Support Vector Machine (OA2 -SVM) algorithm. It classifies sentiments into five classes: Strongly Positive, Strongly Negative, Positive, Negative and Neutral. We implement these processes using public tweets collected from the open repository. The results obtained from the simulation re auspicious in terms of upcoming metrics including, Accuracy, Precision, Recall, F-Measure and Error Rate. From the comparison results, it perceived that our method enhances 25% in Accuracy and Precision, 30% in Recall, 20% in F-Measure and reduces 29% in error rate compared to the existing methods including LAN2
FIS, GA and HCS
Other Latest Articles
- APPLYING INTERVENTIONS TO ENHANCE KNOWLEDGE OF MALAYSIAN CULTURAL ART FORMS
- EVALUATION OF PRIMARY AND SECONDARY SCHOOL BUILDINGS IN THE CITY OF BAFOUSSAM, CAMEROON
- EVALUATION OF PRIMARY AND SECONDARY SCHOOL BUILDINGS IN THE CITY OF BAFOUSSAM, CAMEROON
- EFFECT OF BULBOUS BOW ON RESISTANCE OF A TUNA LONGLINER
- HYPERSPECTRAL SIGNATURES AND PETROGRAPHIC STUDIES OF STEATITE DEPOSIT WITHIN ULTRAMAFIC (PYROXENITE) ROCK OF DHARWAR CRATON, AROUND KARIGAALA AREA, MYSURU DISTRICT, KARNATAKA, INDIA
Last modified: 2020-05-20 20:01:20