ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Improvisation in opinion mining using data preprocessing techniques based on consumer’s review

Journal: International Journal of Advanced Technology and Engineering Exploration (IJATEE) (Vol.10, No. 99)

Publication Date:

Authors : ; ;

Page : 257-277

Keywords : Support vector machine (SVM); Random forest (RF); Decision tree (DT); Logistic regression(LR); Naïve bayes (NB).;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

In today's digital age, an enormous volume of data is generated daily from various internet sources, including social media sites, emails, and consumer reviews. With competition on the rise, it has become essential for organizations to understand their customers' needs and preferences. To gain meaningful insights from human language data, such as reviews, and understand consumer perceptions, sentiment analysis is an effective method. This research article presents a text preprocessing approach consisting of three stages: data collection, cleaning, and transformation. The approach was applied to three datasets - restaurant, cell phone, and garments - and evaluated using various machine learning classifiers for sentiment prediction. A comparison was made between two sets of techniques: set1 employed data cleaning and transformation with stemming, while set2 used data cleaning and transformation with lemmatization. The results indicated that set2 (data cleaning, transformation with lemmatization) performed better during preprocessing when evaluated using various machine learning classifiers, such as support vector machine (SVM), logistic regression (LR), decision tree (DT), random forest (RF), and Naïve Bayes (NB). Specifically, SVM, LR, RF, and NB performed better for the restaurant dataset, while DT, LR, and RF outperformed for the cell phone dataset. In the garment's dataset, LR, DT, and RF outperformed for set2 compared to set1, making set2 the best preprocessing technique for subsequent comparison. Additionally, another comparison was made between two sets of techniques: set3 included text cleaning, transformation with lemmatization, and unigram features, while the other set included text cleaning, transformation with lemmatization, and bigram features. The sets were evaluated using machine learning classifiers, and the results revealed that set3 performed better with most classifiers.

Last modified: 2023-03-07 19:44:27