Improvisation in opinion mining using data preprocessing techniques based on consumer’s review
Journal: International Journal of Advanced Technology and Engineering Exploration (IJATEE) (Vol.10, No. 99)Publication Date: 2023-02-28
Authors : Kartika Makkar Pardeep Kumar Monika Poriye; Shalini Aggarwal;
Page : 257-277
Keywords : Support vector machine (SVM); Random forest (RF); Decision tree (DT); Logistic regression(LR); Naïve bayes (NB).;
Abstract
In today's digital age, an enormous volume of data is generated daily from various internet sources, including social media sites, emails, and consumer reviews. With competition on the rise, it has become essential for organizations to understand their customers' needs and preferences. To gain meaningful insights from human language data, such as reviews, and understand consumer perceptions, sentiment analysis is an effective method. This research article presents a text preprocessing approach consisting of three stages: data collection, cleaning, and transformation. The approach was applied to three datasets - restaurant, cell phone, and garments - and evaluated using various machine learning classifiers for sentiment prediction. A comparison was made between two sets of techniques: set1 employed data cleaning and transformation with stemming, while set2 used data cleaning and transformation with lemmatization. The results indicated that set2 (data cleaning, transformation with lemmatization) performed better during preprocessing when evaluated using various machine learning classifiers, such as support vector machine (SVM), logistic regression (LR), decision tree (DT), random forest (RF), and Naïve Bayes (NB). Specifically, SVM, LR, RF, and NB performed better for the restaurant dataset, while DT, LR, and RF outperformed for the cell phone dataset. In the garment's dataset, LR, DT, and RF outperformed for set2 compared to set1, making set2 the best preprocessing technique for subsequent comparison. Additionally, another comparison was made between two sets of techniques: set3 included text cleaning, transformation with lemmatization, and unigram features, while the other set included text cleaning, transformation with lemmatization, and bigram features. The sets were evaluated using machine learning classifiers, and the results revealed that set3 performed better with most classifiers.
Other Latest Articles
- MANET performance evaluation for DSDV, DSR and ZRP
- A novel design of triangular-shaped hexagonal fractal antenna for satellite communication
- Transmission line fault analysis using ANN and Rogowski coil
- Game theory-based photovoltaic array system reconfigure method: experimental validation
- Statistical analysis of the effect of different water for mixing and curing on the mechanical properties of M-sand concrete
Last modified: 2023-03-07 19:44:27