ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

A Two-Stage Machine Learning Classification Approach to Identify Extremism in Arabic Opinions

Journal: International Journal of Advanced Trends in Computer Science and Engineering (IJATCSE) (Vol.10, No. 2)

Publication Date:

Authors : ;

Page : 736-745

Keywords : Machine Learning; Arabic Sentiment Analysis; Opinion Mining; Extremism.;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

The increased usage of the Internet and social networks allowed and enabled people to express their views, which have generated an increasing attention lately. Sentiment Analysis (SA) techniques are used to determine the polarity of information, either positive or negative, toward a given topic, including opinions. In this research, we have introduced a machine learning approach based on Support Vector Machine (SVM), Naïve Bayes (NB) and Random Forest (RF) classifiers, to find and classify extreme opinions in Arabic reviews. To achieve this, a dataset of 1500 Arabic reviews was collected from Google Play Store. In addition, a two-stage Classification process was applied to classify the reviews. In the first stage, we built a binary classifier to sort out positive from negative reviews. In the second stage, however we applied a binary classification mechanism based on a set of proposed rules that distinguishes extreme positive from positive reviews, and extreme negative from negative reviews. Four major experiments were conducted with a total of 10 different sub experiments to fulfill the two-stage process using different X-validation schemas and Term Frequency-Inverse Document Frequency feature selection method. Obtained results have indicated that SVM was the best during the first stage classification with 30% testing data, and NB was the best with 20% testing data. The results of the second stage classification indicated that SVM has scored better results in identifying extreme positive reviews when dealing with the positive dataset with an overall accuracy of 68.7% and NB showed better accuracy results in identifying extreme negative reviews when dealing with the negative dataset, with an overall accuracy of 72.8%.

Last modified: 2021-04-10 17:43:29