ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Navigating the Dark Web of Hate: Supervised Machine Learning Paradigm and NLP for Detecting Online Hate Speeches

Journal: International Journal of Advanced Engineering Research and Science (Vol.11, No. 03)

Publication Date:

Authors : ;

Page : 037-044

Keywords : Natural Language Processing; Tokenization; Logistic Regression; Hyperparameter;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Many online platform's participants are worried about hate speeches that usually trigger cyberbully attitudes that dissuades users' interest in their platforms. The study investigates hate speech in online platforms using Natural Language Processing (NLP) techniques and supervised machine learning paradigm. It specifically focused on developing a robust model capable of classifying text as 'hateful' or 'non-hateful' accurately. The approaches applied included compiling a large dataset from multiple online textual sources; preprocessing the dataset through normalization, tokenization, stop-word removal, and lemmatization; advanced feature extraction techniques such as negation handling, n-gram analysis, and Term Frequency-Inverse Document Frequency (TF-IDF) to capture the intricacies of the textual material and the model implementation phase using Logistic Regression for its efficiency in binary classification problems. The model's performance was evaluated using metrics such as accuracy, precision, recall, F1-score and confusion matrix. The baseline performance of the model with default hyperparameters achieved a test accuracy of 93%. When optimized with hyperparameter tuning and cross-validation procedures to guarantee more generalizable performance, the model achieved an accuracy of 95%. The study concluded that NLP and logistic regression technique can effectively identify hate speeches.

Last modified: 2024-04-16 16:26:55