Natural Language Processing: Text Categorization And Classifications

Journal: International Journal of Advanced Networking and Applications (Vol.12, No. 02)

Publication Date: 2020-10-30

Authors : Mona Nasr; Andrew karam; Mina Atef; Kirollos Boles; Kirollos Samir; Mario Raouf;

Page : 4542-4548

Keywords : Data Mining; Diabetes; Classification; Prediction; KNN; Naive Bayes; Random Forest; SVM; Accuracy; Precision; F-Measure; Recall;

Source : Download Find it from : Google Scholar

Abstract

In Data mining, Classification and prediction are the two very essential forms of data analysis. They are widely used for extracting models for describing important data classes. This paper aims in designing classifier models based on five different classification algorithms namely, Decision Tree, K-Nearest Neighbors (KNN), Naive Bayes, Random Forest and Support Vector Machines (SVM), to classify and predict patients with diabetes. These classifiers are experimented with 10 fold Cross Validation and their performances are evaluated by computing Accuracy, Precision, F-Score, Recall and ROC measure. The test experiment shows that the accuracy given by classifier models developed by using Decision Tree, KNN, Naïve Bayes, SVM and Random Forest are 73.82%, 71.65%, 76.30%, 65.10% and 68.74 % respectively. Similarly, their precisions and recall are 0.705, 0.552, 0.759, 0.424, 0.538 and 0.738, 0.763, 0.82, 0.651, 0.804 respectively. Thus, this study shows that the Naïve Bayes algorithm provides the better accuracy in predicting diabetes as compared to other techniques. And, the data set chosen for this study is “Pima Indian Diabetic Dataset” taken from University of California, Irvine (UCI) Repository of Machine Learning databases.

Main Menu

Searching By

PARTNERS

Natural Language Processing: Text Categorization And Classifications

Abstract

Advertisement