ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Arabic Text Classification using K-Nearest Neighbour Algorithm

Journal: The International Arab Journal of Information Technology (Vol.12, No. 2)

Publication Date:

Authors : ; ;

Page : 189-194

Keywords : ATC; K-NN; similarity measures; feature selection methods.;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Many algorithms have been implemented to the problem of Automatic Text Categorization (ATC). Most of the work in this area has been carried out on English texts, with only a few researchers addressing Arabic texts. We have investigated the use of the K-Nearest Neighbour (K-NN) classifier, with an Inew, cosine, jaccard and dice similarities, in order to enhance Arabic ATC. We represent the dataset as un-stemmed and stemmed data; with the use of TREC-2002, in order to remove prefixes and suffixes. However, for statistical text representation, Bag-Of-Words (BOW) and character-level 3 (3-Gram) were used. In order to, reduce the dimensionality of feature space; we used several feature selection methods. Experiments conducted with Arabic text showed that the K-NN classifier, with the new method similarity Inew 92.6% Macro-F1, had better performance than the K-NN classifier with cosine, jaccard and dice similarities. Chi-square feature selection, with representation by BOW, led to the best performance over other feature selection methods using BOW and 3-Gram

Last modified: 2019-11-14 22:26:49