Arabic Text Classification using K-Nearest Neighbour Algorithm
Journal: The International Arab Journal of Information Technology (Vol.12, No. 2)Publication Date: 2015-03-01
Authors : Roiss Alhutaish; Nazlia Omar;
Page : 189-194
Keywords : ATC; K-NN; similarity measures; feature selection methods.;
Abstract
Many algorithms have been implemented to the problem of Automatic Text Categorization (ATC). Most of the work in this area has been carried out on English texts, with only a few researchers addressing Arabic texts. We have investigated the use of the K-Nearest Neighbour (K-NN) classifier, with an Inew, cosine, jaccard and dice similarities, in order to enhance Arabic ATC. We represent the dataset as un-stemmed and stemmed data; with the use of TREC-2002, in order to remove prefixes and suffixes. However, for statistical text representation, Bag-Of-Words (BOW) and character-level 3 (3-Gram) were used. In order to, reduce the dimensionality of feature space; we used several feature selection methods. Experiments conducted with Arabic text showed that the K-NN classifier, with the new method similarity Inew 92.6% Macro-F1, had better performance than the K-NN classifier with cosine, jaccard and dice similarities. Chi-square feature selection, with representation by BOW, led to the best performance over other feature selection methods using BOW and 3-Gram
Other Latest Articles
- Optimized Features Selection using Hybrid PSOGA for Multi-View Gender Classification
- Optimizing Ontology Alignments by using NSGA-II
- The Proposal of a Qualification Based Approach to Teach Software Engineering Course
- Chaos Genetic Algorithm Instead Genetic Algorithm
- A Biometric Based Secure Session Key Agreement using Modified Elliptic Curve Cryptography
Last modified: 2019-11-14 22:26:49