ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

AN ENSEMBLE OF FEATURE SELECTION WITH DEEP LEARNING BASED AUTOMATED TAMIL DOCUMENT CLASSIFICATION MODELS

Journal: International Journal of Electrical Engineering and Technology (IJEET) (Vol.11, No. 9)

Publication Date:

Authors : ;

Page : 70-91

Keywords : Tamil Document classification; stemming; TF-IDF; Extra-Tree; Deep Learning;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

In recent times, the exponential growth of the Internet has resulted to an enormous number of electronic documents in several regional languages apart from English. Numerous documents in Tamil language are being generated from news, blogs, eBooks, and entertainment, the automated classification of Tamil documents is needed. Since the automated Tamil document classification is not discovered proficiently, this study focuses on the development of deep learning (DL) models for Tamil document classification. This paper introduces an ensemble of feature selection with DL based classification models for Tamil documents. The presented model primarily involves preprocessing to remove the unwanted data and improve the data quality to a certain extent. Besides, term frequency–inverse document frequency (TF-IDF) approach is used to extract the features from the Tamil documents. In addition, two feature selection (FS) techniques namely Chi Squared(CS) and Extra Tree (ET) Classifier models are employed. The proposed method also uses deep neural network (DNN) and convolutional neural network (CNN) models for classification purposes. A detailed experimentation analysis takes place using a Tamil document dataset gathered by our own. The experimental values showcased that the ETFS-CNN model has obtained effective classification outcome with the maximum accuracy of 90%, precision of 90.57%, recall of 90%, and F-score of 89.89%.

Last modified: 2021-03-04 18:43:12