ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

An Improved Document Image Classification using Deep Transfer Learning and Feature Reduction

Journal: International Journal of Advanced Trends in Computer Science and Engineering (IJATCSE) (Vol.10, No. 2)

Publication Date:

Authors : ;

Page : 549-557

Keywords : ERP; Document image classification; Deep feature selection; Transfer learning.;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Electronic Document Management is an essential workflow within every successful ERP implementation. The integration of these documents in their respective pipelines (e.g. OCR, data extraction) inside the ERP system for processing usually requires a previous classification step to improve the success rate. Unfortunately, due to the variation in type, size, and layout of business documents (i.e. invoices, checks, delivery forms), their classification is a challenging computer task and may need an additional data for model training. This paper investigates the Transfer Learning paradigm using different pre-trained deep models to extract useful features from scanned document images. In fact, the machine learning classifiers, such as Logistic Regression (LR), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Gaussian Naive Bayes (GNB) process the extracted features for classification. The authors compared the constructed models performances based on various metrics. To overcome the over-fitting issue and dataset imbalance, we run a crossvalidation procedure at different folds sizes (4, 6, and 8) to assess the models' generalization ability. We also inspected the effect of dimensionality reduction techniques such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) on the overall performances and execution time. We found that the best classification rate is 97.83% achieved by combining LR, LDA, and the DenseNet121 deep model. Despite the small used dataset (546 images), this excellent performance encourages the integration of this approach in an ERP system as a separate module for document preprocessing for ERP users.

Last modified: 2021-04-10 14:57:39