An Improved Document Image Classification using Deep Transfer Learning and Feature Reduction
Journal: International Journal of Advanced Trends in Computer Science and Engineering (IJATCSE) (Vol.10, No. 2)Publication Date: 2021-04-09
Authors : Aissam JADLI Mustapha HAIN Anouar HASBAOUI;
Page : 549-557
Keywords : ERP; Document image classification; Deep feature selection; Transfer learning.;
Abstract
Electronic Document Management is an essential workflow within every successful ERP implementation. The integration of these documents in their respective pipelines (e.g. OCR, data extraction) inside the ERP system for processing usually requires a previous classification step to improve the success rate. Unfortunately, due to the variation in type, size, and layout of business documents (i.e. invoices, checks, delivery forms), their classification is a challenging computer task and may need an additional data for model training. This paper investigates the Transfer Learning paradigm using different pre-trained deep models to extract useful features from scanned document images. In fact, the machine learning classifiers, such as Logistic Regression (LR), K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Gaussian Naive Bayes (GNB) process the extracted features for classification. The authors compared the constructed models performances based on various metrics. To overcome the over-fitting issue and dataset imbalance, we run a crossvalidation procedure at different folds sizes (4, 6, and 8) to assess the models' generalization ability. We also inspected the effect of dimensionality reduction techniques such as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) on the overall performances and execution time. We found that the best classification rate is 97.83% achieved by combining LR, LDA, and the DenseNet121 deep model. Despite the small used dataset (546 images), this excellent performance encourages the integration of this approach in an ERP system as a separate module for document preprocessing for ERP users.
Other Latest Articles
- Simulation photoelectric parameters of vertical junction solar cells
- The Emergence of Artificial Intelligence for Industrial Internet of Thing Engagement
- Impact of Technological Innovation on Product and Distribution Strategies in Mobile Phone Industry
- Delay Aware Accident Detection and Response System Using IoT
- General Relativistic Magnetohydrodynamic Source Terms in 3+1 Form
Last modified: 2021-04-10 14:57:39