A Word & Character N-Gram based Arabic OCR Error Simulation model
Journal: INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY (Vol.12, No. 8)Publication Date: 2014-03-04
Authors : Mostafa Ezzat; Tarek ElGhazaly; Mervat Gheith;
Page : 3758-3767
Keywords : Arabic OCR Degraded Text Retrieval; Arabic OCR-Degrade Text; Orthographic Query Expansion; Synthesize OCR-Degraded Text.;
Abstract
This paper provides a new model aimed to enhanceArabic OCR degraded text retrieval effectiveness. The proposed model based onsimulating the Arabic OCR recognition mistakesbased on both, word based and Character N-Gram approaches. Then we expand the user search query using the expected OCR errors. The resulting search query expanded gives high precision and recall values in searching Arabic OCR-Degraded text rather than the original query. The proposed model showed a significant increase in the degraded text retrieval effectiveness over the previous models. The retrieval effectiveness of the newmodel is %93, while the best effectiveness published for word based approach was %84 and the best effectiveness for character based approach was %56.
Other Latest Articles
- Genetic Algorithm for solving flow problems in a Stochastic-flow Network under Budget Constraints
- ACTION OF THE COMBINATION OF ALTERNARIA ALTERNATA AND NEOCHETINA EICHHORNIAE ON GROWTH PARAMETERS OF THE WATER HYACINTH IN A CONTROLLED ENVIRONMENT
- IMPROVING TOLERANCE OF STEVIA REBAUDIANA TO WATER DEFICIT STRESS THROUGH FOLIAR SPRAY OF POTASSIUM NITRATE
- BASIC SPECIFICATION REGARDING THE WIND POWER SYSTEMS CONTROL
- Economic Production Inventory model with the associated costs of internet advertising to acquire customers residing worldwide
Last modified: 2016-06-29 18:08:15