UCOM Offline Dataset-An Urdu Handwritten Dataset Generation
Journal: The International Arab Journal of Information Technology (Vol.14, No. 2)Publication Date: 2017-03-01
Authors : Saad Bin Ahmed; Saeeda Naz; Salahuddin Swati; Imran Razzak; Arif Iqbal Umar; Akbar Ali Khan;
Page : 239-245
Keywords : Recurrent neural networks; optical character recognition; cursive; offline handwriting.;
Abstract
A benchmark database for character recognition is an essential part for efficient and robust development. Unfortunately, there is no comprehensive handwritten dataset for Urdu language that would be used to compare the state of the art techniques in the field of optical character recognition. In this paper, we present a new and publically available dataset comprising 600 pages of handwritten Urdu text written in Nasta'liq style in conjunction with detailed ground truth for the evaluation of handwritten Urdu character recognition. This dataset contains text lines written in Nasta'liq style by limited individuals on A4 size paper. The acquired data on page was scanned and text lines were segmented. UCOM database covers all Urdu characters and ligatures with different variation in addition to Urdu numeric data. We have considered that ligature consists of up to five characters in this dataset. The UCOM dataset can be used for handwritten character recogntition as well as writer identification. We proposed and evaluated the strength of Recurrent Neural Networks (RNN) on UCOM offline database sample text line.
Other Latest Articles
- QoS Adaptation for Publish/Subscribe Middleware in Real-Time Dynamic Environments
- An Approach for Identifying Failure-Prone State of Computer System
- Global Software Development Geographical Distance Communication Challenges
- COST-BENEFIT ANALYSIS OF JOURNALS SUBSCRIPTION AT NEHRU LIBRARY, CCSHAU, HISAR, HARYANA
- Software Defect Prediction in Large Space Systems through Hybrid Feature Selection and Classification
Last modified: 2019-05-08 16:47:32