Segmentation of Unstructured Newspaper Documents
Journal: International Journal of Advanced Engineering Research and Science (Vol.4, No. 5)Publication Date: 2017-05-08
Authors : Santosh Naik; R. Dinesh; Prabhanjan S.;
Page : 79-83
Keywords : Document Layout Analysis; data extraction; document page segmentation; unstructured document.;
Abstract
Document layout analysis is one of the important steps in automated document recognition systems. In Document layout analysis, meaningful information is retrieved from document images by identifying, categorizing and labeling the semantics of text blocks from the document images. In this paper, we present simple top-down approach for document page segmentation. We have tested the proposed method on unstructured documents like newspaper which is having complex structures having no fixed structure. Newspaper also has multiple titles and multiple columns. In the proposed method, white gap area which separates titles, columns of text, line of text and words in lines have been identified to separate document into various segments. The proposed algorithm has been successfully implemented and applied over a large number of Indian newspapers and the results have been evaluated by number of blocks detected and taking their correct ordering information into account.
Other Latest Articles
- MHD Free Convective Radiative and Chemicaly Reactive Flow Over a Vertical Porous Surface in the Presence of Diffusion-Thermo Effect
- Explosions, Abnormal Loads on Structures
- Heartbeat and Temperature Monitoring System for Remote Patients using Arduino
- Equilibrium Isotherm, Kinetic and Thermodynamic Studies of the Adsorption of Erythrosine Dye onto Activated Carbon from Coconut Fibre
- Combination between Cobit 5 and ITIL V3 2011
Last modified: 2017-05-21 01:04:53