ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Segmentation of Unstructured Newspaper Documents

Journal: International Journal of Advanced Engineering Research and Science (Vol.4, No. 5)

Publication Date:

Authors : ; ; ;

Page : 79-83

Keywords : Document Layout Analysis; data extraction; document page segmentation; unstructured document.;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Document layout analysis is one of the important steps in automated document recognition systems. In Document layout analysis, meaningful information is retrieved from document images by identifying, categorizing and labeling the semantics of text blocks from the document images. In this paper, we present simple top-down approach for document page segmentation. We have tested the proposed method on unstructured documents like newspaper which is having complex structures having no fixed structure. Newspaper also has multiple titles and multiple columns. In the proposed method, white gap area which separates titles, columns of text, line of text and words in lines have been identified to separate document into various segments. The proposed algorithm has been successfully implemented and applied over a large number of Indian newspapers and the results have been evaluated by number of blocks detected and taking their correct ordering information into account.

Last modified: 2017-05-21 01:04:53