ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Template Extraction from Heterogeneous Web Pages

Journal: International Journal of Advanced Computer Research (IJACR) (Vol.2, No. 6)

Publication Date:

Authors : ;

Page : 197-201

Keywords : Template extraction; Clustering; Data mining; Information search and retrieval.;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

The World Wide Web (WWW) is getting a lot of attention as it is becoming huge repository of information. A web page gets deployed on website by its web template system. Those templates can be used by any individual or organization to set up their website. Also the templates provide its readers the ease of access to the contents guided by consistent structures. Hence the template detection techniques are emerging as Web Templates are becoming more and more important. Earlier systems consider all documents are guaranteed to conform to a common template and hence template extraction is done with those assumptions. However it is not feasible in real application. Our focus is on extracting templates from heterogeneous web pages. But due to large variety of web documents, there is a need to manage unknown number of templates. This can be achieved by clustering web documents by selecting a good partition method. The correctness of extracted templates depending on quality of clustering.

Last modified: 2013-01-26 19:33:38