Template Extraction from Heterogeneous Web Pages
Journal: International Journal of Advanced Computer Research (IJACR) (Vol.2, No. 6)Publication Date: 2012-12-16
Authors : Trupti B. Mane Girish P. Potdar;
Page : 197-201
Keywords : Template extraction; Clustering; Data mining; Information search and retrieval.;
Abstract
The World Wide Web (WWW) is getting a lot of attention as it is becoming huge repository of information. A web page gets deployed on website by its web template system. Those templates can be used by any individual or organization to set up their website. Also the templates provide its readers the ease of access to the contents guided by consistent structures. Hence the template detection techniques are emerging as Web Templates are becoming more and more important. Earlier systems consider all documents are guaranteed to conform to a common template and hence template extraction is done with those assumptions. However it is not feasible in real application. Our focus is on extracting templates from heterogeneous web pages. But due to large variety of web documents, there is a need to manage unknown number of templates. This can be achieved by clustering web documents by selecting a good partition method. The correctness of extracted templates depending on quality of clustering.
Other Latest Articles
- A NOVEL METHOD OF MANAGING ANTERIOR EPISTAXIS
- Automatic Medical Image Classification and Abnormality Detection Using K-Nearest Neighbour
- MAXILLARY SINUS ANTROSTOMY PITFALLS
- Speech Coding Development for Audio fixing Using Spectrum Analysis
- FIBROUS DYSPLASIA OF FACIOMAXILLARY REGION CASE REPORTS AND REVIEW OF LITERATURE
Last modified: 2013-01-26 19:33:38