ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Web Data Extraction and Alignment Tools: A Survey

Journal: International Journal of Scientific Engineering and Technology (IJSET) (Vol.2, No. 6)

Publication Date:

Authors : ;

Page : 573-578

Keywords : Data extraction; Wrapper induction; DOM tree; Web crawler; Data alignment;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Search engine generates the dynamic result page when user submits a query. Result page consists of query relevant data along with some auxiliary information such as advertisement, navigation panels. Decision making regarding which part of this web page has main content is easy for human but tough for computer programs. So in order to utilize this data, it is necessary to remove irrelevant data and automatically extract data from those result pages. Further extracted data can be aligned in structured format like table for comparison. This paper deals with the study of various automatic web data extraction and data alignment techniques. Web data extraction techniques are mainly classified as Wrapper programming languages, Wrapper induction and Automatic extraction. For data alignment some techniques rely only on structure of html tags or on both tag and data values.

Last modified: 2013-06-08 22:02:54