Web Data Extraction and Alignment Tools: A Survey
Journal: International Journal of Scientific Engineering and Technology (IJSET) (Vol.2, No. 6)Publication Date: 2013-06-01
Authors : Shridevi A. Swami Pujashree Vidap;
Page : 573-578
Keywords : Data extraction; Wrapper induction; DOM tree; Web crawler; Data alignment;
Abstract
Search engine generates the dynamic result page when user submits a query. Result page consists of query relevant data along with some auxiliary information such as advertisement, navigation panels. Decision making regarding which part of this web page has main content is easy for human but tough for computer programs. So in order to utilize this data, it is necessary to remove irrelevant data and automatically extract data from those result pages. Further extracted data can be aligned in structured format like table for comparison. This paper deals with the study of various automatic web data extraction and data alignment techniques. Web data extraction techniques are mainly classified as Wrapper programming languages, Wrapper induction and Automatic extraction. For data alignment some techniques rely only on structure of html tags or on both tag and data values.
Other Latest Articles
Last modified: 2013-06-08 22:02:54