ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

REDUCING HUMAN EFFORT: WEB DATA MINING, LEARNING A NEW CHARACTERISTICS FROM BIG DATA

Journal: GRD Journal for Engineering (Vol.1, No. 1)

Publication Date:

Authors : ; ;

Page : 13-19

Keywords : Big Data; DOM; Extraction Pattern; Wrapper Learning & Adaption;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

This paper presents a Reducing Human Effort: Web Data Mining, Learning a New Characteristics from Big data, reducing human effort in extracting precise information from undetected Web sites. Our approach aims at automatically adapting the information extraction knowledge previously learned from a source Web site to a new undetected site, at the same time, discovering previously undetected attributes. There is a two kinds of text related evidences from the source Web site are considered. The first kind of evidences is obtained from the extraction pattern contained in the previously learned wrapper. The second kind of evidences is derived from the previously extracted or collected items. A generative model for the generation of the web site independent content information and the site dependent layout format of the text fragments related to attribute values contained in a Web page is designed to connect the insecurity involved. We have conducted extensive experiments from more than 50 real world Web sites in more than five different domains to demonstrate the effectiveness of our context.

Last modified: 2016-03-08 12:37:16