REVIEW PAPER ON THE DEEP WEB DATA EXTRACTION
Journal: International Journal of Engineering Sciences & Research Technology (IJESRT) (Vol.7, No. 4)Publication Date: 2018-04-30
Authors : V. S. Patil Sneha Sitafale Priyanka Kale Poonam Bhujbal Mohini Dandge .;
Page : 39-44
Keywords : Web data extraction; Visual features of deep web pages; Wrapper generation; Feature extracting; Webpage;
Abstract
Deep web data extraction is the process of extracting a set of data records and the items that they contain from a query result page. Such structured data can be later integrated into results from other data sources and given to the user in a single, cohesive view. Domain identification is used to identify the query interfaces related to the domain from the forms obtained in the search process. The surface web contains a large amount of unfiltered information, whereas the deep web includes high-quality, managed and subject-specific information. The deep web grows faster than the surface web because the surface web is limited to what is easily found by search engines. The deep web covers domains such as education, sports and the economy. Deep web contents are accessed by queries submitted to web databases and the returned data records are enwrapped in dynamically generated web pages (they will be called deep web pages in this paper). Extracting structured data from deep web pages is a challenging problem due to the underlying intricate structures of such pages. For this large set of web databases show that the proposed vision-based approach is highly effective for deep web data extraction.
Other Latest Articles
- REVIEW PAPER ON HAPTIC TECHNOLOGY-SENSE OF TOUCH TO VIRTUAL WORLD INTERATION
- Indian Telecom Sector: An Overview
- TAX MAPPING AND RECORDS MANAGEMENT OF REAL PROPERTY UNITS: AN APPLICATION DEVELOPMENT FOR THE CITY GOVERNMENT OF BUTUAN, PHILIPPINES
- Impact of Learning and Development Strategy on Organisational Performance
- “What Work” and “What Doesn't Work” in Rehabilitation of Offenders: A General Perspective
Last modified: 2018-04-10 20:58:26