ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

SmartDigger: A Two-stage Crawler for Efficiently Harvesting Deep-Web

Journal: International Journal for Scientific Research and Development | IJSRD (Vol.3, No. 11)

Publication Date:

Authors : ; ; ; ; ;

Page : 146-148

Keywords : Harvesting Deep-Web; SmartDigger;

Source : Downloadexternal Find it from : Google Scholarexternal


As deep web grows at a very fast pace, there has been amplified interest in techniques that help proficiently locate deep-web interfaces. However, due to the large volume of web possessions and the dynamic nature of deep web, achieving wide coverage and high efficiency is a challenging matter. We propose a two-stage framework, namely SmartCrawler, for efficient harvesting unfathomable web interfaces. In the first stage, SmartCrawler performs site-based searching for heart pages with the help of search engines, avoiding visiting a huge amount of pages. To achieve more accurate results for a focused crawl, SmartCrawler position websites to prioritize highly pertinent ones for a given topic. In the second stage, SmartCrawler achieves fast in-site penetrating by excavating most relevant links with an adaptive link-ranking. To eliminate bias on visiting some highly relevant links in secreted web directories, we design a link tree data structure to achieve wider coverage for a website. Our investigational results on a set of delegate domains show the agility and accuracy of our proposed crawler framework, which proficiently retrieves deep-web interfaces from large-scale sites and achieves higher harvest rates than other crawlers results.

Last modified: 2016-02-12 19:13:19