ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

FoCUS ? Forum Crawler Under Supervision?

Journal: International Journal of Computer Science and Mobile Computing - IJCSMC (Vol.3, No. 8)

Publication Date:

Authors : ;

Page : 79-84

Keywords : Page classification; URL pattern learning; Sentimental analysis;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Forum Crawler Under Supervision (FoCUS) is a supervised web-scale forum crawler. The web contains large data and innumerable websites that are monitored by a tool or program known as crawler. The goal is to crawl relevant forum content from the web with minimal overhead. Forums have different layouts or styles and are powered by different forum software packages. They have similar implicit navigation paths connected by specific URL types to lead users from entry pages to thread pages. It reduces the web forum crawling problem to a URL-type recognition problem. It also shows how to learn accurate and effective regular expression patterns of implicit navigation paths from automatically created training sets using aggregated results from weak page type classifiers. These type classifiers can be trained and applied to large set of unseen forums. It produces the best effectiveness and addresses the scalability issue and includes the concept called sentimental analysis.

Last modified: 2014-08-09 01:26:56