ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

A Supervised Method for Multi-keyword Web Crawling on Web Forums?

Journal: International Journal of Computer Science and Mobile Computing - IJCSMC (Vol.3, No. 2)

Publication Date:

Authors : ;

Page : 374-381

Keywords : Web crawler; page classification; forum crawler; URL based learning;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Web forums are used by large number of users to post and share their comments with other users of various websites. The forums consist of many lists of topics on their boards with a large list of threads in each board. The users can create many threads and share their views in posts as well. In this paper a supervised web forum multi-keyword crawler is proposed to crawl relevant contents from the forum pages by reducing the delay. All the forums in the web have navigation paths that lead to the forum threads and these paths are connected by specific types of URLs. Thus the proposed method needs to recognize the various URLs by using the regular expression patterns within the forum. Accurate page classifies trained by using other forums can be used to classify the regular expression patterns and detect the URLs. The obtained results show that the proposed method is more reliable and accurate comparing to other existing methods.

Last modified: 2014-02-20 13:22:24