Discovering Informative Blocks from Web Pages for Efficient Information Extraction using DOM tree
Journal: International Journal of Computational and Electronic Aspects in Engineering (Vol.1, No. 2)Publication Date: 2015-03-30
Authors : Rakesh M. Kohale; Shreyash G. Balbudhe;
Page : 23-25
Keywords : DOM tree; Site Style Tree; Tokens; Parsing; Informative blocks; Non-informative blocks;
Abstract
A webpage generally contains data along with navigation panels, advertisements, copyright and privacy notices. Except data these other things does not contain any important information. These blocks can be called as non-informative blocks. As these blocks are non-informative, they can affect the result of web data mining. To avoid this it is important to separate the main data i.e. informative blocks and noninformative blocks from the web page. In a website these non-informative blocks are generally present in different web pages and have same format. Also the data contained in these blocks is also same. In case of informative blocks, data contained by the block and their format are different. We need a structure at site level to capture the same format of the blocks and the data present in the blocks. DOM Tree structure is available at page level. Many tools are available to construct a DOM Tree of a webpage. But DOM Tree structure is not useful at site level. So we need to construct a Site Style Tree (SST) for a website. After analyzing this SST we can identify which part of SST is informative and which is non-informative. There is no tool available to construct a style tree for a given website. This work aims at constructing a style tree for given website and separating informative and non-informative blocks from the website.
Other Latest Articles
- Review on Security Threats of Wireless Networks
- 3D Password for More Secure Authentication For Smart Phone
- Boltzmann Machine and Hyperbolic activation function in Higher Order Neuro Symbolic Integration
- Statistical Neural Networks in the Classification of Alcoholic Liver Disease and Nonalcoholic Fatty Liver Disease
- Design and Development of Coin Based Mobile Charger using Solar Energy
Last modified: 2016-02-29 14:15:22