Semantic Similarity based Web Document Classification Using Support Vector Machine
Journal: The International Arab Journal of Information Technology (Vol.14, No. 3)Publication Date: 2017-05-01
Authors : Kavitha Chinniyan; Sudha Gangadharan; Kiruthika Sabanaikam;
Page : 285-292
Keywords : Document classification; text mining; SVM; latent semantic indexing.;
Abstract
With the rapid growth of information on the World Wide Web (WWW), classification of web documents has become important for efficient information retrieval. Relevancy of information retrieved can also be improved by considering semantic relatedness between words which is a basic research area in fields of natural language processing, intelligent retrieval, document clustering and classification, word sense disambiguation etc. The web search engine based semantic relationship from huge web corpus can improve classification of documents. This paper proposes an approach for web document classification that exploits information, including both page count and snippets. To identify the semantic relations between the query words, a lexical pattern extraction algorithm is applied on snippets. A sequential pattern clustering algorithm is used to form clusters of different patterns. The page count based measures are combined with the clustered patterns to define the features extracted from the word-pairs. These features are used to train the Support Vector Machine (SVM), in order to classify the web documents. Experimental results demonstrate 5% and 9% improvement in F1 measure for Reuters 21578 and 20 Newsgroup datasets in the classifier performance
Other Latest Articles
- Weighted Delta Factor Cluster Ensemble Algorithm for Categorical Data Clustering in Data Mining
- Effects of Network Structures and Fermi Function’s Parameter β in Promoting Information Spreading on Dynamic Social Networks
- New Replica Server Placement Strategies using Clustering Algorithms and SOM Neural Network in CDNs
- A Comparative Study on Various State of the Art Face Recognition Techniques under Varying Facial Expressions
- Enhanced Constrained Artificial Bee Colony Algorithm for Optimization Problems
Last modified: 2019-05-08 18:07:53