ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

AUTOMATIC DOCUMENT CLUSTERING

Journal: International Journal of Computer Engineering and Technology (IJCET) (Vol.6, No. 5)

Publication Date:

Authors : ; ; ; ;

Page : 8-12

Keywords : Document Clustering; Stemmer; Stop words removal; TF*IDF; Tokenization;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Automatic document clustering has played an important role in the field of information retrieval. The aim of the developed this system is to store documents in clusters and to improve its retrieval efficiently. Clustering is a technique aimed at grouping a set of objects into clusters. Document clustering is the task of combining a set of documents into clusters so that similar type of documents will be store in one cluster. We applied non overlapping method to store document into cluster. In this project, we write an algorithm which will calculate similarity of document’s keywords and according to its similarity points it will either put into existing cluster or new cluster is created and stored into that cluster. To find keywords from document various techniques are used like tokenization, stop word removal, stemmer, TF*IDF calculation.

Last modified: 2015-06-17 17:02:42