AUTOMATIC DOCUMENT CLUSTERING

Journal: International Journal of Computer Engineering and Technology (IJCET) (Vol.6, No. 5)

Publication Date: 2015-06-17

Authors : Mona Pardeshi; Neha Puranik; Aishwarya Tiwari; P.Y.Pawar;

Page : 8-12

Keywords : Document Clustering; Stemmer; Stop words removal; TF*IDF; Tokenization;

Source : Download Find it from : Google Scholar

Abstract

Automatic document clustering has played an important role in the field of information retrieval. The aim of the developed this system is to store documents in clusters and to improve its retrieval efficiently. Clustering is a technique aimed at grouping a set of objects into clusters. Document clustering is the task of combining a set of documents into clusters so that similar type of documents will be store in one cluster. We applied non overlapping method to store document into cluster. In this project, we write an algorithm which will calculate similarity of document’s keywords and according to its similarity points it will either put into existing cluster or new cluster is created and stored into that cluster. To find keywords from document various techniques are used like tokenization, stop word removal, stemmer, TF*IDF calculation.

Main Menu

Searching By

PARTNERS

AUTOMATIC DOCUMENT CLUSTERING

Abstract

Advertisement