Comparison of Different Distance Measure Methods in Text Document Clustering
Journal: International Journal of Research and Engineering (Vol.5, No. 7)Publication Date: 2018-08-08
Authors : Yin Min Tun;
Page : 445-449
Keywords : Text mining; text document clustering; distance measure; k-means clustering;
Abstract
Clustering text document is an unsupervised learning method to find common groups. The clustering of text documents are the special issue in text mining for unlabeled train documents. Fortunately, there are many proposed features and methods to resolve this problem. The framework of text document classification consists of: input text document, preprocessing, feature extraction and clustering. The common classification methods are: self-organization map, k-means and mixture of Gaussians. The correlation of resulted clusters is based on selecting a distance measure method. The main focus of this paper is to present different exiting distance measure methods along with k-means clustering for text document clustering. The experiment performed k-means clustering on the Newsgroups dataset and measure clustering entropy to evaluate the different distance measure methods.
Other Latest Articles
- FPGA Based Implementation of Cascaded Multi-level Inverter with Adjustable DC
- The Impact of Applying Electronic Management System on the English Language Level: A Case study at Cihan University
- Leaching of Nigeria Limonitic Laterite using Acidified Sodium Thiosulphate
- A POESIA DE MACHADO DE ASSIS
- DISCURSO DE INAUGURAÇÃO DO MESTRADO EM LETRAS
Last modified: 2018-08-09 07:16:55