Performance Evaluation of Cluster Based Algorithm used for Text Document Classification
Journal: International Journal of Science and Research (IJSR) (Vol.5, No. 5)Publication Date: 2016-05-05
Authors : Rohit S. Patil; Manish Bhardwaj;
Page : 751-754
Keywords : clustering; classification; text mining; dimensionality reduction; Gaussian mixture;
Abstract
In this paper we develop a complete methodology for document classification and clustering. We start by investigating how the choice of document features influences the performance of a document classifier and then use our findings to develop a clustering method suitable for document collections. From our study of the effect of frequency transformation, term weighting and dimensionality reduction through principal components analysis on the performance of a simple K-nearest-neighbors classifier, we conclude that (a) applying a logarithm or square-root transformation to the term frequencies reduces error rates, (b) term weighting of the transformed frequencies does not appear to help much, and (c) increasing the feature space dimension beyond 50 does not improve performance. We use these findings in the construction of a Gaussian Mixture Document Clustering (GMDC) algorithm. This algorithm models the data as a sample from a Gaussian mixture. The model is used to build clusters based on the likelihood of the data, and to classify documents according to Bayes rule. Finally we will build our own classifier which will have ability to automatically select the number of clusters present in the document collection and do classification more efficiently then above two classifier.
Other Latest Articles
- Image Transmission Technique via Mosaic Image Steganography
- Flexural Behaviour of Reinforced Concrete Beam with Hollow Core at Various Depth
- Energy Conservation in Air Supply Unit Used for Ventilation Purpose in Automobile Industries
- Simulation of Droplet Combustion for Monopropellants
- Comparison of Onset, Duration of Action and Intubating Conditions of Three Dosages 0.3 mg/kg, 0.6 mg/kg, 0.9 mg/kg of Rocuronium Bromide
Last modified: 2021-07-01 14:37:34