Bootstrapping in Text Mining Applications
Journal: International Journal of Science and Research (IJSR) (Vol.5, No. 1)Publication Date: 2016-01-05
Authors : C. K. Chandrasekhar; M. R. Srinivasan; B. Ramesh Babu;
Page : 337-344
Keywords : k-Fold Rotation Estimation; Clustering; k-Means; Principal Component Analysis; Dimensionality Reduction; Precision; Recall; F-Score; Scree Plot;
Abstract
Text mining involves analyzing large corpora of documents with thousands of words with a high level of noise content. Dimensionality reduction, noise mitigation, accurate and stable cluster formation are principal challenges of upstream analytics. This paper proposes a methodology for dimensionality as well as noise reduction using k-fold rotation estimation. Principal Component Analysis enables selecting a reduced set of dimensions (words). The resulting noise-reduced data set is the input to clustering algorithms. Experiments using benchmark data sets from the Brown corpus [5] and real life feedback data of a service provider show that our approach delivers improved results using the well-known performance measures recall, precision, and F-measure [14]. We used combination of projective transforms known as principal component analysis (PCA) and visual scree plot techniques [8, 6, 12] for dimensionality reduction and a k-Fold rotation sampling technique [1] for noise elimination and formation of stable clusters. Experimental results with corpora of different sizes demonstrate that the approach delivers improved clustering accuracy than standard k-means clustering algorithm [2].
Other Latest Articles
- Survey Paper on Elastic Search
- Role of Public Sector in Economic Development A Comparative Analysis of Adam Smith's and Chanakya's Views
- Survey on a Novel Approach for Web Service - Security Testing to Improve Web Service Robustness
- Survey on Content Prefetching for Mobile User from Cloud with less Energy Conserving Transmission Protocol
- Software Engineering Testing Research
Last modified: 2021-07-01 14:30:04