KERNEL PCA BASED DIMENSIONALITY REDUCTION TECHNIQUES FOR PREPROCESSING OF TELUGU TEXT DOCUMENTS FOR CLUSTER ANALYSIS
Journal: International Journal of Advanced Research in Engineering and Technology (IJARET) (Vol.11, No. 11)Publication Date: 2020-11-30
Authors : Srinivas Mekala B. Padmaja Rani;
Page : 1337-1352
Keywords : Dimensionality reduction; Clustering; K-means clustering algorithm; Principal Component Analysis (PCA); Kernel Principal Component Analysis (Kernel PCA).;
Abstract
In this paper we focus on investigating the effect of Dimensionality reduction on text document clustering. Clustering is the process of finding groups of objects such that the objects in a group will be similar to one another and different from the objects in other groups. Dimensionality reduction is the transformation of high dimensional data into a meaningful representation of reduced dimensionality of the data. Indian languages are highly inflectional. The dimension of the feature vector hence is very large resulting in poor performance when K-means clustering algorithm is applied. To improve the clustering efficiency KPCA (Kernel Principal Component Analysis) technique is investigated on Indic Script documents and obtained a reduced data set. We aim to investigate Principle Component Analysis (PCA), and Kernel PCA feature reduction technique (KPCA) for dimensionality reduction on Indic script documents and then apply to K-means clustering algorithm. Telugu text documents are chosen as case study for a baseline. Various Kernel functions applied for improving efficiency is also aimed and compared the results with basic PCA technique.
Other Latest Articles
- STRENGTH CHARACTERISATION OF STEEL FIBER REINFORCED LIGHTWEIGHT CONCRETE
- Improving the method of searching digital illegal means obtaining information based on cluster analysis
- Model of strategic analysis of formation and administration of investment activity of stockholder construction company
- AN IMPROVISED CAESAR CIPHER TECHNIQUE FOR ENHANCING DATA SECURITY
- The emotional infection of the virtual innovation project team
Last modified: 2021-02-22 20:10:35