ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

KERNEL PCA BASED DIMENSIONALITY REDUCTION TECHNIQUES FOR PREPROCESSING OF TELUGU TEXT DOCUMENTS FOR CLUSTER ANALYSIS

Journal: International Journal of Advanced Research in Engineering and Technology (IJARET) (Vol.11, No. 11)

Publication Date:

Authors : ;

Page : 1337-1352

Keywords : Dimensionality reduction; Clustering; K-means clustering algorithm; Principal Component Analysis (PCA); Kernel Principal Component Analysis (Kernel PCA).;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

In this paper we focus on investigating the effect of Dimensionality reduction on text document clustering. Clustering is the process of finding groups of objects such that the objects in a group will be similar to one another and different from the objects in other groups. Dimensionality reduction is the transformation of high dimensional data into a meaningful representation of reduced dimensionality of the data. Indian languages are highly inflectional. The dimension of the feature vector hence is very large resulting in poor performance when K-means clustering algorithm is applied. To improve the clustering efficiency KPCA (Kernel Principal Component Analysis) technique is investigated on Indic Script documents and obtained a reduced data set. We aim to investigate Principle Component Analysis (PCA), and Kernel PCA feature reduction technique (KPCA) for dimensionality reduction on Indic script documents and then apply to K-means clustering algorithm. Telugu text documents are chosen as case study for a baseline. Various Kernel functions applied for improving efficiency is also aimed and compared the results with basic PCA technique.

Last modified: 2021-02-22 20:10:35