Efficient High Dimensional Data Clustering Using Hubness PhenomenonJournal: International Journal of Computer Science and Mobile Computing - IJCSMC (Vol.3, No. 5)
Publication Date: 2014-05-30
Authors : Sayali P. Barde; Vikrant Chole; L. H. Patil;
Page : 1033-1040
Keywords : Clustering; curse of dimensionality; hubs; k-nearest neighbour;
High dimensional data occur naturally in many domains and have presented great challenges for traditional data mining techniques. Traditional clustering algorithms become computational expensive when data set to be cluster is large. The curse of dimensionality refers to various phenomena that arise when analyzing data is high dimensional data that do not occur in low dimensional data. The common theme of this problem is that, when the dimensionality increases, the volume of the space increases so fast that the available data becomes sparse. So that in high dimensional data all objects appear to be dissimilar in many ways so it becomes difficult to cluster. A new aspect of curse of dimensionality referred to as hubness that affects the distribution of Koccurrences. In this paper, we proposed and implement hubness based clustering over the high dimensional data. More specifically , hubness i.e. the tendency of high dimensional data to contain points(hubs) that frequently occur in k-nearest neighbor list of other points where hubs can be used effectively as cluster prototypes.
Other Latest Articles
Last modified: 2014-05-30 02:55:23