Implementing K-Means Clustering Algorithm Using MapReduce Paradigm
Journal: International Journal of Science and Research (IJSR) (Vol.5, No. 7)Publication Date: 2016-07-05
Authors : Botcha Chandrasekhara Rao; Medara Rambabu;
Page : 1240-1244
Keywords : Vector space model; map reduce; text clustering; map reduce k-means; Hadoop;
Abstract
Clustering is a useful data mining technique which groups data points such that the points within a single group have similar characteristics, while the points in different groups are dissimilar. Partitioning algorithm methods such as k-means algorithm is one kind of widely used clustering algorithms. As there is an increasing trend of applications to deal with vast amounts of data, clustering such big data is a challenging problem. Recently, partitioning clustering algorithms on a large cluster of commodity machines using the MapReduce framework have received a lot of attention. Traditional way of clustering text documents is Vector space model, in which tf-idf is used for k-means algorithm with supportive similarity measure. This project exhibits an approach to cluster text documents in which results obtained by executing map reduce k-means algorithm on single node cluster show that the performance of the algorithm increases as the text corpus increases.
Other Latest Articles
- Seismic Behaviour of Reinforced Concrete Bridge under Significance of Fluctuating Frequency
- Unconfined Groundwater Dispersion Model on Sand Layers in Coral Island
- Nearest Neighbor Search Technique for Novel Queries
- The Echoes of Buddhism, Mythology, the New Testament, and Inferno in Joseph Conrads Heart of Darkness
- Improved Method for Driver Drowsiness Detection Using Hybrid Template Matching Algorithm
Last modified: 2021-07-01 14:40:32