Big Data Clustering Using Genetic Algorithm On Hadoop Mapreduce
Journal: International Journal of Scientific & Technology Research (Vol.4, No. 4)Publication Date: 2015-04-15
Authors : Nivranshu Hans; Sana Mahajan; SN Omkar;
Page : 58-62
Keywords : Index Terms Big Data; Clustering; Davies-Bouldin Index; Distributed processing; Hadoop MapReduce; Heuristics; Parallel Genetic Algorithm.;
Abstract
Abstract Cluster analysis is used to classify similar objects under same group. It is one of the most important data mining methods. However it fails to perform well for big data due to huge time complexity. For such scenarios parallelization is a better approach. Mapreduce is a popular programming model which enables parallel processing in a distributed environment. But most of the clustering algorithms are not naturally parallelizable for instance Genetic Algorithms. This is so due to the sequential nature of Genetic Algorithms. This paper introduces a technique to parallelize GA based clustering by extending hadoop mapreduce. An analysis of proposed approach to evaluate performance gains with respect to a sequential algorithm is presented. The analysis is based on a real life large data set.
Other Latest Articles
- Al-7075 Malzemesinin Freze Tezgh26305nda Delme 2630426351leminde Farkl26305 Devir Ve 26304lerleme H26305zlar26305 26304in Olu26351an Titre26351imlerin 26304ncelenmesi
- Experimental Study For Pizometric Head Distribution Under Hydraulic Structures
- Optimization Of Bleaching Parameters By Whiteness Index And Bursting Strength Of Knitted Cotton Fabric
- Implementation Of 5S Methodology In The Small Scale Industry A Case Study
- High Thermoelectric Performance Of Unsintered NaCo2O4 Nanocrystal
Last modified: 2015-06-28 04:09:59