ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Big Data Clustering Using Genetic Algorithm On Hadoop Mapreduce

Journal: International Journal of Scientific & Technology Research (Vol.4, No. 4)

Publication Date:

Authors : ; ; ;

Page : 58-62

Keywords : Index Terms Big Data; Clustering; Davies-Bouldin Index; Distributed processing; Hadoop MapReduce; Heuristics; Parallel Genetic Algorithm.;

Source : Downloadexternal Find it from : Google Scholarexternal


Abstract Cluster analysis is used to classify similar objects under same group. It is one of the most important data mining methods. However it fails to perform well for big data due to huge time complexity. For such scenarios parallelization is a better approach. Mapreduce is a popular programming model which enables parallel processing in a distributed environment. But most of the clustering algorithms are not naturally parallelizable for instance Genetic Algorithms. This is so due to the sequential nature of Genetic Algorithms. This paper introduces a technique to parallelize GA based clustering by extending hadoop mapreduce. An analysis of proposed approach to evaluate performance gains with respect to a sequential algorithm is presented. The analysis is based on a real life large data set.

Last modified: 2015-06-28 04:09:59