Data Balancing Scheme for Multi-node Heterogeneous Hadoop Cluster
Journal: International Journal of Science and Research (IJSR) (Vol.6, No. 3)Publication Date: 2017-03-05
Authors : Indresh B. Rajwade; Er. Prateek Singh;
Page : 2088-2094
Keywords : Big Data; Hadoop; MapReduce; HDFS; Grep; WordCount; Heterogeneous cluster;
Abstract
Big data encompasses huge amount of information from multiple internal and external resources such as transactions, social media, enterprise content, sensors and mobile devices. It is characterized as volume, velocity, variety and veracity. MapReduce is a parallel computing framework which meets the tremendous needs for large scale data processing. Due to its simplicity, robustness and scalability MapReduce has been widely used by the companies such as Amazon, Facebook and Yahoo! to process large volumes of data on a daily basis. The MapReduce framework simplifies the complexity of running distributed data processing functions across multiple nodes in a cluster. It automatically handles the gathering of results across the multiple nodes and returns a single result or a set. Hadoop is an open source implementation of MapReduce which balances the load in a cluster by distributing data to multiple nodes based on disk space availability and processing efficiency. In this dissertation, the evaluation of data placement mechanism in a heterogeneous Hadoop cluster is performed using Grep tool and WordCount program. These are two MapReduce applications running on Hadoop clusters. A comparison has been done with Grep and WordCount through Ubuntu 14.04 LTS for three nodes in a Hadoop cluster and it is observed that the computing ratios of a Hadoop cluster are application dependent and size independent. This means that if the configuration of a cluster is updated, computing ratios must be determined again.
Other Latest Articles
- Elastic Electron Scattering from Some Proton-Rich Exotic Nuclei
- On Shrinkage Estimation of the Stress ? Strength Reliability of Exponentiaed Weibull Distribution
- Plant Species Identification using SIFT and SURF Technique
- Macrophages: Contribution to Diseases Development and Progression
- Run off Coefficient on Quaternary Volcanic Landform of Citarik Catchment based on Rainfall-Discharge Measurements, West-Java, Indonesia
Last modified: 2021-06-30 18:07:59