ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

HDFS Erasure Coding Based Hadoop Distributed File System

Journal: International Journal of Scientific & Technology Research (Vol.2, No. 9)

Publication Date:

Authors : ;

Page : 190-197

Keywords : Index Terms Erasure coding; Hadoop; HDFS; IO performance; node failure; replication; space efficiency.;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Abstract A simple replication-based mechanism has been used to achieve high data reliability of Hadoop Distributed File System HDFS. However replication based mechanisms have high degree of disk storage requirement since it makes copies of full block without consideration of storage size. Studies have shown that erasure-coding mechanism can provide more storage space when used as an alternative to replication. Also it can increase write throughput compared to replication mechanism. To improve both space efficiency and IO performance of the HDFS while preserving the same data reliability level we propose HDFS an erasure coding based Hadoop Distributed File System. The proposed scheme writes a full block on the primary DataNode and then performs erasure coding with Vandermonde-based Reed-Solomon algorithm that divides data into m data fragments and encode them into ndata fragments nm which are saved in N distinct DataNodes such that the original object can be reconstructed from any m fragments. The experimental results show that our scheme can save up to 33 of storage space while outperforming the original scheme in write performance by 1.4 times. Our scheme provides the same read performance as the original scheme as long as data can be read from the primary DataNode even under single-node or double-node failure. Otherwise the read performance of the HDFS decreases to some extent. However as the number of fragments increases we show that the performance degradation becomes negligible.

Last modified: 2014-03-17 17:38:10