ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Avoiding Duplication Data in HDFS Based on Supervised Learning

Journal: International Journal of Engineering and Techniques (Vol.4, No. 2)

Publication Date:

Authors : ;

Page : 747-753

Keywords : Big Data; Hadoop Distributed File System; Dynamic data replication;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

The Hadoop Distributed File System (HDFS) part of Apache Hadoop helps in conveyed capacity of huge information with a group of item equipment. HDFS guarantees accessibility of information by duplicating information to various hubs. Be that as it may, the replication strategy of HDFS does not think about the notoriety of information. The prevalence of the documents tend to change after some time. Thus, keeping up a settled replication factor will influence the capacity effectiveness of HDFS. In this paper we propose a proficient dynamic information replication administration framework, which consider the ubiquity of documents put away in HDFS before replication. This methodology powerfully characterizes the records to hot information or cool information in view of its prominence and builds the reproduction of hot information by applying eradication coding for icy information. The trial comes about demonstrate that the proposed technique viably decreases the capacity usage up to 40% without influencing the accessibility and adaptation to internal failure in HDFS.

Last modified: 2018-07-06 19:45:40