Avoiding Duplication Data in HDFS Based on Supervised Learning

Journal: International Journal of Engineering and Techniques (Vol.4, No. 2)

Publication Date: 2018-04-25

Authors : Alapati Janardhana Rao Koppolu Sree Venkata Gopi Naga Raju;

Page : 747-753

Keywords : Big Data; Hadoop Distributed File System; Dynamic data replication;

Source : Download Find it from : Google Scholar

Abstract

The Hadoop Distributed File System (HDFS) part of Apache Hadoop helps in conveyed capacity of huge information with a group of item equipment. HDFS guarantees accessibility of information by duplicating information to various hubs. Be that as it may, the replication strategy of HDFS does not think about the notoriety of information. The prevalence of the documents tend to change after some time. Thus, keeping up a settled replication factor will influence the capacity effectiveness of HDFS. In this paper we propose a proficient dynamic information replication administration framework, which consider the ubiquity of documents put away in HDFS before replication. This methodology powerfully characterizes the records to hot information or cool information in view of its prominence and builds the reproduction of hot information by applying eradication coding for icy information. The trial comes about demonstrate that the proposed technique viably decreases the capacity usage up to 40% without influencing the accessibility and adaptation to internal failure in HDFS.

Main Menu

Searching By

PARTNERS

Avoiding Duplication Data in HDFS Based on Supervised Learning

Abstract

Advertisement