Avoiding Duplication Data in HDFS Based on Supervised LearningJournal: International Journal of Engineering and Techniques (Vol.4, No. 2)
Publication Date: 2018-04-25
Authors : Alapati Janardhana Rao Koppolu Sree Venkata Gopi Naga Raju;
Page : 747-753
Keywords : Big Data; Hadoop Distributed File System; Dynamic data replication;
The Hadoop Distributed File System (HDFS) part of Apache Hadoop helps in conveyed capacity of huge information with a group of item equipment. HDFS guarantees accessibility of information by duplicating information to various hubs. Be that as it may, the replication strategy of HDFS does not think about the notoriety of information. The prevalence of the documents tend to change after some time. Thus, keeping up a settled replication factor will influence the capacity effectiveness of HDFS. In this paper we propose a proficient dynamic information replication administration framework, which consider the ubiquity of documents put away in HDFS before replication. This methodology powerfully characterizes the records to hot information or cool information in view of its prominence and builds the reproduction of hot information by applying eradication coding for icy information. The trial comes about demonstrate that the proposed technique viably decreases the capacity usage up to 40% without influencing the accessibility and adaptation to internal failure in HDFS.
Other Latest Articles
Last modified: 2018-07-06 19:45:40