Avoiding Duplication Data in HDFS Based on Supervised Learning
Journal: International Journal of Engineering and Techniques (Vol.4, No. 2)Publication Date: 2018-04-25
Authors : Alapati Janardhana Rao Koppolu Sree Venkata Gopi Naga Raju;
Page : 747-753
Keywords : Big Data; Hadoop Distributed File System; Dynamic data replication;
Abstract
The Hadoop Distributed File System (HDFS) part of Apache Hadoop helps in conveyed capacity of huge information with a group of item equipment. HDFS guarantees accessibility of information by duplicating information to various hubs. Be that as it may, the replication strategy of HDFS does not think about the notoriety of information. The prevalence of the documents tend to change after some time. Thus, keeping up a settled replication factor will influence the capacity effectiveness of HDFS. In this paper we propose a proficient dynamic information replication administration framework, which consider the ubiquity of documents put away in HDFS before replication. This methodology powerfully characterizes the records to hot information or cool information in view of its prominence and builds the reproduction of hot information by applying eradication coding for icy information. The trial comes about demonstrate that the proposed technique viably decreases the capacity usage up to 40% without influencing the accessibility and adaptation to internal failure in HDFS.
Other Latest Articles
- An Effective and Robustive on Cache-Supported Path Planning on Roads
- Security Enhancement by Achieving Flatness in Selecting the Honey words from Existing User Passwords
- An Efficient Recommendation and Suggestion System for Travel Route Using Places of Interest Implementation
- Secure Privacy Data Collection, Storage and Access in Cloud-Assisted Internet of Things
- Analysis on Sentiment Based Rating Prediction Through Textual Reviews Using Social Media
Last modified: 2018-07-06 19:45:40