A NOVEL TECHNIQUE FOR DATA DEDUPLICATION WITH SHA-1 IN HADOOP FRAMEWORK
Journal: International Journal OF Engineering Sciences & Management Research (Vol.4, No. 4)Publication Date: 2017-04-30
Authors : Sonam Bhardwaj; Preeti Malik;
Page : 78-87
Keywords : Big Data; Deduplication; Sha-1; Hadoop; HDFS;
Abstract
Big Data a transpiring research matter in hand analyzing and processing which is a defiance for current systems leading to high processing costs and degraded performance and quality. The centralized architecture is unable to cope up with the challenge of massive data resulting in storage space issues and processing time conflicts. The proposed technique addresses the above problem by applying the deduplication technique on various dataset containing unstructured data and implementing SHA-1 algorithm for calculation of fixed size digests and only storing the unique values. The research work is favoured by Hadoop that contains Distributed MapReduce framework with Mapper and Reducer programs for processing and reduction of data respectively.By enforcing the proposed technique there is a gain in space saved, reduction in time consumed, increased deduplication ration as well as number of duplicate files are detected efficiently.
Other Latest Articles
- COMPLIANCE OF FEDERAL LAW WITH THE RUSSIAN FEDERATION CONSTITUTION
- PROBLEMS OF INTRODUCTION OF BIM-TECHNOLOGIES IN RUSSIA
- Assessment of Quality of Nursing Work Life
- Impact of Foreign Direct Investment in India by Cross-border Mergers & Acquisitions
- The Effect of Competition on the Supply of Specialized Substance Abuse Treatment Services in the US
Last modified: 2017-04-26 21:56:30