A NOVEL TECHNIQUE FOR DATA DEDUPLICATION WITH SHA-1 IN HADOOP FRAMEWORKJournal: International Journal OF Engineering Sciences & Management Research (Vol.4, No. 4)
Publication Date: 2017-04-30
Authors : Sonam Bhardwaj; Preeti Malik;
Page : 78-87
Keywords : Big Data; Deduplication; Sha-1; Hadoop; HDFS;
Big Data a transpiring research matter in hand analyzing and processing which is a defiance for current systems leading to high processing costs and degraded performance and quality. The centralized architecture is unable to cope up with the challenge of massive data resulting in storage space issues and processing time conflicts. The proposed technique addresses the above problem by applying the deduplication technique on various dataset containing unstructured data and implementing SHA-1 algorithm for calculation of fixed size digests and only storing the unique values. The research work is favoured by Hadoop that contains Distributed MapReduce framework with Mapper and Reducer programs for processing and reduction of data respectively.By enforcing the proposed technique there is a gain in space saved, reduction in time consumed, increased deduplication ration as well as number of duplicate files are detected efficiently.
Other Latest Articles
Last modified: 2017-04-26 21:56:30