ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

A NOVEL TECHNIQUE FOR DATA DEDUPLICATION WITH SHA-1 IN HADOOP FRAMEWORK

Journal: International Journal OF Engineering Sciences & Management Research (Vol.4, No. 4)

Publication Date:

Authors : ; ;

Page : 78-87

Keywords : Big Data; Deduplication; Sha-1; Hadoop; HDFS;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Big Data a transpiring research matter in hand analyzing and processing which is a defiance for current systems leading to high processing costs and degraded performance and quality. The centralized architecture is unable to cope up with the challenge of massive data resulting in storage space issues and processing time conflicts. The proposed technique addresses the above problem by applying the deduplication technique on various dataset containing unstructured data and implementing SHA-1 algorithm for calculation of fixed size digests and only storing the unique values. The research work is favoured by Hadoop that contains Distributed MapReduce framework with Mapper and Reducer programs for processing and reduction of data respectively.By enforcing the proposed technique there is a gain in space saved, reduction in time consumed, increased deduplication ration as well as number of duplicate files are detected efficiently.

Last modified: 2017-04-26 21:56:30