PROCESSING IMAGE FILES USING SEQUENCE FILE IN HADOOP
Journal: International Journal of Engineering Sciences & Research Technology (IJESRT) (Vol.5, No. 10)Publication Date: 2016-10-30
Authors : E. Laxmi Lydia; A. Krishna Mohan; M. Ben Swarup;
Page : 521-528
Keywords : MapReduce; distributed data processing; Hadoop; sequence file;
Abstract
This paper presents MapReduce as a distributed data processing model utilizing open source Hadoop framework for work huge volume of data. The expansive volume of data in the advanced world, especially multimedia data, makes new requirement for processing and storage. As an open source distributed compu tational framework, Hadoop takes into consideration processing a lot of images on an unbounded arrangement of computing nodes by giving fundamental foundations. We have lots and lots of small images files and need to remove duplicate files from the availab le data. As most binary formats ? particularly those that are compressed or encrypted ? cannot be split and must be read as a single linear stream of data. Using such files as input to a MapReduce job means that a single mapper will be used to process the enti re file, causing a potentially large performance hit. The paper proposes splitable format such as SequenceFile and uses MD5 algorithm to improve the p erformance of image processing.
Other Latest Articles
- MECHANICAL PRPOPERTIES OF SELF COMPACTING CONCRETE MADE WITH OPC53 S CEMENT
- THE NORTHBOUND APIs OF SOFTWARE DEFINED NETWORKS
- COMMON FIXED POINT THEOREMS OF ITERATIONS FOR WEAK CONTRACTION IN CONE METRIC SPACES
- A REVIEW ON VARIOUS AUDIO STEGANOGRAPHY TECHNIQUES FOR AUDIO SIGNALS
- EXPERIMENTAL STUDY ON EFFECTS OF FIBERGLASS AND FIBER WASTE IN CONCRETE MIXES
Last modified: 2016-10-15 21:36:40