ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Implementation of Parallelization Contract Mechanism Extension of Map Reduce Framework for the Efficient Execution Time over Geo-Distributed Dataset

Journal: International Journal of Engineering Research (IJER) (Vol.3, No. 12)

Publication Date:

Authors : ; ;

Page : 745-748

Keywords : MaReduce; PACT; big data; Geodistributed data sets; hadoop;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

The world is surrounded by technology and Internet with extreme dynamic changes day by day in it.Due to that quintillion bytes of data is created. Source of this data is in the form of petabytes and zettabytes,which is known as Big data.Examples of such data are climate information, trajectory information, transaction records, web site usage data etc .As this data is in the abundant form so that not easy to process and require more time to execute.Hadoop is only scalable that is it can reliably store and process petabytes. Hadoop plays an important role in processing and handling big data It includes MapReduce ? offline computing engine, HDFS ? Hadoop Distributed file system, HBase ? online data access.Map Reduce functions as dividing input files into chunks and processing these in a series of parallelizable steps., mapping and reducing constitute the essential phases for a Map Reduce job. As this freamework provides solution for large data nodes by providing distributed environment. Moving all input data to a single datacenter before processing the data is expensive. Hence we concentrate on geographical distribution of geo-distributed data for sequential execution of map reduce jobs to optimize the execution time. But it is observed from various results that mapping and reducing function is not sufficient for all type of data processing. The fixed execution strategy of map reduce program is not optimal for many task as it does not know about the behavior of the functions. Thus, to overcome these issues, we are enhancing our proposed work with parallelization contracts. These contracts help to capture a reasonable amount of semantics for executing any type of task with reduced time consumption. The parallelization contracts include input and output contract which includes the constraints and functions of data execution The main aim of this paper is to discuss various known Map reduce technology techniques available for geodistributed data sets by using different techniques. Further, the paper also discloses the implementation of these techniques, and comparision results of this method with the existing systems. Future trends including use of query optimizing techniques to improve the results of the query as well as reduce the cost for the computation. To achieve this we use the indexing mechanism to the cache system to preserve the query search results.

Last modified: 2014-12-17 19:17:31