Implementation of Parallelization Contract Mechanism Extension of Map Reduce Framework for the Efficient Execution Time over Geo-Distributed Dataset
Journal: International Journal of Engineering Research (IJER) (Vol.3, No. 12)Publication Date: 2014-12-01
Authors : Kirtimalini N.; Prof.T.A.Chavan;
Page : 745-748
Keywords : MaReduce; PACT; big data; Geodistributed data sets; hadoop;
Abstract
The world is surrounded by technology and Internet with extreme dynamic changes day by day in it.Due to that quintillion bytes of data is created. Source of this data is in the form of petabytes and zettabytes,which is known as Big data.Examples of such data are climate information, trajectory information, transaction records, web site usage data etc .As this data is in the abundant form so that not easy to process and require more time to execute.Hadoop is only scalable that is it can reliably store and process petabytes. Hadoop plays an important role in processing and handling big data It includes MapReduce ? offline computing engine, HDFS ? Hadoop Distributed file system, HBase ? online data access.Map Reduce functions as dividing input files into chunks and processing these in a series of parallelizable steps., mapping and reducing constitute the essential phases for a Map Reduce job. As this freamework provides solution for large data nodes by providing distributed environment. Moving all input data to a single datacenter before processing the data is expensive. Hence we concentrate on geographical distribution of geo-distributed data for sequential execution of map reduce jobs to optimize the execution time. But it is observed from various results that mapping and reducing function is not sufficient for all type of data processing. The fixed execution strategy of map reduce program is not optimal for many task as it does not know about the behavior of the functions. Thus, to overcome these issues, we are enhancing our proposed work with parallelization contracts. These contracts help to capture a reasonable amount of semantics for executing any type of task with reduced time consumption. The parallelization contracts include input and output contract which includes the constraints and functions of data execution The main aim of this paper is to discuss various known Map reduce technology techniques available for geodistributed data sets by using different techniques. Further, the paper also discloses the implementation of these techniques, and comparision results of this method with the existing systems. Future trends including use of query optimizing techniques to improve the results of the query as well as reduce the cost for the computation. To achieve this we use the indexing mechanism to the cache system to preserve the query search results.
Other Latest Articles
- Comparison of SelexolTM and Rectisol® Technologies in an Integrated Gasification Combined Cycle (IGCC) Plant for Clean Energy Production
- Power Variation with Electret Surface Potential and Frequency of Vibration in Vertical Vibration based Cantilever-Electret Micro-Power Generation
- Dynamic Channel Allocation Technique for Distributed Multi-radio Multichannel Multi-path Routing Protocol in Wireless Mesh Networks
- Passive Optical Network Supporting Seamless Integration of RoF and OFDMA Signals
- Micronutrients Status of Bio fuel Plant (Moringa) Irrigated By Diluted Seawater As Affected By Silicate And Salicylic Acid
Last modified: 2014-12-17 19:17:31