ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Performance Evaluation of Data Placement Structures for Big Data Warehouses

Proceeding: Third International Conference on Data Mining, Internet Computing, and Big Data (BigData2016)

Publication Date:

Authors : ;

Page : 15-21

Keywords : Big Data; RCFile; ORCFile; Hive; Performance;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Rapid growth of data requires systems that are able to provide a scalable infrastructure for distributed storage and processing of vast amount of data efficiently. Hive is a MapReduce-based data warehousing system for data summarization and query analysis. This warehousing system can arrange millions of rows of data into tables, where its data placement structures play a significant role in the performance of this warehouse. It also provides SQL-like language called HiveQL, that able to compile MapReduce jobs into queries on Hadoop. In this paper, we investigate the performance of Hive's data placement structures (RCFile and ORCFile). The experimental results showed the effectiveness of RCFile and ORCFile for data placement structure in MapReduce system.

Last modified: 2016-07-21 23:50:04