Performance Evaluation of Data Placement Structures for Big Data Warehouses
Proceeding: Third International Conference on Data Mining, Internet Computing, and Big Data (BigData2016)Publication Date: 2016-7-21
Authors : Mohammad Rakibul Hasan S. Kami Makki;
Page : 15-21
Keywords : Big Data; RCFile; ORCFile; Hive; Performance;
Abstract
Rapid growth of data requires systems that are able to provide a scalable infrastructure for distributed storage and processing of vast amount of data efficiently. Hive is a MapReduce-based data warehousing system for data summarization and query analysis. This warehousing system can arrange millions of rows of data into tables, where its data placement structures play a significant role in the performance of this warehouse. It also provides SQL-like language called HiveQL, that able to compile MapReduce jobs into queries on Hadoop. In this paper, we investigate the performance of Hive's data placement structures (RCFile and ORCFile). The experimental results showed the effectiveness of RCFile and ORCFile for data placement structure in MapReduce system.
Other Latest Articles
Last modified: 2016-07-21 23:50:04