UNSTRUCTURED DATA ANALYSIS AND PROCESSING USING BIG DATA TOOL - HIVE AND MACHINE LEARNING ALGORITHM - LINEAR REGRESSION
Journal: International Journal of Computer Engineering and Technology (IJCET) (Vol.9, No. 2)Publication Date: 2018-04-19
Authors : Neha Mangla; Priya Rathod;
Page : 61-73
Keywords : Big Data; HDFS; Hadoop; Hive; MapReduce; linear regression;
Abstract
Big data represents the information assets characterized by a high volume, velocity and variety to require specific technology and analytical methods for its transformation into value. The storage of large chunks of data is difficult as even terabytes and petabytes of traditional data warehousing solutions is insufficient and exorbitant [1][2]. It is viable to store and process these ransom amounts of data [13][14]15][16]17]18][19][20][21] on Hadoop; which is a low cost, reliable, scalable and fault tolerant Java-based programming framework that supports the processing of large data sets in a distributed computing environment. Hadoop implements MapReduce programing model for storing and processing large data sets with a parallel, distributed algorithm on commodity hardware. Nevertheless, the programming model expects the developers to write bespoke programs that are less flexible, time consuming, hard to code; maintain and reuse. This challenging task of writing complex MapReduce codes was rationalized by making use of HiveQL. Hive is the platform required to run HiveQL. Hive is built on top of Hadoop to query Big Data. Internally the Hive queries are converted into the corresponding MapReduce task [3][4]. In this paper, by making use of machine learning algorithm a movie rating prediction system is built based on MovieLens dataset.
Other Latest Articles
- Mathematical models and methods for reliability of heat supply systems with tube gas heaters
- Research of compositions of light refractory concrete for structural and thermal devices of individual houses
- Some aspects of teaching mathematics in English-Ukrainian projects of Pridneprovsk state academy of civil engineering and architecture
- Mathematical simulation of the air pollution after accidents on the railway
- The modern state, experience and problems of development of the market of residential real estate
Last modified: 2018-05-04 20:28:59