BigData: A Case Study of Spark Mllib and Hive
Journal: International Journal of Science and Research (IJSR) (Vol.7, No. 9)Publication Date: 2018-09-05
Authors : Shubhajoy Das;
Page : 865-868
Keywords : BigData; SparkMllib; Collaborative Filtering; Hadoop; Spark; Apache; Hive; Amazon aws; HDFS;
Abstract
The extent to which data is generated has shown a tremendous increase in the past decade because of social networks, sensornetworks, geographicinformationsystems, Financial Institutions, Supply chains. The storage capacity of computers have increased to stay competitive, but a big problem is that the access speeds of the disk has not improved to that extent to be at par with disk space improvement. Big Data comes to the rescue with a framework to analyse massive amounts of data in a distributed environment which is both horizontally and vertically scalable. Data sets with trillions of rows can be analysed very fast to provide valuable insights from data. Cloud service providers such as amazon, Alibaba Cloud have made available robust infrastructure for Big Data. We study Apache Hive, Spark Mllib in profiling a Stack Overflow Dataset and Collaborative Filtering algorithm in Spark Mllib for movie recommendations.
Other Latest Articles
- A Two Year Demographic and Medicolegal Study of Burn Cases in V.M.M.C&Safdarjung Hospital, New Delhi - Original Research Paper
- Optimal SVC Sizing and Placement for Reducing Real Power Losses and Voltage Security Improvement in the Power System using DE Algorithm
- Effect of Consumption of Probiotics on Salivary Bacteria Causing Dental Caries
- Study of Various Microorganisms Isolated from Chronic Supperative Otitis Media among Indian Population
- Magnetic and Viscous Dissipation Effects on Convection-Radiation in Corrugated Channel with Porous Medium and Uniform Heat Flux
Last modified: 2021-06-28 19:56:54