A mapreduce based framework to perform full model selection in very large datasets
Journal: IADIS INTERNATIONAL JOURNAL ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (Vol.13, No. 1)Publication Date: 2018-08-21
Authors : Angel Díaz Pacheco Jesús A. Gonzalez-Bernal; Carlos A. Reyes-Garcia;
Page : 1-13
Keywords : Model Selection; MapReduce; Big Data;
Abstract
The analysis of large amounts of data has become an important task in science and business that led to the emergence of the Big Data paradigm. This paradigm owes its name to data objects too large to be processed by standard hardware and algorithms. Many data analysis tasks involve the use of machine learning techniques. The goal of predictive models consists on achieving the highest possible accuracy to predict new samples, and for this reason there is high interest in selecting the most suitable algorithm for a specific dataset. Selecting the most suitable algorithm together with feature selection and data preparation techniques integrates the Full Model Selection paradigm and it has been widely studied in datasets of common size, but poorly explored in the Big Data context. As an effort to explore in this direction, this work proposes a framework adjustable to any population based meta-heuristic methods in order to perform model selection under the MapReduce paradigm.
Other Latest Articles
Last modified: 2019-12-13 21:10:51