ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

PARALLEL PROCESSING ALGORITHM FOR FASTER LEARNING OF VPRS-MODEL BASED DECISION TREE CLASSIFIER ON BIG DATA PLATFORMS USING APACHE SPARK FRAMEWORK

Journal: International Journal of Advanced Research in Engineering and Technology (IJARET) (Vol.11, No. 08)

Publication Date:

Authors : ;

Page : 122-138

Keywords : Variable Precision Rough Set model; Decision Tree; Parallel processing; Apache Spark; Uncertain data; Big Data Platforms;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Decision Tree is the most widely used white box approach for data classification. The level of uncertainty in training data is one of the core factors that influence the complexity of decision tree. So, inducing a single decision tree directly from the entire massive datasets with high degree of uncertainty may demand extensive system resources and sometimes exponentially increase the learning time of the model. This paper presents a solution to this problem by proposing a Parallel Processing Algorithm for Faster Learning (PPAFL) of Variable Precision Rough Set(VPRS)- model based Decision Tree (DT) on Big Data Platforms. The main goal of the proposed PPAFL algorithm is to reduce the time complexity of the DT learning with optimal resource utilization. The PPAFL algorithm selects an efficient model from the set of models induced locally on distributed classdata. Based on tree size, significance of root node, and testing accuracies the proposed algorithm selects a model that best represents the entire dataset without sacrificing the classification accuracy. Three well known large data sets are taken from UCI ML repository for evaluating the PPAFL algorithm. The proposed algorithm is implemented in a fully distributed Apache Hadoop cluster deployment under Apache Spark framework. Empirical experiments were conducted in different computational environments by varying the computing power as well as number of parallel computations. The proposed parallel processing algorithm has shown a very remarkable savings in training time of massive uncertain datasets with minimum resource utilization when compared against the model derived through Map-Reduce approach. Parallel Processing Algorithm for Faster Learning of VPRS-Model based Decision Tree Classifier on Big Data Platforms using Apache Spark Framework

Last modified: 2021-02-20 14:05:15