PARALLEL PROCESSING ALGORITHM FOR FASTER LEARNING OF VPRS-MODEL BASED DECISION TREE CLASSIFIER ON BIG DATA PLATFORMS USING APACHE SPARK FRAMEWORK
Journal: International Journal of Advanced Research in Engineering and Technology (IJARET) (Vol.11, No. 08)Publication Date: 2020-08-31
Authors : Surekha Samsani G. Jaya Suma;
Page : 122-138
Keywords : Variable Precision Rough Set model; Decision Tree; Parallel processing; Apache Spark; Uncertain data; Big Data Platforms;
Abstract
Decision Tree is the most widely used white box approach for data classification. The level of uncertainty in training data is one of the core factors that influence the complexity of decision tree. So, inducing a single decision tree directly from the entire massive datasets with high degree of uncertainty may demand extensive system resources and sometimes exponentially increase the learning time of the model. This paper presents a solution to this problem by proposing a Parallel Processing Algorithm for Faster Learning (PPAFL) of Variable Precision Rough Set(VPRS)- model based Decision Tree (DT) on Big Data Platforms. The main goal of the proposed PPAFL algorithm is to reduce the time complexity of the DT learning with optimal resource utilization. The PPAFL algorithm selects an efficient model from the set of models induced locally on distributed classdata. Based on tree size, significance of root node, and testing accuracies the proposed algorithm selects a model that best represents the entire dataset without sacrificing the classification accuracy. Three well known large data sets are taken from UCI ML repository for evaluating the PPAFL algorithm. The proposed algorithm is implemented in a fully distributed Apache Hadoop cluster deployment under Apache Spark framework. Empirical experiments were conducted in different computational environments by varying the computing power as well as number of parallel computations. The proposed parallel processing algorithm has shown a very remarkable savings in training time of massive uncertain datasets with minimum resource utilization when compared against the model derived through Map-Reduce approach. Parallel Processing Algorithm for Faster Learning of VPRS-Model based Decision Tree Classifier on Big Data Platforms using Apache Spark Framework
Other Latest Articles
- HIGH PERFORMANCE CNT BASED NANOELECTRONIC CIRCUITS: AN ANALYSIS
- CLINICAL PARSER USING PROBABILISTIC CONTEXT FREE GRAMMAR: A INTRODUCTION
- TQM AND COMPETITIVE ADVANTAGE: EXPERIENCES WITHIN AMMAN’S ENGINEERING, ELECTRONICS, AND IT SECTORS
- RELEVANCE OF MACHINE LEARNING ENTROPY TO DETECT FINANCIAL FRAUD
- PROCESSING OF PETROPHYSICAL PARAMETERS USING THE KOHONEN MAP: APPLICATION TO THE STUDY OF THE TRIASSIC RESERVOIR OF THE MESKALA FIELDS (BASIN OF ESSAOUIRA, MOROCCO)
Last modified: 2021-02-20 14:05:15