Estimation of Missing Values in the Data Mining and Comparison of Imputation Methods
Journal: Mathematical Journal of Interdisciplinary Sciences (Vol.1, No. 2)Publication Date: 2013-03-04
Authors : Shamsher Singh; Jagdish Prasad;
Page : 75-90
Keywords : Missing values; imputation methods; non parametric; data mining;
Abstract
Many existing, industrial, and research data sets contain missing values (MVs). There are various reasons for their existence, such as manual data entry procedures, equipment errors, and incorrect measurements. The presence of such imperfections usually requires a preprocessing stage in which the data are prepared and cleaned, in order to be useful to and sufficiently clear for the knowledge extraction process. MVs make the performance of data analysis difficult. The presence of MVs can also pose serious problems for researchers. In fact, in appropriate handling of the MVs in the analysis may introduce bias and can result in misleading conclusions being drawn from a research study and can also limit the generalize ability of the research findings. The various types of problem are usually associated with MVs in data mining are (1) loss of efficiency;(2) complications in handling and analyzing the data; and (3) bias resulting from differences between missing and complete data. We will focus our attention on the use of imputation methods. A fundamental advantage of this approach is that the MV treatment is independent of the learning algorithm used. For this reason, the user can select the most appropriate method for each situation he faces. In this paper different methods of estimation of missing values are discussed. The comparison of different imputation methods are given by using non parametric methods.
Other Latest Articles
- A Note on IIR Filters with Random Parameters
- Order Statistics Based Measure of Past Entropy
- A Teaching Note for Model Selection and Validation
- Bayesian Repetitive Deferred Sampling Plan Indexed Through Relative Slopes
- Profit Analysis of Non-Identical Parallel System with Two Types of Failure Using Discrete Distribution
Last modified: 2015-06-02 23:21:24