Estimation of Missing Values in the Data Mining and Comparison of Imputation Methods

Journal: Mathematical Journal of Interdisciplinary Sciences (Vol.1, No. 2)

Publication Date: 2013-03-04

Authors : Shamsher Singh; Jagdish Prasad;

Page : 75-90

Keywords : Missing values; imputation methods; non parametric; data mining;

Source : Download Find it from : Google Scholar

Abstract

Many existing, industrial, and research data sets contain missing values (MVs). There are various reasons for their existence, such as manual data entry procedures, equipment errors, and incorrect measurements. The presence of such imperfections usually requires a preprocessing stage in which the data are prepared and cleaned, in order to be useful to and sufficiently clear for the knowledge extraction process. MVs make the performance of data analysis difficult. The presence of MVs can also pose serious problems for researchers. In fact, in appropriate handling of the MVs in the analysis may introduce bias and can result in misleading conclusions being drawn from a research study and can also limit the generalize ability of the research findings. The various types of problem are usually associated with MVs in data mining are (1) loss of efficiency;(2) complications in handling and analyzing the data; and (3) bias resulting from differences between missing and complete data. We will focus our attention on the use of imputation methods. A fundamental advantage of this approach is that the MV treatment is independent of the learning algorithm used. For this reason, the user can select the most appropriate method for each situation he faces. In this paper different methods of estimation of missing values are discussed. The comparison of different imputation methods are given by using non parametric methods.

Main Menu

Searching By

PARTNERS

Estimation of Missing Values in the Data Mining and Comparison of Imputation Methods

Abstract

Advertisement