ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

An Analysis on Removal of Duplicate Records using Different Types of Data Mining Techniques: A Survey

Journal: International Journal of Computer Science and Mobile Computing - IJCSMC (Vol.6, No. 11)

Publication Date:

Authors : ;

Page : 38-42

Keywords : Deduplication; Record; Mining; Replica; Repository;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

In the current period rapid improvement of information technology provides to the need of large volume of storage to storing the dataset. From different data mart, most of the data warehouse access ability of data, by reason of this there is a prospect of latency of high record duplicates. Uncounted systems are mainly troubled by the habitation of duplication in the database which provides to the problem like slow performance, degradation of data quality, waste of data storage and high operating cost. In enlargement assurance of duplicates provides to the issue of misleading, the system reports as fails to recover the proper data for the entanglement of query and the time complication is big. The above said issues can be concluding by the process of record deduplication which is the one of the necessary task in data preprocessing. This process concluded in data cleaning and replica free repositories which allow recovering increased higher quality information. Record Deduplication is the process of analyzing and removing records in data storage which indicate to the same entity of different sources of data. Record Deduplication is necessary while linking entity based datasets that permit or not permit to share a frequent accessory. This paper discusses about the elaborate introduction to data deduplication. In this paper also granted the comprehensive study of different existing techniques for removal of data replication using deduplication.

Last modified: 2017-11-27 21:37:01