ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

A Survey on Different Duplicate Detection Methods

Journal: International Journal of Science and Research (IJSR) (Vol.5, No. 12)

Publication Date:

Authors : ; ;

Page : 1222-1224

Keywords : PPSNM; Duplicate Detection; Map Reduce; Parallel Progressive sorted neighborhood Method;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Duplicate records availability is a common phenomenon in real world entities. These duplicate items are available in database because of multiple entries for the same data, incomplete data entries and errors during transactions. In todays world the data sets are very complex and removing the duplicates is a difficult task. Duplicate detection method helps to find out such cases where there are multiple entries for the same entity in real world. In most of the cases duplicate entries cause transactional errors and hence resulting into Operational and Strategic Decision making in an Organization and hence resulting into losses on monetary terms and Brand Image of the Organization. A given example may be multiple Aadhar Cards (Government Identification Cards in India) created for the same person through different locations and the data is used in different systems for identification purposes across industries and locations. The focus in this paper is to compare traditional duplicate detection methods Incremental Sorted Neighborhood Method (ISNM), Duplicate Count Strategy (DCS++) method, Progressive Sorted Neighborhood Method (PSNM) method and PPSNM (Parallel Progressive sorted neighborhood Method).

Last modified: 2021-07-01 14:48:53