A Survey on Different Duplicate Detection Methods
Journal: International Journal of Science and Research (IJSR) (Vol.5, No. 12)Publication Date: 2016-12-05
Authors : Tanvee Meshram; Nivedita Kadam;
Page : 1222-1224
Keywords : PPSNM; Duplicate Detection; Map Reduce; Parallel Progressive sorted neighborhood Method;
Abstract
Duplicate records availability is a common phenomenon in real world entities. These duplicate items are available in database because of multiple entries for the same data, incomplete data entries and errors during transactions. In todays world the data sets are very complex and removing the duplicates is a difficult task. Duplicate detection method helps to find out such cases where there are multiple entries for the same entity in real world. In most of the cases duplicate entries cause transactional errors and hence resulting into Operational and Strategic Decision making in an Organization and hence resulting into losses on monetary terms and Brand Image of the Organization. A given example may be multiple Aadhar Cards (Government Identification Cards in India) created for the same person through different locations and the data is used in different systems for identification purposes across industries and locations. The focus in this paper is to compare traditional duplicate detection methods Incremental Sorted Neighborhood Method (ISNM), Duplicate Count Strategy (DCS++) method, Progressive Sorted Neighborhood Method (PSNM) method and PPSNM (Parallel Progressive sorted neighborhood Method).
Other Latest Articles
- Pestle and Porter's Five Forces Analysis Indonesian Downstream Business of Diesel Oil in 2015
- An Approach of Internal and Fuzzy System of Linear Equations
- Marketing Strategies of Organic Rice On Sri Makmur Farmer Group in Sragen District
- Improving Transport Protocol for Reliable Data Transfer in Wireless Sensor Network
- New Approach in Burn out Printing with DBD Plasma Technique on Linen Fabric and its Blend with Polyester
Last modified: 2021-07-01 14:48:53