Removing Dedepulication Using Pattern Serach Suffix Arrays
Journal: International Journal of Science and Research (IJSR) (Vol.4, No. 11)Publication Date: 2015-11-05
Authors : Pratiksha Dhande; Supriya Kumari; Sushmita Tupe; Laukik Shah;
Page : 1217-1219
Keywords : String search; pattern matching; suffix array; suffix tree;
Abstract
With the increase of de-duplication in data sets of voter card or pan card, removing the de-duplication is the major challenge. Record linkage is the process of matching records from several databases that refer to the same entities. When appliedon a single database, this process is known as de-duplication. In this paper the investigation is done to how to remove the de-duplication with the help of suffix arrays. Suffix array is well organized data structure for pattern searching. This paper covers similarity metrics that are commonly used to spot similar field entries, and present a widespread set of duplicate detection algorithms that can identify almost duplicate records in a database. It also covers multiple techniques for improving the effectiveness and scalability of estimated duplicate detection algorithms. Finally, based on the algorithms, the paper presents how to remove the de-duplication from dataset.
Other Latest Articles
- Selection of Appropriate Media and Technology for Distance Education
- Study of Biodegradation on Packaging Films Derived from Potato Starch and Maleic Anhydride Grafted LDPE and LDPE Polymer
- Efficient Hardware Encryption Using Lightweight Process
- Medical Image Processing ? An Introduction
- A Comprehensive Survey on Image Inpainting Techniques
Last modified: 2021-07-01 14:26:37