Appraisal of Efficient Techniques for Online Record Linkage and Deduplication using Q-Gram Based Indexing?
Journal: International Journal of Computer Science and Mobile Computing - IJCSMC (Vol.3, No. 5)Publication Date: 2014-05-30
Authors : M.V Shiva Prasad; Ch.Krishna Prasad; B.Rambabu;
Page : 404-414
Keywords : q-samples; substrings; BKV; q-grams; Record identifiers;
Abstract
We present new indexing techniques for approximate string matching. The index collects text qsamples, that is, disjoints text substrings of length q, at fixed intervals and stores their positions. At search time, part of the text is filtered out by noticing that any occurrence of the pattern must be reflected in the presence of some text q-samples that match approximately inside the pattern. The aim of this technique is to index the database such that records that have a similar, not just the same BKV (Blocking key value) will be inserted into the same block. Assuming the BKVs are strings, the basic idea is to create variations for each BKV using q-grams (sub-strings of lengths q), and to insert record identifiers into more than one block.
Other Latest Articles
- Enhancement of Web Search Engine Results Using Keyword Frequency Based Ranking?
- DEBLOCKING FILTER BASED ARTIFACT REDUCTION USING PSEUDO RANDOM NOISE MASKING
- Study and Approaches to Green Environment through Eco- Friendly Devices?
- An Improved Novel Steganographic Technique for RGB and YCbCr Colorspace?
- Meditation In The Emotional Intelligence Improvement Among Russian-Speakıng Migrants In Germany
Last modified: 2014-05-21 20:59:03