Record Matching in Web Databases Using Unsupervised Approach
Journal: International Journal of Scientific Engineering and Technology (IJSET) (Vol.2, No. 9)Publication Date: 2013-09-01
Authors : Fouzia Sultana Manjusha Kalekuri;
Page : 895-899
Keywords : Keywords? Record Matching; Unsupervised; UDD; Query Results;
Abstract
Record Matching is the problem of combining information from multiple heterogeneous databases. One step of data integration is relating the records that appear in the different databases specifically, determining which sets of records refer to the same real-world entities. Performing record matching solves the duplication detection problems; hence the needs for identifying the suitable record matching technique follow. Most of record matching methods are supervised, which requires the user to provide training data. These methods are not applicable for the Web database scenario, where the records to match are query results dynamically generated. To overcome the problem, a new record matching method named Unsupervised Duplicate Detection (UDD) is proposed which, for a given query, can effectively identify duplicates from the query result records of multiple Web databases and eliminating duplicates among records in dynamic query results. The idea of this paper is to adjust the weights of record fields in calculating similarities among records. Two classifiers namely weight component similarity summing classifier and support vector machine classifier are iteratively employed with UDD to identify duplicates in the query results from multiple Web databases.
Other Latest Articles
- Biometric Security System based on Fingerprint Recognition
- Malicious Attacks in Ad Hoc Networks - Detection & Protection
- Reducing Peak to Average Power Ratio of OFDM Signals using Tukey Window Technique
- Design of the New Competences Relative to the New Policies of the Industrial Maintenance
- Tree-Based Mining with sentiment Analysis for Discovering Patterns of Human Interaction in Meetings
Last modified: 2013-09-03 20:30:59