ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Record Matching in Web Databases Using Unsupervised Approach

Journal: International Journal of Scientific Engineering and Technology (IJSET) (Vol.2, No. 9)

Publication Date:

Authors : ;

Page : 895-899

Keywords : Keywords? Record Matching; Unsupervised; UDD; Query Results;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Record Matching is the problem of combining information from multiple heterogeneous databases. One step of data integration is relating the records that appear in the different databases specifically, determining which sets of records refer to the same real-world entities. Performing record matching solves the duplication detection problems; hence the needs for identifying the suitable record matching technique follow. Most of record matching methods are supervised, which requires the user to provide training data. These methods are not applicable for the Web database scenario, where the records to match are query results dynamically generated. To overcome the problem, a new record matching method named Unsupervised Duplicate Detection (UDD) is proposed which, for a given query, can effectively identify duplicates from the query result records of multiple Web databases and eliminating duplicates among records in dynamic query results. The idea of this paper is to adjust the weights of record fields in calculating similarities among records. Two classifiers namely weight component similarity summing classifier and support vector machine classifier are iteratively employed with UDD to identify duplicates in the query results from multiple Web databases.

Last modified: 2013-09-03 20:30:59