ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

To Investigate the Problem of Similarity Search on Dimension Incomplete Data

Journal: International Journal of Science and Research (IJSR) (Vol.4, No. 3)

Publication Date:

Authors : ; ;

Page : 214-216

Keywords : Missing Dimensions; Similarity search; Whole sequence query; Probability triangle inequality; Temporal data;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Similarity query in multidimensional database is a fundamental research problem with numerous applications in the areas of database, data mining, and information retrieval. The existing work on querying incomplete data addresses the problem where the data values on certain dimensions are unknown. Missing dimension information poses great computational challenge, since all possible combinations of missing dimensions need to be examined when evaluating the similarity between the query and the data objects. We develop the lower and upper bounds of the probability that a data object is similar to the query. These bounds enable efficient filtering of irrelevant data objects without explicitly examining all missing dimension combinations. A probability triangle inequality is also employed to further prune the search space and speed up the query process. The proposed probabilistic framework and techniques can be applied to both whole and subsequence queries. Extensive experimental results on real-life data sets demonstrate the effectiveness and efficiency of our approach.

Last modified: 2021-06-30 21:34:49