Semantic Similarity Measure Using Information Content Approach With Depth For Similarity CalculationJournal: International Journal of Scientific & Technology Research (Vol.3, No. 2)
Publication Date: 2014-02-15
Authors : Atul Gupta; Dharamveer kr. Yadav;
Page : 165-169
Keywords : ;
Abstract Similarity is criteria of measuring nearness or proximity between two concepts. Several algorithmic approaches for computing similarity have been proposed. Among the existing Similarity measure majority of them utilize WordNet as an underlying ontology for calculating semantic similarity. WordNet is a lexical database for English Language which was created and maintained by Congnitive Science Laboratory at Princeton University under the supervision of Professor George A. Miller. It is organized as a network which consists of concepts or terms called Synsets list of synonyms terms and the relationship between them. There are different type of relationship exists in WordNet such as is-a part-of synonym and antonym. It has thdatabases one for noun one for verb and one for adverb and adjective. This project work proposes a metric for semantic relatedness calculation between pair of concepts which uses Tverskys feature based approach which takes into account the common and distinct feature of the two terms or concepts. If commonality is more as compared to differences the similarity between concepts is high otherwise similarity is low. Tverskys theory is quantified by information content of two concepts and the Information content of most specific common ancestor of two concepts. As we move down in the WordNet hierarchy more specific and more Informative concept are there where as when we move up in the hierarchy more Generalized and less Informative concepts are there. So depth of a concept in the WordNet hierarchy is a critical factor in similarity calculation. We take into consideration the depth of the specific concept in the WordNet hierarchy which is the deciding factor for determining the relevance of distinct feature specific to a concept in similarity calculation. Introduction of depth reduces the impact of the less relevant dissimilarity indulge in similarity calculation thereby increase precision. We carried out our experiment of 28 wordpair common to Rubenstein-Goodenough and Millers-Charles set. These word-pair range from low similarity intermediate similarity and finally to high similarity pairs. Evaluation is done by calculating our similarity values calculated using the proposed measure with the human rating. We utilize Pure Java Wordnet Similarity Library for implementing our proposed metric. Experimental results shows that the proposed metrics is at par with the existing similarity measure and superior to some of the traditional ones.
Other Latest Articles
Last modified: 2015-06-28 03:51:43