ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

An Different Similarity Measures with N-Grams For Text Documents Comparison

Journal: International Journal of Computer Science and Mobile Computing - IJCSMC (Vol.7, No. 11)

Publication Date:

Authors : ; ; ; ;

Page : 195-203

Keywords : Document Similarity; N-Grams set; Cosine Similarity; Jaccard Similarity; Euclidean distance;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Data analysis is a new, emerging field in research area and business. The huge numbers of documents are available in form of unstructured, semi-structured and structured data. Estimating similitude between writings is a critical errand for a few applications. In the existing many similarity algorithms has been proposed for text similarity calculation based on distance between documents in the text processing field. The increased attention has led to many of techniques for measuring semantic based document similarity algorithms. The document similarity application teachers or other users can easily search documents containing some specific terminology. In this paper propose a different type of document similarity calculation based on cosine similarity, Jaccard similarity and Euclidean distance with n-grams algorithms. The similarity metrics between documents can be defined in several ways depending on the representation of the documents. The experimental result compares that three similarity algorithms and finally evaluate which is best similarity measure.

Last modified: 2018-12-19 15:34:30