An Different Similarity Measures with N-Grams For Text Documents Comparison
Journal: International Journal of Computer Science and Mobile Computing - IJCSMC (Vol.7, No. 11)Publication Date: 2018-11-30
Authors : R.Anushya; A.Finny Belwin; A.Linda Sherin; Antony Selvadoss Thanamani;
Page : 195-203
Keywords : Document Similarity; N-Grams set; Cosine Similarity; Jaccard Similarity; Euclidean distance;
Abstract
Data analysis is a new, emerging field in research area and business. The huge numbers of documents are available in form of unstructured, semi-structured and structured data. Estimating similitude between writings is a critical errand for a few applications. In the existing many similarity algorithms has been proposed for text similarity calculation based on distance between documents in the text processing field. The increased attention has led to many of techniques for measuring semantic based document similarity algorithms. The document similarity application teachers or other users can easily search documents containing some specific terminology. In this paper propose a different type of document similarity calculation based on cosine similarity, Jaccard similarity and Euclidean distance with n-grams algorithms. The similarity metrics between documents can be defined in several ways depending on the representation of the documents. The experimental result compares that three similarity algorithms and finally evaluate which is best similarity measure.
Other Latest Articles
- FOREIGN LANGUAGE EDUCATION OF KAZAKHSTAN: CURRENT TRENDS AND FUTURE PERSPECTIVES
- Design, Analysis and Fabrication of Automated Center Stand for Two Wheeler
- تأثير كل من حجم ونوع المعاينة الإحصائية على تقديرات معاملات معادلة الانحدار الخطي البسيط
- إعادة تدوير النفايات البلاستيكية والإطارات المستعملة لتحضير مواد عازلة
- دراسة معادلة بل باستخدام المرتّبات التّربيعيّة
Last modified: 2018-12-19 15:34:30