Analysis of methods of determining the term weight at textual documents
Journal: Scientific review, Науковий огляд, Научное обозрение (Vol.3, No. 46)Publication Date: 2018-07-03
Authors : Oliynik Yuri Katiushchenko Daria;
Page : 112-123
Keywords : data mining; classification of textual information; content analysis; machine learning; classification algorithms;
Abstract
The work is devoted to the development of methods for determining the term weight of the document during automatic classification of text information. The influence of diminishing the dimension of a document terms on the work of vector classifier is considered. In the quality of the proposed methods are considered such methods as TF-IDF, TF-SLF, pointwise mutual information, conditional random fields. The purpose of this work is to improve the quality of the classification of textual information due to the fact that the appropriate method for determining the weight of the document is documented, and their combination with the method will induce the beginning of the classifier. The comparative analysis of methods on characteristics such as precision, recall and F-measure were performed. The considered methods are part of solution of determining the thematic belonging of texts, determining the author of the document, determining the emotional color of the document, spam filtering, etc.
Other Latest Articles
- Comparative analysis of prediction methods of stationary and nonstationary series
- НЕКОРЕКТНІСТЬ ВИКОРИСТАННЯ МЕТОДІВ БАГАТОВИМІРНОГО РЕГРЕСІЙНОГО АНАЛІЗУ ДЛЯ ВИПАДКУ ОДНОВИМІРНОГО ПОЛІНОМІАЛЬНОГО АНАЛІЗУ
- Applying multiple polynomial regression principles for exploratory data analysis and regression analysis
- The optimal scheduling problem for machines with different productivity
- Local search algorithms for one optimal scheduling problem
Last modified: 2018-07-03 21:19:10