Comparison of Text Classification Models and Methods

Journal: RUDN Journal of Engineering Researches (Vol.26, No. 3)

Publication Date: 2025-11-12

Authors : Angelina Zakharova; Alina Vishnyakova; Alexander Detkov;

Page : 298-309

Keywords : natural language processing; NLP; text preprocessing; text representation; machine learning; neural networks;

Source : Download Find it from : Google Scholar

Abstract

The study considers the process of automatic text classification and its components. The relevance of this topic is due to the rapid growth of data and the development of machine learning technologies. The purpose of the study is to determine the best methods and models for automatic text classification. The scientific articles written over the past four years that are most suitable for the topic were selected as material for analysis. Consequently, it was determined that effective preprocessing of text data should consist of normalization, tokenization, removal of stop words and stemming or lemmatization. The BERT model is recommended to be used to represent the text. However, it is worth starting from the conditions of a specific task, in which alternative approaches may be preferable. The most effective methods of direct text classification are the logistic regression method, convolutional neural networks, and RoBERTa. The selection of a particular model is determined by the intended application and the technological capabilities available.

Main Menu

Searching By

PARTNERS

Comparison of Text Classification Models and Methods

Abstract

Advertisement