Comparison of Text Classification Models and Methods
Journal: RUDN Journal of Engineering Researches (Vol.26, No. 3)Publication Date: 2025-11-12
Authors : Angelina Zakharova; Alina Vishnyakova; Alexander Detkov;
Page : 298-309
Keywords : natural language processing; NLP; text preprocessing; text representation; machine learning; neural networks;
Abstract
The study considers the process of automatic text classification and its components. The relevance of this topic is due to the rapid growth of data and the development of machine learning technologies. The purpose of the study is to determine the best methods and models for automatic text classification. The scientific articles written over the past four years that are most suitable for the topic were selected as material for analysis. Consequently, it was determined that effective preprocessing of text data should consist of normalization, tokenization, removal of stop words and stemming or lemmatization. The BERT model is recommended to be used to represent the text. However, it is worth starting from the conditions of a specific task, in which alternative approaches may be preferable. The most effective methods of direct text classification are the logistic regression method, convolutional neural networks, and RoBERTa. The selection of a particular model is determined by the intended application and the technological capabilities available.
Other Latest Articles
- Statistical Analysis of the Performance of Modified Genetic Algorithms for Automated Compilation of a Multilevel University Scheduling
- Comparative Performance of Machine Learning Classifiers in Detecting Vibration Anomalies in Industrial Power Systems
- Generating Realistic Images of Oil and Gas Infrastructure in Satellite Imagery Using Diffusion Models
- Regression Neural Networks Advantage over Classical Regression Analysis
- Aerial Platforms for Exploration Under Extreme Conditions in the Venus Atmosphere
Last modified: 2025-11-12 06:00:54
Share Your Research, Maximize Your Social Impacts


