T5 language models for text simplification
Journal: Software & Systems (Vol.36, No. 2)Publication Date: 2023-06-16
Authors : Vasiliev D.D.; Pyataeva A.V.;
Page : 228-236
Keywords : T5 model; deep learning; text simplification; natural language processing;
Abstract
The problem of text readability in natural Russian is relevant for people with various cognitive impairments and for people with poor language skills, such as labor migrants or children. Texts constantly surround us in real life, such as various instructions, directions, and recommendations. Increasing the availability of these texts for these categories of citizens is possible by using an automated text simplification algorithm. This article used deep neural architecture transformers as an automated simplification algorithm. The following language models were applied: ruT5-base-absum, ruT5-base-paraphraser, ruT5_base_sum_gazeta, ruT5-base. Experimental studies used two data sets: a data set from the Institute of Philology and Language Communication and data from the open Github repository. The following set of metrics was used to evaluate the models: BLEU, Flesh Readability Index, Automatic Readability Index, and Sentence Length Difference. Further, using a test data set, statistical indicators were extracted from the listed metrics, which became the basis for comparing algorithms with different training parameters. The authors carried out several experiments with these models that used different values of the learning rate parameter for each dataset, batch sizes, and the exclusion of an additional dataset from training. Despite the different metrics, the models outputs did not differ much from each other during manual comparison. The results of experimental studies show the need to increase the data set for model training, as well as the change in the parameters of model training, or the use other algorithms. This study is the first step towards creating a decision support system for automatic text simplification and requires further development.
Other Latest Articles
- State of the Gas Transportation System: Current Challenges and Risks
- Features of working with Russian-language ontologies using the Owlready2 library in Python
- The Concept of Smart Specialisation: a Connection in the Essence With Theories of Economic Development
- Neural network tool environment for creating adaptive application program interfaces
- Signs and Models of Modern Capitalism: Conclusions for Enterprises in the Period of Digitalization of the Economy
Last modified: 2023-08-11 17:21:51