Text augmentation preserving persona speech style and vocabulary
Journal: Scientific and Technical Journal of Information Technologies, Mechanics and Optics (Vol.23, No. 4)Publication Date: 2023-08-21
Authors : Matveeva A.A. Makhnytkina O.V.;
Page : 743-749
Keywords : text data augmentation; emotion recognition; statement valence evaluation;
Abstract
Currently, various natural language processing tasks often require large data sets. However, for many tasks, collecting large datasets is quite tedious and expensive, and requires the involvement of experts. An increase in the amount of data can be achieved using methods of data augmentation, however, the use of classical approaches can lead to the inclusion of phrases in the data corpus that differ in the speech style and vocabulary of the target person, which can lead to both a change in the target class as well as the appearance of replicas with unnatural vocabulary use and lack of meaning. In this context, a new method for test data enrichment is proposed that takes into account the person's style and vocabulary. In this article, a new method for expanding text data that preserves individual language features and vocabulary is proposed. The core of the method is to create individual templates for each person based on the analysis of syntactic trees of propositions and then to create new replicas according to the generated templates. The method was tested on the task of assessing the user's emotional state in a dialogue. The search was carried out for data sets in English and Russian. The proposed method made it possible to improve the quality of solving these problems for both the English and Russian languages. Up to a 2 % increase in accuracy and weighted F1 metrics has been noted for various models. The results of the work can be applied to improve the accuracy and weighted F1 metrics of models designed to solve various problems for the English and Russian languages.
Other Latest Articles
- OPTIMIZATION OF GREEN SYNTHESIZED SILVER NANOPARTICLESUSING ALOE VERAFOR THEIR INVESTIGATION OF ANTIBACTERIAL ACTIVITY
- EVALUATION OF CEMENT POLYMER COMPOSITES USING SPSS ANALYSIS
- Brain MRT image super resolution using discrete cosine transform and convolutional neural network
- Attacks based on malicious perturbations on image processing systems and defense methods against them
- An enhanced AES-GCM based security protocol for securing the IoT communication
Last modified: 2023-12-20 18:46:40