RuLegalNER: a new dataset for Russian legal named entities recognition
Journal: Scientific and Technical Journal of Information Technologies, Mechanics and Optics (Vol.23, No. 4)Publication Date: 2023-08-21
Authors : Shaheen Z. Mouromtsev D.I. Postny I.;
Page : 854-857
Keywords : legal named entity recognition; natural language processing; information extraction; low-resource languages; transfer learning; transformers;
Abstract
We address the scarcity of datasets specifically tailored for legal NER in the Russian language and investigate the generalization capabilities of models towards unseen named entities. A rule-based program developed by legal experts at Tag-Consulting Company was employed to automatically annotate legal texts and create the RuLegalNER dataset. Part of the named entities only exists in the development and test splits, and they are unseen in the training set. RuBERT was utilized as the base architecture for experimental evaluation. Two different architectural extensions were explored: RuBERT with CRF and RuBERT with adapters. These architectures were used to train and evaluate NER models on the RuLegalNER dataset. Utilize RuLegalNER to train and evaluate legal NER models, enhancing performance in the legal domain and studying generalization on unseen entities. A published version of RuLegalNER is presented with detailed statistics and demonstration of the usefulness of RuLegalNER by evaluating modern architectures.
Other Latest Articles
- Adaptive observer for state variables of a time-varying nonlinear system with unknown constant parameters and delayed measurements
- The exact solution of a shock wave reflection problem from a wall shielded by a gas suspension layer
- LANGUAGE OF BEIN SPORTS FOOTBALL COMMENTATORS: AN ANALYSIS OF LIVE ARAB FOOTBALL COMMENTARY
- Numerical simulation of gas dynamics during operation of a wide-range rocket nozzle with a porous insert
- Approach to the generalized parameters formation of the complex technical systems technical condition using neural network structures
Last modified: 2023-12-20 19:05:01