Aspect extraction from scientific paper texts
Journal: Software & Systems (Vol.35, No. 4)Publication Date: 2022-12-16
Authors : Marshalova A.E.; Bruches E.P.; Batura T.V.;
Page : 698-706
Keywords : natural language processing; analysis of text information; information extraction from text; data processing; machine learning; neural network;
Abstract
The paper focuses on the problem of automatic aspect extraction from the texts of Russian scientific papers. This problem is relevant due to the increase in the number of scientific publications and the growing need for automated extraction and structuring of key information from them. The study involved the creation of a corpus consisting of 291 abstracts of Russian scientific papers annotated with the following aspects: task, goal, contribution, method, tool, use, advantage, example, and conclusion. The paper provides descriptions and examples for each aspect. As a result of the corpus annotation, 1494 aspects were identified with 44 % of them were the contribution aspect. In addition, the paper proposes an algorithm for automatic aspect extraction. The paper considers the aspect extraction problem as a sequence-labeling problem. The BERT neural network is used to implement the algorithm. The authors have conducted a number of experiments related to the use of vectors obtained from various language models, as well as to freezing the weights of the model. A multilingual model finetuned on our data, that is, trained without freezing of the weights, has shown the best result. To improve the quality of aspect extraction, some heuristics, which are listed in the paper, have been developed, and the model has been further trained on the new data obtained from automatic labeling followed by manual editing. The developed system can be useful to other researchers, as it simplifies selection of publications on a particular topic, review of methods for solving a particular problem, and analysis of results obtained in other works.
Other Latest Articles
- Terms extraction from texts of scientific papers
- Semiotic network editing software for robot control systems
- Classification of common design patterns for multi-agent systems
- TOOL WEAR OF (AL, CR, W) N-COATINGS ON CEMENTED CARBIDE TOOLS PREPARED BY ARC ION PLATING IN DRY CUTTING OF SINTERED STEEL
- Determination of Hydrogeochemistry, Drinking and Irrigation Properties of Groundwaters in the Northwest Section of Afyon Plain
Last modified: 2023-08-03 19:07:20