ALGORITHM OF AUTOMATIC SEARCH FOR NON-STANDARD VOCABULARY UNITS WHEN CREATING A COMPREHNSIVE DICTIONARY
Journal: Current Issues in Philology and Pedagogical Linguistics (Vol.-, No. 2)Publication Date: 2022-06-25
Authors : Gorobets E.A. Mamontova A.V.;
Page : 131-142
Keywords : morphological analyzer; pymorphy2; computer lexicography; natural language processing; comprehnsive dictionary; linguagraphy; word form; dictionary entry; heading unit;
Abstract
The article discusses the experience of developing and using an automatic tool for optimizing linguagraphic work on the creating of comprehnsive dictionaries. Despite the high level of automatic processing of linguistic information in modern lexicography, a number of issues remain unresolved. The main problem in creating comprehnsive lexicographic sources is the combination of different dictionaries, since the heading units in them can be present in different forms, but at the same time refer to one lexeme; lexicographers spend a lot of time on the matching procedure, and this material has to be processed manually. The aim of the study was to solve the problem of identifying non-standard words by using a morphological analyzer. The program developed by the authors is designed to automatically select non-standard words from the list of heading units, which can significantly reduce the chance of errors, the time spent on creating a summary dictionary, and also minimize the necessity to process and interpret units manually. The development was carried out in Python 3.8.2 using the pymorphy2 morphological analyzer library version 0.9.1. The algorithm and program developed by the authors can be used for any list of words from which it is necessary to automatically select non-initial word forms. The created program was tested on a list of 22738 words from Comprehnsive etymological dictionary, 979 non-standard units were identified among them. The average processing time for the specified amount of words was 1.5 seconds, which proves the effectiveness of the created algorithm and the expediency of its further use in lexicographic practice.
Other Latest Articles
- CREATING AN ORAL EDUCATIONAL CORPUS OF THE RUSSIAN LANGUAGE FOR NON-NATIVE SPEAKERS: THE INITIAL RESULTS AND PROSPECT
- LINGUISTIC AND CULTURAL PECULIARITIES OF COMPREHENSION OF FIESTA ON THE MATERIAL OF THE SPANISH NATIONAL CORPUS
- DISTRIBUTIVE DICTIONARY OF THE HISTORICAL CORPUS “MANUSCRIPT”: PROBLEM STATEMENT, MATERIAL, METHODS
- Financial Effects of Dissolving the Marriage on the Alimony of the Wife
- Study of Variables Affecting the Relations between Iran and Turkey
Last modified: 2022-06-27 18:26:40