ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

ALGORITHM OF AUTOMATIC SEARCH FOR NON-STANDARD VOCABULARY UNITS WHEN CREATING A COMPREHNSIVE DICTIONARY

Journal: Current Issues in Philology and Pedagogical Linguistics (Vol.-, No. 2)

Publication Date:

Authors : ;

Page : 131-142

Keywords : morphological analyzer; pymorphy2; computer lexicography; natural language processing; comprehnsive dictionary; linguagraphy; word form; dictionary entry; heading unit;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

The article discusses the experience of developing and using an automatic tool for optimizing linguagraphic work on the creating of comprehnsive dictionaries. Despite the high level of automatic processing of linguistic information in modern lexicography, a number of issues remain unresolved. The main problem in creating comprehnsive lexicographic sources is the combination of different dictionaries, since the heading units in them can be present in different forms, but at the same time refer to one lexeme; lexicographers spend a lot of time on the matching procedure, and this material has to be processed manually. The aim of the study was to solve the problem of identifying non-standard words by using a morphological analyzer. The program developed by the authors is designed to automatically select non-standard words from the list of heading units, which can significantly reduce the chance of errors, the time spent on creating a summary dictionary, and also minimize the necessity to process and interpret units manually. The development was carried out in Python 3.8.2 using the pymorphy2 morphological analyzer library version 0.9.1. The algorithm and program developed by the authors can be used for any list of words from which it is necessary to automatically select non-initial word forms. The created program was tested on a list of 22738 words from Comprehnsive etymological dictionary, 979 non-standard units were identified among them. The average processing time for the specified amount of words was 1.5 seconds, which proves the effectiveness of the created algorithm and the expediency of its further use in lexicographic practice.

Last modified: 2022-06-27 18:26:40