Identification and extraction of multiword expressions from Hindi & Urdu language in natural language processing
Journal: International Journal of Advanced Technology and Engineering Exploration (IJATEE) (Vol.9, No. 91)Publication Date: 2022-06-30
Authors : Vaishali Gupta; Nisheeth Joshi;
Page : 807-826
Keywords : Bigrams; Tags; Multiword expression (MWE); Conditional random field (CRF); Confusion matrix.;
Abstract
Text can be translated from one language to another using statistical machine translation, but there are still gaps in the translations because of a lack of language resource material. Building a linguistic corpus necessarily requires the extraction of multiword expressions (MWE). MWE is a collection of words with idiomatic expression properties. However, due to its non-compositional meaning of distinctive words, identifying and extracting MWE is a time-consuming task. In this case, an automated system has been developed for the extraction of MWEs from Hindi and Urdu language sources automatically. The entire process includes tagging, pattern matching, an identification algorithm, and the extraction of MWEs from the data. Tagging each word with a unique part of speech tag is used as an input to the pattern-matching algorithm. Using pattern matching, MWE tags of specific patterns were selected, and the algorithm for automatic MWE detection was built on top of that. The conditional random field (CRF++) model was used to automatically extract the MWEs from data. Confusion matrix was used to conduct the automated evaluation of this proposed system. For Hindi and Urdu, the calculated overall accuracy is 96.82% and 96.62%, respectively.
Other Latest Articles
- Modeling, analysis and design of Solar PV based hydrogen energy storage system for residential applications
- Stability of uncertain 2-D discrete delayed systems with saturation
- Development and study of false ceiling panels using pumice and steel square mesh-case of Ethiopia
- Performance based seismic analysis of RC structural system under earthquake excitation
- H∞ artificial bee colony law strategy of six degrees of freedom Boeing 747-100 control augmentation
Last modified: 2022-08-08 17:44:41