Rule Based Approach for Word Normalization in Transliterated Search Queries
Journal: International Journal of Linguistics and Computational Applications (Vol.7, No. 2)Publication Date: 2020-06-30
Authors : Varsha M. Pathak Manish R. Joshi;
Page : 5-10
Keywords : Information Retrieval; SMS Based Information System; Vector Space Model; Minimum Edit Distance; Noisy Query; Transliterated Search;
Abstract
SMS based Information Systems is the need of the age. Most of the present SMS based information systems send one way SMS based informative text messages generated from respective knowledge systems. By applying information retrieval methodology using models like Vector Space Mode, the systems can allow its users to send queries as per their requirement of information. This makes the system more fruitful from the user's point of view. This paper is about such initiatives for accessing relevant literature like poems, phrases, Rhymes, stories, abhang and much more. The mobile based quick library access system MQuickLib allows users to access such literature by formulating transliterated queries. The Vector Space Model is used to create the systems knowledge base by processing. The document terms and matched with the query terms by allowing variation in spelling due to transliteration style of the users. The matching score is assigned by devising a set of rules that identify the distance between two terms dk the term from document and qj the query term. The original Levenshtein's minimum edit distance algorithm is modified by applying this rule based approach. These rules are identified by collecting SMS queries from users for a given set of known queries in Marathi (Devnagari). Experiments were carried out for the collection of Marathi and Hindi literature that mainly include songs, gazals, powadas, bharud and other types. These documents are available in a standard transliteration form like ITRANS (an Indic Transliteration System). This paper elaborated a rule based approach and analyses the results to select appropriate rule based model that is further applied for the development of MQuickLib system.
Other Latest Articles
- Financial Distress, Company Size, and Ownership to Tax Avoidance in the Listed Indonesia Stock Exchange Companies
- E-Purchasing Trends for the Time of Covid-19 Pandemic
- E-Learning: Boon or Bane to Higher Education during COVID-19 at MENA Region
- Economy in Times of Crisis: An Economic Analysis of the Energy Policy during COVID-19 Pandemic
- Work Motivation and Supervisor Performance in Indonesia
Last modified: 2021-06-06 02:20:49