A Comprehensive Analysis of Web-based frequency in Multiword Expression Detection
Journal: Intelligent Systems and Applications in Engineering (IJISAE) (Vol.5, No. 3)Publication Date: 2017-09-30
Authors : Hande Aka Uymaz; Senem Kumova Metin;
Page : 145-153
Keywords : multiword expressions; occurrence frequency; web based-frequency; feature selection; supervised learning;
Abstract
Multiword expressions (MWEs) are syntactic and/or semantic units in language, where the meaning of whole is limitedly connected to the meanings of the constituting units. The most prominent property that distinguishes MWEs from random word combinations is the recurrence. The recurrence is commonly measured by the occurrence frequencies of the MWE and the constituting words. Though occurrence frequency measures are known to be best in distinguishing MWEs from random combinations, the performance of those measures depend mainly on the quality and size of the data source where frequencies are obtained. The main goal of this study is to provide a detailed analysis on the change in performance of frequency based measures when the traditional frequency source, corpus, is swapped with a massive and dynamic data source, the World Wide Web. In order to use the web as a frequency source, the constituting words and word combinations are queried among a popular search engine, and the number of results for each query is accepted to be web-based frequency for the regarding word/word combination. In this study, the web-based frequencies are employed in three different MWE detection-related experiments utilizing a Turkish data set. In first group of experiments, the individual performances of 20 well-known frequency metrics in ranking/sorting MWE candidates based on their tendency to be a MWE is examined. Secondly, the most successful frequency metrics are determined by a feature selection method: filtering. Lastly, MWE detection is accepted to be a classification problem. Eight supervised methods are applied in order to show the combined performance of frequency metrics when the frequency is obtained from web. In all experiments, the performance of web-based frequencies in identification of MWEs is compared to the performance of traditional corpus based frequencies. The experimental results showed that the use of web-based frequency in identification of MWEs reveals promising results.
Other Latest Articles
- A Novel Framework for Text Recognition in Street View Images
- Real-Time Fuzzy Logic Control of Switched Reluctance Motor
- Fraud Detection on Financial Statements Using Data Mining Techniques
- A Novel Hybrid Multi Criteria Decision Making Model: Application to Turning Operations
- COTTAPP: An Online University Timetable Application based on a Goal Programming Model
Last modified: 2017-10-09 15:49:36