ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Automatic Generation of Association Thesaurus Based on Domain-Specific Text Collection

Proceeding: 10th International Academic Conference (IAC)

Publication Date:

Authors : ; ; ;

Page : 529-538

Keywords : LSA; thesaurus; chi-square test; graph;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

The given work examines distributive approach for automatic generation of the associative thesauri of a definite domain. Distributive approach is based on assumption that presence of associative link among terms of the domain is defined by the statistics of their co-occurence in thematically related discources. The advantage of distributive approach is defined by the fact that it uses raw basic material (for example collection of documents of the domain) and it does not use additional knowledge about the domain. Distributive approach is supported only by mathematical apparatus of statistics and does not take into account neither lexical nor semantic information, that is why this approach let cover extensive lexical space of terms. However it leads to the main shortcoming of the approach, i.e. it produces excessive amount of “unnecessary” links among words which are less informative from utilitarian point of view. For solving set problems in the given work it is suggested to use special approach represented by combination of methods of distributive statistics, latent semantic analysis and graph theory.

Last modified: 2015-03-07 19:44:21