ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Effect of Near-orthogonality on Random Indexing Based Extractive Text Summarization

Journal: International Journal of Innovation and Applied Studies (Vol.3, No. 3)

Publication Date:

Authors : ; ;

Page : 701-713

Keywords : Word Space; Random Indexing; Index vector; Context vector; Near-orthogonal; PageRank;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Application of Random Indexing (RI) to extractive text summarization has already been proposed in literature. RI is an approximating technique to deal with high-dimensionality problem of Word Space Models (WSMs). However, the distinguishing feature of RI from other WSMs (e.g. Latent Semantic Analysis (LSA)) is the near-orthogonality of the word vectors (index vectors). The near-orthogonality property of the index vectors helps in reducing the dimension of the underlying Word Space. The present work focuses on studying in detail the near-orthogonality property of random index vectors, and its effect on extractive text summarization. A probabilistic definition of near-orthogonality of RI-based Word Space is presented, and a thorough discussion on the subject is conducted in this paper. Our experiments on DUC 2002 data show that while quality of summaries produced by RI with Euclidean distance measure is almost invariant to near-orthogonality of the underlying Word Space; the quality of summaries produced by RI with cosine dissimilarity measure is strongly affected by near-orthogonality. Also, it is found that RI with Euclidean distance measure performs much better than many LSA-based summarization techniques. This improved performance of RI-based summarizer over LSA-based summarizer is significant because RI is computationally inexpensive as compared to LSA which uses Singular Value Decomposition (SVD) - a computationally complex algebraic technique for dimension reduction of the underlying Word Space.

Last modified: 2013-08-21 22:28:24