ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

DOCUMENT SUMMARIZATION USING SENTENCE BASED TOPIC MODELING AND CLUSTERING

Journal: International Journal of Advanced Research (Vol.6, No. 5)

Publication Date:

Authors : ; ;

Page : 285-291

Keywords : Term Frequency Natural Language Processing Text Summarization Structural Topic Modeling (STM) Pre-processing Tokenization Topic Modeling.;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

In recent years, the practical application of automatic document summarization has become popular and numerous papers published based on the topic. There are many approaches to identify the significant portion of each document. Topic representation and modelling is an intermediate representation of the text that captures the topics discussed in the input and aids the automatic summarization. The significance of sentences decided based on the representations of topics in the input document. This article attempts to provide a comprehensive summary that includes sentence extraction, tokenization on the extracted sentences. Sentence based Structural Topic Modeling (STM) is used to determine important content for each domain in the integrated document and sentences are grouped using k-means clustering under each topic. Further Text Summarization of sentences under each topic achieved using its Term Frequency of each sentence. Finally, the sentences are arranged based on its Lexical Ranking score in the summarized text.

Last modified: 2018-06-22 17:06:01