ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

HIGH PERFORMANCE SEQUENCE MINING OF BIG DATA USING HADOOP MAPREDUCE IN CLOUD

Journal: International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) (Vol.5, No. 4)

Publication Date:

Authors : ; ;

Page : 113-120

Keywords : Keywords: Big data; MAPREDUCE; SVD; LSI.;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

Abstract Text mining can handle unstructured data. The proposed work extracts text from a PDF document is converted to plain text format, then document is tokenized and serialized. Document clustering and categorization is done by finding similarities between documents stored in cloud. Similar documents are identified using Singular Value Decomposition (SVD) method in Latent Semantic Indexing (LSI). Then similar documents are grouped together as a cluster. A comparative study is done between LFS (Local File System) and HDFS (HADOOP DISTRIBUTED FILE SYSTEM) with respect to rapidity and dimensionality. The System has been evaluated on real-world documents and the results are tabulated.

Last modified: 2016-09-08 19:30:33