HIGH PERFORMANCE SEQUENCE MINING OF BIG DATA USING HADOOP MAPREDUCE IN CLOUD
Journal: International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) (Vol.5, No. 4)Publication Date: 2016-09-08
Authors : Dr.B.LAVANYA; G.LALITHA;
Page : 113-120
Keywords : Keywords: Big data; MAPREDUCE; SVD; LSI.;
Abstract
Abstract Text mining can handle unstructured data. The proposed work extracts text from a PDF document is converted to plain text format, then document is tokenized and serialized. Document clustering and categorization is done by finding similarities between documents stored in cloud. Similar documents are identified using Singular Value Decomposition (SVD) method in Latent Semantic Indexing (LSI). Then similar documents are grouped together as a cluster. A comparative study is done between LFS (Local File System) and HDFS (HADOOP DISTRIBUTED FILE SYSTEM) with respect to rapidity and dimensionality. The System has been evaluated on real-world documents and the results are tabulated.
Other Latest Articles
- Protonation Equilibria Of L-Glutamic Acid And L-Histidine In Low Dielectric Media
- The Melanoma Skin Cancer Detection and Feature Extraction through Image Processing Techniques
- Smart Office Automation System
- A New Approach to Detect Clone Attack in WSN
- Experimental Studies on Concrete with Bentonite as Mineral Admixture
Last modified: 2016-09-08 19:30:33