Design and Implementation of K-Means and Hierarchical Document Clustering on Hadoop
Journal: International Journal of Science and Research (IJSR) (Vol.3, No. 10)Publication Date: 2014-10-05
Authors : Y. K. Patil; V. S. Nandedkar;
Page : 1566-1570
Keywords : Hadoop; Tf-Idf; Cosine Similarity; K-means and Hierarchical clustering;
Abstract
Document clustering is one of the important areas in data mining. Hadoop is being used by the Yahoo, Google, Face book and Twitter business companies for implementing real time applications. Email, social media blog, movie review comments, books are used for document clustering. This paper focuses on the document clustering using Hadoop. Hadoop is the new technology used for parallel computing of documents. The computing time complexity in Hadoop for document clustering is less as compared to JAVA based implementations. In this paper, authors have proposed the design and implementation of Tf-Idf, K-means and Hierarchical clustering algorithms on Hadoop.
Other Latest Articles
- Effects of Paternal Age and Cigarette Smoking on Human Semen Parameters: A Retrospective Study on Infertile Couples
- Independent Lict Subdivision Domination in Graphs
- An Assessment of the Effects of Retirement Age on Organizational Productivity: A Case of Kenya Power Company, Kenya
- Measurement of Bithmuth (214Bi) in Indoor Air and Evaluation of Deposition Fraction
- Enterprise Software Management Systems by Using Security Metrics
Last modified: 2021-06-30 21:10:56