Implementation of Hadoop Based Framework for Parallel Processing of Biological Data
Journal: International Journal of Science and Research (IJSR) (Vol.4, No. 4)Publication Date: 2015-04-05
Authors : Praveen Kumar B; Nirmala Bariker;
Page : 1087-1091
Keywords : Hadoop; Hadoop Distributed File System; Map; Reduce; Small Files; Iris plants data; Decision Tree; Classification rule;
Abstract
Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale datafrom high-throughput sequencing. Hadoop is designed to process large data sets (petabytes). It becomes a bottleneck, when handling massive small files because the name node utilize more memory to store the metadata of files and the data nodes consumes more CPU time to process massive small files. The open source Apache Hadoop project, which in this paper, presenting the Optimized Hadoop, consists of Merge Model to merge massive small files into a single large file and introduced the efficient indexing mechanism and adopts the MapReduce frame-work using decision classification rule for analysis and Diagnosis of Iris Plants data through a distributed file system to achieve scalable, efficient and reliable computing performance on Linux clusters of low cost commodity machines. Our experimental result shows that Optimized Hadoop improves performance of processing small files drastically up to 90.83 % and effectively reduces the memory utilization of the name node to store the metadata of files.
Other Latest Articles
- Comparison of Nerve Conduction Studies in Geriatric Normal and Diabetic Subjects
- Bamboo Fiber: An Approach toward Sustainable Development
- Parallelizing Coherent Rule Mining Algorithm on CUDA
- Severity of Diarrhea and Dehydration in Children Under 5 Years
- Comparison of Alkaline phosphatase, Lactate Dehydrogenase and Acid Phosphatase Levels in Serum and Synovial Fluid between Patients with Rheumatoid Arthritis and Osteoarthritis
Last modified: 2021-06-30 21:44:39