ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Efficient Way for Handling Small Files using Extended HDFS?

Journal: International Journal of Computer Science and Mobile Computing - IJCSMC (Vol.3, No. 6)

Publication Date:

Authors : ; ;

Page : 785-789

Keywords : Hadoop; NameNode; Datanode; HDFS; EHDFS; ConstituentFileMap;

Source : Downloadexternal Find it from : Google Scholarexternal


Hadoop file system is for managing very large amount of files and high fault tolerant. Hadoop is widely used file system. Hadoopfile system is HDFS Hadoop Distributed File System. In HDFS, it is master-slave architecture, but in this architecture there is nothing shared between mater and slaves. Master is single server i.e. NameNode which stores metadata, in its own memory. As consequence, HDFS become problematic with number of small size files. But for storing and managing penalty number of small size imposes Burden on the NameNode, so it is difficult for NameNode to manage all DataNodes & burden on its memory. In HDFS, it does not provide any prefetching, combined file to improve the I/O performance. Extended Hadoop Distributed file system (EHDFS) improves storing and accessing efficiency of small file on file system, In this approach the files are stored in one file is known as combined file on Datanode or client. The ConstituentFileMap is most important in EHDFS for efficient management. Indexing mechanism is used to access individual files from combined file. To minimize load on NameNode and I/O performance improvement index prefetching is used. The footprint in NameNode is reduced then the file system becomes more efficient. EHDFS will minimize the time required for processing.

Last modified: 2014-07-01 19:09:29