ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

A Survey of Various Fault Tolerance Checkpointing Algorithms in Distributed System

Journal: International Journal of Advanced Networking and Applications (Vol.7, No. 02)

Publication Date:

Authors : ; ;

Page : 2682-2689

Keywords : Checkpointing; Distributed systems; Fault tolerance; Mobile computing system; Rollback recovery;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

A distributed system is a collection of independent entities that cooperate to solve a problem that cannot be individually solved. Checkpoint is defined as a fault tolerant technique. It is a save state of a process during the failure-free execution, enabling it to restart from this checkpointed state upon a failure to reduce the amount of lost work instead of repeating the computation from beginning. The process of restoring form previous checkpointed state is known as rollback recovery. A checkpoint can be saved on either the stable storage or the volatile storage depending on the failure scenarios to be tolerated. Checkpointing is major challenge in mobile ad hoc network. The mobile ad hoc network architecture is one consisting of a set of self configure mobile hosts(MH) capable of communicating with each other without the assistance of base stations, some of processes running on mobile host. The main issues of this environment are insufficient power and limited storage capacity. This paper surveys the algorithms which have been reported in the literature for checkpointing in distributed systems as well as Mobile Distributed systems.

Last modified: 2015-11-30 18:54:15