ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Failures and Fault Tolerance in Distributed Systems

Journal: INTERNATIONAL JOURNAL OF ELECTRONICS & DATA COMMUNICATION (Vol.3, No. 1)

Publication Date:

Authors : ;

Page : 21-26

Keywords : Crash Failure; Asynchrony; Two phase commit; Log file; Data Replication;

Source : Download Find it from : Google Scholarexternal

Abstract

There are two types of systems that people can use when setting up their network. They can either use a distributed system or a centralized system. In a centralized system, if a program fails for any reason, the simple solution is to abort then restart its transactions. On the other hand, chances to see a single machine fail are low. Things are quite different in the case of a distributed system with thousands of computers. Failure becomes a possibly frequent situation, due to program bugs, human errors, hardware or network problems, etc. Or we can say that distributed systems are hard to program and manage. This paper discusses basic concepts related to failure, fault and errors and emphasizes more on failure management, fault tolerance and failure recovery in distributed systems.

Last modified: 2016-07-04 17:27:05