Failures and Fault Tolerance in Distributed Systems
Journal: INTERNATIONAL JOURNAL OF ELECTRONICS & DATA COMMUNICATION (Vol.3, No. 1)Publication Date: 2013-02-15
Authors : Vishal Sood;
Page : 21-26
Keywords : Crash Failure; Asynchrony; Two phase commit; Log file; Data Replication;
Abstract
There are two types of systems that people can use when setting up their network. They can either use a distributed system or a centralized system. In a centralized system, if a program fails for any reason, the simple solution is to abort then restart its transactions. On the other hand, chances to see a single machine fail are low. Things are quite different in the case of a distributed system with thousands of computers. Failure becomes a possibly frequent situation, due to program bugs, human errors, hardware or network problems, etc. Or we can say that distributed systems are hard to program and manage. This paper discusses basic concepts related to failure, fault and errors and emphasizes more on failure management, fault tolerance and failure recovery in distributed systems.
Other Latest Articles
- The impact of Facebook on Zimbabwean University students: Culture dilution or Pedagogical?
- Software Quality Risk Management: A Case Study
- Information Technology: Nerve System of Hospitality Industry
- Sign Language Recognition using Digital Image Processing
- GROUPING OF CLOUD ZONES TO IMPROVED THE SECURITY MECHANISM IN CLOUD COMPUTING
Last modified: 2016-07-04 17:27:05