Wednesday, July 16, 2014

Set of possible system failure modes in distributed database

Possible System Failure Modes in Distributed Database / Types of System Failures in Distributed Database / How does 2PC protocol handles failures in distributed database?

The known errors or failures like software errors, hardware failures, hard disk failures, and power failures are very common in both centralized database system and distributed database systems. Apart from these common failures, a distributed database system may suffer from some of the failures as listed below;
Failure of a site - Distributed database consists of two or more servers. These servers are otherwise called as site. Any of these sites might fail. Though it is a hardware or software failure, as a distributed system it must be treated differently.
Loss of messages - The messages which are shared in between a set of sites might be lost. TCP/IP protocols are responsible to handle these losses.
Failure of a communication link - A connection/communication links between a set of sites might be failed. In such case, the distributed database system may try to identify an alternate route to send the messages.
Network partition - A distributed database system is said to be partitioned if it has two or more subsystems. A subsystem may be a set of one or more sites which has one connection to the other subsystems. For example, consider a distributed database system which manages sites at three different college campuses. Every campus may be internally having more sites. But they are connected to other campuses through a single connection. Now the problem is, if this connection is failed, the distributed database system cannot differentiate or diagnose the actual problem. The failure can be treated as Failure of a site, loss of messages, or a communication link failure.

Among all the failures discussed above, Failure of a site and Network partition need extra care when handling failures in a distributed database.

Please recall from the post Distributed Transactions, the various components of Transaction System Structure. As shown in the figure below, every site has its own Transaction Coordinator, and Transaction Manager. In distributed database systems, the resources are shared among many sites. Hence, the site which initiates a transaction T may be treated as coordinating site (the Transaction coordinator [TC] is responsible). The other sties which are participating in the process of completing the transaction T may be called as participating sites (the Transaction managers [TM] of those sites are responsible). So, the failure of a site may be treated in two difference sense; failure of a participating site and failure of a coordinator
Figure 1 - Transaction System Structure in a Distributed Database

The links that are given below will take you to the posts which about handling failures in distributed database.

         Failure of a coordinator

         Network partition


Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents

data recovery