Failure of a Participating Site / Handling the failure of a participating site by 2PC protocol in distributed database / 2 Phase Commit (2PC) protocol failure handling techniques
Recall the messages used by the
TC (Transaction Coordinator) to perform any transaction in a Distributed
Database.
<Prepare T> - It is send by the TC to the TM (Transaction
Managers) of all the participating sites. It instructs the participating sites
to ready for Commit the transaction T.
<Ready T> - This message is send by the TMs of all the
participating sites to the TC if they are ready to commit the transaction T.
<Abort T> - This message indicates that the system
which sends this message is not ready to commit or cannot commit T. This
message can be send by the participating sites (TMs can send) or by the
initiating site (TC can send).
<Commit T> - Instructs the system to commit, i.e,
permanently store the changes in the database.
Handling
a Failure of a participating site
Let us assume that the failed
site is Si and the Transaction Coordinator is TC.There are two
things we need to look into to handle such failure;
1. The response of the Transaction Coordinator of transaction T.
If
the failed site have not sent any <ready T> message, the TC cannot decide
to commit the transaction [Remember, in distributed
database all the participating sites must be ready to commit. Even if, one site
is not ready, then the whole transaction needs to be aborted by the TC].
Hence, the transaction T should be aborted and other participating sites to be
informed.
If
the failed site have sent a <ready T> message, the TC can assume that the
failed site also was ready to commit, hence the transaction can be committed by
TC and the other sites will be informed to commit. In this case, the site which recovers from failure has to execute the
2PC protocol to set its local database up-to-date.
2. The response of the failed site when it recovers.
When recover from failure, the
recovering site Si must identify the fate of the transactions which
were going on during the failure of Si. This can be done by
examining the log file entries of site Si.
The following are the possible
cases and relevant actions;
[a]
If the log contains a <commit T>
entry - It means that all the other sites including Si have responded with <ready T> message to TC and TC
must have send <commit T> to
all the participants. Because, the participating sites are not allowed to
insert <commit T> message in
the log file without the coordinator’s decision. Hence, the recovered site Si
can perform redo(T). That is, T is
executed once again locally by Si.
[b]
If the log contains an <abort T>
entry – Any site can have <abort
T> message in its entry, if the decision taken by the coordinator TC is to
abort the transaction T. Hence, site Si executes undo(T).
[c]
If the log contains a <ready T>
entry – This means that the site Si failed immediately after sending
its own status on transaction T. Now, it has contact the TC or other sites for deciding the fate of the transaction T.
The first
choice is to contact the TC of transaction T. If the TC have an entry <commit T>, then according to the
above discussions, it is clear that the Si have to perform redo(T). If the TC have an entry <abort T>, then Si performs undo(T).
The
second choice is to contact the other sites which have participated in
transaction T (this choice is chosen only if TC is not available). Then the
decision can be taken based on the other sites’ log entries.
[d]
If the log contains no control messages, i.e, no <abort T, <commit T>,
or <ready T> - It clearly shows that the site Si has
failed well before responding to the <prepare
T> message. Hence, the TC must have aborted the transaction. So, Si
needs to execute a undo(T).
This is how the 2PC handles the
failure of a participating Site.
The handling of other types of failures can be visited through the following links;