Showing posts with label Distributed Database. Show all posts
Showing posts with label Distributed Database. Show all posts

Tuesday, July 5, 2016

Distributed Transactions

Transactions in distributed database / Transaction manager / Transaction coordinator / Commit protocols / 2 Phase commit protocol

Transactions in Distributed Database Management System


In centralized database system, it is mandatory to perform any transaction (ie., accessing any data items) under the satisfaction of ACID (Atomicity, Consistency, Isolation, and Durability) properties. The act of preserving the ACID properties for any transaction is mandatory in Distributed Database (distributed transaction) also. In case of distributed transactions, there are two types based on the location of accessed data. The first one, the local transactions involve read, write, or update of data in only one local database. Whereas, the global transactions involve read, write, or update of data in many such local databases.

Distributed database – Transaction system

The transaction system consists of two important components, 1. Transaction manager (which is similar to the transaction manager in centralized database. But, in distributed database we have one for every site), whose main job is to verify the ACID properties of those transactions that execute at that site, 2. Transaction Coordinator, (available for every site in distributed database) to manage and coordinate various transactions (both local and global) initiated at that site.

Transaction System Structure for Distributed Database

Each Transaction Manager is responsible for,
·         Maintaining a log for recovery purpose,
·         Participating in an appropriate concurrency-control scheme (more on this later) to coordinate the concurrent execution of the transactions executing at that site.

Every Transaction Coordinator is responsible for,
·         Starting the execution of every transaction at that site,
·         Breaking the transaction into a number of sub-transactions and distributing these sub-transactions to the appropriate sites for execution (In the diagram, the links from TC1 to TM2, TC1 to TM3, TC1 to TM4 and so on and TC1 to TMn mention the distribution of sub-transactions to the concerned transaction managers),
·         Coordinating the termination of the transaction.

Commit Protocols

As we noted earlier, satisfaction of ACID properties is very well important. At the initial stage, we need to ensure that the transaction is atomic (i.e., either completed as a whole or not). In distributed database, the transaction, say T, which is going on in multiple sites must be committed at all sites to say that the transaction T is successfully completed. If not, the transaction T must be aborted at all the sites. To implement this, we need a commit protocol. The simplest among the commit protocol is the Two-Phase Commit protocol which is widely used.


Two Phase Commit Protocol

Two Phase Commit Protocol, Explain 2PC protocol, How does 2 Phase Commit protocol work?, Two phase protocol in handling distributed transactions, 2PC and distributed concurrency control

Two Phase Commit protocol in Distributed Database

Two Phase Commit (2PC) Protocol
Consider a transaction T initiated at site Sitei. And, at that site the transaction coordinator is TCi. When the transaction started, TCi distributes the sub-transactions to the sites where the data needed for those sub-transactions available. When T completed its execution at all the sites at which T has executed, the transaction managers (TMs) of those sites inform TCi about the completion. Then TCi starts the 2PC protocol. It works as follows;

Set of messages used for communication in 2PC protocol are,

<prepare T>
send by the coordinator to all the participating sites for preparing for commit. It is always sent by the coordinator whenever a transaction is ready.
<ready T>
send by the transaction manager of the participating site as reply for <prepare T> message, if the sending site is ready for commit the ongoing transaction.
<abort T>
send by the transaction manager of the participating site and later by the coordinator to all the participating site, if any one or more of the participating sites are not ready to commit.
<no T>
it is the log written to the log file of the local system by the transaction manager of the participating site, if it is not ready to commit (which also send <abort T> to the coordinator)
<commit T>
send by the coordinator if all the sites are ready for a commit.

Phase 1: Transaction Coordinator TCi inserts a <preapare T> message into the log file, and forces the log into stable storage (for example, hard disk) for the recovery purpose. Then it sends <prepare T> message to all the sites where the transaction T is being executed. On receiving such message, the TM of the participating site must decide to commit or not based on its status. If the TM of the receiving site decided not to commit for some reasons (failure of transaction, message failure, locking etc.), it write <no T> to its log, and sends <abort T> message to the coordinator TCi. If the TM is read to commit, then it sends <ready T> message to the coordinator TCi. In both the cases, (i.e., no T, or ready T), the messages first written into the stable storage of that site where it is decided and send back to the coordinator.

Phase 2: when TCi receives reply messages for <prepare T> message, or after the pre-specified time interval, the TCi can decide the fate of the transaction. Transaction T can be committed if it received <ready T> message from all the participating sites of the transaction T. Then TCi write a message <commit T> into its stable storage and send <commit T> to all the participating sites for them to commit the transaction. If any one of the reply is <abort T> or no reply on the specified time interval, the transaction must be aborted. In this case, a <abort T> message must be written into stable storage and sent to all the participating sites to abort as well.


How does 2PC protocol work?

What is two phase commit protocol in distributed transactions

Explain 2PC protocol

What are the two phases in 2PC protocl

How distributed transactions are managed using two phase commit protocol?

Wednesday, February 24, 2016

Fragmentation in Distributed Database - Quiz

Database System Architectures - Quiz

1. What are the advantages of Replication of data in Distributed database?
    Avaliability, Parallelism, Increased data transfer
    Availability, Parallelism, Reduced data transfer
    Availability, Increased parallelism, Cost of updates
    All of the above

2. A fragmentation technique wherein every tuple of a table is assigned to one or more fragments as a result of fragmentation is called ________________ .
    Vertical Fragmentation
    Horizontal Fragmentation
    Hybrid Fragmentation
    None of the above

Assume a relation EMP as given below:
                EMP(EmpNo, EName, Job, Sal, Department)
Furthermore, assume that there are two applications which are accessing the above mentioned table. One application typically retrieves information about employees who earn more than Rs5.000, the other application typically manages information about 'clerks' (job). Also, assume that there are employees with other designations and different salaries stored in EMP. With this information, answer the questions 3 to 6.

3. Which of the following are the simple predicates which can be directly extracted from the given applications?
    {Job = clerk, Salary>5000}
    {Job = clerk, Salary<5000}
    {Job = Manager, Salary>5000}
    {Job = Manager, Salary<5000}

4. How many valid minterm predicates we can derive for the above said problem?

5. Assume that the departments are 'Finance', 'Production', and 'Design'. If there is one more application which accesses the informatoin frequently based on the 'Finance' department, what would be the number of valid minterm fragments?

6. If by mistake, I miss one of the valid minterm fragments. What would be the effect of that in fragmentation?
    Causes skew
    Reconstruction of EMP will be unsuccessful
    Slows down the database access

7. Which of the following failures are unique to distributed database systems?
    Failure of a site
    Loss of messages
    Network Partition
    All the above

8. For the given set of simple predicates Pr, how many min-term predicates we can derive (including invalid ones). Pr = {Branch = “Vellore”, Branch = “Chennai”, Salary <=20000, Salary > 20000} Assumptions: Consider there are five different branches.

9. In a distributed database application, if we have very many number of read only queries than update queries then ______________ allocation technique is advantageous.
    Hybrid Fragmentation
    Horizontal Fragmentation
    Vertical Fragmentation

10. Which of the following would be the advantage of Database Fragmentation?
    Most of the operations are local to any sites
    Reduced Network Traffic
    Parallel processing
    All the above

Score =

Correct answers:

Go to Distributed Database Quizzes - Home page

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents

data recovery