Distributed Database Question Bank with Answers
1. In the Two-Phase Commit Protocol (2PC), why can blocking never be completely eliminated, even when the participants elect a new coordinator?
After the election,
the new coordinator may crash as well. In this case, the remaining participants
may not be able to reach a final decision, because this requires the vote from
the newly elected coordinator, just as before.
The greatest disadvantage of the two-phase
commit protocol is that it is a blocking protocol. If the coordinator fails
permanently, some participants will never resolve their transactions: After a
participant has sent an agreement message to the coordinator, it will block
until a commit or rollback is received.
2. How should a small/large relation be partitioned?
It is better to
assign a small relation to a single disk (single partition). Partitioning
across several sites will never improve the performance.
A large relation
should be partitioned across all the available disks (to increase the use of
parallelism). In case of large relations, partitioning will be helpful a lot.
3. Which among the three architectures (share disk, shared memory, shared nothing) supports the failure of one processor?
Shared disk
architecture is fault-tolerant to the extent that if a processor fails, other
processors can take over its tasks as all processors can access the database
resident on the disks.
4. Which form of parallelism (inter-query, intra-query, inter-operation, intra-operation) is most likely to advance the following goals and why?
i) Increasing
the throughput and response time of the system when there are lots of smaller
queries.
Inter-query parallelism is about executing multiple queries
simultaneously. When there are smaller queries, this parallelism will increase
the throughput, that is, the number of queries executed per unit time will be
increased. Other type of parallelism don’t help much for small queries.
ii) Increasing
the throughput and response time of the system, when there are a few large
queries and the systems consist of several disks and processors.
Intraquery parallelism is likely to reduce throughput times efficiently
in the cases of a few large queries. A large number of processors is best made
use of using intraoperation parallelism, these queries typically contain only a few operations (when there are few
queries, then the number of simultaneous operation may even be less than the
number of processors), but each operation usually refers to large number of
tuples.
An example would be
parallel sort: each processor
executes the same operation, in this case sorting, simultaneously with the
other processors. There is no interaction with other processors.
5. Suppose a company which processes transactions is growing rapidly and needs to choose a new, parallel computer. What would be more important, speedup or transaction scale-up? Why?
Since the scale of operations increases, one can expect the number of
transactions submitted per unit time to increase. Individual transactions don’t
have to take more time, so scale-up is the most important thing.
*******************
Download
Related links:
- Go to Distributed Database Quiz page
- Go to Database Management Systems Quiz page
- Go to Distributed Database Home page