Wednesday, April 23, 2014

Hash Join technique in DBMS

Hash Join Technique in DBMS / Simple Hash Join Technique / Sequential Hash Join in Database / Hash join in database example


Hash Join in database

Hash join is one type of joining techniques that are used to process a join query. Hash join is proposed for performing joins that are Natural joins or Equi-joins. There are several variants of hash joins, like Simple Hash Join, Partitioned Hash Join, and Hybrid Hash Join.

Simple Hash Join:

Input: The list of tables to be joined, say r and s and the joining attributes r.A and s.B of tables r and s.
Output: Joined relation.


Steps:

1. Identify a hash function such that the result of the function should point to one of the identified range of values (or buckets, say bucket 0, bucket 1, …, bucket 9 for example).

2. Identify the smaller relation among the relations that are to be joined, say r and s.

3. Partition the smaller relation, say r, into the identified buckets using the join attribute of r. That is, r is partitioned into available buckets as r0, r1, r2, …, r9. This step is called Building Phase (sometimes Partitioning Phase).

4. Partition the relation s into the identified buckets using the join attribute of s. That is, s is partitioned into available buckets as s0, s1, s2, …, s9. This step is called Probing Phase as the records of s probe for the matching records in the appropriate buckets.

5. Finally the result can be collected from different buckets and submitted as result.

How does simple Hash join work?

Example:

Consider the following two table instances for tables Employee(EmpID, EName, Dept) and Department(DeptNo, DName, DLocation);


EmpID
EName
DeptNo
E101
Kumar
2
E103
Wess
1
E107
Terry
1
E102
Gowri
2
E110
Morgan
3
E111
Sachin
3
Table 1 - Employee
DeptNo
DName
DLocation
1
Design
Chennai
2
Production
Chennai
3
Administration
Bangalore
Table 2 – Department

Step 1: Hash function looks like the one given below;
h(Hash_key) = Hash_key_value mod n
where n is the number of partitions (or buckets). Let us choose the DeptNo attribute as hash key and partition the relation into 3 partitions such as 0, 1, and 2. Then, our hash function will look like as follows;
h(DeptNo) = DeptNo mod 3

Step 2: The Department table is the smaller table.

Step 3: Partition Department using the hash function. The first record of Department will go into Bucket 1, second record into Bucket 2, and third record into Bucket 0.

DeptNo
DName
DLocation
1
Design
Chennai
2
Production
Chennai
3
Administration
Bangalore
Figure 1 - Hashing of Department table
 
Step 4: Now partition the larger relation Employee using the same hash function. That is use the following hash function;
h(DeptNo) = DeptNo mod 3
The figure given below shows the partitioning of Employee table into 3 buckets.

EmpID
EName
DeptNo
E101
Kumar
2
E103
Wess
1
E107
Terry
1
E102
Gowri
2
E110
Morgan
3
E111
Sachin
3
Figure 2 - Hashing of Employee table

After successful partitioning, our hash buckets will look the figure given below; in this figure, the first table shows 0th partition of Department table, and the second one shows the 0th partition of Employee table.


Figure 3 - Bucket status after hashing of all records of the tables to be joined

Carefully observe the data stored in every bucket. Bucket 0 stores records of zeroth partitions of both tables where the attribute DeptNo (joining attribute) is having the same value, i.e, 3. It is true for other partitions also. Hence, at this stage the join is trivial. This is why this phase of this joining technique is named Probing phase, where the Probe relation searches for the joining attribute value of other relation and gets joined. At this stage, the joining is trivial.

The major advantage is that, if we join with conventional join technique, then the comparison requires,
6 records X 3 records = 18 comparisons
When we use hash join, we need
(1 record X 2 records in Bucket 0)+ (1 X 2 in Bucket 1)+ (1 X 2 in Bucket 2) = 6 comparisons

Points to note:

1. Only Equi-join or Natural Join can be performed using Hash join
2. Simple hash join assumed one of the relations as small relation, i.e, the whole relation can fit into memory.
3. Smaller relation is chosen in the building phase.
4. Only a single pass through the tables is required. That is one time scanning of relation is required.
5. Hash function should be chosen such that it should not cause skewness.


Friday, April 18, 2014

Database Normal Forms - Quiz 1

Normalization and Normal Forms in Database Design - Quiz 1


1. For a relation R, we are given that the domains of all attributes of R are indivisible. With this information, we can say the relation R is in
    1 NF
    2 NF
    3 NF
    BCNF

2. In a table with the schema STU(STUNO, STUNAME, ADDRESS, COURSENO, COURSENAME) and with the functional dependencies, STUNO --> STUNAME ADDRESS, COURSENO --> COURSENAME, STUNO COURSENO --> STUNAME ADDRESS COURSENAME, which of the following anomalies would present?
    Insetion Anomaly
    Deletion Anomaly
    Update Anomaly
    All of the above

3. A relation which has only atomic attributes and every non-key attribute fully functionally dependent on candidate keys is said to be in
    1 NF
    2 NF
    3 NF
    BCNF

4. Consider the schema given in question 2. Is the relation STU in 2 NF?
    YES
    NO

5. What does Atomic attribute mean?
    It is one type of attribute used in DBMS
    An attribute whose values can be divided into meaningful subparts
    An attribute whose values cannot be divided into meaningful subparts
    None of the above

6. If you have removed repeated groups of values from a relation and also removed the partial key dependencies, then we would say that the given relation is in
    1 NF
    2 NF
    3 NF
    BCNF

7. Assume a table with the schema STU(STUNO, STUNAME, ADDRESS, PINCODE) and with the functional dependencies, STUNO --> STUNAME ADDRESS PINCODE, PINCODE --> ADDRESS. Is this relation is in 2 NF?
    YES
    NO

8. A relation is said to be in 3 NF if which of the following is/are true?
    No partial key dependencies
    All attributes are atomic
    No presence of transitive dependencies
    All of the above

9. Assume a table with the schema STU(STUNO, STUNAME, ADDRESS, PINCODE) and with the functional dependencies, STUNO --> STUNAME ADDRESS PINCODE, PINCODE --> ADDRESS. Is this relation is in 3 NF?
    YES
    NO

10. Assume a table with the schema STU(STUNO, STUNAME, ADDRESS, COURSENO, COURSENAME) and with the functional dependencies, STUNO --> STUNAME ADDRESS, COURSENO --> COURSENAME, STUNO COURSENO --> STUNAME ADDRESS COURSENAME with no transitive dependency. The relation STU is in _______ NF?
    1 NF
    2 NF
    3 NF
    BCNF

Score =

Correct answers:

* You can also try other DBMS quizzes here

Monday, April 14, 2014

What is deadlock in database?

What is Deadlock? / Deadlock - Explained with examples / How to handle deadlocks?


What is Deadlock?

In Database Management System, Deadlock is part of discussion in Transaction Processing Component. Deadlock is a situation where two or more transactions waiting for locks on some data items which are locked by other transactions in an incompatible mode.

Here, incompatible mode would mean one of the following;
A read lock request for a data item which is locked in write mode
A write lock request for a data item which is locked in read mode
A write lock request for a data item which is locked in write mode.


Example:

Assume that two transactions T1 and T2 are needed data items A and B to be locked. In the lock acquiring process, let us suppose T1 locked A successfully and T2 locked B successfully. For successfully completing the transactions, T1 needs B also to be locked and T2 needs A. The problem is T1 cannot release lock on A and T2 cannot release lock on B. This situation is called deadlock. If you carefully observe this you would understand that we have formed a cycle which leads to deadlock.
Deadlock example in DBMS
1 (a)
  
deadlock example in dbms
1 (b)
Figure 1 - (a) Deadlock occurrence with two transactions, (b) deadlock occurrence with three transactions


Real time example of Deadlock situation:



Let us assume two bank transactions, namely T1 and T2 as follows;

T1 – Transaction which transfers money, say Rs. 5000 from account A to account B. T1 needs to lock both A and B in Write mode (Exclusive Lock). T1 is said to be completed if and only if it successfully updates the old balance of A and B with a new balance and commits the transaction.

T1 would involve two update queries;
UPDATE ACCOUNT SET balance = balance -5000 WHERE account = ‘A’;
UPDATE ACCOUNT SET balance = balance +5000 WHERE account = ‘B’;

T2 – Transaction which updates all the accounts with yearly interest, say 5%. T2 need to lock all the accounts in Write mode (Exclusive lock). T2 is said to be completed if and only if it successfully updates the old balances of all the accounts with the new balances and commits the transaction.

T2 would involve one update query;
UPDATE ACCOUNT SET balance = balance + (balance*0.05);

Assume that the transactions are executed as follows;

T1 has started and acquired Write lock on account A. T1 can now debit the amount to be transferred from account A and save the new value of A. (cannot commit the transaction T1 at this stage)

T2 also has started at the same time and acquired Write lock on B along with other accounts. T2 can now update all the accounts with their interest amounts except account A. (because account A is held by T1. Hence, T2 cannot commit as well at this stage).

At this stage, both T1 and T2 are waiting for each other which leads to deadlock. This is shown in Figure 2.

deadlock situation example
Figure 2 - Deadlock situation example


What is the solution to handle deadlocks if occured?

The only solution is to pick up one of the Transactions and roll back the same.


definition of deadlock in rdbms
deadlock during database transactions explained 
real time example of deadlock detection
solution to handle deadlock

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents