## Partitioned Parallel Hash Join in Parallel Database / Parallel Hash Join Technique

## Partitioned Parallel Hash Join

The
Sequential Hash Join technique discussed in the post Hash Join technique in DBMS
can be parallelized. .

### The Technique

Please
go through the post Hash Join technique in DBMS for better understanding of
Parallel Hash-Join.

#### Assume that we have,

·
n
processors, P0, P1, …, Pn-1

·
two
tables, r and s (r and s are already partitioned into disks of n processors),

·
s
is the smaller table

### Parallel Hash-Join Algorithm:

Step
1: Choose a hash function

*h1*.*h1*should be able to take the**of each tuple of relations r and s, and maps them to one of the n processors.***join attribute value*
Step
2: Smaller relations s is taken as the Build relation.

Step
3: Each processor Pi reads the tuples of smaller relation s in it’s disk Di, and
sends them to different processors based on the hash function h1.

Step
4: On receiving the records of s

_{i}at every destination processor Pi, the Pi further partitions the tuples using another hash function*h2*. This hash function is used for joining the tuples locally at every processor. This step is independent for any processor.
Step
5: On completion of the distribution of all the records of smaller relation s,
it is now the turn for the larger relation r. Step 3 is done for all the
records of relation r.

Step
6: On receiving the records of r

_{i}at every destination processor Pi, the Pi further partitions the tuples using another hash function*h2*. This phase is called the Probe Phase.
Step
7: At last, all the records which can be joined will be in same processors’
same partitions. Each processor Pi executes the build and probe phases of the
hash–join algorithm on the local partitions r

_{i}and s_{i}of r and s to produce a partition of the final result of the hash–join.#### Points to remember:

Hash
Join at each processor is independent of the other.

It
works for equi-join and natural join conditions