Partitioned Join
The
relational tables that are to be joined gets partitioned on the joining
attributes of both tables using same partitioning function to perform Join
operation in parallel.
How does Partitioned Join work?
Assume that relational
tables r and s need to be joined using attributes r.A and s.B. The system
partitions both r and s using same partitioning technique into n partitions. In
this process A and B attributes (joining attributes) to be used as partitioning
attributes as well for r and s respectively. r is partitioned into r0, r1, r2,
…, rn-1 and s is partitioned into s0, s1, s2,
…, sn-1. Then, the system sends partitions ri and si
into processor Pi, where
the join is performed locally.
What type of joins can we perform in parallel using Partitioned Join?
Only joins
such as Equi-Joins and Natural Joins can be performed using Partitioned join
technique.
Why Equi-Jion and Natural Jion?
Equi-Join or
Natural Join is done between two tables using an equality condition such as r.A
= s.B. The tuples which are satisfying this condition, i.e, same value for both
A and B, are joined together. Others are discarded. Hence, if we partition the
relations r and s on applying certain partitioning technique on both A and B,
then the tuples having same value for A and B will end up in the same
partition. Let us analyze this using simple example;
RegNo
|
SName
|
Gen
|
Phone
|
1
|
Sundar
|
M
|
9898786756
|
3
|
Karthik
|
M
|
8798987867
|
4
|
John
|
M
|
7898886756
|
2
|
Ram
|
M
|
9897786776
|
Table 1 - STUDENT
RegNo
|
Courses
|
4
|
Database
|
2
|
Database
|
3
|
Data Structures
|
1
|
Multimedia
|
Table 2 – COURSES_REGD
Let us assume
the following;
The RegNo attributes of tables STUDENT and COURSES_REGD are used for
joining.
Observe the order of tuples in both
tables. They are not in particular order. They are stored in random order on
RegNo.
Partition the tables on RegNo attribute
using Hash Partition. We have 2 disks and we need to partition the relational
tables into two partitions (possibly equal). Hence, n is 2.
The hash
function is, h(RegNo) = (RegNo mod n) = (RegNo mod 2). And, if we apply the
hash function we shall get the tables STUDENT and COURSES_REGD partitioned into
Disk0 and Disk1 as stated below.
Partition 0
|
Partition 1
|
||||||||||||||||||||||||
STUDENT_0
|
STUDENT_1
|
||||||||||||||||||||||||
COURSES_REGD_0
|
COURSES_REGD_1
|
From the
above table, it is very clear that the same RegNo values of both tables STUDENT
and COURSES_REGD are sent to same partitions. Now, join can be performed
locally at every processor in parallel.
One more
interesting fact about this join is, only 4 (2 Student records X 2 Courses_regd
records) comparisons need to be done in every partition for our example. Hence,
we need total of 8 comparisons in partitioned join against 16 (4 X 4) in conventional
join.
The above discussed process is shown in Figure 1.
Figure 1 - Partitioned Join |
Points to
note:
1. There are
only two ways of partitioning the relations,
Range partitioning on the join
attributes or
Hash partitioning on the join
attributes.
2. Only
equi-joins and natural joins can be performed in parallel using Partitioned
Join technique.
3. Non-equi-joins cannot be performed with this method.
3. Non-equi-joins cannot be performed with this method.
4. After successful
partitioning, the records at every processor can be joined locally using any of
the joining techniques hash join, merge join, or nested loop join.
5. If Range
partitioning technique is used to partition the relations into n processors, Skew may present a special problem. That
is, for some partitions, we may get fewer records (tuples) for one relation for
a given range and many records for other relation for the same range.
6. With Hash
partitioning, if there are many tuples with same value in one relation then the
difference both relations is possible in one partition. Otherwise, skew has
minimal effect.
7. The number of comparisons between relations are well reduced in partitioned join parallel technique.