Parallel Join Technique - Partitioned Join



Partitioned Join

The relational tables that are to be joined gets partitioned on the joining attributes of both tables using same partitioning function to perform Join operation in parallel.

How does Partitioned Join work?


Assume that relational tables r and s need to be joined using attributes r.A and s.B. The system partitions both r and s using same partitioning technique into n partitions. In this process A and B attributes (joining attributes) to be used as partitioning attributes as well for r and s respectively. r is partitioned into r0, r1, r2, …, rn-1 and s is partitioned into s0, s1, s2, …, sn-1. Then, the system sends partitions ri and si into processor Pi, where the join is performed locally.

What type of joins can we perform in parallel using Partitioned Join?


Only joins such as Equi-Joins and Natural Joins can be performed using Partitioned join technique.

Why Equi-Jion and Natural Jion?


Equi-Join or Natural Join is done between two tables using an equality condition such as r.A = s.B. The tuples which are satisfying this condition, i.e, same value for both A and B, are joined together. Others are discarded. Hence, if we partition the relations r and s on applying certain partitioning technique on both A and B, then the tuples having same value for A and B will end up in the same partition. Let us analyze this using simple example;

RegNo
SName
Gen
Phone
1
Sundar
M
9898786756
3
Karthik
M
8798987867
4
John
M
7898886756
2
Ram
M
9897786776
Table 1 - STUDENT
RegNo
Courses
4
Database
2
Database
3
Data Structures
1
Multimedia
Table 2 – COURSES_REGD

Let us assume the following;
The RegNo attributes of tables STUDENT and COURSES_REGD are used for joining.
Observe the order of tuples in both tables. They are not in particular order. They are stored in random order on RegNo.
Partition the tables on RegNo attribute using Hash Partition. We have 2 disks and we need to partition the relational tables into two partitions (possibly equal). Hence, n is 2.
The hash function is, h(RegNo) = (RegNo mod n) = (RegNo mod 2). And, if we apply the hash function we shall get the tables STUDENT and COURSES_REGD partitioned into Disk0 and Disk1 as stated below.

Partition 0
Partition 1

RegNo
SName
Gen
Phone
4
John
M
7898886756
2
Ram
M
9897786776
STUDENT_0


RegNo
SName
Gen
Phone
1
Sundar
M
9898786756
3
Karthik
M
8798987867
STUDENT_1

RegNo
Courses
4
Database
2
Database
COURSES_REGD_0


RegNo
Courses
3
Data Structures
1
Multimedia
COURSES_REGD_1

From the above table, it is very clear that the same RegNo values of both tables STUDENT and COURSES_REGD are sent to same partitions. Now, join can be performed locally at every processor in parallel.
One more interesting fact about this join is, only 4 (2 Student records X 2 Courses_regd records) comparisons need to be done in every partition for our example. Hence, we need total of 8 comparisons in partitioned join against 16 (4 X 4) in conventional join.

The above discussed process is shown in Figure 1.
Figure 1 - Partitioned Join

  
 Points to note:


1. There are only two ways of partitioning the relations,
                Range partitioning on the join attributes or
                Hash partitioning on the join attributes.
2. Only equi-joins and natural joins can be performed in parallel using Partitioned Join technique.
3. Non-equi-joins cannot be performed with this method.
4. After successful partitioning, the records at every processor can be joined locally using any of the joining techniques hash join, merge join, or nested loop join.
5. If Range partitioning technique is used to partition the relations into n processors, Skew may present a special problem. That is, for some partitions, we may get fewer records (tuples) for one relation for a given range and many records for other relation for the same range.

6. With Hash partitioning, if there are many tuples with same value in one relation then the difference both relations is possible in one partition. Otherwise, skew has minimal effect.
7. The number of comparisons between relations are well reduced in partitioned join parallel technique.




"The more I help others to succeed, the more I succeed."



2 comments:

Popular Posts