Computer Science and Engineering - Tutorials, Notes, MCQs, Questions and Answers

Sunday, March 23, 2014

Parallel Join Technique - Partitioned Join

Partitioned Join

The relational tables that are to be joined gets partitioned on the joining attributes of both tables using same partitioning function to perform Join operation in parallel.

How does Partitioned Join work?

Assume that relational tables r and s need to be joined using attributes r.A and s.B. The system partitions both r and s using same partitioning technique into n partitions. In this process A and B attributes (joining attributes) to be used as partitioning attributes as well for r and s respectively. r is partitioned into r₀, r₁, r₂, …, r_n-1and s is partitioned into s₀, s₁, s₂, …, s_n-1. Then, the system sends partitions r_i and s_i into processor P_i, where the join is performed locally.

What type of joins can we perform in parallel using Partitioned Join?

Only joins such as Equi-Joins and Natural Joins can be performed using Partitioned join technique.

Why Equi-Jion and Natural Jion?

Equi-Join or Natural Join is done between two tables using an equality condition such as r.A = s.B. The tuples which are satisfying this condition, i.e, same value for both A and B, are joined together. Others are discarded. Hence, if we partition the relations r and s on applying certain partitioning technique on both A and B, then the tuples having same value for A and B will end up in the same partition. Let us analyze this using simple example;

*RegNo*	*SName*	*Gen*	*Phone*
1	Sundar	M	9898786756
3	Karthik	M	8798987867
4	John	M	7898886756
2	Ram	M	9897786776

Table 1 - STUDENT

*RegNo*	*Courses*
4	Database
2	Database
3	Data Structures
1	Multimedia

Table 2 – COURSES_REGD

Let us assume the following;

The RegNo attributes of tables STUDENT and COURSES_REGD are used for joining.

Observe the order of tuples in both tables. They are not in particular order. They are stored in random order on RegNo.

Partition the tables on RegNo attribute using Hash Partition. We have 2 disks and we need to partition the relational tables into two partitions (possibly equal). Hence, n is 2.

The hash function is, h(RegNo) = (RegNo mod n) = (RegNo mod 2). And, if we apply the hash function we shall get the tables STUDENT and COURSES_REGD partitioned into Disk₀ and Disk₁ as stated below.

Partition 0

Partition 1

*RegNo*	*SName*	*Gen*	*Phone*
4	John	M	7898886756
2	Ram	M	9897786776

STUDENT_0

*RegNo*	*SName*	*Gen*	*Phone*
1	Sundar	M	9898786756
3	Karthik	M	8798987867

STUDENT_1

*RegNo*	*Courses*
4	Database
2	Database

COURSES_REGD_0

*RegNo*	*Courses*
3	Data Structures
1	Multimedia

COURSES_REGD_1

From the above table, it is very clear that the same RegNo values of both tables STUDENT and COURSES_REGD are sent to same partitions. Now, join can be performed locally at every processor in parallel.

One more interesting fact about this join is, only 4 (2 Student records X 2 Courses_regd records) comparisons need to be done in every partition for our example. Hence, we need total of 8 comparisons in partitioned join against 16 (4 X 4) in conventional join.

The above discussed process is shown in Figure 1.

Figure 1 - Partitioned Join

Points to note:

1. There are only two ways of partitioning the relations,

Range partitioning on the join attributes or

Hash partitioning on the join attributes.

2. Only equi-joins and natural joins can be performed in parallel using Partitioned Join technique.
3. Non-equi-joins cannot be performed with this method.

4. After successful partitioning, the records at every processor can be joined locally using any of the joining techniques hash join, merge join, or nested loop join.

5. If Range partitioning technique is used to partition the relations into n processors, Skew may present a special problem. That is, for some partitions, we may get fewer records (tuples) for one relation for a given range and many records for other relation for the same range.

6. With Hash partitioning, if there are many tuples with same value in one relation then the difference both relations is possible in one partition. Otherwise, skew has minimal effect.

7. The number of comparisons between relations are well reduced in partitioned join parallel technique.

"The more I help others to succeed, the more I succeed."

- Ray Kroc

TOPICS (Click to Navigate)

Sunday, March 23, 2014

Parallel Join Technique - Partitioned Join

Partitioned Join

How does Partitioned Join work?

What type of joins can we perform in parallel using Partitioned Join?

Why Equi-Jion and Natural Jion?

Points to note:

Featured Content

Multiple choice questions in Natural Language Processing Home

All time most popular contents

Report Abuse