Pipelined Parallelism and Independent Parallelism

Friday, August 15, 2014

Pipelined Parallelism and Independent Parallelism

Pipelined Parallelism and Independent Parallelism / Types of Interoperation Parallelism / What is Pipelined Parallelism? / What is Independent Parallelism?

Interoperation Parallelism

It is about executing different operations of a query in parallel. A single query may involve multiple operations at once. We may exploit parallelism to achieve better performance of such queries. Consider the example query given below;

SELECT AVG(Salary) FROM Employee GROUP BY Dept_Id;

It involves two operations. First one is an Aggregation and the second is grouping. For executing this query,

We need to group all the employee records based on the attribute Dept_Id first.

Then, for every group we can apply the AVG aggregate function to get the final result.

We can use Interoperation parallelism concept to parallelize these two operations.

[Note: Intra-operation is about executing single operation of a query using multiple processors in parallel]

The following are the variants using which we would achieve Interoperation Parallelism;

1. Pipelined Parallelism

2. Independent Parallelism

1. Pipelined Parallelism

In Pipelined Parallelism, the idea is to consume the result produced by one operation by the next operation in the pipeline. For example, consider the following operation;

r1 ⋈ r2 ⋈ r3 ⋈ r4

The above expression shows a natural join operation. This actually joins four tables. This operation can be pipelined as follows;

Perform temp1 ← r1 ⋈ r2 at processor P1 and send the result temp1 to processor P2 to perform temp2 ← temp1 ⋈ r3 and send the result temp2 to processor P3 to perform result ← temp2 ⋈ r4. The advantage is, we do not need to store the intermediate results, and instead the result produced at one processor can be consumed directly by the other. Hence, we would start receiving tuples well before P1 completes the join assigned to it.

Disadvantages:

1. Pipelined parallelism is not the good choice, if degree of parallelism is high.

2. Useful with small number of processors.

3. Not all operations can be pipelined. For example, consider the query given in the first section. Here, you need to group at least one department employees. Then only the output can be given for aggregate operation at the next processor.

4. Cannot expect full speedup.

2. Independent Parallelism:

Operations that are not depending on each other can be executed in parallel at different processors. This is called as Independent Parallelism.

For example, in the expression r1 ⋈ r2 ⋈ r3 ⋈ r4, the portion r1 ⋈ r2 can be done in one processor, and r3 ⋈ r4 can be performed in the other processor. Both results can be pipelined into the third processor to get the final result.

Disadvantages:

Does not work well in case of high degree of parallelism.

Major links

Quicklinks

Friday, August 15, 2014