Computer Science and Engineering - Tutorials, Notes, MCQs, Questions and Answers: Intraoperation Parallelism

Range Partitioning Sort

Assumptions:

Assume n processors, P₀, P₁, …, P_n-1 and n disks D₀, D₁, …, D_n-1.

Disk D_i is associated with Processor P_i.

Relation R is partitioned into R₀, R₁, …, R_n-1 using Round-robin technique or Hash Partitioning technique or Range Partitioning technique (if range partitioned on some other attribute other than sorting attribute)

Objective:

Our objective is to sort a relation (table) R_i that resides on n disks on an attribute A in parallel.

Steps:

Step 1: Partition the relations R_ion the sorting attribute A at every processor using a range vector v. Send the partitioned records which fall in the i^th range to Processor P_i where they are temporarily stored in D_i.

Step 2: Sort each partition locally at each processor P_i. And, send the sorted results for merging with all the other sorted results which is trivial process.

Point to note:

Range partition must be done using a good range-partitioning vector. Otherwise, skew might be the problem.

Example:

Let us explain the above said process with simple example.Consider the following relation schema Employee;

Employee (Emp_ID, EName, Salary)

Assume that relation Employee is permanently partitioned using Round-robin technique into 3 disks D₀, D₁, and D₂ which are associated with processors P₀, P₁, and P₂. At processors P₀, P₁, and P₂, the relations are named Employee0, Employee1 and Employee2 respectively. This initial state is given in Figure 1.

Figure 1

Assume that the following sorting query is initiated.

SELECT * FROM Employee ORDER BY Salary;

As already said, the table Employee is not partitioned on the sorting attribute Salary. Then, the Range-Partitioning technique works as follows;

Step 1:

At first we have to identify a range vector v on the Salary attribute. The range vector is of the form v[v0, v1, …, vn-2]. For our example, let us assume the following range vector;

v[14000, 24000]

This range vector represents 3 ranges, range 0 (14000 and less), range 1 (14001 to 24000) and range 2 (24001 and more).

Redistribute the relations Employee0, Employee1 and Employee2 using these range vectors into 3 disks temporarily. After this distribution disk 0 will have range 0 records (i.e, records with salary value less than or equal to 14000), disk 1 will have range 1 records (i.e, records with salary value greater than 14000 and less than or equal to 24000), and disk 2 will have range 2 records (i.e, records with salary value greater than 24000).

This redistribution according to range vector v is represented in Figure 2 as links to all the disks from all the relations. Temp_Employee0, Temp_Employee1, and Temp_Employee2, are the relations after successful redistribution. These tables are stored temporarily in disks D₀, D₁, and D₂. (They can also be stored in Main memories (M₀, M₁, M₂) if they fit into RAM).

Figure 2

Step 2:

Now, we got temporary relations at all the disks after redistribution.

At this point, all the processors sort the data assigned to them in ascending order of Salary individually. The process of performing the same operation in parallel on different sets of data is called Data Parallelism.

Final Result:

After the processors completed the sorting, we can simply collect the data from different processors and merge them. This merge process is straightforward as we have data already sorted for every range. Hence, collecting sorted records from partition 0, partition 1 and partition 2 and merging them will give us final sorted output.

Inter-query Parallelism

It is a form of parallelism where many different Queries or Transactions are executed in parallel with one another on many processors.

Advantages

It increases Transaction Throughput. That is, number of transactions executed in a given time can be increased.

It scales up the Transaction processing system. Hence, best suited for On-Line Transaction Processing (OLTP) systems.

Supported Parallel Database Architectures

It is easy to implement in Shared Memory Parallel System. Lock tables and Log information are maintained in the same memory. Hence, it is easy to handle those transactions which shares locks with other transactions. Locking and logging can be done efficiently.

In other parallel architectures like Shared Disk and Shared Nothing, the locking and logging must be done through message passing between processors, which is considered as costly operation when compared Shared Memory Parallel architecture. Cache coherency problem would occur.

Example Database systems which support Inter-query Parallelism

Oracle 8 and Oracle Rdb

Intra-Query Parallelism

It is the form of parallelism where Single Query is executed in parallel on many processors.

Advantages

To speed up a single complex long running queries.

Best suited for complex scientific calculations (queries).

Supported Parallel Database Architectures

SharedMemory, Shared Disk and Shared Nothing parallel architectures are supported. We need not worry about locking and logging as because it involves parallelizing single query.

Types

Intra-operation parallelism – the process of speeding up a query through parallelizing the execution of individual operations. The operations which can be parallelized are Sort, Join, Projection, Selection and so on.

Inter-operation parallelism – the process of speeding up a query through parallelizing various operations which are part of the query. For example, a query which involves join of 4 tables can be executed in parallel in two processors in such a way that each processor shall join two relations locally and the result1 and result2 can be joined further to produce the final result.

Example Database systems which support Intra-query Parallelism

Informix, Terradata

“Live as if you were to die tomorrow. Learn as if you were to live forever.”
Mahatma Gandhi

TOPICS (Click to Navigate)

Tuesday, March 11, 2014

Parallel Sort - Range Partitioning Sort

Range Partitioning Sort

Assumptions:

Objective:

Steps:

Point to note:

Example:

Friday, March 7, 2014

Interquery and Intraquery Parallelism in Parallel Database

Inter-query Parallelism

Advantages

Supported Parallel Database Architectures

Example Database systems which support Inter-query Parallelism

Intra-Query Parallelism

Advantages

Supported Parallel Database Architectures

Types

Example Database systems which support Intra-query Parallelism

Featured Content

Multiple choice questions in Natural Language Processing Home

All time most popular contents

Report Abuse