Parallel Database - Intraoperation Parallelism

Wednesday, March 12, 2014

Parallel Database - Intraoperation Parallelism

Intra-operation Parallelism

It is about parallelizing a single relational operation given in a query.

SELECT * FROM Email ORDER BY Start_Date;

In the above query, the relational operation is Sorting. Since a table may have large number of records in it, the operation can be performed on different subsets of the table in multiple processors, which reduces the time required to sort.

Consider another query,

SELECT * FROM Student, CourseRegd WHERE Student.Regno = CourseRegd.Regno;

In this query, the relational operation is Join. The query joins two tables Student, and CourseRegd on common attribute Regno. Parallelism is required here if the size of tables is very large. Usually, order of tuples does not matter in DBMS. Hence, the tables arranged in random order needs every record of one table should be matched with every record of other table to complete the join process. For example, if Student has 10000 records and CourseRegd has 60000 records, then join requires 10000 X 60000 comparisons. If we exploit parallelism in here, we could achieve better performance.

There are many such relational operations which can be executed in parallel using many processors on subsets of the table/tables mentioned in the query. The following list includes the relational operations and various techniques used to implement those operations in parallel.