It is about
parallelizing a single relational operation given in a query.
* FROM Email ORDER BY Start_Date;
In the above
query, the relational operation is Sorting. Since a table may have large number
of records in it, the operation can be performed on different subsets of the
table in multiple processors, which reduces the time required to sort.
Consider another query,
SELECT * FROM Student, CourseRegd WHERE Student.Regno = CourseRegd.Regno;
In this query, the relational operation is Join. The query joins two tables Student, and CourseRegd on common attribute Regno. Parallelism is required here if the size of tables is very large. Usually, order of tuples does not matter in DBMS. Hence, the tables arranged in random order needs every record of one table should be matched with every record of other table to complete the join process. For example, if Student has 10000 records and CourseRegd has 60000 records, then join requires 10000 X 60000 comparisons. If we exploit parallelism in here, we could achieve better performance.
There are many
such relational operations which can be executed in parallel using many
processors on subsets of the table/tables mentioned in the query. The following
list includes the relational operations and various techniques used to
implement those operations in parallel.