Showing posts with label Parallel Database. Show all posts
Showing posts with label Parallel Database. Show all posts

Saturday, 19 November 2016

Parallel Query Processing - Examples

Parallel query execution - example SQL queries / Inter-query, Intra-query parallelism examples / Inter-operation and Intra-operation parallelism examples / Independent and Pipelined parallelism examples

Parallel Query Processing

Go back to PARALLEL DATABASES home page

Saturday, 1 November 2014

List of operations that can be parallelized in Oracle

Oracle operations that can be parallelized / List of operations that can be parallelized in Oracle / Overview of operations that can be parallelized

Operations that can be parallelized in Oracle

1. Access methods – few access methods are full table scans, full index scans, partitioned index range scans.
Full table scans – the following SQL query scans the table for identifying the records which satisfy the given condition. Full table needs to be scanned in the case of select queries with conditions on non-key attributes, aggregate operations.
SELECT * FROM Emp WHERE Ename = “Suresh”;
Full index scans - all columns in the SELECT and WHERE clauses must exist in the index. In such case, full index scan is used. For example, the following SQL statement does not need to access the table rows, and needs to analyze the index alone, if we have an index on COLOR attribute.
Partitioned index range scans - An index range scan is a common operation for accessing selective data. An index is a table where data are sorted.

2. Join methods – few join methods are nested loop joins, hash joins, and sort merge joins. These operations can be parallelized to increase the performance.
Nested loop joins - Nested loop joins are useful when small subsets of data are being joined and if the join condition is an efficient way of accessing the second table.
Hash joins - Hash joins are used for joining large data sets. The optimizer uses the smaller of two tables or data sources to build a hash table on the join key in memory.
Sort merge joins - Sort merge joins can be used to join rows from two independent sources. This join type can be preferred over hash join if the rows are already sorted.

We can normally use parallel DDL where we use regular DDL. There are, however, some additional details to consider when designing your database. One important restriction is that parallel DDL cannot be used on tables with object or LOB columns.
Parallel operations require accurate statistics to perform optimally.

4. DML statements – few are INSERT AS SELECT, updates, deletes, and MERGE operations
Parallel DML (parallel insert, update, merge, and delete) uses parallel execution mechanisms to speed up or scale up large DML operations against large database tables and indexes. You can also use INSERT ... SELECT statements to insert rows into multiple tables as part of a single DML statement. You can normally use parallel DML where you use regular DML.
Example DML statement for inserting data:
INSERT INTO tbl_temp2 (fld_id) SELECT tbl_temp1.fld_order_id FROM tbl_temp1 WHERE tbl_temp1.fld_order_id > 100;
[Note: The term parallel DML refers only to inserts, updates, upserts and deletes done in parallel.]

5. Parallel query - You can parallelize queries and sub-queries in SELECT statements, as well as the query portions of DDL statements and DML statements (INSERT, UPDATE, DELETE, and MERGE).

6. Other SQL operations that can be parallelized - GROUP BY, NOT IN, SELECT DISTINCT, UNION, UNION ALL, CUBE, and ROLLUP, as well as aggregate and table functions.

Source: Oracle Documentation

List of RDBMSs that support parallel operations

List the RDBMSs that are supporting parallel operations / Parallel processing enabled RDBMSs / List of RDBMSs that are providing parallelism

List of RDBMSs that support parallel operations

·        It is a relational model database server product.
·        Commonly known as Sybase DB.
·        Available for Unix and Windows.
·        Since version 11.5, ASE has supported intra-query parallelism.
Clustrix, Inc.
·        They are the developer of NewSQL database.
·        Clustrix offers a scale-out SQL database that lets you simply add more nodes to your cluster as demand grows.
·        ClustrixDB employs massively parallel processing (MPP) across its distributed cluster to parallelize and distribute SQL queries, and uses all available resources of the cluster to accelerate the queries.

·        IBM employed MPP to parallelize queries.
·        Shared Nothing Architecture is used.
·        DB2 can use two different methods for achieving query parallelism: I/O or CPU.

·        EXASolution is an in-memory, column-oriented RDBMS.
·        It is a parallelized RDBMS runs on cluster of standard hardware servers.
·        Shared nothing architecture is used.
·        Massively Parallel Processing (MPP)
·        Highly scalable.
Available under BSD license.
·        Written in Java
·        Fully mutithreaded
Informix Dynamic Server
·        Uses Parallel Database Query (PDQ) feature to run queries in parallel.
·        PDQ along with table fragmentation feature works well.
·        Uses combination of shared memory and shared nothing architecture
·        By default, SQL Server will use all available CPUs during query execution
·        Massively Parallel Processing
·        Shared nothing architecture
MonetDB Developer team
·        Open source
·        Column-oriented DBMS
·        Designed for multi-core parallel execution on desktops
Oracle Corporation
·        Relies on Shared everything architecture
·        Oracle can flexibly parallelize almost all operations in various ways and degrees, independent of the underlying data layout without overloading the system.
·        Massively Parallel Processing
PostgreSQL Global Development Group
·        Postgres currently supports full parallelism in client-side code
·        Implements intraquery parallelism with threads.
·        Still in process to a complete parallelism support like Oracle or DB2.
SAP (Systems, Applications & Products in Data Processing)
*world’s 3rd largest software company
·        It supports inter-query and intra-query parallelism
·        It can assign multiple threads to a single request, thus achieving intra-query parallelism.
·        Each request can run on a single thread and execute on a single processor, thus achieving inter-query parallelism
Public type company
·        Teradata RDBMS is designed for parallelism
·        The virtual processor (VPROC) is the basic unit of parallelism
·        Teradata PT (Teradata Parallel Transporter) provides parallelism support for data warehouse.
·        Teradata PT supports Pipeline, Data, and Multiple parallelism

Simple introduction to Naive Bayes classifier

Simple introduction to Naive Bayes classifier What is Naive Bayes Classifier? A Naive Bayes classifier is a probabilistic classifier ...