TOPICS (Click to Navigate)
- Advanced Database Concepts
- Data structures, Operating Systems
- Natural Language Processing
- Quiz Questions and Answers
- DBMS, ADBMS Question Bank
- SQL
- RDBMS Exam and Interview Questions
- Parallel Databases
- ADBMS Quizzes
- Advanced DBMS Concepts
- Distributed Databases
- Modern Databases - Special Purpose Databases
- Object Based Database Systems
- Machine Learning MCQ
Showing posts with label Parallel Database. Show all posts
Showing posts with label Parallel Database. Show all posts
Saturday, November 19, 2016
Saturday, November 1, 2014
List of operations that can be parallelized in Oracle
Oracle operations that can be parallelized / List of operations that can be parallelized in Oracle / Overview of operations that can be parallelized
Operations that can be parallelized in Oracle
1. Access methods – few access methods
are full table scans, full index scans, partitioned index range scans.
Examples
Full table scans
– the following SQL query scans the table for identifying the records which
satisfy the given condition. Full table needs to be scanned in the case of
select queries with conditions on non-key attributes, aggregate operations.
SELECT * FROM Emp WHERE Ename
= “Suresh”;
Full index scans
- all columns in the SELECT and WHERE clauses must exist in the index. In such
case, full index scan is used. For example, the following SQL statement does
not need to access the table rows, and needs to analyze the index alone, if
we have an index on COLOR attribute.
SELECT DISTINCT color, COUNT(*)
FROM Product GROUP BY color;
Partitioned index range scans
- An index range scan is a common operation for accessing selective data. An
index is a table where data are sorted.
|
2. Join methods – few join methods
are nested loop joins, hash joins, and sort merge joins. These operations can be
parallelized to increase the performance.
Nested loop joins - Nested
loop joins are useful when small subsets of data are being joined and if the
join condition is an efficient way of accessing the second table.
Hash joins - Hash
joins are used for joining large data sets. The optimizer uses the smaller of
two tables or data sources to build a hash table on the join key in memory.
Sort merge joins - Sort
merge joins can be used to join rows from two independent sources. This join
type can be preferred over hash join if the rows are already sorted.
|
3. DDL statements – few DDL statements
are CREATE TABLE AS SELECT, CREATE INDEX, REBUILD INDEX, and REBUILD INDEX PARTITION
4. DML statements – few are INSERT AS SELECT,
updates, deletes, and MERGE operations
Parallel
DML (parallel insert, update, merge, and delete) uses parallel execution
mechanisms to speed up or scale up large DML operations against large
database tables and indexes. You can also use INSERT ... SELECT statements to
insert rows into multiple tables as part of a single DML statement. You can
normally use parallel DML where you use regular DML.
Example DML statement for
inserting data:
INSERT INTO tbl_temp2
(fld_id) SELECT tbl_temp1.fld_order_id FROM tbl_temp1 WHERE
tbl_temp1.fld_order_id > 100;
[Note:
The term parallel DML refers only to inserts, updates, upserts and deletes
done in parallel.]
|
5. Parallel query - You can
parallelize queries and sub-queries in SELECT statements, as well as the query
portions of DDL statements and DML statements (INSERT, UPDATE, DELETE, and MERGE).
6. Other SQL operations that can be parallelized
- GROUP BY, NOT IN, SELECT DISTINCT, UNION, UNION ALL, CUBE, and ROLLUP, as
well as aggregate and table functions.
Source: Oracle Documentation
List of RDBMSs that support parallel operations
List the RDBMSs that are supporting parallel operations / Parallel processing enabled RDBMSs / List of RDBMSs that are providing parallelism
List of RDBMSs that support parallel operations
RDBMS
|
Developer
|
Description
|
Sybase
|
·
It is a relational model database server product.
·
Commonly known as Sybase DB.
·
Available for Unix and Windows.
·
Since version 11.5, ASE has supported intra-query
parallelism.
|
|
Clustrix, Inc.
|
·
They are the developer of NewSQL database.
·
Clustrix offers a scale-out SQL database that lets
you simply add more nodes to your cluster as demand grows.
·
ClustrixDB employs massively parallel processing
(MPP) across its distributed cluster to parallelize and distribute SQL
queries, and uses all available resources of the cluster to accelerate the
queries.
|
|
IBM
|
·
IBM employed MPP to parallelize queries.
·
Shared Nothing Architecture is used.
·
DB2 can use two different methods for achieving query
parallelism: I/O or CPU.
|
|
EXASOL
|
·
EXASolution is an in-memory, column-oriented RDBMS.
·
It is a parallelized RDBMS runs on cluster of
standard hardware servers.
·
Shared nothing architecture is used.
·
Massively Parallel Processing (MPP)
·
Highly scalable.
|
|
Available under BSD license.
|
·
Written in Java
·
Fully mutithreaded
|
|
Informix Dynamic Server
|
IBM
|
·
Uses Parallel Database Query (PDQ) feature to run
queries in parallel.
·
PDQ along with table
fragmentation feature works well.
·
Uses combination of shared memory and shared nothing
architecture
|
Microsoft
|
·
By default, SQL Server will use all available CPUs
during query execution
·
Massively Parallel Processing
·
Shared nothing architecture
|
|
MonetDB Developer team
|
·
Open source
·
Column-oriented DBMS
·
Designed for multi-core parallel execution on
desktops
|
|
Oracle Corporation
|
·
Relies on Shared everything architecture
·
Oracle can flexibly parallelize almost all operations
in various ways and degrees, independent of the underlying data layout without
overloading the system.
·
Massively Parallel Processing
|
|
PostgreSQL Global Development Group
|
·
Postgres currently supports full parallelism in client-side code
·
Implements intraquery parallelism with threads.
·
Still in process to a complete parallelism support
like Oracle or DB2.
|
|
SAP (Systems, Applications & Products
in Data Processing)
*world’s 3rd largest software
company
|
·
It supports inter-query and intra-query parallelism
·
It can assign multiple threads to a single request,
thus achieving intra-query parallelism.
·
Each request can run on a single thread and execute
on a single processor, thus achieving inter-query parallelism
|
|
Public type company
|
·
Teradata RDBMS is designed for parallelism
·
The virtual processor (VPROC) is the basic unit of
parallelism
·
Teradata PT (Teradata Parallel Transporter) provides
parallelism support for data warehouse.
·
Teradata PT supports Pipeline, Data, and Multiple
parallelism
|
Subscribe to:
Posts (Atom)
Featured Content
Multiple choice questions in Natural Language Processing Home
MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...
All time most popular contents
-
Relational algebra in database management systems solved exercise Relational algebra – solved exercise Question: Consider the fo...
-
Draw an ER diagram for the given scenario, Construct an ER diagram to model a database for the given information, How to draw Entity Relat...
-
Bigram Trigram and NGram in NLP, How to calculate the unigram, bigram, trigram, and ngram probabilities of a sentence? Maximum likelihood...
-
Top 5 Machine Learning Quiz Questions with Answers explanation, Interview questions on machine learning, quiz questions for data scienti...
-
Bigram probability estimate of a word sequence, Probability estimation for a sentence using Bigram language model Bigram Model - Probab...