Parallel Database Architectures

Saturday, February 22, 2014

Parallel Database Architectures

Parallel Database Architecture

Today everybody interested in storing the information they have got. Even small organizations collect data and maintain mega databases. Though the databases eat space, they really helpful in many ways. For example, they are helpful in taking decisions through a decision support system. To handle such a voluminous data through conventional centralized system is bit complex. It means, even simple queries are time consuming queries. The solution is to handle those databases through Parallel Database Systems, where a table / database is distributed among multiple processors possibly equally to perform the queries in parallel. Such a system which share resources to handle massive data just to increase the performance of the whole system is called Parallel Database Systems.

We need certain architecture to handle the above said. That is, we need architectures which can handle data through data distribution, parallel query execution thereby produce good throughput of queries or Transactions. Figure 1, 2 and 3 shows the different architecture proposed and successfully implemented in the area of Parallel Database systems. In the figures, P represents Processors, M represents Memory, and D represents Disks/Disk setups.

1. Shared Memory Architecture

Figure 1 - Shared Memory Architecture

In Shared Memory architecture, single memory is shared among many processors as show in Figure 1. As shown in the figure, several processors are connected through an interconnection network with Main memory and disk setup. Here interconnection network is usually a high speed network (may be Bus, Mesh, or Hypercube) which makes data sharing (transporting) easy among the various components (Processor, Memory, and Disk).

Advantages:

Simple implementation
Establishes effective communication between processors through single memory addresses space.
Above point leads to less communication overhead.

Disadvantages:

Higher degree of parallelism (more number of concurrent operations in different processors) cannot be achieved due to the reason that all the processors share the same interconnection network to connect with memory. This causes Bottleneck in interconnection network (Interference), especially in the case of Bus interconnection network.

Addition of processor would slow down the existing processors.

Cache-coherency should be maintained. That is, if any processor tries to read the data used or modified by other processors, then we need to ensure that the data is of latest version.

Degree of Parallelism is limited. More number of parallel processes might degrade the performance.

2. Shared Disk Architecture

Figure 2 - Shared Disk Architecture

In Shared Disk architecture, single disk or single disk setup is shared among all the available processors and also all the processors have their own private memories as shown in Figure 2.

Advantages:

Failure of any processors would not stop the entire system (Fault tolerance)
Interconnection to the memory is not a bottleneck. (It was bottleneck in Shared Memory architecture)
Support larger number of processors (when compared to Shared Memory architecture)

Disadvantages:

Interconnection to the disk is bottleneck as all processors share common disk setup.

Inter-processor communication is slow. The reason is, all the processors have their own memory. Hence, the communication between processors need reading of data from other processors’ memory which needs additional software support.

Example Real Time Shared Disk Implementation

DEC clusters (VMScluster) running Rdb

3. Shared Nothing Architecture

Figure 3 - Shared Nothing Architecture

In Shared Nothing architecture, every processor has its own memory and disk setup. This setup may be considered as set of individual computers connected through high speed interconnection network using regular network protocols and switches for example to share data between computers. (This architecture is used in the Distributed Database System). In Shared Nothing parallel database system implementation, we insist the use of similar nodes that are Homogenous systems. (In distributed database System we may use Heterogeneous nodes)

Advantages:

Number of processors used here is scalable. That is, the design is flexible to add more number of computers.
Unlike in other two architectures, only the data request which cannot be answered by local processors need to be forwarded through interconnection network.

Disadvantages:

Non-local disk accesses are costly. That is, if one server receives the request. If the required data not available, it must be routed to the server where the data is available. It is slightly complex.
Communication cost involved in transporting data among computers.

Example Real Time Shared Nothing Implementation

Teradata
Tandem
Oracle nCUBE

6 comments:

UnknownNovember 23, 2016 at 12:03 AM
Nice Article !
This is my pleasure to read your article.
Really this will help to people of Database Community.

I have also prepared one article about, What is Parallel Query Processing or Parallel Database System
You can also visit my article, your comments and reviews are most welcome.

http://www.dbrnd.com/2016/11/database-theory-what-is-parallel-query-processing-parallel-database-system/
ReplyDelete
Replies
Aayush NagarOctober 21, 2017 at 10:58 PM
Nice one
ReplyDelete
Replies
UnknownJune 2, 2018 at 9:40 PM
Very helpful..... Thanks
ReplyDelete
Replies
RajneeshApril 21, 2019 at 11:42 AM
awsm
ReplyDelete
Replies
Tin TrashFebruary 26, 2021 at 6:51 PM
Simple and easy to understand. Great work.
ReplyDelete
Replies

Add comment

Major links

Quicklinks

Saturday, February 22, 2014