Tuesday, October 7, 2014

Two mark questions in Distributed Database







Two mark questions with answers in Distributed database / Important two mark questions and answers in distributed database


Two mark questions (To know more, please click the links)

 

  • List the fragmentation types.
Vertical fragmentation
Hybrid fragmentation
  • Give the different strategies used for placement of data in distributed database.
What are the advantages of allocation of fragments?
Locality of reference – data are close where they are required
Improved reliability and availability – data are highly available
Balanced storage and cost – simple hard disks are fine. Cost effective.
Minimal communication cost - Communication cost is less
Completeness
Reconstruction
Dis-jointness
Loss of message
Communication link failure
Site failure.
  • Who is transaction coordinator?

In distributed database, the site in which a transaction is initiated is completely responsible for finishing the transaction. This is done by the Transaction Manager of that particular site. That site’s Transaction Manager is called as transaction coordinator.
  • Define replication

Creating a copy of existing data or table is called replication. It increases availability at the cost of redundancy.
  • List down some advantages of DDBMSs.

Reflects organizational structure
Improved ability to share and local autonomy
Improved availability
Improved reliability
Improved performance
Economy in deploying software and hardware resources
Modular growth of the system
Integration
  • Name some disadvantages of fragmentation.

Performance degradation in case of larger applications which handles required data from several sites.
Integrity control may be more difficult as the data are stored at multiple sites.

Primary Horizontal fragmentation is a fragmentation technique which fragments (partitions) a single relation (table) using a set of min-term fragments.
  • List down the steps involved in Primary horizontal fragmentation.

Identify set of simple predicates.
Construct set of min-term predicates using simple predicates.
Eliminate redundant or unnecessary min-term predicates.
Use all valid min-term predicates to create fragments.
It is a horizontal fragmentation process where a table is fragmented based on the fragments that are created by horizontal fragmentation process of another table (parent table – think about primary key foreign key relationship).
  • Define fragmentation transparency.

The end users of a distributed database system need not know about how the database is fragmented. Hiding the information about how a table is fragmented from the end user is called fragmentation transparency.
  • Who is the transaction coordinator in distributed transactions?

The module of a distributed site which coordinates the transactions that are initiated at that site is called transaction coordinator. Let us assume that you have configured a distributed database system where we have sites A, B, C, and D containing different or same data. Any transaction requested at one of the sites say A, and in need of data available at sites B, and C, the transaction coordinator of site A is responsible to process the requested data at other sites.
  • What are the roles of transaction manager?

Maintaining a log for recovery purposes
Coordinating the concurrent transactions that are executed at that site.
  • List down the roles of transaction coordinator.

Starting the execution of a transaction initiated at that site.
Breaking the transaction into a set of sub-transactions if needed.
Distributing the sub-transactions to different sites.
Coordinating the completion of the transaction.
  • What are the messages used by 2 Phase Commit protocol?

<prepare T>
<no T>
<ready T>
<abort T>
<commit T>
  • What are the two phases of 2PC protocol?

Phase 1: Obtaining the decision – whether to commit or abort a transaction T.
Phase 2: Recording the decision – implement the decision taken in phase 1.

Monday, October 6, 2014

Techniques to optimize disk block accesses

Various techniques to optimize disk-block accesses / What are the techniques used for optimizing disk block accesses? / Overview of optimization of disk-block access


Techniques to optimize disk-block accesses

A block is a contiguous sequence of sectors in a single track of one platter of a hard disk. The data that are stored in the hard disk are stored in disk blocks. Based on the size the data may be stored in one or several blocks. Reading or writing data by specifying exact address is a difficult task. Hence, the operating systems perform reading or writing through the use of disk blocks. One reason is, the blocks are fixed in size and that size is fixed for an operating system. Hence, transferring blocks from disk or to disk is easier than specifying exact location.
The access to data on disk is several orders slower than the access to the data stored in RAM. Though the access is slow in case of transferring data from disk, we would use several techniques to boost that up a little faster.
The techniques are,
Buffering of blocks in memory (memory means RAM, main memory hereafter) – the major goal of a DBMS is to minimize the number of disk block transfers between the memory and the disk. Buffering involves allocation of memory space in the main memory to hold some of the blocks which can be retained for future use in case if that blocks are needed frequently. We want to do this for as many blocks as possible. There are several ideas to choose the important or frequently needed blocks. These are done by the Buffer Manager.
Scheduling – it is about transferring the data in the order in which they are stored. The idea is to reduce the disk-arm movement. For example, let us suppose that there are several block requests initiated. And at that time the disk read-write head is exactly over the innermost track of a disk platter. Now the idea is to move disk arm towards the outermost track so that we can transfer the required blocks on the way instead of accessing back and forth. Assume that we have tracks 1 to 100, where 1st track is the innermost and the 100th is the outermost. Assume that the requests made for data stored at tracks 7, 25, 50, 75 in the order 7, 50, 75, 25. If we consider that the disk read-write head is at the innermost track, then it can move toward the outermost by accessing track 7 followed by 25, 50 and 75. [This is actually against the actual request. That is request for tracks 50 and 75 are generated before 25. But all the requests were generated while the disk arm is at the innermost track]. This actually reduces the disk arm movement.
File Organization – The data files can be stored in a contiguous manner to access them easier. For example, there are several disk platters in a hard disk which are arranged one over the other. If we are able to store a file in the same track of adjacent platters, we may need to move disk arm just once to read the whole file from the disk. This type of file organization is followed by many systems. For example, the utility Disk Defragmentation provided in Windows is used for moving the contiguous blocks of file closer or to form a sequence.
Nonvolatile Write Buffers – main memory is volatile, ie., losses data due to power failure. For this case, we would use a Non-volatile Random Access Memory (NV-RAM) which is backed by the battery power. The pending buffer writes due to a system crash or sudden power failure can be easily handled using this technique.




Database Query Processing


Database Query Processing / Overview / Techniques / Evaluation of different operations

Database Query Processing

Different measures for calculating query cost

Measures of Query Cost / Different measures used in calculating the query cost / What are the different measures that are to be considered which calculating the query cost? / Query cost evaluation measures

Measures of Query Cost


In DBMS, the cost involved in executing a query can be measured by considering the number of different resources that are listed below;
  • The number of disk accesses / the number of disk block transfers / the size of the table
  • Time taken by CPU for executing the query

The time taken by CPU is negligible in most systems when compared with the number of disk accesses.

If we consider the number of block transfers as the main component in calculating the cost of a query, it would include more sub-components. Those are;
Rotational latency – time taken to bring and spin the required data under the read-write head of the disk.
Seek time – time taken to position the read-write head over the required track or cylinder.
Sequential I/O – reading data that are stored in contiguous blocks of the disk
Random I/O – reading data that are stored in different blocks that are not contiguous.

That is, the blocks might be stored in different tracks, or different cylinders, etc.
Whether read/write? – read takes less time, write takes more.

From these sub-components, we would list the components of a more accurate measure as follows;
  • The number of seek operations performed
  • The number of block read
  • The number of blocks written
To get the final result, these numbers to be multiplied by the average time required to complete the task. Hence, it can be written as follows;

Query cost = (number of seek operations X average seek time) +

(number of blocks read X average transfer time for reading a block) +

(number of blocks written X average transfer time for writing a block)

Note: here, CPU cost and few other costs like cost of writing the final result are omitted.




Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents