Computer Science and Engineering - Tutorials, Notes, MCQs, Questions and Answers

Sunday, May 4, 2014

CS2255 Database Management Systems question paper

CS2255 Database Management Systems question paper - Nov/Dec 2010 / Anna University Previous Year Exam Questions / Anna University Previous Year Computer Science and Information Technology Question Papers

B.E./B.Tech. DEGREE EXAMINATION, NOVEMBER/DECEMBER 2010

Fourth Semester

Computer Science and Engineering

CS 2255 — DATABASE MANAGEMENT SYSTEMS

(Common to Information Technology)

(Regulation 2008)

Time : Three hours Maximum : 100 Marks

Answer ALL questions

PART A — (10 × 2 = 20 Marks)

1. Define the two levels of data independence.

2. Write down any two major responsibilities of a database administrator.
3. List out the various relational algebra operators.
4. What are the four broad categories of constraints?
5. Define irreducible sets of dependencies.
6. Define the third normal form.
7. What are ACID properties?
8. What are the three kinds of intent locks?
9. Which are the factors to be considered for the evaluation of indexing and
hashing techniques?
10. What is the drawback of flash memory?

PART B — (5 × 16 = 80 Marks)

11. (a) Explain the three different groups of data models with examples. (16)

OR

(b) Describe the components of entity-relationship diagram with suitable examples. (16)
12. (a) Describe the features of Embedded SQL and Dynamic SQL. Give suitable examples. (16)

OR

(b) Write short notes on the following :
(i) Mandatory access control. (9)
(ii) Missing information. (7)
13. (a) Explain non loss decomposition and functional dependencies with
suitable example. (16)

OR

(b) Discuss Join Dependencies and Fifth Normal Form, and explain why
5NF? (16)
14. (a) (i) State the Two-Phase Commit protocol. Discuss the implications of a
failure of the coordinator and some participants. (10)
(ii) Briefly explain transaction recovery with primitive operations. (6)

OR

(b) (i) State and explain the three concurrency problems. (9)
(ii) What is meant by isolation level and define the five different
isolation levels. (7)
15. (a) (i) Discuss the improvement of reliability and performance of RAID (8)
(ii) Explain the structure of a B+-tree. (8)

OR

(b) Explain the complex selection predicates with example. (16)

_____________________________

Process of recovery from deadlock in database

Recovery from Deadlock in Database:

How can we recover from deadlock? / Rollback in deadlock / Guidelines to choose deadlock victim

Refer here for Deadlock detection.

After detecting the deadlock, the next important step a system has to take is to recover the system from the deadlock state. This can be achieved through rolling back one or more transactions. But the difficult part is to choose one or more transactions as victims.

Recovery from deadlock can be done in three steps;

1. Selection of victim: Given a set of deadlocked transactions (transactions that formed cycles), we have to identify the transaction or transactions that are to be rolled-back for successful relief from deadlock state. This is done by identifying the transaction or transactions that are cost minimum. This would mean many things. The following guidelines would help us.

Guidelines to choose victim:

The length of the transaction – We need to choose the transaction which is younger.

The data items used by the transaction – The transactions that are used less number of data items.

The data items that are to be locked – The transaction that needs to lock many more data items compared to that are already locked.

How many transactions to be rolled back? – We need to choose transaction or transactions that would cause less number of other transactions to be rolled back (cascading rollback)

2. Rollback: Once we have identified the transaction or transactions that are to be rolled back, then rollback them. This can be done in two ways;

Full rollback – Simplest of two. Rollback the transaction up to its starting point. That is, abort the transaction and restart. It will also abort other transactions that have used the data item changed by the rolled back transaction.

Partial rollback – it might be the effective one, if the system maintains additional information like the order of lock requests, save points etc.

3. Starvation: We must be careful enough to avoid a transaction or transactions to be chosen repeatedly as victims.

Wednesday, April 30, 2014

Parallel execution of Duplicate elimination, Projection, and Aggregation operations

How do we perform Duplicate data elimination, Projection, and Aggregation in parallel in parallel database system?

Assume that the table is partitioned and stored in disks D0, D1, …, Dn-1 with processors P0, P1, …, Pn-1. For example, consider the following figure where the table Employee is partitioned used Round-robin partitioning technique.

Figure 1 - Partitioning Employee table

Duplication elimination:

Duplicate elimination is about removing duplicate values or neglecting repeated values that are stored in an attribute for various reasons; For example, in one or all of the following situations we need duplicate elimination;

When we would like to count the number of unique values present in a table under a particular attribute.

When we need to retrieve the values and present only the unique values of an attribute.

When we need to join two tables, etc.

The main approach used for duplicate elimination is by sorting the data on the attribute where in the duplicate values to be removed.

Duplicate elimination can be achieved in the following two ways in parallel database;

1. During parallel sort, if we find any repeated values while partitioning, those can be discarded immediately. (This method is for tables that are not partitioned). For example, in Figure 1, while you partition the data, you send tuples to different partitions based on the partition attribute and conditions. Suppose that we are partitioning on Salary attribute. During the process, if you find any of the repeated values which are already send into a partition, you can discard those values if they repeat. In our example, salary value 5000 occurs in two records. Hence, one can be accepted and the other can be discarded.

2. We can partition the table into many partitions (using range or hash partitioning), and instruct all the processors to sort the data locally and remove the duplicates. (This works only for the data that are hash or range partitioned on the duplicate elimination attribute)

Projection:

Projection means selection of one or more attributes from a table with the records stored in them. This operation can be parallelized as follows;

1. Projection without duplicate elimination: while you read data into various disks during partitioning, you can project the required columns.

2. Projection with duplicate elimination: any of the techniques suggested in Duplicate elimination section above can be used.

Aggregation:

Aggregation operation involves finding the count of records, sum of values stored in an attribute, minimum or maximum value of all the values stored in an attribute, and average value of all the attribute values. This operation basically needs grouping. That is, for example, we can find the sum of salary for all the records of Employee table, or we can find sum of salary with some filter conditions. In the first case, all the records of Employee table come under one group. In the later case, we choose the group based on the conditions included.

Hash Partitioning and Range Partitioning in Oracle

Hash Partitioning in Oracle:

How does Oracle manage Hash partitioning / How does Oracle manage Range Partitioning / Range and Hash Partition Oracle example

CREATE TABLE employees (

id INT NOT NULL,

fname VARCHAR(30),

lname VARCHAR(30),

job_code INT,

store_id INT

)

PARTITION BY HASH(store_id)

PARTITIONS 4;

The above CREATE TABLE statement will create the table employees by partitioning the records based on the Store_id attribute. Also, PARTITIONS clause defines the number of partitions, in the above example it is 4. If you don’t include PARTITIONS clause, then it takes the default value which is 1.

Range Partitioning in Oracle

CREATE TABLE sales_range

(salesman_id  NUMBER(5),

salesman_name VARCHAR2(30),

sales_amount  NUMBER(10),

sales_date    DATE)

PARTITION BY RANGE(sales_date)

PARTITION sales_jan2000 VALUES LESS THAN(TO_DATE('02/01/2000','DD/MM/YYYY')),

PARTITION sales_feb2000 VALUES LESS THAN(TO_DATE('03/01/2000','DD/MM/YYYY')),

PARTITION sales_mar2000 VALUES LESS THAN(TO_DATE('04/01/2000','DD/MM/YYYY')),

PARTITION sales_apr2000 VALUES LESS THAN(TO_DATE('05/01/2000','DD/MM/YYYY'))

);

The above statement uses the sales_date attribute as the partitioning attribute to partition the table into monthly partitions. Using the PARTITION partitionname VALUES LESS THAN() clause, it mentions the number of partitions along with unique partition names.

CREATE TABLE employees (

    id INT NOT NULL,

    fname VARCHAR(30),

    lname VARCHAR(30),

    job_code INT NOT NULL,

    store_id INT NOT NULL

PARTITION BY RANGE (store_id) (

    PARTITION p0 VALUES LESS THAN (6),

    PARTITION p1 VALUES LESS THAN (11),

    PARTITION p2 VALUES LESS THAN (16),

    PARTITION p3 VALUES LESS THAN (21)

);

In this example, the table is partitioned into 4 partitions on store_id attribute. Here, the first partition P0 stores the records with store_id value less that 6, i.e, 1 to 5, p1 in the range 6 to 10, and so on.

These examples taken from Oracle online documentation.
You can refer to the theory and discussion on various data partitioning techniques here.

Tuesday, April 29, 2014

Parallelizing the SELECTION operation

How would we parallelize the SELECTION operation in an SQL query?

Assume that the table is partitioned and stored in disks D0, D1, …, Dn-1 with processors P0, P1, …, Pn-1. For example, consider the following figure where the table Employee is partitioned using Round-robin partitioning technique.

Figure 1 - Partitioned table Employee

Recall the types of queries based on data access. Basically we have three types;

Point queries – In point queries, the WHERE conditions are specified like “attribute_name = value” format.

Range queries – In range queries, the WHERE conditions are specified like “attribute_name >= value AND attribute_name <= value”, or “attribute_name BETWEEN value1 and value 2”.

Scanning the entire relation – In queries where the WHERE condition involves non-key attributes, then the system has to look for all the data in the specified table. This is called the relation scan.

With this information let us consider the following general syntax of any SQL select query;

SELECT list_of_attributes FROM table_name WHERE condition;

This query can be parallelized based on the condition given in the WHERE clause.

CASE 1: If the condition is of the form “a = value”, then;

If the given table is partitioned on a, then we need to execute the selection operation in only single processor where in the given attribute is partitioned.

CASE 2: If the condition is of the form “value1 <= a <= value2” (i.e, a falls in a range), then;

If the given relation is range partitioned on a, then we need to perform selection at all the processors wherever the range overlaps with the given range value1 to value2.

CASE 3: In all the other cases, the selection performed in parallel in all the processors. That is, for all the other type of queries, the selection can be done in parallel in all the processors, because, the data are partitioned over several processors.

As an example for CASE 3, let us assume a query which requests information from a table with a non-key attribute in the WHERE clause condition. Refer the Employee table in Figure 1.

SELECT * FROM emp WHERE ename = ‘Kumar’;

The query involves a condition on name attribute and we don’t create an index or key on name attribute as there are more such names. Hence, this query needs to examine all records of Employee table to give the final result. So, we can distribute the query to all the processors wherever Employee table's partitions are stored, and execute the query in parallel on all the processors. According to the example in Figure 1, Processors P0, P1, and P2 have to execute the query locally, even though the data available at D0 alone. The final result is generated from the local results of these processors. This way the parallelism for SELECTION operation can be achieved.

The same (CASE 3) is applicable for other type of actions like,

If the table is partitioned based on an attribute other than the WHERE condition attribute,

If the table is range partitioned on an attribute other than the range attribute specified in the query,

If the table is partitioned on one attribute and the query involves one more attribute with and condition, etc.

Point to note:

According to CASE 3, it means, we can achieve parallelism by exploiting all the processors.

According to CASE 1, only one processor has to execute and it is known initially. That means, the other processors can effectively be used for executing other queries at that time.

According to CASE 2, only few processors where the range of values stored need to execute. The other processors can be utilized for executing other queries.

TOPICS (Click to Navigate)