2PC Protocol (13) ACID (5) Advanced Database Questions (12) Anna University DBMS Questions (34) Anna University Exam Questions (64) Codd's 12 Rules (13) Concurrency Control (5) CS2255 DBMS Questions (11) Database Quizzes (27) Date's 12 Rules (11) Dependency Preservation (5) Distributed Database (44) Distributed Lock Manager (4) Distributed Transaction (2) ER Model (23) Fragmentation (3) Indexing (16) Keywords and Definitions (3) Normal Forms (15) Normalization (91) Object Databases (11) Parallel Database (16) Previous Year Questions (63) Query Processing (4) Real Time Database (22) Solved Exercises (15) Transaction Management (9)
Database Query Processing / Overview / Techniques / Evaluation of different operations
Database Query Processing
Measures of Query Cost / Different measures used in calculating the query cost / What are the different measures that are to be considered which calculating the query cost? / Query cost evaluation measures
Measures of Query Cost
In DBMS, the cost involved in executing a query can be measured by considering the number of different resources that are listed below;
- The number of disk accesses / the number of disk block transfers / the size of the table
- Time taken by CPU for executing the query
The time taken by CPU is negligible in most systems when compared with the number of disk accesses.
If we consider the number of block transfers as the main component in calculating the cost of a query, it would include more sub-components. Those are;
Rotational latency – time taken to bring and spin the required data under the read-write head of the disk.
Seek time – time taken to position the read-write head over the required track or cylinder.
Sequential I/O – reading data that are stored in contiguous blocks of the disk
Random I/O – reading data that are stored in different blocks that are not contiguous.
That is, the blocks might be stored in different tracks, or different cylinders, etc.
Whether read/write? – read takes less time, write takes more.
From these sub-components, we would list the components of a more accurate measure as follows;
- The number of seek operations performed
- The number of block read
- The number of blocks written
To get the final result, these numbers to be multiplied by the average time required to complete the task. Hence, it can be written as follows;
Query cost = (number of seek operations X average seek time) +
(number of blocks read X average transfer time for reading a block) +
(number of blocks written X average transfer time for writing a block)
Note: here, CPU cost and few other costs like cost of writing the final result are omitted.
Query Processing in DBMS / Steps involved in Query Processing in DBMS / How is a query gets processed in a Database Management System? / Query Processing overview / Database Query Processing
Query Processing would mean the entire process or activity which involves query translation into low level instructions, query optimization to save resources, cost estimation or evaluation of query, and extraction of data from the database.
Goal: To find an efficient Query Execution Plan for a given SQL query which would minimize the cost considerably, especially time.
Cost Factors: Disk accesses [which typically consumes time], read/write operations [which typically needs resources such as memory/RAM].
The major steps involved in query processing are depicted in the figure below;
|Figure 1 - Steps in Database Query Processing|
Let us discuss the whole process with an example. Let us consider the following two relations as the example tables for our discussion;
Employee(Eno, Ename, Phone)
Proj_Assigned(Eno, Proj_No, Role, DOP)
Eno is Employee number,
Ename is Employee name,
Proj_No is Project Number in which an employee is assigned,
Role is the role of an employee in a project,
DOP is duration of the project in months.
With this information, let us write a query to find the list of all employees who are working in a project which is more than 10 months old.
FROM Employee, Proj_Assigned
WHERE Employee.Eno = Proj_Assigned.Eno AND DOP > 10;
A query written in SQL is given as input to the query processor. For our case, let us consider the SQL query written above.
Step 1: Parsing
In this step, the parser of the query processor module checks the syntax of the query, the user’s privileges to execute the query, the table names and attribute names, etc. The correct table names, attribute names and the privilege of the users can be taken from the system catalog (data dictionary).
Step 2: Translation
If we have written a valid query, then it is converted from high level language SQL to low level instruction in Relational Algebra.
For example, our SQL query can be converted into a Relational Algebra equivalent as follows;
πEname(σDOP>10 Λ Employee.Eno=Proj_Assigned.Eno(Employee X Prof_Assigned))
Step 3: Optimizer
Optimizer uses the statistical data stored as part of data dictionary. The statistical data are information about the size of the table, the length of records, the indexes created on the table, etc. Optimizer also checks for the conditions and conditional attributes which are parts of the query.
Step 4: Execution Plan
A query can be expressed in many ways. The query processor module, at this stage, using the information collected in step 3 to find different relational algebra expressions that are equivalent and return the result of the one which we have written already.
For our example, the query written in Relational algebra can also be written as the one given below;
πEname(Employee ⋈Eno (σDOP>10 (Prof_Assigned)))
So far, we have got two execution plans. Only condition is that both plans should give the same result.
Step 5: Evaluation
Though we got many execution plans constructed through statistical data, though they return same result (obvious), they differ in terms of Time consumption to execute the query, or the Space required executing the query. Hence, it is mandatory choose one plan which obviously consumes less cost.
At this stage, we choose one execution plan of the several we have developed. This Execution plan accesses data from the database to give the final result.
In our example, the second plan may be good. In the first plan, we join two relations (costly operation) then apply the condition (conditions are considered as filters) on the joined relation. This consumes more time as well as space.
In the second plan, we filter one of the tables (Proj_Assigned) and the result is joined with the Employee table. This join may need to compare less number of records. Hence, the second plan is the best (with the information known, not always).
The final result is shown to the user.
The overall information discussed above are depicted in Figure 2 below;
|Figure 2 - Query Processing [Note: in Step 4, NJ means Natural Join]|