TOPICS (Click to Navigate)
- Advanced Database Concepts
- Data structures, Operating Systems
- Natural Language Processing
- Quiz Questions and Answers
- DBMS, ADBMS Question Bank
- SQL
- RDBMS Exam and Interview Questions
- Parallel Databases
- ADBMS Quizzes
- Advanced DBMS Concepts
- Distributed Databases
- Modern Databases - Special Purpose Databases
- Object Based Database Systems
Showing posts with label Query Processing. Show all posts
Showing posts with label Query Processing. Show all posts
Saturday, November 19, 2016
Monday, October 6, 2014
Database Query Processing
Database Query Processing / Overview / Techniques / Evaluation of different operations
Database Query Processing
Different measures for calculating query cost
Measures of Query Cost / Different measures used in calculating the query cost / What are the different measures that are to be considered which calculating the query cost? / Query cost evaluation measures
Measures of Query Cost
In
DBMS, the cost involved in executing a query can be measured by considering the
number of different resources that are listed below;
- The number of disk accesses / the number of disk block transfers / the size of the table
- Time taken by CPU for executing the query
The
time taken by CPU
is negligible in most systems when compared with the number of disk
accesses.
If
we consider the number of block transfers as the main component in calculating
the cost of a query, it would include more sub-components. Those are;
Rotational
latency – time taken to bring and spin the required
data under the read-write head of the disk.
Seek
time – time taken to position the read-write head over
the required track or cylinder.
Sequential
I/O – reading data that are stored in contiguous
blocks of the disk
Random
I/O – reading data that are stored in different
blocks that are not contiguous.
That
is, the blocks might be stored in different tracks, or different cylinders,
etc.
Whether
read/write? – read takes less time, write takes more.
From
these sub-components, we would list the components of a more accurate measure
as follows;
- The number of seek operations performed
- The number of block read
- The number of blocks written
To
get the final result, these numbers to be multiplied by the average time
required to complete the task. Hence, it can be written as follows;
Query
cost = (number of seek operations X average seek time) +
(number
of blocks read X average transfer time for reading a block) +
(number
of blocks written X average transfer time for writing a block)
Note:
here, CPU cost and few other costs like cost of writing the final result are
omitted.
Thursday, September 25, 2014
Query Processing in DBMS
Query Processing in DBMS / Steps involved in Query Processing in DBMS / How is a query gets processed in a Database Management System? / Query Processing overview / Database Query Processing
Query Processing
Query Processing would mean the
entire process or activity which involves query translation into low level
instructions, query optimization to save resources, cost estimation or
evaluation of query, and extraction of data from the database.
Goal: To find an efficient Query
Execution Plan for a given SQL query which would minimize the cost
considerably, especially time.
Cost Factors: Disk accesses
[which typically consumes time], read/write operations [which typically needs
resources such as memory/RAM].
The major steps involved in query
processing are depicted in the figure below;
Figure 1 - Steps in Database Query Processing |
Let us discuss the whole process
with an example. Let us consider the following two relations as the example
tables for our discussion;
Employee(Eno,
Ename, Phone)
Proj_Assigned(Eno,
Proj_No, Role, DOP)
where,
Eno is
Employee number,
Ename is
Employee name,
Proj_No is
Project Number in which an employee is assigned,
Role is the
role of an employee in a project,
DOP is
duration of the project in months.
With this information, let us
write a query to find the list of all employees who are working in a project
which is more than 10 months old.
SELECT
Ename
FROM
Employee, Proj_Assigned
WHERE
Employee.Eno = Proj_Assigned.Eno AND DOP > 10;
Input:
A query written in SQL is given
as input to the query processor. For our case, let us consider the SQL query
written above.
Step 1: Parsing
In this step, the parser of the
query processor module checks the syntax of the query, the user’s privileges to
execute the query, the table names and attribute names, etc. The correct table
names, attribute names and the privilege of the users can be taken from the
system catalog (data dictionary).
Step 2: Translation
If we have written a valid query,
then it is converted from high level language SQL to low level instruction in
Relational Algebra.
For example, our SQL query can be
converted into a Relational Algebra equivalent as follows;
πEname(σDOP>10
Λ Employee.Eno=Proj_Assigned.Eno(Employee X Prof_Assigned))
Step 3: Optimizer
Optimizer uses the statistical
data stored as part of data dictionary. The statistical data are information
about the size of the table, the length of records, the indexes created on the
table, etc. Optimizer also checks for the conditions and conditional attributes
which are parts of the query.
Step 4: Execution Plan
A query can be expressed in many
ways. The query processor module, at this stage, using the information
collected in step 3 to find different relational algebra expressions that are equivalent
and return the result of the one which we have written already.
For our example, the query
written in Relational algebra can also be written as the one given below;
πEname(Employee
⋈Eno (σDOP>10 (Prof_Assigned)))
So far, we have got two execution
plans. Only condition is that both plans should give the same result.
Step 5: Evaluation
Though we got many execution
plans constructed through statistical data, though they return same result
(obvious), they differ in terms of Time consumption to execute the query, or
the Space required executing the query. Hence, it is mandatory choose one plan
which obviously consumes less cost.
At this stage, we choose one
execution plan of the several we have developed. This Execution plan accesses
data from the database to give the final result.
In our example, the second plan
may be good. In the first plan, we join two relations (costly operation) then
apply the condition (conditions are considered as filters) on the joined
relation. This consumes more time as well as space.
In the second plan, we filter one
of the tables (Proj_Assigned) and the result is joined with the Employee table.
This join may need to compare less number of records. Hence, the second plan is
the best (with the information known, not always).
Output:
The final result is shown to the
user.
The overall information discussed above are depicted in Figure 2 below;
Figure 2 - Query Processing [Note: in Step 4, NJ means Natural Join] |
Subscribe to:
Posts (Atom)
Featured Content
Multiple choice questions in Natural Language Processing Home
MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...
All time most popular contents
-
Relational algebra in database management systems solved exercise Relational algebra – solved exercise Question: Consider the fo...
-
Bigram Trigram and NGram in NLP, How to calculate the unigram, bigram, trigram, and ngram probabilities of a sentence? Maximum likelihood...
-
Top 5 Machine Learning Quiz Questions with Answers explanation, Interview questions on machine learning, quiz questions for data scienti...
-
Solved Exercises - Reducing Entity Relationship Diagram into Tables, Convert ER diagram to tables, relational schemas, ER model to relat...
-
Bigram probability estimate of a word sequence, Probability estimation for a sentence using Bigram language model Bigram Model - Probab...