Showing posts with label Query Processing. Show all posts

Wednesday, June 19, 2024

Understanding SQL queries - Multiple Choice Questions

Understanding SQL queries / what is the result of this sql query / difference between distinct query, normal query and subquery / check the following statements and justify whether true or false

Question:

Consider the relation schemas of a database X given below. Primary keys of all the tables are underlined.

COUNTRY (cid, cname)

LOCATION (locid, countryid, lname, population)

DEPARTMENT (did, dname, noOfEmp, location, country, manager)

Consider the following queries;

A. SELECT sum(population) FROM Department D, Location L WHERE D.location=L.locid AND D.country=L.countryid;
B. SELECT sum(DISTINCT population) FROM Department D, Location L WHERE D.location=L.locid AND D.country=L.countryid;
C. SELECT sum(population) FROM Location WHERE (locid,countryid) IN (SELECT location, country FROM Department);

Use the queries A, B, and C and the database X given above, answer the following questions;

Check for each of the following statements if it is true or false. Justify your answers.

1. A and B always return the same result.
2. B and C always return the same result.
3. One of A, B, and C has a correlated sub-query.
4. One of the queries computes the total population in all countries.

Answer:

1. False. Query A counts every occurrence of the population value for each department, whereas query B eliminates duplicates and counts unique values alone (because of DISTINCT).

2. False. They may differ if two locations have the same population. In such cases, query B counts the population only once whereas query C counts each population value.

3. False. The only query that has a sub-query is query C which is not correlated. Correlated sub-queries are sub-queries that are dependent on the outer query and are executed for each row of the main query.

Query C is considered as regular sub-query (non-correlated) that is independent of the outer query and is evaluated only once before the main query runs.

5. False. None of them counts the population in locations with NO departments.

That is, for example, let us suppose we have location X. X has a location id, country id and population. If this location id and country id are not part of department table (ie., there is no department in X), then X’s population is not counted. This is TRUE for all queries.

***************

Keywords:

Find the difference between two SQL queries
Why two similar queries behave differently
What difference a sub-query and a distinct keyword make in execution of an SQL query?

Saturday, November 19, 2016

Parallel Query Processing - Examples

Parallel query execution - example SQL queries / Inter-query, Intra-query parallelism examples / Inter-operation and Intra-operation parallelism examples / Independent and Pipelined parallelism examples

Parallel Query Processing

Go back to PARALLEL DATABASES home page

Monday, October 6, 2014

Database Query Processing

Database Query Processing / Overview / Techniques / Evaluation of different operations

Database Query Processing

Overview of Query Processing / The steps involved in database query processing

Measures used in calculation of query cost

Different measures for calculating query cost

Measures of Query Cost / Different measures used in calculating the query cost / What are the different measures that are to be considered which calculating the query cost? / Query cost evaluation measures

Measures of Query Cost

In DBMS, the cost involved in executing a query can be measured by considering the number of different resources that are listed below;

The number of disk accesses / the number of disk block transfers / the size of the table

Time taken by CPU for executing the query

The time taken by CPU is negligible in most systems when compared with the number of disk accesses.

If we consider the number of block transfers as the main component in calculating the cost of a query, it would include more sub-components. Those are;

Rotational latency – time taken to bring and spin the required data under the read-write head of the disk.

Seek time – time taken to position the read-write head over the required track or cylinder.

Sequential I/O – reading data that are stored in contiguous blocks of the disk

Random I/O – reading data that are stored in different blocks that are not contiguous.

That is, the blocks might be stored in different tracks, or different cylinders, etc.

Whether read/write? – read takes less time, write takes more.

From these sub-components, we would list the components of a more accurate measure as follows;

The number of seek operations performed

The number of block read

The number of blocks written

To get the final result, these numbers to be multiplied by the average time required to complete the task. Hence, it can be written as follows;

Query cost = (number of seek operations X average seek time) +

(number of blocks read X average transfer time for reading a block) +

(number of blocks written X average transfer time for writing a block)

Note: here, CPU cost and few other costs like cost of writing the final result are omitted.

Thursday, September 25, 2014

Query Processing in DBMS

Query Processing in DBMS / Steps involved in Query Processing in DBMS / How is a query gets processed in a Database Management System? / Query Processing overview / Database Query Processing

Query Processing

Query Processing would mean the entire process or activity which involves query translation into low level instructions, query optimization to save resources, cost estimation or evaluation of query, and extraction of data from the database.

Goal: To find an efficient Query Execution Plan for a given SQL query which would minimize the cost considerably, especially time.

Cost Factors: Disk accesses [which typically consumes time], read/write operations [which typically needs resources such as memory/RAM].

The major steps involved in query processing are depicted in the figure below;

Figure 1 - Steps in Database Query Processing

Let us discuss the whole process with an example. Let us consider the following two relations as the example tables for our discussion;

Employee(Eno, Ename, Phone)

Proj_Assigned(Eno, Proj_No, Role, DOP)

where,

Eno is Employee number,

Ename is Employee name,

Proj_No is Project Number in which an employee is assigned,

Role is the role of an employee in a project,

DOP is duration of the project in months.

With this information, let us write a query to find the list of all employees who are working in a project which is more than 10 months old.

SELECT Ename

FROM Employee, Proj_Assigned

WHERE Employee.Eno = Proj_Assigned.Eno AND DOP > 10;

Input:

A query written in SQL is given as input to the query processor. For our case, let us consider the SQL query written above.

Step 1: Parsing

In this step, the parser of the query processor module checks the syntax of the query, the user’s privileges to execute the query, the table names and attribute names, etc. The correct table names, attribute names and the privilege of the users can be taken from the system catalog (data dictionary).

Step 2: Translation

If we have written a valid query, then it is converted from high level language SQL to low level instruction in Relational Algebra.

For example, our SQL query can be converted into a Relational Algebra equivalent as follows;

π_Ename(σ_{DOP>10
Λ Employee.Eno=Proj_Assigned.Eno}(Employee X Prof_Assigned))

Step 3: Optimizer

Optimizer uses the statistical data stored as part of data dictionary. The statistical data are information about the size of the table, the length of records, the indexes created on the table, etc. Optimizer also checks for the conditions and conditional attributes which are parts of the query.

Step 4: Execution Plan

A query can be expressed in many ways. The query processor module, at this stage, using the information collected in step 3 to find different relational algebra expressions that are equivalent and return the result of the one which we have written already.

For our example, the query written in Relational algebra can also be written as the one given below;

π_Ename(Employee ⋈_Eno (σ_DOP>10(Prof_Assigned)))

So far, we have got two execution plans. Only condition is that both plans should give the same result.

Step 5: Evaluation

Though we got many execution plans constructed through statistical data, though they return same result (obvious), they differ in terms of Time consumption to execute the query, or the Space required executing the query. Hence, it is mandatory choose one plan which obviously consumes less cost.

At this stage, we choose one execution plan of the several we have developed. This Execution plan accesses data from the database to give the final result.

In our example, the second plan may be good. In the first plan, we join two relations (costly operation) then apply the condition (conditions are considered as filters) on the joined relation. This consumes more time as well as space.

In the second plan, we filter one of the tables (Proj_Assigned) and the result is joined with the Employee table. This join may need to compare less number of records. Hence, the second plan is the best (with the information known, not always).

Output:

The final result is shown to the user.

The overall information discussed above are depicted in Figure 2 below;

Figure 2 - Query Processing [Note: in Step 4, NJ means Natural Join]

TOPICS (Click to Navigate)

Wednesday, June 19, 2024

Understanding SQL queries - Multiple Choice Questions

Understanding SQL queries / what is the result of this sql query / difference between distinct query, normal query and subquery / check the following statements and justify whether true or false

Question: