Indexing and Tuning - solved exercises - Indexing examples - Tuning examples - How to choose an appropriate index?
Question:
Consider a relation Customer of a
large super market chain with the relational schema Customer (CID, CName, City,
PointsCollected). Here CID is the customer identification number that stores unique values.
The following query is the most
frequent one;
SELECT * FROM Customer
WHERE CID = v1;
Assume that the relation Customer does
not have any indexes. We need to create necessary indexes. Which of the following indexes is
best suited for executing the above query efficiently? Why?
(a) A sparse primary index on CID (b) A dense primary index on CID
Solution:
Definition: Primary index can be sparse (only few
values of indexing column of data file is stored sparsely in index table along
with pointer in sorted order) or dense (all the values of indexing column of
the data file is stored in index table along with pointer in sorted order).
The query given above is called point
queries. Point query is searching for records that satisfies a condition of the
form “attribute_name = value”.
To explain further, let us use the
given query with a value as follows;
SELECT * FROM Customer
WHERE CID = 123;
This query, according to the
information provided in the question, will give exactly one record.
Among sparse index and dense index, the dense index is most suitable index for
CID attribute. Let us discuss this with the above query.
The following will happen,
If we use sparse index –
the system will look for the matching CID value in the index file. [As index
file is always sorted, we can skip further scan once we reach the required
record]. If we find the matching record,
1. we can read the pointer (address)
stored adjacent and
2. locate the actual record (or the block
where the record is stored) in the data file directly using the address.
If the matching value is not found in
the index file (because it is sparse),
1. we have to find the largest CID
value that is smaller than 123,
2. use the pointer of that largest CID
value, and
3. locate the record (or the block
where the record is stored) of largest CID in the data file directly using the
address, and
4. perform linear scan until you reach
the required value/record.
If we use dense index -
the system will look for the matching CID value in the index file. [As index
file is always sorted, we can skip further scan once we reach the required
record]. If we find the matching record,
1. we can read the pointer (address)
stored adjacent and
2. locate the actual record (or the block
where the record is stored) in the data file directly using the address.
The difference between these two is,
dense index stores all the values in index file so linear scan in index file is
enough to find any record, whereas sparse index stores few records than dense index
in the index file so linear scan is required both in index file and in data
file (in many cases).
Hence, dense primary index is the best suited
for executing the above query.
Go back to the Indexing and Tuning –Solved exercises page
c