File organization in DBMS - Hash file organization

Friday, September 11, 2015

File organization in DBMS - Hash file organization

Hash File Organization / Advantages and Diadvantages

Hash File Organization

It is a file organization technique where a hash function is used to compute the address of a record. It uses the value of an attribute or set of attributes as input and gives the location (page/block/bucket) where the record can be stored.

For example, let us consider the following table Student;

*RegNo*	*SName*	*Gen*	*Phone*
1	Sundar	M	9898786756
3	Karthik	M	8798987867
4	John	M	7898886756
2	Ram	M	9897786772
5	Martin	M	9765430231
6	Rashmi	F	8976543990

A hash function is a function which maps the large set of values into smaller set of files/locations/values. Let us organize the above table using the phone attribute value as input for the hash function.

h(phone mod 10)

In the above hash function, phone is the phone attribute’s value of each record. 10 is the number of buckets/pages where we want to store our table. [10 buckets means bucket0, bucket1, …, bucket9].

For our example,

For 1^st record, h(9898786756 mod 10) = 6 ie., the first record has to be stored in 6^th bucket.

For 2^nd record, h(8798987867 mod 10) = 7 ie., the second record has be stored in 7^th bucket.

…

For 4^th record, h(7898886756 mod 10) = 6 ie., the fourth record has be stored in 6^th bucket [like 1^st]

For 5^th record, h(9765430231 mod 10) = 1 ie., the 5^th record has to be stored in 1^st bucket.

For last record, h(8976543990 mod 10) = 0 ie., the last record has to be stored in 0^th bucket.

Important points for consideration:

If bucket(s) is/are full, then overflow buckets can be used to store more records.
Hash function has to be chosen with extra care to avoid uneven distribution. That is, a bad hash function may assign more records to few buckets and less to others.
The attribute(s) that is frequently used for data manipulation can be chosen as the input for the hash function.
Same hash function that was used to store the records has to be used for deletion, modification or selection of records.
Two types of hashing : static and dynamic

How would we locate a bucket for inserting/deleting/updating/reading a record?

Let us assume that the following query is executed.

SEELCT * FROM Student WHERE phone = 8976543990;

For searching the record, we has to use the same hash function that we used for storing the records. Hence, h(8976543990 mod 10) = 0. And the result points to the 0^th bucket. It actually gives us the quick access to the required record.

Advantages

Quick access to records in terms of selection. [If queried on the attribute that was used for hashing]

Easy to insert, delete, or update a record.

Disadvantages

Records are randomly stored in scattered locations. May waste a lot of space in case of small files.

For queries that involve ranges, hash file organization is not efficient. [eg. SELECT * FROM Emp WHERE Salary BETWEEN 10000 AND 25000]

If querying attribute is not the hashed attribute, you may need to scan the entire table for retrieval.

Frequent update to the hashed column results in movement of data between buckets which actually affects the system performance.

Go to File Organization

1 comment:

UnknownMay 24, 2018 at 4:44 AM
Excellent post
ReplyDelete
Replies

Add comment

Major links

Quicklinks

Friday, September 11, 2015