Please visit, subscribe and share 10 Minutes Lectures in Computer Science
Showing posts with label File organization. Show all posts
Showing posts with label File organization. Show all posts

## Hash File Organization / Advantages and Diadvantages

### Hash File Organization

It is a file organization technique where a hash function is used to compute the address of a record. It uses the value of an attribute or set of attributes as input and gives the location (page/block/bucket) where the record can be stored.

For example, let us consider the following table Student;

 RegNo SName Gen Phone 1 Sundar M 9898786756 3 Karthik M 8798987867 4 John M 7898886756 2 Ram M 9897786772 5 Martin M 9765430231 6 Rashmi F 8976543990

A hash function is a function which maps the large set of values into smaller set of files/locations/values. Let us organize the above table using the phone attribute value as input for the hash function.
h(phone mod 10)
In the above hash function, phone is the phone attribute’s value of each record. 10 is the number of buckets/pages where we want to store our table. [10 buckets means bucket0, bucket1, …, bucket9].
For our example,
For 1st record, h(9898786756 mod 10)  = 6 ie., the first record has to be stored in 6th bucket.
For 2nd record, h(8798987867 mod 10) = 7 ie., the second record has be stored in 7th bucket.
For 4th record, h(7898886756 mod 10) = 6 ie., the fourth record has be stored in 6th bucket [like 1st]
For 5th record, h(9765430231 mod 10) = 1 ie., the 5th record has to be stored in 1st bucket.
For last record, h(8976543990 mod 10) = 0 ie., the last record has to be stored in 0th bucket.

### Important points for consideration:

• If bucket(s) is/are full, then overflow buckets can be used to store more records.
• Hash function has to be chosen with extra care to avoid uneven distribution. That is, a bad hash function may assign more records to few buckets and less to others.
• The attribute(s) that is frequently used for data manipulation can be chosen as the input for the hash function.
• Same hash function that was used to store the records has to be used for deletion, modification or selection of records.
• Two types of hashing : static and dynamic

### How would we locate a bucket for inserting/deleting/updating/reading a record?

Let us assume that the following query is executed.

SEELCT * FROM Student WHERE phone = 8976543990;

For searching the record, we has to use the same hash function that we used for storing the records. Hence, h(8976543990 mod 10) = 0. And the result points to the 0th bucket. It actually gives us the quick access to the required record.

• Quick access to records in terms of selection. [If queried on the attribute that was used for hashing]
• Easy to insert, delete, or update a record.

• Records are randomly stored in scattered locations. May waste a lot of space in case of small files.
• For queries that involve ranges, hash file organization is not efficient. [eg. SELECT * FROM Emp WHERE Salary BETWEEN 10000 AND 25000]
• If querying attribute is not the hashed attribute, you may need to scan the entire table for retrieval.
• Frequent update to the hashed column results in movement of data between buckets which actually affects the system performance.

## Sequential File Organization (Sorted file organization)

In this file organization, records are sorted on an attribute(s) values and stored physically in the disk in that sorted order. This kind of file organization will speed up the retrieval of data especially when queried on the sorting attributes.
Ex. Assume a relation Emp with schema Emp(Eno, Ename, Salary, Dept). The following query can be answered quickly (possibly) if we have arranged the Emp file using sorted file organization on Eno attribute.
SELECT * FROM Emp WHERE Eno = ‘110’;
In this file organization, records of a table are chained together using pointers in the search key order and stored in that order physically.

### How do we insert a record?

• Insert the record at the end of file.
• Find the previous record on the sorting key value and reset the pointer of that record to point to the new record, and insert the previous record’s pointer in the pointer field of new record.
• All the other records’ pointers should be altered as well.
• Reorganize all the records periodically to arrange the records in the sorting order of the ordering attribute.
Refer the example given below.

### How to delete a record?

• Locate the record that is to be deleted.
• Replace the previous record’s pointer value with the pointer of the record to be deleted.
• Update all the other records' pointers (if needed).

• Retrieval of records become efficient if the query uses the sorting attribute as the search key.
• Sorting of records on the ordering field is fast. [No sorting is required externally]

• Insertion and deletion of records are expensive.
• Updating the sorting attribute values of the records is also expensive.
• Retrieval of records on the non-ordering attributes is not easy.

## Heap file organization / Unordered file organization

### Heap File Organization

• Simplest type of file organization.
• Also called as, unordered file organization.
• To make it simple, new records are normally inserted at the end of the file. If the last page is full, then the new record can go into the next block.
• No order is required in storing the records.
• Typically single file is maintained for every database table.

### Operations that are supported:

• Insert records – new records can be inserted at the last page of the file. For efficient insertion, we may keep track of free space in the file pages, and insert new records in the available space as well.
• Delete records – in a database table, all the records are given unique row IDs. We can find the block in which the record that is to be deleted is stored, and then we can delete the record. Frequent deletion of records would create much free space in a file, and this need periodical reorganization of the file.
• Scan records/file – the row IDs would be used to locate and read a record from a file. Linear search is important to search for a record [also for locating a record that is to be deleted].

• Simple file organization.
• Insertion becomes easy.
• Best method for bulk loading data into a relation.

• As a heap file uses no particular ordering, we are able to locate the required record using linear search only. It would take more time and make retrieval slow.
• Frequent deletions of records would need periodical file reorganization.
• Inefficient for larger database tables.
• To boost the performance, proper memory management is required. This is the extra overhead in handling records.

## What is file organization / Different file organization techniques / Heap file organization / Sequential file organization / Hash file organization

### File organization

Tables in databases stored as files in the disk. Though the tables are treated as files, it actually consists of set of records. The issue is about how we organize the records in a file so as to handle them in the future easily.

 File organization –      physical arrangement of data in a file into records and pages;                                     arrangement of records in a file when the file is stored in a disk;

There are different ways to organize records in a file. They are;

### Heap file organization (unordered files): (Click to navigate)

• Records are stored in random order (no particular order on any attribute values).
• Records can be stored wherever the space available in a file, that is, a record can be appended at the end of the file or can be inserted in the middle is space available.
• Simplest file organization.

### Sequential file organization (ordered files / sorted files): (Click to navigate)

• Records are ordered and stored according to the value of an attribute(s) in a file.
• Records are arranged such that the values of the ordering field are in ascending order.
• It is slightly complex file organization as we need to arrange the data in  sorted order for every insert and update operation.

### Hash file organization (hashed files)  (Click to navigate)

• Records are stored in a file according to a hash function.
• A hash function, say h(value), accepts the value of the attribute(s) and assigns the records to several blocks.
• Very useful file organization if you need a quick access.

## Featured Content

### Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

data recovery