Please visit, subscribe and share 10 Minutes Lectures in Computer Science

## Memory required for external sorting in DBMS

Question:
Let us suppose that you have a file with 150000 pages and 45 available buffer pages. If you use external sorting to sort the above said file, answer the following questions;
a) How many runs will you produce in the first pass?
b) To sort the entire file, how many passes do you need?
c) How much is involved in terms of I/O cost to sort the file?
d) If you like to sort the file in just two passes, then what is the number of buffer pages you need?

Solution:
a) If Nf is the number of file pages and Nb is the number of available buffer pages then the first pass requires Nf/Nb [ceiling(Nf/Nb)] sorted runs.
For our case,
Number of file pages, Nf = 150000
Number of available buffer pages, Nb = 45
The number of sorted runs in the first pass = ⌈150000/45⌉ = 3334 runs

b) The number of passes required to sort the entire file can be calculated using the equation ⌈logNb-1Nf/Nb⌉⌉+1.
Number of passes to sort complete file = ⌈log2⌈150000/45⌉⌉+1
= ⌈log23334⌉+1 = 13 passes

c) Since each page is read and written once per pass, the total number of page I/Os for sorting the file is 2 * Nf * (#passes):
Total I/O cost for sorting the file = 2 * 150000 * 13 = 39,00,000

d) In Pass 0, Nf/Nbruns are produced. In Pass 1, we must be able to merge this many runs; i.e., Nb − 1 ≥ Nf/Nb. This implies that Nb must at least be large enough to satisfy Nb * (Nb − 1) ≥ Nf; this can be used to guess at Nb, and the guess must be validated by checking the first inequality. Thus;
With 150000 pages in the file, Nb = 388 satisfies both inequalities, but Nb = 387 does not. So we need 388 buffer pages to sort the entire file in just two passes.
[Note: to find the value of Nb, factorize the equation Nb * (Nb-1) = Nf which can be written as a quadratic equation Nb2-Nb-Nf = 0. The equation to be solved is Nb2 - Nb - 8500 = 0]

*********

buffer pages required for external sorting
number of passes required to complete sorting of a file using external sorting
external sorting technique in database
number of runs required for each pass in external sorting
I/O cost involved in sorting the entire file