Sunday, February 2, 2014

First Normal Form (1NF)

First normal form, how to normalize an unnormalized table into 1NF table, problem with non-atomic attributes, convert an UNF relation into 1NF relation


Prev - Normalization                                                                     Next - Functional Dependency (FD) 

First Normal Form (1NF) with example

Normal Forms

The idea is to organize the attributes in any tables to have reduced redundancy and dependency. In the process of Normalization we have the normal forms, First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), Boyce-Codd Normal Form (BCNF). Here 1NF, 2NF, and 3NF were defined by Edgar F.Codd and BCNF was defined by Raymond F.Boyce along with Codd. There are other normal forms like 4NF, 5NF, DKNF, and 6NF. Here, we are about to discuss first four normal forms.

In the normalization process of a table, we need to check whether a table satisfies the conditions of every normal form. If yes, we could say the given table is in that normal form. That is, if a table satisfies the conditions of 1NF, then we say that the table is in 1NF, and so on.

First Normal Form (1NF)


In Relational Database Management System (RDBMS), 1NF is one of the properties a relation (table) must satisfy. 1NF requires the following properties to be satisfied.

  •     Any attribute (column) of a table must have been designed to accept only atomic values.
  • Any value stored in any column must have single value (no repeating groups).


Let us take the following relation (table) STUDENT1 to explain the property 1NF. In STUDENT1, PR is Program Registered, and CName_Regd is Name of the course registered by the student.


RegNo
SName
Gen
PR
CName_Regd
R1
Sundar
M
BTech
Database, Data Structures
R2
Ram
M
MS
Database
R3
Kathik
M
MCA
Data Structures, Multimedia
R4
John
M
BSc
Multimedia
Table 1 – STUDENT1


Rule 1 says that the columns can have atomic domains only. That is, the permitted/accepted values for the column should be indivisible. Here indivisible means that if we try to divide a value stored in the column in any possible way, it should not give a meaningful string. Let us take the table STUDENT1. In this table, the CName_Regd attribute is accepts more than one course if registered by single student. And the value is inserted using comma as the separator for every individual course. Student “Sundar”, registered for two courses, namely, “Database”, and “Data Structures”. And they are stored as a single value (string) “Database, Data Structures”. If we divide the value at the comma, we will get two meaningful names “Database”, and “Data Structures”. That is, both are valid course names. Hence, CName_Regd is divisible.

The domain of Gen (Gender) is indivisible, i.e., atomic. We cannot divide the value stored in the column further. RegNo values cannot be divided further; PR values cannot be divided further, and so on.

Rule 2 insists that the column must not have multiple values. In our example, CName_Regd column accepts set of courses (i.e., group of one or more values) registered by a student for every record. And, if more than one course is registered, the course names are stored as single value separated by a comma (or any separators as decided by the designer). Like said above, the value is divisible.
Our table STUDENT1 does not satisfy both the conditions. Hence, it is not in 1NF. (If any one of the rules were not satisfied, then also the table not in 1NF). To further normalize the table into 2NF, 3NF, and so on, we need to convert this design into 1NF.

Solution to convert into 1NF:


To convert non-1NF design into 1NF, “unnest” (i.e., flatten the relation) the given relation.

In STUDENT1, CName_Regd is the only attribute which violates 1NF. So, it can be unnested. That is, for every individual course of any student, repeat the content of all the other column values for that student as given in the table STUDENT_1NF below.


RegNo
SName
Gen
PR
CName_Regd
R1
Sundar
M
BTech
Database
R1
Sundar
M
BTech
Data Structures
R2
Ram
M
MS
Database
R3
Kathik
M
MCA
Data Structures
R3
Kathik
M
MCA
Multimedia
R4
John
M
BSc
Multimedia
Table 2 – STUDENT_1NF


Note that, the course “Data Structures” has two substrings. If we try to divide “Data Structures” further like “Data” and “Structures”, this division, even though give meaningful strings “Data” and “Structures”, these are not courses offered.
Hence, this kind of division does not mean ‘divisible attribute’.


What would happen if any attribute is divisible?

Let us table Table1 to explain this. For queries which require list of courses for a given register number is straight forward. That is, we get the list of course names just by using the register number in the WHERE clause of the query. But, when the requirement is, “Find the list of student register numbers whoever registered for the course Database”, it is a complex scanning of the attribute for the result. Because, it involves parsing of every value present in the CName_Regd column and it is not simple.

When we normalize the table, both the requirements can be handled easily without any string analysis.

For other simple definition on First Normal Form (1NF), click here.

Prev - Normalization                                                                     Next - Functional Dependency (FD)


*********

Related Links




First normal form 1NF in DBMS with example



 

Normalization

INDEX                                                                                   NEXT - First Normal Form (1NF)

What is Normalization? / Why do we need to normalize a table? / What are modification anomalies?

Why do we need to Normalize a table?


Introduction

A database which is designed using a model (Example – ER model) is perfect to some extent. This is true as for as the design is concerned. That is, it depends on the chosen attributes. But, there might be some hidden problems hidden in the design. One has to identify and remove those problems. Then only we could say that the design is good. The actual problems are due to the data stored or the dependency of the attribute over the other, etc. If we eliminate those problems, then we could say that the final design is perfect. Let us analyze the possible problems with the following table (designed using ER model) which is populated with sample data.

The following table (STUDENT) shows information about the students of an Engineering college. The student information like Registration Number (RegNo), Name(SName), Gender(Gen), Program Joined(PR), Course Number(CNo), Course Name(CName), Professor Name(PN), and Professor Office Address(POA) are stored in the table. Also assume the following;
1. Every course is offered by only one professor.
2. Student can register many courses (say maximum 6)
3. POA gives the office address of the professor which is also unique for every professor.
4. All the students’ information who joined for any program will be included in this table.

For every table, it is good to have a key to uniquely identify some information. The key for the table STUDENT is (RegNo, CNo), a composite primary key. Because, no single attribute can identify the records uniquely.

RegNo
SName
Gen
PR
CNo
CName
PN
POA
R1
Sundar
M
BTech
C101
Database
Kumar
CA101
R2
Ram
M
MS
C101
Database
Kumar
CA101
R3
Malini
F
MS
C101
Database
Kumar
CA101
R1
Sundar
M
BTech
C105
Data Structures
Praveen
CA107
R4
Kathik
M
MCA
C105
Data Structures
Praveen
CA107
R5
John
M
BSc
C104
Multimedia
Kesavan
CA103
Table 1 - STUDENT

This table looks error free according to its design, but has many hidden problems related to data stored. To identify those problems let us discuss about Modification Anomalies.

Modification Anomalies:


It is a set of problems caused due to the data manipulation in any table. Let us discuss one by one using the above sample tables STUDENT.

1. Insertion Anomaly

Insertion anomaly is the one which is caused during insertion of records into a table. In our sample table STUDENT, to insert information about any student he must have registered atleast one course, and the course must be offered by one professor. That is, if a student information is inserted without course information, according to integrity constraint violation (part of the Primary key is NULL – Cno is null), the record will not be accepted. The same is applicable for course data insertion without professor. This problem is called insertion anomaly.

2. Deletion Anomaly

Deletion of value of some attributes in a record leads to loss of some other important information too. In our table STUDENT, for some reasons, if the subject ‘Multimedia’ is canceled, then we lose the information about the student ‘John’ as well as the information about professor ‘Kesavan’. Also, withdrawal of a course registered by student ‘John’ will lead to loss of information. Such a inconvenience caused due to the deletion is deletion anomaly.

3. Updation Anomaly

An important statement regarding database and data is “Data redundancy leads to inconsistency”. It says if you have data repeated in one or more tables may take you to an inconsistent state while updating such information. In our table, student ‘Sundar’ has registered two courses. It leads to duplication of RegNo, SName, Gen, and PR values. Course ‘C101’ is registered by 3 students. If for any reason, the change of course name ‘Database’ to ‘Database Systems’ must be changed in all the 3 records. Failing which leads to inconsistent state. That is if one of the record is not changed the values ‘Database’ then we have two values for ‘C101’. If Sundar’s data is changed, it must be changed in all the records wherever ‘R1’ available.

Our table STUDENT has all the above said anomalies. To free the table from these anomalies we need to normalize the table using Normalization techniques. Various normal forms which are very basic and essential are discussed in the post Normal Forms.

INDEX                                                                                   NEXT - First Normal Form (1NF)

Friday, January 31, 2014

Distributed databases - Concurrency Control


Concurrency control in distributed database / Various locking protocols in distributed database / Single lock manager and distributed lock manager approaches in handling concurrent transactions



Concurrency Control in Distributed Database

Concurrency control schemes dealt with handling of data as part of concurrent transactions. Various locking protocols are used for handling concurrent transactions in centralized database systems. There are no major differences between the schemes in centralized and distributed databases. The only major difference is that the way the lock manager should deal with the replicated data. The following topics discusses about several possible schemes that are applicable to an environment where data can be replicated in several sites. We shall assume the existence of the shared and exclusive lock modes.

Locking protocols
                                a) Primary Copy protocol
                                b) Majority protocol
                                c) Biased protocol
                                d) Quorum Consensus protocol

Some assumptions:
Our distributed database system consists of n sites (servers/computers in different locations)
Data are replicated in two or more sites

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents

data recovery