Tuesday, February 11, 2014

Super Key, Candidate Key, and Primary Key

Define Super key, Candidate key, Primary key and Foreign key / Type of keys in database / Purpose of Super key, Candidate key, Primary key and Foreign key in database design / Super key, Candidate key, Primary key and Foreign key Examples / How does a Foreign key ensure consistency?

Super key, Candidate key, and Primary key - Definitions and Discussions


Emp_ID
Emp_Name
DOB
Gender
Dept_No
E101
Ramkumar
15-JUL-1986
M
2
E103
Ramesh
04-MAY-1989
M
1
E104
Stephen
29-OCT-1987
M
1
E102
Nirmal
23-JAN-1980
M
3
E105
Laxmi
20-MAY-1988
F
4
E107
Rani
23-JAN-1980
F
4
E106
Ramesh
12-MAR-1979
M
2


Table 1 - Employee

Keys

 

The ability to locate data uniquely is one of the main properties which must be included while designing any databases. That is, using any single or set of attribute values one must be able to locate data in the database uniquely. The attribute(s) is called as key for a relation (table).
To retrieve unique(single) record (entity) from table Employee, we can write the following SQL query;
SELECT * FROM employee WHERE emp_id = ‘E102’;
This query can get you information about employee ‘Nirmal’, because he is the only person with the Emp_ID value ‘E102’. And, if you look at the sample data stored, it is very evident that no two employees have same Emp_ID. Hence, Emp_ID attribute (column) is one of the keys for the table Employee.

Types of Keys

1. Superkey


Set of one or more attributes together can uniquely identify rest of the attributes of a relation uniquely is called Superkey.

In our example, Emp_ID can uniquely identify any records in Employee. Likewise, any of the combinations, (Emp_ID, Emp_Name), (Emp_ID, DOB), (Emp_ID, Gender), (Emp_ID, Dept_No), (Emp_Name, DOB), (Emp_Name, DOB, Gender), and so on [you form the other combinations which can uniquely identify records] can also identify the records uniquely.

For any table, we have at least one super key which is the combination of all the attributes of the table. This is trivial superkey. In RDBMS, no two records are allowed to be the same. For example, at least the complete single record (E101, Ramkumar,         15-JUL-1986, M, 2) is different from (E105, Laxmi, 20-MAY-1988, F, 4). Hence, we have at least one super key.

Note: Redundant attributes are permitted in Super Keys. In (Emp_ID, Emp_Name), Emp_Name is unnecessary and redundant.

2. Candidate Key

Minimal Super key is a key which is a super key without any redundant attribute (unnecessary combination). In other words, if you remove any attribute from a key combination, it should not be able to uniquely identify data. That is we need a minimal combination of attributes. For example, (Emp_ID, Emp_Name) is a super key with an unnecessary attribute Emp_Name. Here, without Emp_Name attribute, only Emp_ID attribute can uniquely identify records. Hence, Emp_ID is itself a minimal super key.

Candidate key is such a minimal super key with another condition as follows;
Minimal super key with no component in its proper subset is a super key.
[Proper subset – if A is subset of B and A is not equal to B then A is the proper subset of B]

For example, {(Emp_ID), (Emp_Name)} is proper subset of set (Emp_ID, Emp_Name). In this, (Emp_ID) is itself a super key. Hence, (Emp_ID, Emp_Name) is not a candidate key but a super key.
As another example, (Emp_Name, DOB) is a super key and it is minimal. That is, the proper subset {(Emp_Name), (DOB)} does not contain any super keys. So, (Emp_Name, DOB) is a candidate key.

For our table Employee’s instance (information stored at a particular moment in a table is called instance), we have the following candidate keys (please check them);

(Emp_ID)
(Emp_Name, DOB)
(Emp_Name, Dept_No)
(DOB, Dept_No)
(DOB, Gender)

All the listed candidate keys are minimal. If you divide them further, they lose the property of a key. That is, for example, Dept_No alone cannot uniquely identify.



Why do we need to identify set of candidate keys?


They are helping in normalizing a table. That is, they are helping in eliminating unwanted anomalies. More on Boyce Codd Normal Form (BCNF).

3. Primary Key


Any relation (table) in Relational Database Management Systems (RDBMS) must have a primary key, which is one of the candidate keys which are minimal. That is, we choose one key among the list of candidate keys as primary key for a table. While choosing the primary key we are looking for simple candidate key among all the candidate keys. For example, in the above list of candidate keys, we choose (Emp_ID) as primary key because it is simple and single attribute key.

What is the purpose of Primary Key?


As I said earlier, it helps us in identifying data uniquely in the database. Also, it helps in normalizing your table and helps in performance boosting especially when you access your data using primary key.



4. Foreign Key


Foreign key is an attribute (or set of attributes) of one table (relation) which refers its value from another table’s Primary key (preferably) to establish a connection between them. In other words, it can be stated as “It is a constraint which helps in maintaining consistency on an attribute (or set of attributes) of two tables”.

Example of Foreign Key


Consider the following two relational schemas (Primary keys underlined).

STUDENT (Regno, SName, Program, DOJ, DOB)
STU_PHONE (Regno, Phone)

Here, Regno is the key for STUDENT and (Regno, Phone) is the key for STU_PHONE. Construction of separate table for storing Phone numbers is due to the intent to store multiple phone numbers for single student. That is, this design enables us to store zero or more phone numbers. It is clear that, if we would like to store any phone numbers, the condition is that the phone number must belong to some student. In other words, if you like to enter any phone number, then there must be a student registered for any program. Based on this condition, we designate Regno of STU_PHONE as the Foreign Key which refers the value of any existing STUDENT records’ Regno.

How does a Foreign Key ensure consistency?


Foreign key shows Parent-Child relationship. That is, there must be a parent (tuples in STUDENT) record for every child record (tuples in STU_PHONE). Hence, it acts as a condition.

And, it ensures consistency through cross verifying the values entered in the child table against the values stored in the parent table, thereby gives security from mistyping some values.

In its implementation of Foreign Key, if anybody entering a record in STU_PHONE, the Regno column value will be verified with the Regno column of STUDENT table for the existence of such values.

For example, if you like to enter a record (‘12MKV001’, 9897909411) in STU_PHONE, then you must have a record in STUDENT with Regno value ‘12MKV001’

Conclusion:


A table can have at most one Primary key.

Every Primary key is one of the candidate keys.
Every Primary key is a super key. Not all the super keys are primary key.
All the candidate keys except Primary key are called Alternate keys.
Primary key which consists of more than one attribute is called Composite key.
In any table, only one Primary key is permitted.


Foreign keys need not always refer to primary keys. it needs some attributes which are UNIQUE.

Sunday, February 9, 2014

Relational Database Design with ER diagram - An Example



Relational Database Design with ER diagram - An Example

ABC General Store:

ABC General Store is a medium sized store selling grocery and other general items everyday through only one shop. Now, they need to establish computer based billing system to maintain the flow of goods in their store. Their basic requirement is to store information about the products to maintain an inventory, to store information about their workers, information about their wholesale vendors, etc.

Requirement analysis:

The requirement states that the customer need to do the following;

  • To perform inventory management on items they sell
  • To maintain the details regarding the vendors (wholesale dealers, agents, or direct selling vendors, etc)
  • To handle the workers’ details.

Based on the above minimum requirements, the following Entity Sets are identified. They are defined as follows;

1. Items

Item No. – Unique identification for every item sold.
Item Name – Name of the items.
Item Price – Price of the item per piece. Based on the product, the meaning may vary. For example, tooth brush price is for one piece, rice price is for one kilogram, and so on.
Quantity – Available quantity of any item. Quantity need to be updated whenever an item is sold out or purchased for stock.
Threshold – The reorder level for any item. This is used for deciding when to put a purchase order for particular product. For example, if the threshold value for ‘Tooth Paste’ is 50, it means when our stock level (quantity) reaches 50, we have to immediately place a purchase order to the appropriate vendors.
Remarks – Any specific description about any item.
Expiry Date – Expiry date of any product (if applicable)

2. Vendor

Vendor Id – Unique Identification Number to represent any vendor.
Vendor Name – Name of the agent, or dealers, or direct seller name
Vendor Address – this detail the customer likes to store, door number, street name, city name along with pin code for future usage like placing orders, and customer specified complaints.
Phone – Phone number of the vendors. Here, we can collect any number of phone numbers of the vendor.
Remarks – To save information about specific products we could purchase from one vendor and any other information.

3. Employee

 Employee ID – Identification number of any employee
Name – Name of the employee
Join Date – joining date of the employee
DOB – Date of Birth of the employee.
WAddress – This stores the address of the employee. This address can be stored in a single column. It need not be separated as street name, city name etc.
Section – Particular section (like Grocery, Toys, and Cosmetics etc.) in the shop in which the employee got his job assigned.
Salary – salary of the worker.
Designation – Various designations of the employees. In our case they are Sales Man, Floor Manager, Store Manager, and Purchase Officer.

4. Section

Section Name – It is used to identify any sections uniquely. Various sections of the store are Grocery, Cosmetics, Toys, Sports, Gifts, Foot wears, Home appliances, and Vegetables. The proposed system must be able to include more such sections in the future.
Section Location – There are five floors in the shop. Each floor is divided into four sections like, Floor I Billing Section, Floor I Toys Section I, Floor I Toys Section II, and so on.
Description – To store information regarding what shall be sold in any particular section, and any other specific details regarding the section.

Cardinality Ratios(Type of Relationship):


  1. The relationship between Items and Vendor is Many-to-Many. Because, one vendor may supply many products, and more than one vendor may supply one product.
  2. The relationship between Items and Section is Many-to-One from Items to Section. Because, every section have many employees assigned, and any employee is assigned to exactly one section.
  3. The relationship between Section and Employee is One-To-Many from Section to Employee. In our store, an employee is allotted to exactly one section.


Entity Relationship Diagram

Using the details collected above, we draw an ER diagram which could depict the design in a more clear form as follows;

 Figure 1 - ABC General Store ER diagram


Figure 1 shows the ER diagram for ABC General Store. Here, I have given a simple design, purely for the purpose of explaining the process of ER diagram based design to the new learners.
There are some rules defined, which would help us to convert the ER diagram into set of schemas (structure of tables in the basic level) very easily.

1. Strong Entity sets

We know that, a Strong Entity set is one which has a primary key. In the ER diagram, primary keys are represented with a solid underlined attribute names. In our design, all the entity sets are having primary keys. Hence all are strong entity sets.
Rule says “We can convert strong entity sets into relational design as it is”. That is, all the attributes listed under an entity set can be used as attributes in the target table. As a result, we will get the following relations (tables).

Items (Item_No, Item_Name, Price, Qty, Threshold, Remarks)

Vendors (VID, VName, Address, Phone, Remarks)

Section (SectionName, Section_Loc, Description)

Employee (EID, EName, Desig, JDate, DOB, WAddress, Work_Section, Salary)

At the end of converting the Strong entity sets, we have the above 4 tables as result.

2. Attribute type identification

There are some conversions required based on the type of attributes used in any entity sets. The attributes type based conversion help us to eliminate some amount of duplication (redundancy) at the basic level of design.
In our design, we have following type of attributes;
Simple attributes – all the entity sets have simple attributes.
Composite attributes – entity set Vendors has one composite attribute ‘Address’ (represented as set of attributes linked with one attribute).
Multi-valued attributes- entity set Vendors has one multi-values attribute ‘Phone’ (represented using double ellipse)
While converting ER diagram into tables, we need special attention on attribute types other than simple attributes. For composite attributes, we need to replace the main attribute name with its sub-attributes. In our case, in entity set Vendors, ‘Address’ need to be replaced with door no, street, city and pincode. For multi-valued attributes, the attribute with multiple values might cause complexity in accessing data, or it may end up in redundancy of other related data. Hence, it is suggested to have different table designation for multi-valued attributes. As a result, we have the following relations (tables).

Items (Item_No, Item_Name, Price, Qty, Threshold, Remarks)

Vendors (VID, VName, Door_No, Street, City, Pincode, Remarks)

Vendor_Phone (VID, Phone) – (Note - Phone alone cannot go into a separate table. It must be identified as whose phone. So, the primary key of the base table is included to represent whose phone)

Section (SectionName, Section_Loc, Description)

Employee (EID, EName, Desig, JDate, DOB, WAddress, Work_Section, Salary)

3. Type of relationship sets.

In our design, we have three relationship sets namely, Item_Vendor, It_Sec, and Sec_Wor of type Many-To-Many, Many-To-One, and One-To-Many respectively.
Rule says, “Include the one side entity set’s primary key as many side entity set’s foreign key” to handle One-To-Many or Many-To-One relationships.
Rule says, “Create separate table for Many-To-Many relationship”. Because, handling this type of relationships like One-To-Many or Many-To-One will cause complete redundancy in one of the tables. (This can be understood if you test with Items and Vendors with some sample data). The target table will contain the primary keys of participating entity sets. In our example, the participating entity sets are Items and Vendors on Many-To-Many relationship set.
As a result, we have the following table included in the database as new table.
Item_Vendor (Item_no, VID)

At the end, we have the following tables;

Items (Item_No, Item_Name, Price, Qty, Threshold, Remarks, SectionName) – [due to the One-To-Many relationship from Section to Item, the attribute ‘SectiosName’ is included as foreign key in Items table]
Vendors (VID, VName, Door_No, Street, City, Pincode, Remarks)

Vendor_Phone (VID, Phone)

Item_Vendor (Item_no, VID)

Section (SectionName, Section_Loc, Description)

Employee (EID, EName, Desig, JDate, DOB, WAddress, Work_Section, Salary, SectionName) – [due to the Many-To-One relationship from Employee to Section, the attribute ‘SectionName’ is included as foreign key in Employee table].

Finally, our database contains 6 tables namely, Items, Vendors, Vendor_Phone, Item_Vendor, Section, and Employee.

You may visit ER Diagram Tutorials to know about various components.

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents