Sunday, March 23, 2014

Parallel Join Technique - Partitioned Join



Partitioned Join

The relational tables that are to be joined gets partitioned on the joining attributes of both tables using same partitioning function to perform Join operation in parallel.

How does Partitioned Join work?


Assume that relational tables r and s need to be joined using attributes r.A and s.B. The system partitions both r and s using same partitioning technique into n partitions. In this process A and B attributes (joining attributes) to be used as partitioning attributes as well for r and s respectively. r is partitioned into r0, r1, r2, …, rn-1 and s is partitioned into s0, s1, s2, …, sn-1. Then, the system sends partitions ri and si into processor Pi, where the join is performed locally.

What type of joins can we perform in parallel using Partitioned Join?


Only joins such as Equi-Joins and Natural Joins can be performed using Partitioned join technique.

Why Equi-Jion and Natural Jion?


Equi-Join or Natural Join is done between two tables using an equality condition such as r.A = s.B. The tuples which are satisfying this condition, i.e, same value for both A and B, are joined together. Others are discarded. Hence, if we partition the relations r and s on applying certain partitioning technique on both A and B, then the tuples having same value for A and B will end up in the same partition. Let us analyze this using simple example;

RegNo
SName
Gen
Phone
1
Sundar
M
9898786756
3
Karthik
M
8798987867
4
John
M
7898886756
2
Ram
M
9897786776
Table 1 - STUDENT
RegNo
Courses
4
Database
2
Database
3
Data Structures
1
Multimedia
Table 2 – COURSES_REGD

Let us assume the following;
The RegNo attributes of tables STUDENT and COURSES_REGD are used for joining.
Observe the order of tuples in both tables. They are not in particular order. They are stored in random order on RegNo.
Partition the tables on RegNo attribute using Hash Partition. We have 2 disks and we need to partition the relational tables into two partitions (possibly equal). Hence, n is 2.
The hash function is, h(RegNo) = (RegNo mod n) = (RegNo mod 2). And, if we apply the hash function we shall get the tables STUDENT and COURSES_REGD partitioned into Disk0 and Disk1 as stated below.

Partition 0
Partition 1

RegNo
SName
Gen
Phone
4
John
M
7898886756
2
Ram
M
9897786776
STUDENT_0


RegNo
SName
Gen
Phone
1
Sundar
M
9898786756
3
Karthik
M
8798987867
STUDENT_1

RegNo
Courses
4
Database
2
Database
COURSES_REGD_0


RegNo
Courses
3
Data Structures
1
Multimedia
COURSES_REGD_1

From the above table, it is very clear that the same RegNo values of both tables STUDENT and COURSES_REGD are sent to same partitions. Now, join can be performed locally at every processor in parallel.
One more interesting fact about this join is, only 4 (2 Student records X 2 Courses_regd records) comparisons need to be done in every partition for our example. Hence, we need total of 8 comparisons in partitioned join against 16 (4 X 4) in conventional join.

The above discussed process is shown in Figure 1.
Figure 1 - Partitioned Join

  
 Points to note:


1. There are only two ways of partitioning the relations,
                Range partitioning on the join attributes or
                Hash partitioning on the join attributes.
2. Only equi-joins and natural joins can be performed in parallel using Partitioned Join technique.
3. Non-equi-joins cannot be performed with this method.
4. After successful partitioning, the records at every processor can be joined locally using any of the joining techniques hash join, merge join, or nested loop join.
5. If Range partitioning technique is used to partition the relations into n processors, Skew may present a special problem. That is, for some partitions, we may get fewer records (tuples) for one relation for a given range and many records for other relation for the same range.

6. With Hash partitioning, if there are many tuples with same value in one relation then the difference both relations is possible in one partition. Otherwise, skew has minimal effect.
7. The number of comparisons between relations are well reduced in partitioned join parallel technique.




"The more I help others to succeed, the more I succeed."



Natural Join



Natural Join

  • It is a join technique which compares the common attributes of the tables to join the tables.
  • We have to check for the common attributes in the tables well before executing the join query.
  • NATURAL JOIN is the keyword used in the query to perform join.
  • Natural join cause problems when new columns are added or columns are renamed or more than one set of common attributes are available. Hence, Natural join is not recommended.

Example:

SELECT * FROM Employee NATURAL JOIN Course_Registration;
The result generated by the above query is slightly different in structure from that of Equi-join. See the post on Equi-join for table instances STUDENT and COURSE_REGISTRATION and to compare the result with Natural join. The result of the above SQL query is given in Table 1;

RegNo
SName
Gen
Phone
Courses
R1
Sundar
M
9898786756
Database
R2
Ram
M
9897786776
Database
R3
Karthik
M
8798987867
Data Structures
R4
John
M
7898886756
Multimedia
 Table 1

*********

Go to Equi-join in SQL page

Go to Multiple choice questions in SQL/Relational algebra page


Functional Dependency Quiz 1

Relational Database Design - Functional Dependency Quiz


1. __________ refers to a attribute or group of attributes mentioned in the left hand side of the arrow in a FD.
    Discriminator
    Determinant
    Multivalued attribute
    All of the above

2. In a functional dependency X --> Y, if Y is functionally dependent on X, but not on X's proper subsets, then we would call the functional dependency as
    Full Functional Dependency
    Partial Functional Dependency
    Multivalued Functional Dependency
    None of the above

3. Which of the following is the result of bad database design?
    Repetition of Information
    Inability to represent some information
    Inconsistent database state due to some transaction
    All of the above

4. If X is {E, G, H, M} and Y is {G, M} then X --> Y is
    Augmentation Rule
    Reflexivity Rule
    Union Rule
    Pseudotransitivity Rule

5. If X --> YZ then X --> Y and X --> Z is
    Composition Rule
    Reflexivity Rule
    Union Rule
    Decomposition Rule

6. Consider F1 and F2 as two sets of functional dependencies. If every functional dependency in F2 can be inferred from the functional dependencies of F1 using inference rules, then F1 is _________ of F2
    Cover Set
    Closure Set
    Minimal Set
    None of the above

7. If X --> Y is a functional dependency and X and Y are sets of attributes, what is the relationship between X and Y?
    One-to-Many
    Many-to-One
    One-to-One
    Many-to-Many

8. For a functional dependency X --> Y, it is said to be _________ if Y is the subset or equal to X.
    Total
    Trivial
    Non-trivial
    Partial

9. To check whether X (a set of one or more attributes) is a candidate key of relation R, we need to find ______ of X.
    Canonical Cover
    Closure
    Minimal Cover
    None of the above

10. (a) If every functional dependency in F1 can be inferred from F2 on application of inference rules, and
      (b) removal of any one attribute from any functional dependency of F2 violates (a).   
Then F2 is called ______ for F1.
    Minimal Cover
    Canonical Cover
    Both Minimal and Canonical Cover
    None of the above

Score =

Correct answers:

Friday, March 21, 2014

ER Model Quiz 3

DBMS Basics and Entity-Relationship Model - Quiz 3


1. Degree of relational table is
    Number of records in the table
    Number of attributes in the table
    Number of candidate keys in the table
    Number of distinct records in the table

2. Cardinality of the relational table is
    Number of records in the table
    Number of attributes in the table
    Number of candidate keys in the table
    Number of distinct records in the table

3. Which of the following would be the appropriate term for a key which consists of more than one attribute?
    Composite Key
    Candidate Key
    Primary Key
    None of the above

4. Degree of the relationship set is
    Number of records in the relational table
    Number of attributes in the relationship set
    Number of participating entity sets in the relationship set
    Number of distinct records in the relationship set

5. Cardinality of an attribute in a relational table is
    Number of unique values in the attribute
    Number of duplicated values in the attribute
    Number of values in the attribute
    None of the above

6. A Multivalued attribute in an ER diagram
    Must be directly mapped into the base table to which it belongs
    Must be mapped into a different table
    Must not be mapped into any tables
    Must be mapped into the relationship table

7. The set of candidate keys that are not selected as the Primary key is called
    Primary Keys
    Composite Keys
    Alternate Keys
    All of the above

8. "CREATE TABLE Emp (Emp_No CHAR(5), EName VARCHAR(25), SALARY NUMBER(8))". This DDL statment will
    Create a table named Emp
    Create a table named Emp and Update data dictionary
    Show error
    Not update data dictionary as the table does not have any primary key

9. An attribute or set of attributes of one relation that matches an attribute or set of attributes of other relation is
    Primary Key
    Candidate Key
    Super Key
    Foreign Key

10. Which of the following is an Integrity constraint?
    INT datatype
    NOT NULL
    DATE datatype
    All of the above

Score =

Correct answers:

Featured Content

Multiple choice questions in Natural Language Processing Home

MCQ in Natural Language Processing, Quiz questions with answers in NLP, Top interview questions in NLP with answers Multiple Choice Que...

All time most popular contents