Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
CS3402
Normal Forms
Problems caused by redundancy
Redundant storage
Potential inconsistency
Insertion anomaly: E.g., to insert a new tuple for an employee
who works in department number 5, we must enter all the
attribute values of department 5 correctly
CS3402 3
Problems caused by redundancy
Deletion anomaly: E.g., If we delete an employee tuple that
happens to represent the last employee working for a particular
department, the information concerning that department is lost
inadvertently from the database.
CS3402 4
Problems caused by redundancy
Update anomaly: E.g., If we change the value of one of the
attributes of a particular department—say, the manager of
department 5—we must update the tuples of all employees who
work in that department; otherwise, the database will become
inconsistent.
CS3402 5
Problems caused by redundancy
Redundancy must be eliminated in a well guided fashion
How?
Normalization based on functionality dependency
CS3402 6
Normalization
Normalization
Proposed by Codd (1972)
take a relation schema through a series of tests based on
functional dependency and primary keys to certify whether it
satisfies a certain normal form
Decompose a relation into several related relation to achieve:
Minimizing redundancy
Minimizing the insertion, deletion and update anomalies
CS3402 7
Relational Database Design
Normalization theory
- based on functional dependencies
* universe of relations
* 1st Normal Form (1NF)
* 2NF
* 3NF
* BCNF
* 4NF
* 5NF
1NF
2NF
3NF
BCNF
4NF
5NF
CS3402 8
Relational Database Design
Issue: Which attributes should or should not be put in the same
table?
Criteria: Avoid redundancies that may cause problems that we
discussed.
If a relation is in a certain normal form, it is known that certain kinds
of problems can be avoided / minimized.
CS3402 9
Normalization
Normalization requires two properties:
Non-additive or lossless join
Decomposition is reversible and no information is lost
No spurious tuples (tuples that should not exist) should be
generated by doing a natural-join of any relations
(Extremely important)
Preservation of the functional dependencies
Ensure each functional dependency is represented in some
individual relation or could be inferred from other
dependencies after decomposition.
CS3402 10
First Normal Form with Primary Key
First normal form (1NF)
Disallow multi-values attributes, composite attributes and their
combination
The domain of an attribute must be atomic (simple and
indivisible) values
Q. Is this relational schema in 1NF?
CS3402 11
CS3402 12
Nested relations
First normal form also disallows multivalued attributes that are
themselves composite. These are called nested relations because
each tuple can have a relation within it.
E.g., Ssn is the primary key of the EMP_PROJ relation in Figure (a)
Pnumber is the partial key of the nested relation; that is, within each
tuple, the nested relation must have unique values of Pnumber.
(a) Schema of the EMP_PROJ
relation with a nested relation
attribute PROJS.
(b) Sample extension of the
EMP_PROJ relation showing
nested relations within each
tuple.
CS3402 13
Nested relations
To normalize this into 1NF, we:
1. Remove the nested relation attributes into a new relation
2. Propagate the primary key into it;
the primary key of the new relation will combine the partial key with the
primary key of the original relation.
(c) Decomposition of EMP_PROJ into relations EMP_PROJ1 and
EMP_PROJ2 by propagating the primary key
CS3402 14
Problems with 1NF
Problems with 1NF
Insertion Anomalies
Deletion Anomalies
Update Anomalies
Q: Is it in 1NF?
What are the problems?
CS3402 15
Functional Dependency (Recap)
Functional dependency is a constraint between two sets of
attributes from the database
[Definition] A functional dependency denotes by X → Y, between
two sets of attributes X and Y that are subsets of a relation schema
R specifies a constraint on the possible tuples that can form a
relation state r of R
The constraint is that, for any two tuples t1 and t2 in r that have t1[X] = t2[X],
they must also have t1[Y] = t2[Y]
i.e., The values of the X component of a tuple uniquely (or functionally)
determine the values of the Y component.
E.g., In DEPARTMENT, Dnumber and Dname
If you know the department number, you know the department
name, so we have Dnumber → Dname
CS3402 16
Functional Dependency: Full vs Partial
A FD X→ Y is a full functional dependency if removal of any
attribute A from X means that the dependency does not hold
anymore
E.g., {Ssn, Pnumber} → Hours is a full dependency
A FD X→ Y is a partial functional dependency if some attributes A
belonging to X can be removed from X and the dependency still
holds
E.g., {Ssn, Pnumber} → Ename is a partial dependency
CS3402 17
Attributes: Prime vs Nonprime
An attribute of relation schema R is called a prime attribute of R if it
is a member of some candidate key of R.
Otherwise, it is nonprime.
Example: Consider the CAR relation schema:
CAR(State, Reg#, SerialNo, Make, Model, Year)
Suppose CAR has two candidate keys:
Key1 = {State, Reg#}
Key2 = {SerialNo}
Q. Which attributes are nonprime attributes?
CS3402 18
2NF with Primary Key
[Definition] A relation R is in 2NF if every nonprime attributes A in
R is fully functional dependent on the primary key of R.
Thus, if the primary key contains a single attribute, the test
need not be applied at all.
Q.Is the EMP_PROJ relation in 2NF?
CS3402 19
2NF Normalization
If a relation schema is not in 2NF, it can be 2NF normalized into a
number of 2NF relations in which nonprime attributes are
associated only with the part of the primary key on which they are
fully functionally dependent.
CS3402 20
Problems with 2NF
Problems with 2NF
Insertion Anomalies
Deletion Anomalies
Update Anomalies
Q: Is it in 2NF?
What are the problems?
CS3402 21
Transitive Dependency
3NF is based on the concept of transitive dependency
[Definition] A functional dependency X → Y in a relation R is
transitive dependency if
there exists a set of attributes Z in R that is neither a candidate
key nor a subset of any key of R AND
both X → Z and Z → Y hold.
E.g., The dependency Ssn→ Dmgr_ssn is transitive through
Dnumber
CS3402 22
3NF with Primary Key
[Codd’s original definition] A relation schema R is in 3NF if it
satisfies 2NF and no non-prime attribute of R is transitively
dependent on the primary key.
Q. Is EMP_DEPT in 3NF?
CS3402 23
3NF Normalization
To normalize relation schema R into 3NF, R should be
decomposed to break the transitive dependency.
CS3402 24
Normal Forms Defined Informally
1st normal form
All attributes are simple (no composite or multivalued attributes)
2nd normal form
All non-prime attributes depend on the whole key (no partial
dependency)
3rd normal form
All non-prime attributes only depend on the key (no transitive
dependency)
In general, 3NF is desirable and powerful enough
But, still there are cases where a stronger normal form is needed
CS3402 25
Test and Remedy for Normalization
CS3402 26
General Definitions of 2NF and 3NF
The normalization procedure described is useful for analysis in
practical situations where primary keys have already been defined.
These definitions, however, do not take other candidate keys of a relation, if any,
into account.
Here, we give the more general definitions of 2NF and 3NF that
take all candidate keys of a relation into account.
Notice that this does not affect the definition of 1NF since it is
independent of keys and functional dependencies.
Partial and full functional dependencies and transitive
dependencies will now be considered with respect to all candidate
keys of a relation.
CS3402 27
General Definition of 2NF
Alternative definition: A relation schema R is in 2NF is every
nonprime attribute A in R is fully dependent on every key of R.
CS3402 28
Example
LOTS describes parcels of land for sale in various counties of a
state
Two candidate keys: Property_id# and {County_name, Lot#}
Property_id# is chosen as the primary key
Q. Is LOTS in 2NF?
A. LOTS violates 2NF due to FD3 as Tax_rate is partially dependent
on the candidate key {County_name, Lot#}
County_name → Tax-rate (partial dependent)
Q&A
CS3402 29
Example (cont’d)
Decomposing into the 2NF relations LOTS1 and LOTS2.
CS3402 30
General Definition of 3NF
This general definition can be applied directly to test whether a
relation schema is in 3NF; it does not have to go through 2NF first.
In other words, if a relation passes the general 3NF test, then it
automatically passes the 2NF test.
CS3402 31
General Definition of 3NF
Two candidate keys: Property_id# and {County_name, Lot#}
Q. Are LOTS1 and LOTS2 in 3NF?
LOTS2 is in 3NF
LOTS1 violates 3NF:
FD4 in LOTS1 violates 3NF because Area is not a superkey
and Price is not a prime attribute in LOTS1
Q&A
CS3402 32
Example (cont’d)
Decomposing LOTS1 into the 3NF relations LOTS1A and LOTS1B.
CS3402 33
Example (cont’d)
Progressive normalization of LOTS into a 3NF design..
CS3402 34
Boyce-Codd Normal Form
BCNF was proposed as a simpler form of 3NF, but it was found to
be stricter than 3NF
Every relation in BCNF is also in 3NF
BUT relation in 3NF is not necessarily in BCNF
Difference with 3NF:
Condition which allows A to be prime is absent from BCNF
CS3402 35
Boyce-Codd Normal Form
Suppose we have thousands of lots in the relation but the lots are
from only two counties: A and B
Lot sizes in A are only 0.5, 0.6, 0.7, 0.8, 0.9 acres whereas lots in B
are 1.1, 1.2, …, 1.9
Thus, we further have Area → County_name (FD 5) in LOTS1A
CS3402 36
Boyce-Codd Normal Form
Recall that there are two candidate keys: Property_id# and
{County_name, Lot#}
Q. Does LOTS1A satisfy 3NF? How about BCNF?
FD5 satisfies 3NF in LOTS1A because County_name is a prime
attribute
FD5 violates BCNF in LOTS1A because Area is not a superkey of
LOTS1A
Q&A
CS3402 37
BCNF Normalization
This decomposition loses the functional dependency FD2 because
its attributes no longer coexist in the same relation after
decomposition.
CS3402 38
Summary of 3NF and BCNF
In practice, most relation schemas that are in 3NF are also in
BCNF.
Only if there exists some f.d. X → A that holds in a relation schema
R with X not being a superkey and A being a prime attribute will R
be in 3NF but not in BCNF.
Ideally, relational database design should strive to achieve BCNF or
3NF for every relation schema.
Achieving the normalization status of just 1NF or 2NF is not
considered adequate, since both were developed historically to be
intermediate normal forms as stepping stones to 3NF and BCNF.
CS3402 39
Boyce-Codd Normal Form
Some notes on BCNF:
BCNF is stronger than 3NF
BCNF is useful when a relation has:
multiple candidate keys, and
these candidate keys are composite ones, and
they overlap on some common attributes
BCNF reduces to 3NF if the above conditions do not apply