Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
CS3402
Normal Forms Problems caused by redundancy Redundant storage Potential inconsistency Insertion anomaly: E.g., to insert a new tuple for an employee who works in department number 5, we must enter all the attribute values of department 5 correctly CS3402 3 Problems caused by redundancy Deletion anomaly: E.g., If we delete an employee tuple that happens to represent the last employee working for a particular department, the information concerning that department is lost inadvertently from the database. CS3402 4 Problems caused by redundancy Update anomaly: E.g., If we change the value of one of the attributes of a particular department—say, the manager of department 5—we must update the tuples of all employees who work in that department; otherwise, the database will become inconsistent. CS3402 5 Problems caused by redundancy Redundancy must be eliminated in a well guided fashion How? Normalization based on functionality dependency CS3402 6 Normalization Normalization Proposed by Codd (1972) take a relation schema through a series of tests based on functional dependency and primary keys to certify whether it satisfies a certain normal form Decompose a relation into several related relation to achieve: Minimizing redundancy Minimizing the insertion, deletion and update anomalies CS3402 7 Relational Database Design Normalization theory - based on functional dependencies * universe of relations * 1st Normal Form (1NF) * 2NF * 3NF * BCNF * 4NF * 5NF 1NF 2NF 3NF BCNF 4NF 5NF CS3402 8 Relational Database Design Issue: Which attributes should or should not be put in the same table? Criteria: Avoid redundancies that may cause problems that we discussed. If a relation is in a certain normal form, it is known that certain kinds of problems can be avoided / minimized. CS3402 9 Normalization Normalization requires two properties: Non-additive or lossless join Decomposition is reversible and no information is lost No spurious tuples (tuples that should not exist) should be generated by doing a natural-join of any relations (Extremely important) Preservation of the functional dependencies Ensure each functional dependency is represented in some individual relation or could be inferred from other dependencies after decomposition. CS3402 10 First Normal Form with Primary Key First normal form (1NF) Disallow multi-values attributes, composite attributes and their combination The domain of an attribute must be atomic (simple and indivisible) values Q. Is this relational schema in 1NF? CS3402 11 CS3402 12 Nested relations First normal form also disallows multivalued attributes that are themselves composite. These are called nested relations because each tuple can have a relation within it. E.g., Ssn is the primary key of the EMP_PROJ relation in Figure (a) Pnumber is the partial key of the nested relation; that is, within each tuple, the nested relation must have unique values of Pnumber. (a) Schema of the EMP_PROJ relation with a nested relation attribute PROJS. (b) Sample extension of the EMP_PROJ relation showing nested relations within each tuple. CS3402 13 Nested relations To normalize this into 1NF, we: 1. Remove the nested relation attributes into a new relation 2. Propagate the primary key into it; the primary key of the new relation will combine the partial key with the primary key of the original relation. (c) Decomposition of EMP_PROJ into relations EMP_PROJ1 and EMP_PROJ2 by propagating the primary key CS3402 14 Problems with 1NF Problems with 1NF Insertion Anomalies Deletion Anomalies Update Anomalies Q: Is it in 1NF? What are the problems? CS3402 15 Functional Dependency (Recap) Functional dependency is a constraint between two sets of attributes from the database [Definition] A functional dependency denotes by X → Y, between two sets of attributes X and Y that are subsets of a relation schema R specifies a constraint on the possible tuples that can form a relation state r of R The constraint is that, for any two tuples t1 and t2 in r that have t1[X] = t2[X], they must also have t1[Y] = t2[Y] i.e., The values of the X component of a tuple uniquely (or functionally) determine the values of the Y component. E.g., In DEPARTMENT, Dnumber and Dname If you know the department number, you know the department name, so we have Dnumber → Dname CS3402 16 Functional Dependency: Full vs Partial A FD X→ Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold anymore E.g., {Ssn, Pnumber} → Hours is a full dependency A FD X→ Y is a partial functional dependency if some attributes A belonging to X can be removed from X and the dependency still holds E.g., {Ssn, Pnumber} → Ename is a partial dependency CS3402 17 Attributes: Prime vs Nonprime An attribute of relation schema R is called a prime attribute of R if it is a member of some candidate key of R. Otherwise, it is nonprime. Example: Consider the CAR relation schema: CAR(State, Reg#, SerialNo, Make, Model, Year) Suppose CAR has two candidate keys: Key1 = {State, Reg#} Key2 = {SerialNo} Q. Which attributes are nonprime attributes? CS3402 18 2NF with Primary Key [Definition] A relation R is in 2NF if every nonprime attributes A in R is fully functional dependent on the primary key of R. Thus, if the primary key contains a single attribute, the test need not be applied at all. Q.Is the EMP_PROJ relation in 2NF? CS3402 19 2NF Normalization If a relation schema is not in 2NF, it can be 2NF normalized into a number of 2NF relations in which nonprime attributes are associated only with the part of the primary key on which they are fully functionally dependent. CS3402 20 Problems with 2NF Problems with 2NF Insertion Anomalies Deletion Anomalies Update Anomalies Q: Is it in 2NF? What are the problems? CS3402 21 Transitive Dependency 3NF is based on the concept of transitive dependency [Definition] A functional dependency X → Y in a relation R is transitive dependency if there exists a set of attributes Z in R that is neither a candidate key nor a subset of any key of R AND both X → Z and Z → Y hold. E.g., The dependency Ssn→ Dmgr_ssn is transitive through Dnumber CS3402 22 3NF with Primary Key [Codd’s original definition] A relation schema R is in 3NF if it satisfies 2NF and no non-prime attribute of R is transitively dependent on the primary key. Q. Is EMP_DEPT in 3NF? CS3402 23 3NF Normalization To normalize relation schema R into 3NF, R should be decomposed to break the transitive dependency. CS3402 24 Normal Forms Defined Informally 1st normal form All attributes are simple (no composite or multivalued attributes) 2nd normal form All non-prime attributes depend on the whole key (no partial dependency) 3rd normal form All non-prime attributes only depend on the key (no transitive dependency) In general, 3NF is desirable and powerful enough But, still there are cases where a stronger normal form is needed CS3402 25 Test and Remedy for Normalization CS3402 26 General Definitions of 2NF and 3NF The normalization procedure described is useful for analysis in practical situations where primary keys have already been defined. These definitions, however, do not take other candidate keys of a relation, if any, into account. Here, we give the more general definitions of 2NF and 3NF that take all candidate keys of a relation into account. Notice that this does not affect the definition of 1NF since it is independent of keys and functional dependencies. Partial and full functional dependencies and transitive dependencies will now be considered with respect to all candidate keys of a relation. CS3402 27 General Definition of 2NF Alternative definition: A relation schema R is in 2NF is every nonprime attribute A in R is fully dependent on every key of R. CS3402 28 Example LOTS describes parcels of land for sale in various counties of a state Two candidate keys: Property_id# and {County_name, Lot#} Property_id# is chosen as the primary key Q. Is LOTS in 2NF? A. LOTS violates 2NF due to FD3 as Tax_rate is partially dependent on the candidate key {County_name, Lot#} County_name → Tax-rate (partial dependent) Q&A CS3402 29 Example (cont’d) Decomposing into the 2NF relations LOTS1 and LOTS2. CS3402 30 General Definition of 3NF This general definition can be applied directly to test whether a relation schema is in 3NF; it does not have to go through 2NF first. In other words, if a relation passes the general 3NF test, then it automatically passes the 2NF test. CS3402 31 General Definition of 3NF Two candidate keys: Property_id# and {County_name, Lot#} Q. Are LOTS1 and LOTS2 in 3NF? LOTS2 is in 3NF LOTS1 violates 3NF: FD4 in LOTS1 violates 3NF because Area is not a superkey and Price is not a prime attribute in LOTS1 Q&A CS3402 32 Example (cont’d) Decomposing LOTS1 into the 3NF relations LOTS1A and LOTS1B. CS3402 33 Example (cont’d) Progressive normalization of LOTS into a 3NF design.. CS3402 34 Boyce-Codd Normal Form BCNF was proposed as a simpler form of 3NF, but it was found to be stricter than 3NF Every relation in BCNF is also in 3NF BUT relation in 3NF is not necessarily in BCNF Difference with 3NF: Condition which allows A to be prime is absent from BCNF CS3402 35 Boyce-Codd Normal Form Suppose we have thousands of lots in the relation but the lots are from only two counties: A and B Lot sizes in A are only 0.5, 0.6, 0.7, 0.8, 0.9 acres whereas lots in B are 1.1, 1.2, …, 1.9 Thus, we further have Area → County_name (FD 5) in LOTS1A CS3402 36 Boyce-Codd Normal Form Recall that there are two candidate keys: Property_id# and {County_name, Lot#} Q. Does LOTS1A satisfy 3NF? How about BCNF? FD5 satisfies 3NF in LOTS1A because County_name is a prime attribute FD5 violates BCNF in LOTS1A because Area is not a superkey of LOTS1A Q&A CS3402 37 BCNF Normalization This decomposition loses the functional dependency FD2 because its attributes no longer coexist in the same relation after decomposition. CS3402 38 Summary of 3NF and BCNF In practice, most relation schemas that are in 3NF are also in BCNF. Only if there exists some f.d. X → A that holds in a relation schema R with X not being a superkey and A being a prime attribute will R be in 3NF but not in BCNF. Ideally, relational database design should strive to achieve BCNF or 3NF for every relation schema. Achieving the normalization status of just 1NF or 2NF is not considered adequate, since both were developed historically to be intermediate normal forms as stepping stones to 3NF and BCNF. CS3402 39 Boyce-Codd Normal Form Some notes on BCNF: BCNF is stronger than 3NF BCNF is useful when a relation has: multiple candidate keys, and these candidate keys are composite ones, and they overlap on some common attributes BCNF reduces to 3NF if the above conditions do not apply