Click here to Login

DataBase Management Systems(Module 4)

                               



MODULE 4


  4.1        Introduction

 Measure Of Quality
   We can discuss the goodness of relation schema in two levels.
1.Logical Level
 All you know, logical level represent the middle level in three level architecture of DBMS. The logical level describes how users interpret the relation schema and the meaning of their attributes. Having good relation schema at this level enables users to understand clearly the meaning of data in relations and hence to formulate the queries correctly.
2  Implementation Level
It is the lowermost level in the DBMS architecture, which describes how the tuples in base relations are stored and updated. This level applies only to the storage level in the database. But the former logical level applies to both view level and logical level. And the database becomes effective as much as with the effective storage.

                          Database Design Techniques

Generally we can design the database in two different approaches.
1.      Top-Down Design (Analysis) methodology
            It starts with the major entities of their interest, their attributes and their relationships. And then we add other entities and may split these entities into a number of specialized entities and add the relationships between these entities.
2.      Bottom-Up Design
            It starts with a set of attributes. And group these attributes into entities. Then find out the relationship between these entities. Identify the higher-level entities, generalized these entities and locate relationships at this higher level.

Problems with bad schema

1.      Redundant storage of data:
2.      Wastage of disc space
3.      More running time

Informal Design Guidelines for relation schema

There are four informal measures of quality for relation schema design.

1.      Semantics of the relation attribute
            For all attributes belonging to a relation will have certain real world meaning and a proper interpretation associated with them. The semantics specifies how to interpret the attribute values stored in a tuple of the relation.
Guideline:-Design relation schema so that it is easy to explain its meaning. Do not combine the attributes from multiple entity types and relationship types into a single relation. Otherwise if the relation contains multiple entities, the semantics will be unambiguous and the relation cannot be easily explained.

2.      Reducing the redundant values in tuples and update anomalies
            One goal of schema design is to minimize the storage space that the space relations occupy. The anomalies that may be present in the database relation can be classified into three categories.
1.      Insertion anomalies
2.      Deletion anomalies
3.      Modification anomalies
Guideline: - Design relation schema so that there is no insertion, deletion and modification anomalies are present in relations. If any anomalies are present note them clearly and make sure that the programs that update the database will operate correctly.

3.      Null values in tuples
            The null value in tuples is wastage of storage space in storage level and may also lead to problems with understanding the meaning of attributes and with specifying join operations at the logical level. Another problem is how to account the aggregate functions in null valued attributes.
Guideline: - As far possible, avoid placing attributes in base relation whose values frequently be null. If nulls are unavoidable, make sure that they are applicable for exceptional cases only.

4.3 Constraints
            Constraints on database can generally divided into four main categories.
1.      Inherent model based: Constraints that are inherent to the data model called inherent model based constraint like, a relation cannot have duplicate tuples.
2.      Schema Based: Constraints that can be directly expressed in the schemas of the data model, typically specifying in DDL are called Schema based constraint.
3.      Application Based:  Schemas expressed by the application programs are called application-based constraints.
4.      Data Dependency: It is the constraint that is related to the dependency between the values in the relation.
          Now we can go through the details of schema-based constraints. These schema based constraints are expressed in relational model. It includes five basic constraints.
1.      Domain constraint
2.      Key constraint
3.      Entity integrity constraint
4.      Referential integrity constraint
                        5. Constraint on nulls
4.4 Domain constraint
      A domain D represents a set of atomic values. The data type describing the type of values that can appear in each column is represented by this domain. i.e each value in the domain indivisible as far as the relational model is concerned. Specifying the data type from which the data values specifying the domain are drawn specifies the domain. A domain is given a name, data type and format.
4.5 Entity-Integrity constraint
Entity integrity constraint states that no primary key value can be null. Primary key value is used to identify tuples. Having null value in primary key implies we cannot identify tuples. The key constraint and integrity constraint are specified on individual relations.
4.6 Referential integrity constraint
The Referential integrity constraint is specified between two relations and is used to maintain the consistency among tuples in two relations. Referential integrity constraint states that a tuple in one relation that refer to another relation must refer to an existing tuple in that relation. To define Referential integrity constraint first we have to define the concept of foreign key (FK).
A set of attributes FK in relation schema R1, is a foreign key of R1 that references the R2 relation if it satisfies the following two rules.
1. The attributes in FK have the same domain as the PK attributes of R2. The attributes FK are said to reference or refer to the relation R2.
2. A value of FK in tuple t1 of the current state r1(R) either occurs as a value of PK for some tuple t2 in the current state r2(R) or is null.
If    t1[FK]=t2[PK]  then we can say that the tuple t1 refers to the tuple t2.
       R1 is referencing relation
       R2 is referenced relation.
Key constraint
           A relation is defined as a set of tuples. By definition of a relation all the tuples in a relation are distinct. i.e no two tuples can have the same value for all their attributes. There are some subsets of relation schema R with the property that no two tuples in any relation state ‘r ‘ of  ‘R’ should have the same value for these attributes.
A key K of a relation schema R is a super key  of R with the additional property that removing any attribute A from  K leaves a set of attributes K’ that is not a super key of R any more. Hence key satisfies the following two constraints.
1. The two distinct tuples in any state of the relation cannot have identical values for all the attributes in the key.
2. It is a minimal super key. i.e from a super key we cannot remove any attribute and still have the uniqueness constraint to hold the first condition.      
 Null Constraint
It specifies whether the null values are permitted to an attribute in a database.


4.7 Functional Dependency (FD)
Functional dependency is a constraint between two sets of attributes from the database.
Definition: A FD denoted by X→ Y between two sets of attributes X and Y that  are subsets of R specifies a constraint on the possible tuples that can form a relation state r of R. The constraint is that, for any two tuples t1 and t2 in r that have t1[X] =t2[X], they must also have t1[Y] =t2[Y]. i. e. the values of the Y component of a tuple in r depends on the values of X component or the X component determines the value of Y component.
i.e
1. The constraint on R states that there cannot be more than one tuple with a given X value in any relation state r(R) ╡ X→ Y for any subset of attributes Y of R.
2. If X→ Y in R doesn’t say whether or not Y→ X in R.
A functional dependency FD: is called trivial if Y is a subset of X.
Definition: A functional dependency, denoted by X→Y, between two sets of attributes X and Y that are subsets of the attributes of relation R, specifies that the values in a tuple corresponding to the attributes in Y are uniquely determined by the values corresponding to the attributes in X.
For example, the social security number uniquely determines a name;                    SSN→ Name
Functional dependencies are determined by the semantics of the relation, in general, they cannot be determined by inspection of an instance of the relation. That is, a functional dependency is a constraint and not a property derived from a relation.
 Inference rules
Armstrong's axioms - sound and complete i.e, enable the computation of any functional dependency.
Functional dependencies are
1. Reflexivity - if the B's are a subset of the A's then A → B
2. Augmentation - If A → B, then A, C → B, C.
3. Transitivity - If A → B and B → C then A → C.
Additional inference rules
4. Decomposition - If A → B, C then A → B
5. Union - If A → B and A → C then A → B, C
6. Pseudo transitive - If A → B and C, B → D then C, A → D
Equivalence of sets of functional dependencies
Two functional dependencies S & T are equivalent iff S→ T and T → S.
The dependency {A_1, ..., A_n} → {B_1, ..., B_m}
   is trivial if the B's are a subset of the A's
    is nontrivial if at least one of the B's is not among the A's
    is completely nontrivial if none of the B's is also one of the A's

Closure (F+)
All dependencies that include F and that can be inferred from F using the above rules are called closure of F denoted by F+.


Algorithm to compute closure
We have to find out whether F╞ X → Y. This is the case when X → Y Є F+ .For the better method is to generate X+, closure of X under Fand test F╞ X → Y using the first two axioms augmentation and reflexive rules.
Algorithm:
Input: A set of FD ‘s F and asset of attributes X.
Output: The closure X+ of X under the FD’s F+.      
X+:=X;
Change=true;
While change do
            Begin
Change: = False;
For each FD W → Z in F do
                        Begin
             If W Í Z then do
                       Begin
                      X+: = X+UZ;
                      Change: =True;
             End;
 End;
End;

4.8 Normalization
In relational database theory, normalization is the process of restructuring the logical data model of a database to eliminate redundancy, organize data efficiently, reduce repeating data and to reduce the potential for anomalies during data operations. Data normalization also may improve data consistency and simplify future extension of the logical data model. The formal classifications used for describing a relational database's level of normalization are called normal forms (NF).
A non-normalized database can suffer from data anomalies:
A non-normalized database may store data representing a particular referent in multiple locations. An update to such data in some but not all of those locations results in an update anomaly, yielding inconsistent data. A normalized database prevents such an anomaly by storing such data (i.e. data other than primary keys) in only one location.
A non-normalized database may have inappropriate dependencies, i.e. relationships between data with no functional dependencies. Adding data to such a database may require first adding the unrelated dependency. A normalized database prevents such insertion anomalies by ensuring that database relations mirror functional dependencies.
Similarly, such dependencies in non-normalized databases can hinder deletion. That is, deleting data from such databases may require deleting data from the inappropriate dependency. A normalized database prevents such deletion anomalies by ensuring that all records are uniquely identifiable and contain no extraneous information.
4.9 Normal forms
Edgar F. Codd originally defined the first three normal
The first normal form requires that tables be made up of a primary key and a number of atomic fields, and the second and third deal with the relationship of non-key fields to the primary key. These have been summarized as requiring that all non-key fields be dependent on "the key, the whole key and nothing but the key". In practice, most applications in 3NF are fully normalized. However, research has identified potential update anomalies in 3NF databases. BCNF is a further refinement of 3NF that attempts to eliminate such anomalies.
The fourth and fifth normal forms (4NF and 5NF) deal specifically with the representation of many-many and one-many relationships. Sixth normal form (6NF) only applies to temporal databases.

4.10. First normal form (1NF)
      First normal form (1NF) lays the groundwork for an organized database design:
       Ensure that each table has a primary key: minimal set of attributes which can uniquely identify a record.  It states that the domain of an attribute must include only atomic values and the value of any attribute in a tuple must be single value from the domain of that attribute. It doesn’t allow nested relation. Data that is redundantly duplicated across multiple rows of a table is moved out to a separate table.
Atomicity: Each attribute must contain a single value, not a set of values.
Eg: Consider a Relation Person. The person will have the attributes SSN, Name, Age, Address and College_Degree.
Person
                        

                              Table 4.1
Now we can analyze this relation. Now check what are the possible values of each attributes. Here SSN and Age will have only one value for a person. But The college_Degree will have more than one value. And the address and Name of person can be divided into more than one attributes. Hence this relation is not in 1NF. So let us change this relation schema into 1NF by dividing this relation into two relations.
 Name→ FName, MInit, LName
Address→ ApartmentNo, City
           Person_Residence

 



                              Table 4.2

College_Degree
 



                Table 4.3

 4.11Second normal form (2NF)
           First, the table must be in 1NF, plus, we want to make sure that every Non-Primary-Key attribute (field) is fully functionally dependent upon the ENTIRE Primary-Key for its existence.  This rule ONLY applies when you have a multi-part (concatenated) Primary Key (PK).
It requires that data stored in a table with a composite primary key must not be dependent on only part of the table's primary key. And the database must meet all the requirements of the first normal form.
Take each non-key field, and ask this question: If I knew part of the PK, could I tell what the non-key field would be.
Inventory

                         
                                      Table4.4
      In this inventory table, Description combined with Supplier is our PK.  This is because we have two of the same product that come from different suppliers.
There are two non-key fields.  So, we can ask the questions:
     If we know just Description, can we find out Cost?  No, because we have more than one supplier for the same product.
    If we know just Supplier, and we find out Cost?  No, because we need to know what the Item is as well.
Therefore, Cost is fully, functionally dependent upon the ENTIRE PK (Description-Supplier) for its existence.
        If we know just Description, can we find out Supplier Address?  No, because we have more than one supplier for the same product.
       If we know just Supplier, Can we find out Supplier Address?  Yes.  The Address does not depend upon the Description of the item.
Therefore, Supplier Address is NOT functionally dependent upon the ENTIRE PK (Description-Supplier) for its existence.
We must get rid of Supplier Address from this table.

Inventory


                            Table 4.5
Supplier

               
                         Table 4.6
     At this point, since it is the "Supplier" table, we can rename the "Supplier" filed to "Name."  Name is the PK for this new table.
General Definition:
            A relation schema R is in second normal form (2NF) if every nonprime attribute A in R is not partially dependent on any key of R.
4.12 Third normal form (3NF)
                    For 3NF, first, the table must be in 2NF, plus, we want to make sure that the non-key fields are dependent upon ONLY the PK, and not on any other field in the table.  This is very similar to 2NF, except that now you are comparing the non-key fields to OTHER non-key fields. 
For database to be in third normal form
1. The database must meet all the requirements of the second normal form.
2. Any field which is dependent not only on the primary key but also on another field is moved out to a separate table.
Book


                          Table 4.7
Again, just ask the questions:
If I know # of Pages, can I find out Author's Name?  No.  Can I find out Author's affiliation No?  No.
If I know Author's Name, can I find out # of Pages?  No.  Can I find out Author's affiliation No?  YES.
Therefore, Author's affiliation No is functionally dependent upon Author's Name, not the PK for its existence.

    Name              Auth_Name       #Pages     
 
Book

                                   Table 4.8

Author

                                  Table 4.9

General Definition:
            A relation schema R is in 3NF if, whenever a nontrivial functional dependency X→A holds in R,
           Either  a) X is a Superkey
            Or        b) Y is a prime attribute of R.
i.e. A relation schema R is in 3NF if every nonprime attribute of R meets both of the following terms:
            1. It is fully functionally dependent on every key of R.
            2. It is nontransitively dependent on every key of R.
4.13 Boyce-Codd normal form (BCNF)
  A row is in BCNF if and only if every determinant is a candidate key.
The second and third normal forms assume that all attributes not part of the candidate keys depend on the candidate keys but does not deal with dependencies within the keys. BCNF deals with such dependencies.
A relation R is said to be in BCNF if whenever X -> A holds in R, and A is not in X, then X is a candidate key for R.
BCNF covers very specific situations where 3NF misses interdependencies between non key attributes. It should be noted that most relations that are in 3NF are also in BCNF. Infrequently, a 3NF relation is not in BCNF and this happens only if
(a) the candidate keys in the relation are composite keys (that is, they are not single attributes),
(b) there is more than one candidate key in the relation, and
(c) the keys are not disjoint, that is, some attributes in the keys are common.
The BCNF differs from the 3NF only when there are more than one candidate keys and the keys are composite and overlapping. Consider for example, the relationship
enrol (sno, sname, cno, cname, date-enrolled)
Let us assume that the relation has the following candidate keys:
(sno, cno)
        (sno, cname)
       (sname, cno)
        (sname, cname)
(we have assumed sname and cname are unique identifiers). The relation is in 3NF but not in BCNF because there are dependencies
sno -> sname
        cno -> cname
where attributes that are part of a candidate key are dependent on part of another candidate key. Such dependencies indicate that although the relation is about some entity or association that is identified by the candidate keys
e.g. (sno, cno), there are attributes that are not about the whole thing that the keys identify. For example, the above relation is about an association (enrolment) between students and subjects and therefore the relation needs to include only one identifier to identify students and one identifier to identify subjects. Providing two identifiers about students (sno, sname) and two keys about subjects (cno, cname) means that some information about students and subjects that is not needed is being provided. This provision of information will result in repetition of information and the anomalies. If we wish to include further information about students and courses in the database, it should not be done by putting the information in the present relation but by creating new relations that represent information about entities student and subject.
These difficulties may be overcome by decomposing the above relation in the following three relations:
(sno, sname)
       (cno, cname)
       (sno, cno, date-of-enrolment)
We now have a relation that only has information about students, another only about subjects and the third only about enrolments. All the anomalies and repetition of information have been removed.

4.14 Multivalued Dependency and Fourth normal form
In a relational model, if all of the information about an entity is to be represented in one relation, it will be necessary to repeat all the information other than the multivalue attribute value to represent all the information that we wish to represent. This results in many tuples about the same instance of the entity in the relation and the relation having a composite key (the entity id and the mutlivalued attribute). Of course the other option suggested was to represent this multivalue information in a separate relation. The situation of course becomes much worse if an entity has more than one multivalued attributes and these values are represented in one relation by a number of tuples for each entity instance. The multivalued dependency relates to this problem when more than one multivalued attributes exist. Consider the following relation that represents an entity employee that has one mutlivalued attribute proj:
emp (e#, dept, salary, proj)
We have so far considered normalization based on functional dependencies; dependencies that apply only to single-valued facts. For example, e# -> dept implies only one dept value for each value of e#. Not all information in a database is single-valued, for example, proj in an employee relation may be the list of all projects that the employee is currently working on. Although e# determines the list of all projects that an employee is working on, e# -> proj is not a functional dependency.
We can more clear the multivalued dependency by the following example.
programmer (emp_name, qualifications, languages)
This relation includes two multivalued attributes of entity programmer; qualifications and languages. There are no functional dependencies.
The attributes qualifications and languages are assumed independent of each other. If we were to consider qualifications and languages separate entities, we would have two relationships (one between employees and qualifications and the other between employees and programming languages). Both the above relationships are many-to-many i.e. one programmer could have several qualifications and may know several programming languages. Also one qualification may be obtained by several programmers and one programming language may be known to many programmers.
Functional dependency A -> B relates one value of A to one value of B while multivalued dependency A ->> B defines a relationship in which a set of values of attribute B are determined by a single value of A.
Now, more formally, X ->> Y is said to hold for R(X, Y, Z) if t1 and t2 are two tuples in R that have the same values for attributes X and therefore with t1[x] = t2[x] then R also contains tuples t3 and t4 (not necessarily distinct) such that
t1[x] = t2[x] = t3[x] = t4[x]
t3[Y] = t1[Y]
and t3[Z] = t2[Z]
t4[Y] = t2[Y]
and t4[Z] = t1[Z]
In other words if t1 and t2 are given by
t1 = [X, Y1, Z1], and
t2 = [X, Y2, Z2]
then there must be tuples t3 and t4 such that
t3 = [X, Y1, Z2], and
t4 = [X, Y2, Z1]
We are therefore insisting that every value of Y appears with every value of Z to keep the relation instances consistent. In other words, the above conditions insist that X alone determines Y and Z and there is no relationship between Y and Z since Y and Z appear in every possible pair and hence these pairings present no information and are of no significance.

Fourth normal form
Fourth normal form (or 4NF) requires that there be no non-trivial multivalued dependencies of attribute sets on something other than a superset of a candidate key. A table is said to be in 4NF if and only if it is in the BCNF and multivalued dependencies are functional dependencies. The 4NF removes unwanted data structures: multivalued dependencies.
Definition: A relation schema R is in 4NF with respect to a set of dependencies if, for every non trivial multivalued dependency X ->>Y in F+, X is a superkey for R.
Properties Of Relational Decompositions
Decomposition Property: A relation must satisfy the following two properties during decomposition.
i. Lossless:- A lossless-join dependency is a property of decomposition, which ensures that spurious rows are generated when relations are united through a natural join operation. i.e. The information in an instance r of R must be preserved in the instances r1, r2, r3, …..rk    where  ri = ∏Ri (r)
Decomposition is lossless with respect to a set of functional dependencies F if, for every relation instance r on R satisfying F,
R=∏R1 (r) * ∏R2 (r) * . . . . . . . ∏Rn (r)

ii. Dependency Preserving Property: - If a set of functional dependencies hold on R it should be possible to enforce F by enforcing appropriate dependencies on each r1 .
 Decomposition D= (R1, R2, R3, ………, Rk) of schema R preserves a set of dependencies F if,
(∏R1 (F) U ∏R2 (F) U . . . . . . . . . . . ∏Rn (F)) +=F+
Ri(F) is the projection of F onto Ri.
i.e Any FD that logically follows from F must also logically follows from the union of projection of F onto Ri ‘S . Then D is called dependency preserving.
4.15 Join Dependency and Fifth Normal Form
            Join dependency is the term used to indicate the property of a relation schema that cannot be decomposed losslesly into two relation schema, but can be decomposed losslesly into three or more simpler relation schema. It means that a table, after it has been decomposed into three or more smaller tables must be capable of being joined again on common keys to form the original table.
Fifth normal form
Fifth normal form (5NF and also PJ/NF) requires that there are no non-trivial join dependencies that do not follow from the key constraints. A table is said to be in the 5NF if and only if it is in 4NF and the candidate keys imply every join dependency in it.
4.16 Pitfalls in Relational Database Design.
A bad design may have several properties, including:
  • Repetition of information.
  • Inability to represent certain information.
  • Loss of information.






0 comments:

Post a Comment