Click here to Login

DataBase Management System(Module 1)

                


 MODULE I



1.1 Basic Concepts
 DBMS is a collection of interrelated data and a set of programs to access this data in convenient and efficient way. It controls the organization, storage, retrieval, security and integrity of data in a database. In other words, it enables users to create and maintain a database. It accepts requests from the application and instructs the operating system to transfer the appropriate data. It facilitates the processes of defining, constructing, manipulating and sharing of database among various users and applications.
-Defining a database means specifying the different type of data elements to be stored in the database. i.e. data types, structures and constraints. For a bank database, specifies the fields like Name (string of alphabets), Acct Number (integer with range) and also the characteristics of each field.
-Constructing the database is the process of storing the data itself on some storage medium that is controlled by the dbms.
-Manipulating a database is the processing of database. It includes updating and retrieving of database.

1.2  Purpose of database systems

File system Verses Database approach

                        One way to store information in a computer system is to store it as in traditional file system. In this method each data is stored in different files. And there is an application programs for each of the application.
  • Data redundancy and inconsistency
In traditional file systems the data may be duplicated. For eg: Consider a bank having two accounts savings bank account and credit check account. In this case, the address of customer is stored in two files: one with SB account and other with checking record. Thus this duplication will result in need of high storage space. And this will also leads to the inconsistency. That is, if the address of a customer changes, then the change may be reflected only in one account. This is the inconsistency of data.
  • Difficulty in accessing information
Suppose the bank needs a list of customers with an account higher than Rs. 10,000. But, we do not have an application at hand to list out this request. Thus, to access this information we have two choices. First one is that list out the SB account customers and then extracts the needed list manually.  In the second option, we have to develop a new program to satisfy the new request. The two are difficult.
  • Data Isolation
Data are scattered in different files and files may be in various formats. So it is difficult extract the appropriate data.
  • Integrity problems
The constraint of data is enforced through the programs by appropriate code. So if we need to add a new constraint, we have to change the code. Then, it is very difficult to add or change the constraints. The problem will be compounded when constraints involves several constraints from different files.
  • Atomicity problems
Suppose a failure occurs during execution of the program. Then the execution stops in the middle of the program resulting in an inconsistency. But the execution of a program should end to a consistency state. For a traditional file system the failure mostly result to an inconsistency state.
     1.3 Features (characteristics) of DBMS

The basic difference difference between traditional file processing and database approach is that in traditional file processing, each user defines and implements the files needed for a specific application as part of programming the application. But in case of database approach, a single repository of data is maintained that is defined once and then is accessed by various users.
      For eg. Consider a student record, in traditional file processing the office should have a record for each student to keep his or her fees and payments. And in department have another record for students to keep their marks and progress. Even though both office and Department interested in data about students, each user maintains separate files, because each user requires some data that is not available from other user.
      Now what are the features of database approach?
Database system is
1.      Self describing:
i.e. The database system contains not only the database itself but also a complete definition or description and structure of database . This structure is stored in a catalog with type, storage format and constraints as I mentioned earlier. The information stored in database is called meta-data.
2.      Data security
The DBMS can prevent unauthorized users from viewing or updating the database. Using passwords, users are allowed access to the entire database or a subset of it known as a "subschema." For example, in a student database, some users may be able to view payment details while others may view only mark list of students.
3.      Data Integrity
The DBMS can ensure that no more than one user can update the same record at the same time. It can keep duplicate records out of the database; for example, no two customers with the same customer number can be entered.
4.      Interactive Query
Most DBMSs provide query languages and report writers that let users interactively interrogate the database and analyze its data. This important feature gives users access to all management information as needed. i.e. we will get easily all details of each student at any time.
5.      Interactive Data Entry and Updating
Many DBMSs provide a way to interactively enter and edit data, allowing you to manage your own files and databases. However, interactive operation does not leave an audit trail and does not provide the controls necessary in a large organization. These controls must be programmed into the data entry and update programs of the application.
6.      Data Independence
With DBMSs, the details of the data structure are not stated in each application program. The program asks the DBMS for data by field name; for example, a coded equivalent of "give me customer name and balance due" would be sent to the DBMS. Without a DBMS, the programmer must reserve space for the full structure of the record in the program. Any change in data structure requires changing all application programs.

1.4 DBMS Components
·         Data:
Data stored in a database include numerical data which may be integers (whole numbers only) or floating point numbers (decimal), and non-numerical data such as characters (alphabetic and numeric characters), date or logical (true or false). More advanced systems may include more complicated data entities such as pictures and images as data types.
·         Standard operations:
Standard operations are provided by most DBMS. These operations provide the user basic capabilities for data manipulation. Examples of these standard operations are sorting, deleting and selecting records.
·         Data definition language (DDL):
 DDL is the language used to describe the contents of the database. It is used to describe, for example, attribute names (field names), data types, location in the database, etc.
·         Data manipulation and query language:
Normally a query language is supported by a DBMS to form commands for input, edit, analysis, output, reformatting, etc. Some degree of standardization has been achieved with SQL (Structured Query Language).
·         Programming tools:
 Besides commands and queries, the database should be accessible directly from application programs through function calls (subroutine calls) in conventional programming languages.
·         File structures:
 Every DBMS has its own internal structures used to organize the data although some common data models are used by most DBMS.
Abstraction
               We all know that each application program have some data relevant to a particular task. And an application program needs to use a portion of data, which is used by some other programs. In early days of computerization, each application programmer designs the file structure, metadata of the file and the access method each record. That is, each application program use its own data, details concerning the structure of data as well as the access and to interpret each data. The application programs are implemented independently and by hence itself, any change in storage media requires changes to these structures and access methods. Because the files were structured for one application, it was difficult to use the data in these files to new applications requiring data from several files belonging to different existing applications.

Eg: Consider two application programs that require the data on an entity set EMPLOYEE. The first application program involves the public relation department sending each employee a news letter and related material. This application program interested in the record type EMPLOYEE, that containing the values for the attributes of EMPL_Name and EMPL_Address.
1.5 Architecture of DBMS
          The generalized architecture of DBMS is called ANSI/SPARC model. The architecture is divided into three levels: External level, Conceptual level and Internal level.
   The view at each of these levels is described by a schema. Schema describes the records and its relationships in the view.
a.   External view or User view
   It is the highest level of data abstraction. This includes only those portions of database of concern to a user or Application program. Each user has a different external view and it is described by means of a scheme called external schema. The schema contains the definition of the logical records and relationships in external view. It also contains the method of deriving the objects in the external view from the objects in the conceptual view.
b.      Conceptual view
At this level of database abstraction, all the database entities and the relationship among them are included. One conceptual view represents the entire database called conceptual schema. It describes the method of deriving the objects in the conceptual view from the objects in the internal view. And also specify the checks to retain the data consistency and integrating.
c. Internal view
It is the lowest level of abstraction, closest to the physical storage method. It describes how the data is stored, what is the structure of data storage and the method of accessing these data. It is represented by internal schema.


                                       View Level …Defined by User
        ……………………………………………………………………………………………..                                                                             
………………………………………………………………………………………………..
                                                                                       Defined by 
DBA                                                                                                                                 


                                                                                Defined by DBA for           
                                                                                optimization

                                          Fig 1.1

1.6 Data independence
Data independence of DBMS is the capacity to change the schema at one level of database system without having to change the next high levels.  The three schema architecture can be used to achieve this data independence. We can define data independence into two types:
1.      Logical data independence
It is the capacity to change the conceptual schema without having to change the external schema. Sometimes, we may need to change the conceptual schema to expand the database, to change the constraints, or to reduce the database. Only the view definitions and mappings need to be changed in DBMS that supporting logical data independence. Application programmer cannot feel any change in the schema construct of DBMS.
2. Physical data independence
      Physical data independence is the capacity to change the internal schema without having to change the conceptual schema and external schema. The internal schema may change to improve the performance of retrieval or update. Then the conceptual schema need not change if the data remains same. For e.g.: We need not change the Query to retrieve a student progress report even though the DBMS take a new method to store the student record. 
    Advantages
1. Controlling Redundancy
           
In traditional file processing, every user group maintains its own files. Each group independently keeps files on their db e.g., students. Therefore, much of the data is stored twice or more. And the redundancy leads to several problems:
a. duplication of effort
                  i.e. storage space wasted when the same data is stored repeatedly
b. files that represent the same data may become inconsistent (since the updates are applied independently by each users group).
We can use controlled redundancy.
2. Restricting Unauthorized Access
      A DBMS should provide a security and authorization subsystem.
Some db users will not be authorized to access all information in the db(e.g., financial data).
Some users are allowed only to retrieve data. Some users are allowed both to retrieve and to update database.
    3. Providing Persistent Storage for Program Objects and Data Structures
        
Data structure provided by DBMS must be compatible with the programming language’s data structures. E.g., Object oriented DBMS are compatible with programming languages such as C++, SMALLTALK, and the DBMS software automatically performs conversions between programming data structure and file formats.
 4. Permitting Inference and Actions Using Deduction Rules
          Deductive database systems provide capabilities for defining deduction rules to inference new information from the stored database facts.

      5. Providing Multiple User Interfaces
(e.g., query languages, programming languages interfaces, forms, menu-driven interfaces, etc.)  
6. Representing Complex Relationships Between Data
     The complex relationship between data is easily represented.
7. Enforce Integrity Constraints

     The integrity constraint for information is reasonably enforced by the database management system.
1.7 DBMS Disadvantages
          A database system generally provides on-line access to the database for many users. In contrast, a conventional system is often designed to meet a specific need and therefore generally provides access to only a small number of users. Because of the larger number of users accessing the data when a database is used, the enterprise may involve additional risks as compared to a conventional data processing system in the following areas.
1. Confidentiality, Privacy and Security
                              When information is centralized and is made available to users from remote locations, the possibilities of abuse are often more than in a conventional system. To reduce the chances of unauthorized users accessing sensitive information, it is necessary to take technical, administrative and, possibly, legal measures. Most databases store valuable information that must be protected from deliberate attack and destruction.
2. Data Quality
                              Since the database is accessible to users remotely, adequate controls are needed to control users updating data and to control data quality. With increased number of users accessing data directly, there are enormous opportunities for users to damage the data. Unless there are suitable controls, the data quality may be compromised.
3. Data Integrity
                 Since a large number of users could be using a database concurrently, we should have to ensure that data remain correct during operation. The main threat to data integrity comes from several different users attempting to update the same data at the same time. The database therefore needs to be protected against accidental changes by the users.
4. Enterprise Vulnerability
                   Centralizing all data of an enterprise in one database may mean that the database becomes critical resource. The survival of the enterprise may depend on reliable information being available from its database. The enterprise therefore becomes vulnerable to the destruction of the database or to unauthorized modification of the database.
5. The Cost of using a DBMS
                  Conventional data processing systems are typically designed to run a number of well-defined, preplanned processes. Such systems are often "tuned" to run efficiently for the processes that they were designed for. Although the conventional systems are usually fairly inflexible in that new applications may be difficult to implement and/or expensive to run, they are usually very efficient for the applications they are designed for.
                              The database approach on the other hand provides a flexible alternative where new applications can be developed relatively inexpensively. The flexible approach is not without its costs and one of these costs is the additional cost of running applications that
the conventional system was designed for. Using standardized software is almost always less machine efficient than specialized software.

            Data model
Entities and Attributes
         Entities are distinguishable objects of concern and are modeled using their characteristics or attributes. A database usually contains large number of similar entities. For eg: A company database consists of a large number of employees may want to store similar information for each employee. Then each of the employees can be termed as an entity.  An entity can be an object with physical existence.  For eg: a car, a person or an employee. But each entity will have its own value.  Each entity has properties that describe the entity called attribute of that entity. Collection of entities with same attributes termed as an entity type.
For eg: Employee (Employee_id, Address, Designation, Salary)
Here Employee is an entity and Employee_id, Address, Designation, Salary represents the attribute of entity Employee.
            There can be several types of attributes such as Simple versus composite, single-valued verses multi-valued and stored verses derived.
1. Composite versus Simple
               Composite attributes are those attributes that can be divided into smaller sub parts with independent meaning. Consider the above e.g.: in which the attribute Address can be
divided into small sub parts like City, State and Street_address. The attributes that are not divisible are called simple or atomic attributes. The value of a composite attribute is the concatenation of the value of its constituent simple attributes.
2. Single-valued verses multi-valued
                Most of the attributes will have only single value for a particular entity. Such attributes are called single valued. In some cases there may be having more than one value for an attribute of a particular entity. These attributes are called multi-valued. The attribute age of an entity person will have only one value, while the college degree of that person will have more than one degree. So the attribute age can be consider as single-valued and college degree as multi-valued.
3.Stored verses derived
                  In some case the attribute values can be related so that one can be derived from the other. Consider a person as an entity. The attributes age and DateOfBirth of person is
related. i.e. the age of a person can be derived from  the current date and his  DateOfBirth. The age attribute hence is called Derived attribute and the DateOfBirth is called stored attribute from where age of person calculated.
     Entity set
             An entity set is a set of entities of the same type that share the same properties, or attributes. It is represented by a set of attributes. An attribute, as used in the E-R model can be characterized by the following attributes.
·         Simple and composite attributes
·         Single and multi-valued attributes
·         Null attributes
·         Derived attributes
       A relationship is an association among several entities. And a relationship set is a set of   relationships of the same type.
Keys
        Before designing a database we should be able to specify how entities within a given entity set and relationships within a given relationship set are distinguished. Conceptually the individual entities and relationships are distinct; but from a database perspective, the difference must be expressed by their attributes. The concept of key is used to make such distinctions.
       Super key is a set of attributes that, taken collectively, to identify uniquely an entity in the entity set.  For eg: the social_security_no attribute of the entity set employee is sufficient to distinguish one employee entity from another. Thus social_security_no is a superkey for the entity set employee. Superkeys with minimal subset is known as the candidate key. For eg: it is possible to combine the attributes, employ_id & employ_name form a superkey. But the social_security_no is sufficient to distinguish the two employees.  Thus social_security_no is a candidate key. Usually primary key is used to denote the candidate key that is chosen by the database designer to identify an entity from an entity set. A key (super, candidate and primary) is a property of the entity set rather than the individual entities.

Entity- Relationship (E-R) Diagram
  The overall logical structure of a database can be expressed graphically by an E-R diagram.  The diagram consists of the following major components.
·         Rectangles: represent entity set.
·         Ellipses: represent attributes.
·         Diamonds: represents relationship sets.
·         Lines: links attribute set to entity set and entity set to relationship set.
·         Double ellipses: represent multi-valued attributes.
·         Dashed ellipses: denote derived attributes.


For eg: Consider an E-R diagram, which consists of two entity sets customer and loan.











                                           

             A data model is a plan for building a database. The model represents data conceptually, the way the user sees it, rather than how computers store it. Data models focus on required data elements and associations; most often they are expressed graphically using
Entity-relationship diagrams. On a more abstract level, the term is also used in describing a database's overall structure.  Mostly used data modeling techniques are
1. Entity- Relationship model
2. Hierarchical model
3. Network model
4. Object-oriented model

1.9 Hierarchical Model
                  The hierarchical data model organizes data in a tree structure. There is a hierarchy of parent and child data segments. This structure implies that a record can have repeating information, generally in the child data segments. Data in a series of records have a set of field values attached to it. It collects all the instances of a specific record together as a record type. These record types are the equivalent of tables in the relational model, and with the individual records being the equivalent of rows. To create links between these record types, the hierarchical model uses Parent Child Relationships

                         
Hierarchical databases link records like an organization chart. A record type can be owned by only one owner. In the following example, orders are owned by only one customer. Hierarchical structures were widely used with early mainframe systems; however, they are often restrictive in linking real-world structures.





 






                                 Fig 1.3

Adv
antages:
• Hierarchical Model is simple to construct and operate on
• Corresponds to a number of natural hierarchically organized domains - e.g., assemblies in manufacturing, personnel organization in companies
• Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN PARENT etc.
Disadvantages:
• Navigational and procedural nature of processing
• Database is visualized as a linear arrangement of records

• Little scope for "query optimization"

1.10 Network Model

In 1971, the Conference on Data Systems Languages (CODASYL) formally defined the network model. The basic data modeling construct in the network model is the set construct. A set consists of an owner record type, a set name, and a member record type. A member record type can have that role in more than one set, hence the multiparent concept is supported. An owner record type can also be a member or owner in another set. In network databases, a record type can have multiple owners. In the example below, orders are owned by both customers and products, reflecting their natural relationship in business.

 






                                           

  Advantages:
• Network Model is able to model complex relationships and represents semantics of add/delete on the relationships.
• Can handle most situations for modeling using record types and relationship types.
• Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND NEXT within set, GET etc. Programmers can do optimal navigation through the database.
 Disadvantages:
• Navigational and procedural nature of processing
• Database contains a complex array of pointers that thread through a set of records.
• Little scope for automated "query optimization"


            1.11Object-Oriented Model
Object DBMSs add database functionality to object programming languages. They bring much more than persistent storage of programming language objects. Object DBMSs
extend the semantics of the C++, Smalltalk and Java object programming languages to provide full-featured database programming capability, while retaining native language compatibility. A major benefit of this approach is the unification of the application and database development into a seamless data model and language environment. As a result, applications require less code, use more natural data modeling, and code bases are easier to maintain. Object developers can write complete database applications with a modest   
 








                                       


1.12 Entity relational model (RDBMS - relational database management system)
A database based on the relational model developed by E.F. Codd. A relational database allows the definition of data structures, storage and retrieval operations and integrity constraints. In such a database the data and relations between them are organized in tables. A table is a collection of records and each record in a table contains the same fields.
                              It permits the database designer to create a consistent, logical representation of information. Consistency is achieved by including declared constraints in the database design, which is usually referred to as the logical schema. The theory includes a process of database normalization whereby a design with certain desirable properties can be selected from a set of logically equivalent alternatives. The access plans and other implementation and operation details are handled by the DBMS engine, and are not reflected in the logical model. This contrasts with common practice for SQL DBMSs in which performance tuning often requires changes to the logical model.
                              The basic relational building block is the domain or data type, usually abbreviated nowadays to type. A tuple is an unordered set of attribute values. An attribute is an ordered pair of attribute name and type name. An attribute value is a specific valid value for the type of the attribute. This can be either a scalar value or a more complex type. Relational databases do not link records together physically, but the design of the records must provide a common field, such as account number, to allow for matching. Often, the fields used for matching are indexed in order to speed up the process.
                              In the following example, customers, orders and products are linked by comparing data fields and/or indexes when information from more than one record type is needed. This method is more flexible for ad hoc inquiries. Many hierarchical and network DBMSs also provide this capability.

                                        













0 comments:

Post a Comment