MODULE I
1.1 Basic Concepts
DBMS is a collection of interrelated data and
a set of programs to access this data in convenient and efficient way. It
controls the organization, storage, retrieval, security and integrity of data
in a database. In other words, it enables users to create and maintain a
database. It accepts requests from the application and instructs the operating
system to transfer the appropriate data. It facilitates the processes of
defining, constructing, manipulating and sharing of database among various
users and applications.
-Defining a database means
specifying the different type of data elements to be stored in the database.
i.e. data types, structures and constraints. For a bank database, specifies the
fields like Name (string of alphabets), Acct Number (integer with range) and
also the characteristics of each field.
-Constructing the database
is the process of storing the data itself on some storage medium that is
controlled by the dbms.
-Manipulating a database is
the processing of database. It includes updating and retrieving of database.
1.2 Purpose of database systems
File system Verses Database approach
One way to store information in a
computer system is to store it as in traditional file system. In this method
each data is stored in different files. And there is an application programs
for each of the application.
- Data redundancy and inconsistency
In traditional file systems the data may be duplicated.
For eg: Consider a bank having two accounts savings bank account and credit
check account. In this case, the address of customer is stored in two files:
one with SB account and other with checking record. Thus this duplication will
result in need of high storage space. And this will also leads to the
inconsistency. That is, if the address of a customer changes, then the change
may be reflected only in one account. This is the inconsistency of data.
- Difficulty in accessing information
Suppose the bank needs a list of customers with an
account higher than Rs. 10,000. But, we do not have an application at hand to
list out this request. Thus, to access this information we have two choices.
First one is that list out the SB account customers and then extracts the
needed list manually. In the second option,
we have to develop a new program to satisfy the new request. The two are
difficult.
- Data Isolation
Data are scattered in different files and files may be
in various formats. So it is difficult extract the appropriate data.
- Integrity problems
The constraint of data is enforced through the programs
by appropriate code. So if we need to add a new constraint, we have to change
the code. Then, it is very difficult to add or change the constraints. The
problem will be compounded when constraints involves several constraints from
different files.
- Atomicity problems
Suppose a failure occurs during execution of the
program. Then the execution stops in the middle of the program resulting in an
inconsistency. But the execution of a program should end to a consistency
state. For a traditional file system the failure mostly result to an
inconsistency state.
1.3
Features (characteristics) of DBMS
The basic difference difference between
traditional file processing and database approach is that in traditional file
processing, each user defines and implements the files needed for a specific
application as part of programming the application. But in case of database
approach, a single repository of data is maintained that is defined once and
then is accessed by various users.
For eg. Consider a student record, in traditional file processing the office should have a record for each student to keep his or her fees and payments. And in department have another record for students to keep their marks and progress. Even though both office and Department interested in data about students, each user maintains separate files, because each user requires some data that is not available from other user.
For eg. Consider a student record, in traditional file processing the office should have a record for each student to keep his or her fees and payments. And in department have another record for students to keep their marks and progress. Even though both office and Department interested in data about students, each user maintains separate files, because each user requires some data that is not available from other user.
Now what are the features of database
approach?
Database system is
1. Self
describing:
i.e. The database system contains not only
the database itself but also a complete definition or description and structure
of database . This structure is stored in a catalog with type, storage format
and constraints as I mentioned earlier. The information stored in database is
called meta-data.
2. Data security
The DBMS can prevent unauthorized users from viewing or updating the database. Using passwords, users are allowed access to the entire database or a subset of it known as a "subschema." For example, in a student database, some users may be able to view payment details while others may view only mark list of students.
The DBMS can prevent unauthorized users from viewing or updating the database. Using passwords, users are allowed access to the entire database or a subset of it known as a "subschema." For example, in a student database, some users may be able to view payment details while others may view only mark list of students.
3. Data Integrity
The DBMS can ensure that no more than one user can update the same record at the same time. It can keep duplicate records out of the database; for example, no two customers with the same customer number can be entered.
The DBMS can ensure that no more than one user can update the same record at the same time. It can keep duplicate records out of the database; for example, no two customers with the same customer number can be entered.
4. Interactive Query
Most DBMSs provide query languages and report writers that let users interactively interrogate the database and analyze its data. This important feature gives users access to all management information as needed. i.e. we will get easily all details of each student at any time.
Most DBMSs provide query languages and report writers that let users interactively interrogate the database and analyze its data. This important feature gives users access to all management information as needed. i.e. we will get easily all details of each student at any time.
5. Interactive Data Entry and
Updating
Many DBMSs provide a way to interactively enter and edit data, allowing you to manage your own files and databases. However, interactive operation does not leave an audit trail and does not provide the controls necessary in a large organization. These controls must be programmed into the data entry and update programs of the application.
Many DBMSs provide a way to interactively enter and edit data, allowing you to manage your own files and databases. However, interactive operation does not leave an audit trail and does not provide the controls necessary in a large organization. These controls must be programmed into the data entry and update programs of the application.
6. Data Independence
With DBMSs, the details of the data structure are not stated in each application program. The program asks the DBMS for data by field name; for example, a coded equivalent of "give me customer name and balance due" would be sent to the DBMS. Without a DBMS, the programmer must reserve space for the full structure of the record in the program. Any change in data structure requires changing all application programs.
With DBMSs, the details of the data structure are not stated in each application program. The program asks the DBMS for data by field name; for example, a coded equivalent of "give me customer name and balance due" would be sent to the DBMS. Without a DBMS, the programmer must reserve space for the full structure of the record in the program. Any change in data structure requires changing all application programs.
1.4 DBMS Components
·
Data:
Data stored in a database
include numerical data which may be integers (whole numbers only) or floating
point numbers (decimal), and non-numerical data such as characters (alphabetic
and numeric characters), date or logical (true or false). More advanced systems
may include more complicated data entities such as pictures and images as data
types.
·
Standard operations:
Standard
operations are provided by most DBMS. These operations provide the user basic
capabilities for data manipulation. Examples of these standard operations are
sorting, deleting and selecting records.
·
Data definition language (DDL):
DDL is the language used to describe the
contents of the database. It is used to describe, for example, attribute names
(field names), data types, location in the database, etc.
·
Data manipulation and query language:
Normally a query language is
supported by a DBMS to form commands for input, edit, analysis, output,
reformatting, etc. Some degree of standardization has been achieved with SQL
(Structured Query Language).
·
Programming tools:
Besides commands and queries, the database
should be accessible directly from application programs through function calls
(subroutine calls) in conventional programming languages.
·
File structures:
Every DBMS has its own internal structures
used to organize the data although some common data models are used by most
DBMS.
Abstraction
We all know that each application program have some
data relevant to a particular task. And an application program needs to use a
portion of data, which is used by some other programs. In early days of
computerization, each application programmer designs the file structure,
metadata of the file and the access method each record. That is, each
application program use its own data, details concerning the structure of data
as well as the access and to interpret each data. The application programs are
implemented independently and by hence itself, any change in storage media
requires changes to these structures and access methods. Because the files were
structured for one application, it was difficult to use the data in these files
to new applications requiring data from several files belonging to different
existing applications.
Eg: Consider two application
programs that require the data on an entity set EMPLOYEE. The first application
program involves the public relation department sending each employee a news
letter and related material. This application program interested in the record
type EMPLOYEE, that containing the values for the attributes of EMPL_Name and
EMPL_Address.
1.5 Architecture of DBMS
The generalized architecture of DBMS is called
ANSI/SPARC model. The architecture is divided into three levels: External
level, Conceptual level and Internal level.
The view at each of these levels is described by a schema. Schema
describes the records and its relationships in the view.
a. External view or User view
It is the highest level of data abstraction. This includes only
those portions of database of concern to a user or Application program. Each
user has a different external view and it is described by means of a scheme
called external schema. The schema contains the definition of the logical
records and relationships in external view. It also contains the method of
deriving the objects in the external view from the objects in the conceptual
view.
b.
Conceptual view
At this level of database
abstraction, all the database entities and the relationship among them are
included. One conceptual view represents the entire database called conceptual
schema. It describes the method of deriving the objects in the conceptual view
from the objects in the internal view. And also specify the checks to retain
the data consistency and integrating.
c. Internal
view
It is the lowest level of
abstraction, closest to the physical storage method. It describes how the data
is stored, what is the structure of data storage and the method of accessing
these data. It is represented by internal schema.
View Level …Defined by User
……………………………………………………………………………………………..
………………………………………………………………………………………………..
Defined by
DBA
Defined by DBA for
optimization
Fig 1.1
1.6 Data independence
Data independence of DBMS is
the capacity to change the schema at one level of database system without
having to change the next high levels.
The three schema architecture can be used to achieve this data
independence. We can define data independence into two types:
1. Logical data independence
It is the capacity to change
the conceptual schema without having to change the external schema. Sometimes,
we may need to change the conceptual schema to expand the database, to change
the constraints, or to reduce the database. Only the view definitions and
mappings need to be changed in DBMS that supporting logical data independence.
Application programmer cannot feel any change in the schema construct of DBMS.
2. Physical
data independence
Physical data independence is the capacity to change the
internal schema without having to change the conceptual schema and external
schema. The internal schema may change to improve the performance of retrieval
or update. Then the conceptual schema need not change if the data remains same.
For e.g.: We need not change the Query to retrieve a student progress report
even though the DBMS take a new method to store the student record.
Advantages
1. Controlling Redundancy
In traditional file processing, every user group maintains its own files. Each group independently keeps files on their db e.g., students. Therefore, much of the data is stored twice or more. And the redundancy leads to several problems:
a. duplication of effort
i.e. storage space wasted when the same data is stored repeatedly
b. files that represent the same data may become inconsistent (since the updates are applied independently by each users group).
We can use controlled redundancy.
In traditional file processing, every user group maintains its own files. Each group independently keeps files on their db e.g., students. Therefore, much of the data is stored twice or more. And the redundancy leads to several problems:
a. duplication of effort
i.e. storage space wasted when the same data is stored repeatedly
b. files that represent the same data may become inconsistent (since the updates are applied independently by each users group).
We can use controlled redundancy.
2. Restricting
Unauthorized Access
A DBMS should provide a security and
authorization subsystem.
Some db users will not be authorized to access all
information in the db(e.g., financial data).
Some users are allowed only to retrieve data. Some users are allowed both to retrieve and to update database.
Some users are allowed only to retrieve data. Some users are allowed both to retrieve and to update database.
3. Providing
Persistent Storage for Program Objects and Data Structures
Data structure provided by DBMS must be compatible with the programming language’s data structures. E.g., Object oriented DBMS are compatible with programming languages such as C++, SMALLTALK, and the DBMS software automatically performs conversions between programming data structure and file formats.
4. Permitting Inference and Actions Using Deduction Rules
Deductive database systems provide capabilities for defining deduction rules to inference new information from the stored database facts.
Data structure provided by DBMS must be compatible with the programming language’s data structures. E.g., Object oriented DBMS are compatible with programming languages such as C++, SMALLTALK, and the DBMS software automatically performs conversions between programming data structure and file formats.
4. Permitting Inference and Actions Using Deduction Rules
Deductive database systems provide capabilities for defining deduction rules to inference new information from the stored database facts.
5. Providing
Multiple User Interfaces
(e.g., query languages, programming languages interfaces, forms, menu-driven interfaces, etc.)
6. Representing Complex Relationships Between Data
The complex relationship between data is easily represented.
7. Enforce Integrity Constraints
The integrity constraint for information is reasonably enforced by the database management system.
(e.g., query languages, programming languages interfaces, forms, menu-driven interfaces, etc.)
6. Representing Complex Relationships Between Data
The complex relationship between data is easily represented.
7. Enforce Integrity Constraints
The integrity constraint for information is reasonably enforced by the database management system.
1.7 DBMS
Disadvantages
A database system generally provides on-line access to the database for many users. In contrast, a conventional system is often designed to meet a specific need and therefore generally provides access to only a small number of users. Because of the larger number of users accessing the data when a database is used, the enterprise may involve additional risks as compared to a conventional data processing system in the following areas.
A database system generally provides on-line access to the database for many users. In contrast, a conventional system is often designed to meet a specific need and therefore generally provides access to only a small number of users. Because of the larger number of users accessing the data when a database is used, the enterprise may involve additional risks as compared to a conventional data processing system in the following areas.
1.
Confidentiality, Privacy and Security
When
information is centralized and is made available to users from remote
locations, the possibilities of abuse are often more than in a conventional
system. To reduce the chances of unauthorized users accessing sensitive
information, it is necessary to take technical, administrative and, possibly,
legal measures. Most databases store valuable information that must be
protected from deliberate attack and destruction.
2. Data Quality
Since
the database is accessible to users remotely, adequate controls are needed to
control users updating data and to control data quality. With increased number
of users accessing data directly, there are enormous opportunities for users to
damage the data. Unless there are suitable controls, the data quality may be
compromised.
3. Data Integrity
Since a large number of users could be using a database concurrently, we
should have to ensure that data remain correct during operation. The main
threat to data integrity comes from several different users attempting to
update the same data at the same time. The database therefore needs to be
protected against accidental changes by the users.
4. Enterprise Vulnerability
Centralizing all data of an enterprise in one database may mean that
the database becomes critical resource. The survival of the enterprise may
depend on reliable information being available from its database. The
enterprise therefore becomes vulnerable to the destruction of the database or
to unauthorized modification of the database.
5. The Cost of using a DBMS
Conventional data
processing systems are typically designed to run a number of well-defined,
preplanned processes. Such systems are often "tuned" to run
efficiently for the processes that they were designed for. Although the
conventional systems are usually fairly inflexible in that new applications may
be difficult to implement and/or expensive to run, they are usually very
efficient for the applications they are designed for.
The
database approach on the other hand provides a flexible alternative where new
applications can be developed relatively inexpensively. The flexible approach
is not without its costs and one of these costs is the additional cost of
running applications that
the conventional system was designed for. Using
standardized software is almost always less machine efficient than specialized
software.
Data model
Entities and Attributes
Entities are distinguishable objects
of concern and are modeled using their characteristics or attributes. A
database usually contains large number of similar entities. For eg: A company
database consists of a large number of employees may want to store similar
information for each employee. Then each of the employees can be termed as an entity.
An entity can be an object with physical existence. For eg: a car, a person or an employee. But
each entity will have its own value. Each
entity has properties that describe the entity called attribute of that entity. Collection of entities with same
attributes termed as an entity type.
For eg: Employee (Employee_id,
Address, Designation, Salary)
Here Employee is an entity and
Employee_id, Address, Designation, Salary represents the attribute of entity
Employee.
There can be several types of
attributes such as Simple versus composite, single-valued verses multi-valued
and stored verses derived.
1. Composite versus Simple
Composite attributes are those
attributes that can be divided into smaller sub parts with independent meaning.
Consider the above e.g.: in which the attribute Address can be
divided into small sub parts like
City, State and Street_address. The attributes that are not divisible are
called simple or atomic attributes. The value of a composite attribute is the
concatenation of the value of its constituent simple attributes.
2. Single-valued verses
multi-valued
Most of the attributes will
have only single value for a particular entity. Such attributes are called
single valued. In some cases there may be having more than one value for an
attribute of a particular entity. These attributes are called multi-valued. The
attribute age of an entity person will have only one value, while the college
degree of that person will have more than one degree. So the attribute age can
be consider as single-valued and college degree as multi-valued.
3.Stored verses derived
In
some case the attribute values can be related so that one can be derived from
the other. Consider a person as an entity. The attributes age and DateOfBirth
of person is
related. i.e. the age of a person can be
derived from the current date and
his DateOfBirth. The age attribute hence
is called Derived attribute and the DateOfBirth is called stored attribute from
where age of person calculated.
Entity set
An
entity set is a set of entities of the same type that share the same
properties, or attributes. It is represented by a set of attributes. An
attribute, as used in the E-R model can be characterized by the following attributes.
·
Simple and composite attributes
·
Single and multi-valued attributes
·
Null attributes
·
Derived attributes
A relationship is an association among several
entities. And a relationship set is a set of
relationships of the same type.
Keys
Before designing a database we should
be able to specify how entities within a given entity set and relationships
within a given relationship set are distinguished. Conceptually the individual
entities and relationships are distinct; but from a database perspective, the
difference must be expressed by their attributes. The concept of key is used to
make such distinctions.
Super key is a set of attributes that,
taken collectively, to identify uniquely an entity in the entity set. For eg: the social_security_no attribute of the entity set employee is
sufficient to distinguish one employee entity from another. Thus social_security_no is a superkey for the
entity set employee. Superkeys with minimal subset is known as the candidate
key. For eg: it is possible to combine the attributes, employ_id & employ_name form a superkey. But the social_security_no is sufficient to
distinguish the two employees. Thus social_security_no is a candidate key.
Usually primary key is used to denote the candidate key that is chosen by the
database designer to identify an entity from an entity set. A key (super,
candidate and primary) is a property of the entity set rather than the
individual entities.
Entity- Relationship (E-R) Diagram
The overall logical structure of a database can be expressed graphically
by an E-R diagram. The diagram consists
of the following major components.
·
Rectangles: represent entity set.
·
Ellipses: represent attributes.
·
Diamonds: represents relationship sets.
·
Lines: links attribute set to entity set and
entity set to relationship set.
·
Double ellipses: represent multi-valued
attributes.
·
Dashed ellipses: denote derived attributes.
For eg: Consider an E-R diagram, which
consists of two entity sets customer and loan.
A data model is
a plan for building a database. The model represents data conceptually, the way
the user sees it, rather than how computers store it. Data models focus on
required data elements and associations; most often they are expressed
graphically using
Entity-relationship diagrams. On a
more abstract level, the term is also used in describing a database's overall
structure. Mostly used data modeling
techniques are
1. Entity- Relationship model
2. Hierarchical model
3. Network model
4. Object-oriented model
1.9 Hierarchical Model
The
hierarchical data model organizes data in a tree structure. There is a
hierarchy of parent and child data segments. This structure implies that a
record can have repeating information, generally in the child data segments.
Data in a series of records have a set of field values attached to it. It
collects all the instances of a specific record together as a record type.
These record types are the equivalent of tables in the relational model, and
with the individual records being the equivalent of rows. To create links
between these record types, the hierarchical model uses Parent Child
Relationships
Hierarchical databases link records like an organization chart. A record type can be owned by only one owner. In the following example, orders are owned by only one customer. Hierarchical structures were widely used with early mainframe systems; however, they are often restrictive in linking real-world structures.
Fig 1.3
Advantages:
• Hierarchical Model is simple to
construct and operate on
• Corresponds to a number of natural
hierarchically organized domains - e.g., assemblies in manufacturing, personnel
organization in companies
• Language is simple; uses constructs
like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN PARENT etc.
Disadvantages:
• Navigational and procedural nature
of processing
• Database is visualized as a linear
arrangement of records
• Little scope for "query
optimization"
1.10 Network Model
In 1971, the Conference on Data
Systems Languages (CODASYL) formally defined the network model. The basic data
modeling construct in the network model is the set construct. A set consists of
an owner record type, a set name, and a member record type. A member record
type can have that role in more than one set, hence the multiparent concept is
supported. An owner record type can also be a member or owner in another set. In network databases, a record type can have
multiple owners. In the example below, orders are owned by both customers and
products, reflecting their natural relationship in business.
Advantages:
• Network Model is able to model
complex relationships and represents semantics of add/delete on the
relationships.
• Can handle most situations for
modeling using record types and relationship types.
• Language is navigational; uses
constructs like FIND, FIND member, FIND owner, FIND NEXT within set, GET etc.
Programmers can do optimal navigation through the database.
Disadvantages:
• Navigational and procedural nature
of processing
• Database contains a complex array of
pointers that thread through a set of records.
• Little scope for automated
"query optimization"
1.11Object-Oriented Model
Object DBMSs add database functionality
to object programming languages. They bring much more than persistent storage
of programming language objects. Object DBMSs
extend the semantics of the C++, Smalltalk
and Java object programming languages to provide full-featured database
programming capability, while retaining native language compatibility. A major
benefit of this approach is the unification of the application and database
development into a seamless data model and language environment. As a result,
applications require less code, use more natural data modeling, and code bases
are easier to maintain. Object developers can write complete database
applications with a modest
1.12 Entity relational
model (RDBMS - relational database management system)
A database based on the relational
model developed by E.F. Codd. A relational database allows the definition of
data structures, storage and retrieval operations and integrity constraints. In
such a database the data and relations between them are organized in tables. A
table is a collection of records and each record in a table contains the same
fields.
It
permits the database designer to create a consistent, logical representation of
information. Consistency is achieved by including declared constraints in the
database design, which is usually referred to as the logical schema. The theory
includes a process of database normalization whereby a design with certain
desirable properties can be selected from a set of logically equivalent
alternatives. The access plans and other implementation and operation details
are handled by the DBMS engine, and are not reflected in the logical model.
This contrasts with common practice for SQL DBMSs in which performance tuning
often requires changes to the logical model.
The
basic relational building block is the domain or data type, usually abbreviated
nowadays to type. A tuple is an unordered set of attribute values. An attribute
is an ordered pair of attribute name and type name. An attribute value is a
specific valid value for the type of the attribute. This can be either a scalar
value or a more complex type. Relational databases do not link records together physically, but the
design of the records must provide a common field, such as account number, to
allow for matching. Often, the fields used for matching are indexed in order to
speed up the process.
In
the following example, customers, orders and products are linked by comparing
data fields and/or indexes when information from more than one record type is
needed. This method is more flexible for ad hoc inquiries. Many hierarchical
and network DBMSs also provide this capability.
0 comments:
Post a Comment