home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The World of Computer Software
/
World_Of_Computer_Software-02-387-Vol-3of3.iso
/
c
/
coddrule.zip
/
CODDRULE.TXT
next >
Wrap
Text File
|
1989-12-18
|
30KB
|
687 lines
THE TWELVE RULES
for determining how relational a DBMS product is
by
E. F. Codd
President
The Relational Institute
May 16, 1986
Abstract
In a database management system (DBMS) omission of a few features
of the relational model can have seriously adverse effects on the
usability, the economic advantage, and the integrity advantage of
that system. Frequently, DBMS vendors have been amazingly
short-sighted in the design of their new DBMS products. However,
they all want to claim their products are "relational". The rules
described in this paper are intended to reflect the compelling
advantages to be gained by a DBMS that is fully supportive of
each and every feature of the relational model. Furthermore,
these rules may help users to evaluate DBMS products (which are
claimed to be relational) more rapidly than if they examined the
support for each and every feature of the relational model. The
rules may also help vendors realize how far short of the mark
their DBMS products are.
NOTE: Authorization to upload to a BBS was obtained from The
Relational Institute by both Hugo Blasdel (who scanned
corrected and uploaded the document) and Fabian Pascal
(who originally obtained it from TRI). Errors in text
are preserved (bottom of page 5) and formating (middle
of page 5) are preserved, although this note and fixed
phone number below are added.
Copyright (c) 1986 E. F. Codd / The Relational Institute
All rights reserved
This material is planned for inclusion in a book
by E. F. Codd to be published soon by Addison-Wesley
The Relational Institute Telephone: 408-268-8821
Suite 106
6489 Camden Avenue TRI Technical Report
San Jose, CA 95120 EFC-6 / 05-16-86
CONTENTS PAGE
1 Introduction to the rules . . . . . . . . . . . . . 1
2 Non-claims . . . . . . . . . . . . . . . . . . . . . 2
R1 Rule 1 The information rule . . . . . . . . 2
R2 Rule 2 Guaranteed access . . . . . . . . . . 3
R3 Rule 3 Systematic treatment of
missing information . . . . . . . 3
R4 Rule 4 Dynamic on-line catalog . . . . . . . 4
R5 Rule 5 Comprehensive data sublanguage . . . 4
R6 Rule 6 View updating . . . . . . . . . . . . 5
R7 Rule 7 High-level insert, update, delete . . 5
R8 Rule 8 Physical data independence . . . . . 6
R9 Rule 9 Logical data independence . . . . . . 6
R10 Rule 10 Integrity independence . . . . . . . 6
R11 Rule 11 Distribution independence . . . . . . 7
R12 Rule 12 Non-subversion . . . . . . . . . . . 8
3 Concluding remarks . . . . . . . . . . . . . . . . 9
4 References . . . . . . . . . . . . . . . . . . . . 9
Copyright (c) 1986 E. F. Codd / The Relational Institute
THE TWELVE RULES
1. Introduction
The twelve rules defined below were first published in
Computerworld U.S.A. October 14 & 21, 1985. The versions
presented here are slightly revised to make their definitions
more precise (not to change their meaning). A database
management system (DBMS) is fully relational if it fully
supports each and every one of the twelve rules and all 30
features of the relational model.
Like the relational model itself, these rules are based on the
practical requirements of users, database administrators (DBAs),
application programmers, security officers and their managers.
For example, in rules 8 through 11, I specify, and require,
four different types of independence aimed at protecting
customers' investment in application programs, terminal
activities, and training. Rules 8 and 9 -- physical and logical
data independence -- have been heavily discussed for many years.
Rules 10 and 11 -- integrity independence and distribution
independence -- are aspects of the relational approach which have
received inadequate attention to-date, but are likely to become
as important as rules 8 and 9.
These rules are collectively based on a single foundation rule,
which I call Rule Zero:.
For any system that is advertised as, or claimed to be, a
RELATIONAL DATABASE MANAGEMENT SYSTEM, that system must be
able to manage databases entirely through its relational
capabilities as specified in the relational model.
This must hold whether or not the system supports any non-
relational capabilities of managing data. Any DBMS that does not
satisfy this Rule Zero is not worth rating as a RELATIONAL DBMS.
One consequence of this rule: any system claimed to be a
relational DBMS must support database insert, update, and delete
at the RELATIONAL level (multiple-record-at-a-time) -- see rule
R7 below. Another consequence is the necessity of supporting
the information rule and the guaranteed access rule -- see rules
Rl and R2 below. "Multiple-record-at-a-time" includes as special
cases those situations in which zero or one record is retrieved,
inserted, updated, or deleted. In other words, a relation
(R-table) may have either zero tuples (rows) or one tuple (row)
and still be a valid relation (R-table). Note that although these
are special cases, they do not receive special treatment.
What is the danger to buyers and users of a system:
1 that is claimed to be a RELATIONAL DBMS
and 2 that fails on Rule Zero ?
1
Buyers and users will expect all the advantages of a truly
relational DBMS, and they will fail to get these advantages.
2. Non-Claims
It is important for the reader to realize that the author is NOT
making any of the following claims about the 12 rules:
1) that the rules are independent of one another;
2) that the 12 rules are complete in any sense;
3) that all a user needs to consider in evaluating any DBMS
is its fidelity to the relational model or to the rules
(for more complete evaluations, see the forthcoming Product
Reports from The Relational Institute).
As time goes on, it may be deemed advisable to augment these
rules to make them "more complete", but in any event they will
continue to be based principally on the relational model. This
model is not expected to change in any radical way. It has been
very stable for the last ten years at least. Minor additions
are expected every five years at the beginning and midpoint of
each decade. For an example of such an addition, see the
references [EFC1,2] cited in connection with Rule 3: it contains
a candidate extension re missing information, one that will be
included by 1990, if there are no insuperable technical or
semantic problems uncovered.
Now for the 12 rules based on Rule Zero.
Rl The Information Rule
Rule 1: ALL INFORMATION in a relational database is
represented explicitly at the logical level and in
exactly one way -- by values in R-tables.
Even R-table names, column names, and domain names are represented
as character strings in some R-tables. R-tables containing such
names are normally part of the built-in system catalog. The
catalog is accordingly a relational database itself -- one that
is dynamic and active and represents the meta-data (data
describing the rest of the data in the system).
The information rule is enforced not only for user productivity,
but also to make it a reasonably simple job for software vendors
to define additional software packages (like application
development aids, expert systems, etc.) which interface with
relational DBMS and, by definition, are well integrated with the
DBMS -- that is, these packages retrieve information already
existing in the catalog and, as needed, put new information in
the catalog by the very act of using the DBMS.
An additional reason is to make the database administrator's
task of maintaining the database in a state of overall integrity
both simpler and more effective. There is nothing more
embarrassing to a DBA than being asked if his database contains
2
certain specific information, and then replying after a week's
examination of the database that he does not know.
R2 Guaranteed Access Rule
Rule 2: Each and every datum (atomic value) in a relational
database is guaranteed to be logically accessible by
resorting to a combination of R-table name, primary key
value, and column name.
Clearly, each datum in a relational database can be accessed in a
rich variety (possibly thousands) of logically distinct ways.
However, it is important to have at least one way (independent of
the specific relational database) that is guaranteed -- because
most computer-oriented concepts (such as scanning successive
addresses) have been deliberately omitted from the relational
model.
Note that the guaranteed access rule represents an associative
addressing scheme that is unique to the relational model. It
does not depend at all on the usual computer-oriented addressing.
However, the primary key concept is an essential part of it.
Structural feature S8 (see section 3.1) requires each base
relation to have a primary key.
R3 Systematic treatment of missing information
Rule 3: Indicators (distinct from the empty character string or
a string of blank characters, and distinct from zero or
any other number) are supported in fully relational DBMS
for representing at the logical level the fact that
information is missing (applicable and inapplicable
information) in a systematic way -- independent of data
type. Besides the logical representation, the DBMS must
support manipulative functions for these indicators
[EFC1,2] and these must also be independent of the data
type of the missing information.
To support database integrity, it must be possible to specify
"indicators not allowed" for each primary key column and for any
other columns where the DBA considers it an appropriate integrity
constraint (for example, certain foreign key columns).
Past techniques entailed defining a special value (peculiar to
each column or field) to represent missing information. This
would be most unsystematic in a relational database, because
users would have to employ different techniques for each column
or for each domain -- a difficult task due to the high level of
language in use (and a task which, I believe, would lower the
productivity of users). Moreover, the use of a special value
selected by a user from the domain pertinent to the column is
tantamount to equating the semantics of the FACT that a value is
missing to the semantics of any value from that domain -- I call
this error the "value misrepresentation error".
3
R4 Dynamic on-line catalog based on the relational model
Rule 4: The database description is represented at the logical
level just like ordinary data, so that authorized
users can apply the same relational language to its
interrogation as they apply to the regular data.
One consequence of this is that each user (whether an application
programmer or end user) needs to learn only one data model -- an
advantage which non-relational systems usually do not offer (IMS
together with its dictionary requires the user to learn two
distinct data models). Another consequence is that authorized
users can easily extend the catalog to become a full-fledged,
active, relational data dictionary whenever the vendor fails to
do so.
R5 Comprehensive Data Sublanguage Rule
Rule 5: A relational DBMS (no matter how many languages and
what modes of terminal use it supports -- for example,
the fill-in-the-blanks mode) MUST support at least one
one language (1) whose statements are expressible per
some well-defined syntax as character strings; and
(2) which is comprehensive in supporting ALL of the
following items:
1 data definition
2 view definition
3 data manipulation (interactive & by program)
4 integrity constraints
5 authorization
6 transaction boundaries
(begin, commit, and rollback)
The relational approach is intentionally highly dynamic -- that
is, it should rarely be necessary to bring the database activity
to a halt (in contrast to non-relational DBMS). Therefore it does
not make sense to separate the services listed above into
distinct languages.
Note: in the mid-seventies ANSI-SPARC generated a document
advocating 42 distinct interfaces and (potentially) 42 distinct
languages for database management systems! Fortunately, that
idea has apparently been abandoned.
4
R6 View Updating Rule
Rule 6: The DBMS includes an algorithm at least as powerful as
VU-1 [see EFC3] for determining (at view definition
time) whether that view is.tuple-insertible and tuple
deletable, and whether each of its columns is
updatable. It records the result of this investigation
in the catalog.
Note that a view is theoretically updatable if there exists a
time-independent algorithm for unambiguously determining a single
series of changes to the base relations which will have as their
effect precisely the requested changes in the view. In this
regard, "update" is intended to include insertion and deletion,
as well as modification.
Suppose the definition of a particular view V 1) happens to be
expressed directly in terms of base relations; or 2) its definition
is expanded until it is so expressed. This view is simple if
its definition in terms of base relations involves no more than:
1) four occurrences of operators of the class: union, outer
union, set difference, intersection;
2) four occurrences of operators of the class: join,
relational division;
3) twice as many occurrences of the project operator as
there are relations cited in the query or update;
and 4) twice as many occurrences of algebraic selects as there
are relations cited in the query or update.
The reason that rule # 6 refers to simple views only is that the
general question as to whether or not a view is updatable is NOT
LOGICALLY DECIDABLE [HWB]. By limiting the class of views in
rule #6 to those defined above, the updatability of these views
becomes decidable -- and it should be decided by the DBMS, not by
a user!
R7 High-level Insert, Update, and Delete
Rule 7: The capability of handling a base relation or a derived
relation as a single operand applies not only to the
retrieval of data but also to the insertion, update,
and deletion of data.
This requirement gives the system much more scope in optimizing
the efficiency of its execution-time actions. It allows the
system to determine which access paths to exploit to obtain the
most efficient code. It can also be extremely important in
obtaining efficient handling of transactions across a distributed
database In this case, the total communication costs can easily
become enormous. Hence, many companies wish to
costs are saved by avoiding the necessity of transmitting a
separate request for each record obtained from remote sites.
5
R8 Physical Data Independence
Rule 8: Application programs and terminal activities remain
logically unimpaired whenever any changes are made in
either storage representations or access methods.
To handle this, the DBMS must support a clear, sharp boundary
between the logical and semantic aspects on the one hand and the
physical and performance aspects of the base tables on the other
hand; application programs must deal with the logical aspects
only. Non-relational DBMS rarely provide complete support for
this rule -- in fact, I know of none that do.
R9 Logical Data Independence
Rule 9: Application programs and terminal activities remain
logically unimpaired when information-preserving
changes of any kind that theoretically permit
unimpairment are made to the base tables.
Two examples: (1) splitting a base R-table into two tables either
by rows using row content or by columns using column names, if
primary keys are preserved in each result; (2) combining two base
R-tables into one by means of a non-loss join (Stanford
University and MIT authors now call these joins "lossless").
To provide this service whenever possible, the DBMS must be
capable of handling inserts, updates, and deletes at least when
algorithm VU-1 determines that such updatability is feasible (see
rule R6). Rule R9 frequently permits logical database design to
be dynamically changed if it appears to be necessary or desirable
-- for example, if such a change would improve performance.
The physical and logical data independence rules permit database
designers for relational DBMS to make mistakes in their designs
without the heavy penalties levied by non-relational DBMS. This
in turn means that it is much easier to get started with a
relational DBMS, because not nearly as much performance-oriented
planning is needed prior to "blast-off".
R10 Integrity Independence
Rule 10: Integrity constraints specific to a particular
relational database must be definable in the
relational data sublanguage and storable in the
catalog (not in the application programs).
Information about inadequately identified objects is NEVER
recorded in a relational database. To be more specific, the
following two integrity rules apply to every relational database:
1) Entity integrity
No component of a primary key is allowed to have a missing
value;
6
2) Referential integrity
For each distinct non-missing foreign key value in a
relational database, there must exist a matching primary key
value from the same domain.
In addition to the two integrity rules (entity integrity and
referential integrity), which apply to the base relations of each
and every RELATIONAL DATABASE, there is a clear need to be able
to specify additional integrity constraints reflecting either
business policies or government regulations. Assume the
relational model is faithfully reflected. Then, these additional
integrity constraints are defined in terms of the high level data
sublanguage, and the definitions are stored in the catalog (not
in the application programs).
If, as sometimes happens, either business policies or government
regulations change, it will probably become necessary to change
the integrity constraints. Normally, this can be accomplished in
a fully relational DBMS by changing one or more integrity
statements stored in the catalog. In many cases, neither the
application programs nor the terminal activities are logically
impaired. Non-relational DBMS rarely support this rule as part
of the DBMS engine (where it belongs) -- instead they depend on
a dictionary package (which may or may not be present and can
readily be bypassed).
Rll Distribution Independence
Rule 11: A relational DBMS has distribution independence.
By distribution independence, I mean that the DBMS has a data
sublanguage which enables application programs and terminal
activities to remain logically unimpaired:
1) when data distribution is first introduced (if the DBMS
originally installed manages non-distributed data only);
and 2) when data is re-distributed (if the DBMS manages
distributed data).
Note that this definition is carefully worded so that even a non-
distributed DBMS can fully support rule 11. SQL/DS, DB2, Oracle,
INGRES, and SUPRA fully support this rule, although each one is
either non-distributed or supports distributed databases only
partially in its present release. This has been demonstrated as
follows: (1) SQL programs written to operate on non-distributed
data (using System R) run correctly on distributed versions of
that data using System R* -- the IBM San Jose Research prototype;
(2) the distributed INGRES project has shown the same capability
for the QUEL language of INGRES.
It is important to distinguish distributed processing from
distributed data. In the former case, work (i.e., programs) is
transmitted to the data; in the latter case, data is transmitted
to the work. Many non-relational DBMS support distributed
7
processing, but not distributed data. The only systems that
support the concept of making all the distributed data aopear to
be local are relational DBMS (these are prototypes right now).
In the case of a distributed relational DBMS, a single trans-
action may straddle several remote sites. Such straddling is
managed entirely under the covers -- the system may have to
execute recovery at multiple sites. Each program or terminal
activity treats the totality of data as if it were all local to
the site where the application program or terminal activity is
being executed.
A fully relational DBMS that does not support distributed
databases has the capability of being extended to provide that
support, while leaving application programs and terminal
activities logically unimpaired -- both at the time of initial
distribution and whenever later re-distribution is made. There
are 4 important reasons why relational DBMS enjoy this advantage:
decomposition flexibility in deciding how to deploy the data,
recomposition power of the relational operators when combining
the results of sub-transactions executed at different sites,
economy of transmission resulting from the fact that there need
not be a request message sent for each record to be retrieved
from any remote site, analyzability of intent (due to the very
high level of relational languages) for vastly improved
optimization of execution.
R12 Non-Subversion Rule
Rule 12: If a relational system has a low level (single-record-
at-a-time) language, that low level cannot be used to
subvert or bypass the integrity rules and constraints
expressed in the higher level relational language
(multiple-records-at-a-time).
In the relational approach, preservation of integrity is made
independent of logical data structure to achieve integrity
independence (see rule R10).
Remark: Rulc R12 is extremely difficult for a "relational
pretender" like Cullinet's IDMS/R and ADR's Datacom/DB to obey,
because such a system already supports an interface below the
relational integrity constraint interface. Even if the DBMS had
an authorization feature which would permit only certain
specified programs to use the low level language, that would not
be enough (and neither DBMS goes even this far, at this time).
In fact, in talking with vendors of "relational pretenders", it
is quite obvious that they had not given this matter a thought !
8
3. Conclusion
As stressed in section 2 (Non-Claims), there is no contention in
this article that evaluating a candidate DBMS with regard to its
degree of fidelity to the relational model is all that needs to
be done, when trying to determine which is the best DBMS for a
particular company or portion thereof. However, I do claim that
this fidelity evaluation is today an essential, and possibly
crucial, part of the choosing process. There are many clear (and
obvious) relationships between user needs in general and the
twelve rules cited above. This applies whether we are discussing
productivity of application programmers, direct access to the
databases by end users, ease of database design and installation,
vastly improved control of database integrity, or protection of
the firm's investment in application programs and in the training
of their employees.
4. References
[EFCl] E. F. Codd, "Missing Information (Applicable and
Inapplicable) in Relational Databases", Report EFC-4,
The Relational Institute, San Jose, CA, Feb 21, 1986
[EFC2] E. F. Codd, "More Commentary on Missing Information in
Relational Databases", Report EFC-14, The Relational
Institute, San Jose, CA, January 12, 1987
[EFC3] E. F. Codd, "View Updatability: Algorithm VU-1",
Report EFC-15, The Relational Institute, San Jose, CA
January 22, 1987
[HWB] H. W. Buff, "The View Update Problem is Undecidable",
private communication from the Swiss Reinsurance Co.,
Zurich, Switzerland August 4, 1986
(revised June 2, 1987)
9