THE TWELVE RULES
        for determining how relational a DBMS product is

        by

        E. F. Codd
        President
        The Relational Institute


        May 16, 1986








        Abstract

        In a database management system (DBMS) omission of a few features
        of the relational model can have seriously adverse effects on the
        usability, the economic advantage, and the integrity advantage of
        that  system.   Frequently,  DBMS  vendors  have  been  amazingly
        short-sighted in the design of their new DBMS products.  However,
        they all want to claim their products are "relational". The rules
        described  in this paper  are intended to reflect  the compelling
        advantages  to  be gained by a DBMS that is  fully supportive  of
        each  and every feature of the  relational  model.   Furthermore,
        these rules  may help users  to evaluate DBMS products (which are
        claimed to be relational) more rapidly than if they examined  the
        support for each and every feature of the relational model.   The
        rules may also help  vendors  realize  how far short of  the mark
        their DBMS products are.

        NOTE: Authorization to upload to a BBS was obtained from The
              Relational Institute by both Hugo Blasdel (who scanned
              corrected and uploaded the document) and Fabian Pascal
              (who originally obtained it from TRI).  Errors in text
              are preserved (bottom of page 5) and formating (middle
              of page 5) are preserved, although this note and fixed
              phone number below are added.

            Copyright (c) 1986  E. F. Codd / The Relational Institute
                              All rights reserved

                 This material is planned for inclusion in a book
               by E. F. Codd to be published soon by Addison-Wesley

        The Relational Institute                 Telephone:  408-268-8821
        Suite 106
        6489 Camden Avenue                           TRI Technical Report
        San Jose, CA  95120                              EFC-6 / 05-16-86
















                           CONTENTS                                PAGE

            1      Introduction to the rules  . . . . . . . . . . . . . 1
            2      Non-claims . . . . . . . . . . . . . . . . . . . . . 2

              R1     Rule 1  The information rule  . . . . . . . . 2
              R2     Rule 2  Guaranteed access . . . . . . . . . . 3
              R3     Rule 3  Systematic treatment of
                                 missing information . . . . . . . 3
              R4     Rule 4  Dynamic on-line catalog . . . . . . . 4
              R5     Rule 5  Comprehensive data sublanguage  . . . 4
              R6     Rule 6  View updating . . . . . . . . . . . . 5
              R7     Rule 7  High-level insert, update, delete . . 5
              R8     Rule 8  Physical data independence  . . . . . 6
              R9     Rule 9  Logical data independence . . . . . . 6
              R10    Rule 10 Integrity independence  . . . . . . . 6
              R11    Rule 11 Distribution independence . . . . . . 7
              R12    Rule 12 Non-subversion  . . . . . . . . . . . 8

            3      Concluding remarks . . . . . . . . . . . . . . . .   9
            4      References . . . . . . . . . . . . . . . . . . . .   9





















              Copyright (c) 1986  E. F. Codd / The Relational Institute





                               THE TWELVE RULES

         1.  Introduction

         The   twelve   rules  defined  below  were  first  published   in
         Computerworld  U.S.A.   October  14  &  21,  1985.  The  versions
         presented here  are slightly revised  to make  their  definitions
         more  precise   (not  to  change  their  meaning).    A  database
         management  system  (DBMS)  is  fully  relational   if  it  fully
         supports   each  and  every one of the twelve rules  and  all  30
         features of the relational model.

         Like  the relational model itself,  these rules  are based on the
         practical requirements of users,  database administrators (DBAs),
         application  programmers,  security officers  and their managers.
         For example,  in rules 8 through 11,   I specify,  and   require,
         four   different  types  of  independence  aimed  at   protecting
         customers'   investment   in   application   programs,   terminal
         activities, and training.  Rules 8 and 9 -- physical and  logical
         data  independence -- have been heavily discussed for many years.
         Rules  10  and  11  -- integrity  independence  and  distribution
         independence -- are aspects of the relational approach which have
         received inadequate attention to-date,  but are likely to  become
         as important as rules 8 and 9.

         These  rules are collectively based on a single foundation  rule,
         which I call Rule Zero:.

              For any system that is advertised as, or claimed to be, a
              RELATIONAL  DATABASE MANAGEMENT SYSTEM,  that system must be
              able  to  manage databases  entirely through  its relational
              capabilities as specified in the relational model.

         This  must  hold  whether  or not the system  supports  any  non-
         relational capabilities of managing data.  Any DBMS that does not
         satisfy this Rule Zero is not worth rating as a RELATIONAL DBMS.

         One  consequence  of  this  rule:  any system  claimed  to  be  a
         relational DBMS must support database insert,  update, and delete
         at  the RELATIONAL level (multiple-record-at-a-time) -- see  rule
         R7   below.   Another consequence is the necessity of  supporting
         the information rule and the guaranteed access rule -- see  rules
         Rl and R2 below.  "Multiple-record-at-a-time" includes as special
         cases  those situations in which zero or one record is retrieved,
         inserted,  updated,  or  deleted.   In other  words,  a  relation
         (R-table)  may have either zero tuples (rows)  or one tuple (row)
         and still be a valid relation (R-table). Note that although these
         are special cases, they do not receive special treatment.

         What is the danger to buyers and users of a system:

              1  that is claimed to be a RELATIONAL DBMS
         and  2  that fails on Rule Zero ?




                                    1





         Buyers  and  users  will  expect all the advantages  of  a  truly
         relational DBMS, and they will fail to get these advantages.

         2.  Non-Claims

         It  is important for the reader to realize that the author is NOT
         making any of the following claims about the 12 rules:

            1) that the rules are independent of one another;
            2) that the 12 rules are complete in any sense;
            3) that all a user needs to consider in evaluating any DBMS
               is its fidelity to the relational model or to the rules
               (for more complete evaluations, see the forthcoming Product
               Reports from The Relational Institute).

         As  time  goes on,  it may be deemed advisable to  augment  these
         rules  to make them "more complete",  but in any event they  will
         continue  to be based principally on the relational model.   This
         model is not expected to change in any radical way.  It has  been
         very  stable  for the last ten years at least.   Minor  additions
         are  expected every five years at the beginning and  midpoint  of
         each  decade.    For  an  example of such an  addition,  see  the
         references [EFC1,2] cited in connection with Rule 3:  it contains
         a  candidate extension re missing information,  one that will  be
         included  by  1990,  if  there are no  insuperable  technical  or
         semantic problems uncovered.

         Now for the 12 rules based on Rule Zero.
         Rl  The Information Rule

          Rule 1:     ALL   INFORMATION   in  a  relational  database  is
                      represented  explicitly at the logical level and in
                      exactly one way -- by values in R-tables.

          Even R-table names, column names, and domain names are represented
          as  character strings in some R-tables.  R-tables containing such
          names  are  normally part of the built-in  system  catalog.   The
          catalog  is accordingly a relational database itself -- one  that
          is  dynamic  and  active  and  represents  the  meta-data   (data
          describing the rest of the data in the system).

          The  information rule is enforced not only for user productivity,
          but also to make it a reasonably simple job for software  vendors
          to   define   additional  software  packages  (like   application
          development  aids,  expert systems,  etc.) which  interface  with
          relational DBMS  and, by definition, are well integrated with the
          DBMS  -- that  is,  these packages retrieve  information  already
          existing  in the catalog and,  as needed,  put new information in
          the catalog by the very act of using the DBMS.

          An additional  reason is to make  the  database   administrator's
          task  of maintaining the database in a state of overall integrity
          both  simpler  and  more  effective.    There  is  nothing   more
          embarrassing to a DBA than being asked if his  database  contains


                                     2





         certain  specific information,  and then replying after a  week's
         examination of the database that he does not know.

         R2  Guaranteed Access Rule

         Rule  2:  Each  and every datum  (atomic value)  in  a relational
                   database  is guaranteed  to be logically accessible  by
                   resorting to a combination of R-table name, primary key
                   value, and column name.

         Clearly, each datum in a relational database can be accessed in a
         rich  variety  (possibly thousands) of logically  distinct  ways.
         However, it is important to have at least one way (independent of
         the  specific relational database) that is guaranteed  -- because
         most  computer-oriented  concepts (such  as  scanning  successive
         addresses)  have  been deliberately omitted from  the  relational
         model.

         Note  that the guaranteed access rule represents  an  associative
         addressing  scheme  that is unique to the relational  model.   It
         does not depend at all on the usual computer-oriented addressing.
         However,  the  primary  key concept is an essential part  of  it.
         Structural  feature  S8  (see  section 3.1)  requires  each  base
         relation to have a primary key.

         R3  Systematic treatment of missing information

         Rule 3:  Indicators  (distinct from the empty character string or
                  a string of blank characters,  and distinct from zero or
                  any other number) are supported in fully relational DBMS
                  for  representing  at the logical level  the  fact  that
                  information  is  missing  (applicable  and  inapplicable
                  information) in a systematic way  -- independent of data
                  type. Besides the logical representation, the DBMS  must
                  support  manipulative  functions  for  these  indicators
                  [EFC1,2]  and these must also be independent of the data
                  type of the missing information.

         To  support database integrity,  it must be possible  to  specify
         "indicators not allowed"  for each primary key column and for any
         other columns where the DBA considers it an appropriate integrity
         constraint (for example, certain foreign key columns).

         Past  techniques  entailed defining a special value  (peculiar to
         each  column or field) to represent  missing  information.   This
         would  be  most unsystematic in a  relational  database,  because
         users  would have to employ different techniques for each  column
         or  for each domain -- a difficult task  due to the high level of
         language  in use  (and a task which,  I believe,  would lower the
         productivity  of users).   Moreover,  the use of  a special value
         selected  by  a user  from the domain  pertinent to the column is
         tantamount  to equating the semantics of the FACT that a value is
         missing  to the semantics of any value from that domain -- I call
         this error the "value misrepresentation error".



                                    3






         R4  Dynamic on-line catalog based on the relational model

         Rule  4:   The database description is represented at the logical
                    level  just  like ordinary data,  so  that  authorized
                    users  can apply the same relational language  to  its
                    interrogation as they apply to the regular data.

         One consequence of this is that each user (whether an application
         programmer  or end user) needs to learn only one data model -- an
         advantage  which non-relational systems usually do not offer (IMS
         together  with  its  dictionary requires the user  to  learn  two
         distinct  data models).   Another consequence is that  authorized
         users  can  easily extend the catalog to become  a  full-fledged,
         active,  relational data dictionary whenever the vendor fails  to
         do so.

         R5  Comprehensive Data Sublanguage Rule

         Rule 5:   A relational DBMS  (no matter  how many languages  and
                   what modes of terminal use it supports -- for example,
                   the fill-in-the-blanks mode) MUST support at least one
                   one language  (1) whose statements are expressible per
                   some well-defined syntax as character strings;     and
                   (2)  which  is comprehensive in supporting ALL  of the
                   following items:

                    1   data definition
                    2   view definition
                    3   data manipulation (interactive & by program)
                    4   integrity constraints
                    5   authorization
                    6   transaction boundaries
                        (begin, commit, and rollback)

         The relational approach  is intentionally highly dynamic  -- that
         is,  it should rarely be necessary to bring the database activity
         to a halt (in contrast to non-relational DBMS). Therefore it does
         not  make  sense  to  separate the  services  listed  above  into
         distinct languages.

         Note:   in  the  mid-seventies  ANSI-SPARC generated  a  document
         advocating  42 distinct interfaces and (potentially) 42  distinct
         languages  for database management  systems!   Fortunately,  that
         idea has apparently been abandoned.












                                    4





         R6  View Updating Rule

         Rule  6:  The DBMS includes an algorithm  at least as powerful as
                   VU-1  [see  EFC3] for determining (at  view  definition
                   time)  whether that view is.tuple-insertible and  tuple
                   deletable,   and  whether  each  of  its   columns   is
                   updatable.  It records the result of this investigation
                   in the catalog.

         Note  that  a view is theoretically updatable if there  exists  a
         time-independent algorithm for unambiguously determining a single
         series of changes to the base relations which will have as  their
         effect  precisely  the requested changes in the  view.   In  this
         regard,  "update" is  intended to include insertion and deletion,
         as well as modification.

         Suppose the definition  of a particular view  V  1) happens to be
         expressed directly in terms of base relations; or 2) its definition
         is  expanded  until it is so expressed.   This view is simple  if
         its definition in terms of base relations involves no more than:

             1)  four occurrences of operators of the class:  union, outer
                 union, set difference, intersection;
             2)  four occurrences of operators of the class:   join,
                 relational division;
             3)  twice as many occurrences of the project operator as
                 there are relations cited in the query or update;
         and 4)  twice as many occurrences of algebraic selects as there
                 are relations cited in the query or update.

         The reason that rule # 6 refers to simple views only  is that the
         general question as to whether or not a view is updatable  is NOT
         LOGICALLY  DECIDABLE  [HWB].  By limiting the class of  views  in
         rule #6 to those defined above,  the updatability  of these views
         becomes decidable -- and it should be decided by the DBMS, not by
         a user!

         R7  High-level Insert, Update, and Delete

         Rule 7:   The capability of handling a base relation or a derived
                   relation  as  a single operand applies  not only to the
                   retrieval of data  but also to the  insertion,  update,
                   and deletion of data.

         This  requirement gives the system much more scope in  optimizing
         the  efficiency  of its execution-time actions.   It  allows  the
         system  to determine which access paths to exploit to obtain  the
         most efficient  code.   It  can  also be extremely  important  in
         obtaining efficient handling of transactions across a distributed
         database  In this case, the total communication costs can  easily
         become enormous.  Hence, many companies wish to
         costs  are  saved  by avoiding the necessity  of  transmitting  a
         separate request for each record obtained from remote sites.




                                    5





        R8  Physical Data Independence

        Rule 8:    Application  programs and terminal  activities  remain
                   logically  unimpaired whenever any changes are made in
                   either storage representations or access methods.

        To  handle this,  the DBMS must support a clear,  sharp  boundary
        between the logical and semantic aspects on the one hand and  the
        physical  and performance aspects of the base tables on the other
        hand;  application  programs  must deal with the logical  aspects
        only.   Non-relational DBMS  rarely provide complete support  for
        this rule -- in fact, I know of none that do.

        R9  Logical Data Independence

        Rule 9:    Application  programs and terminal  activities  remain
                   logically   unimpaired   when   information-preserving
                   changes   of   any  kind  that  theoretically   permit
                   unimpairment are made to the base tables.

        Two examples: (1) splitting a base R-table into two tables either
        by rows using row content  or by columns  using column names,  if
        primary keys are preserved in each result; (2) combining two base
        R-tables  into  one  by  means  of  a  non-loss  join   (Stanford
        University and MIT authors now call these joins "lossless").

        To  provide  this service whenever possible,  the  DBMS  must  be
        capable  of handling inserts, updates, and deletes at least  when
        algorithm VU-1 determines that such updatability is feasible (see
        rule R6).  Rule R9 frequently permits  logical database design to
        be dynamically changed if it appears to be necessary or desirable
        -- for example, if such a change would improve performance.

        The physical and logical data independence rules permit  database
        designers  for relational DBMS  to make mistakes in their designs
        without the heavy penalties levied by non-relational DBMS.   This
        in  turn  means  that  it is much easier to get  started  with  a
        relational DBMS,  because not nearly as much performance-oriented
        planning is needed prior to "blast-off".

        R10  Integrity Independence

        Rule 10:    Integrity constraints  specific   to  a  particular
                    relational  database  must  be  definable  in   the
                    relational  data  sublanguage and storable  in  the
                    catalog (not in the application programs).

        Information  about  inadequately  identified   objects  is  NEVER
        recorded  in a relational database.   To be  more  specific,  the
        following two integrity rules apply to every relational database:

        1) Entity integrity
           No component of a primary key is allowed to have a missing
           value;



                                   6





         2) Referential integrity
            For each distinct non-missing foreign key value in a
            relational  database,  there must exist a matching primary key
            value from the same domain.

         In  addition  to the two integrity rules  (entity  integrity  and
         referential integrity), which apply to the base relations of each
         and  every RELATIONAL DATABASE, there is a clear need to be  able
         to  specify  additional integrity constraints  reflecting  either
         business   policies  or  government  regulations.   Assume    the
         relational model is faithfully reflected.  Then, these additional
         integrity constraints are defined in terms of the high level data
         sublanguage,  and the definitions are stored in the catalog  (not
         in the application programs).

         If,  as sometimes happens, either business policies or government
         regulations  change,  it will probably become necessary to change
         the integrity constraints.  Normally, this can be accomplished in
         a  fully  relational  DBMS  by changing  one  or  more  integrity
         statements  stored in the catalog.   In many cases,  neither  the
         application  programs nor the terminal activities  are  logically
         impaired.   Non-relational  DBMS rarely support this rule as part
         of  the DBMS engine (where it belongs) -- instead they depend  on
         a dictionary  package  (which may or may not be present  and  can
         readily be bypassed).

         Rll  Distribution Independence

         Rule 11:    A relational DBMS has distribution independence.

         By  distribution independence,  I mean that the DBMS has  a  data
         sublanguage  which  enables  application  programs  and  terminal
         activities to remain logically unimpaired:

              1)  when data distribution is first introduced (if the  DBMS
                  originally installed manages non-distributed data only);

         and  2)  when  data  is  re-distributed   (if  the  DBMS  manages
                  distributed data).

         Note that this definition is carefully worded so that even a non-
         distributed DBMS can fully support rule 11.  SQL/DS, DB2, Oracle,
         INGRES,  and SUPRA fully support this rule,  although each one is
         either  non-distributed  or supports distributed  databases  only
         partially in its present release.  This has been demonstrated  as
         follows: (1)  SQL programs written to operate on  non-distributed
         data  (using System R) run correctly  on distributed versions  of
         that data using System R* -- the IBM San Jose Research prototype;
         (2) the distributed INGRES project has shown  the same capability
         for the QUEL language of INGRES.

         It  is  important  to  distinguish  distributed  processing  from
         distributed data.   In the former case,  work (i.e., programs) is
         transmitted to the data;  in the latter case, data is transmitted
         to  the  work.   Many  non-relational  DBMS  support  distributed


                                    7





         processing,  but  not distributed data.   The only  systems  that
         support  the concept of making all the distributed data aopear to
         be local are relational DBMS  (these are prototypes right now).

         In  the  case of a distributed relational DBMS,  a single  trans-
         action  may  straddle several remote sites.   Such  straddling is
         managed  entirely  under  the covers -- the system  may  have  to
         execute  recovery at multiple sites.   Each program  or  terminal
         activity  treats the totality of data as if it were all local  to
         the  site where the application program or terminal  activity  is
         being executed.

         A  fully  relational  DBMS  that  does  not  support  distributed
         databases  has  the capability of being extended to provide  that
         support,  while   leaving  application  programs   and   terminal
         activities  logically unimpaired -- both at the time  of  initial
         distribution and whenever later re-distribution is  made.   There
         are 4 important reasons why relational DBMS enjoy this advantage:
         decomposition  flexibility  in deciding how to deploy  the  data,
         recomposition  power  of the relational operators when  combining
         the results  of sub-transactions  executed  at  different  sites,
         economy  of transmission resulting from the fact that there  need
         not  be a request message  sent  for each record  to be retrieved
         from  any remote site,  analyzability of intent  (due to the very
         high level of relational languages) for vastly improved
         optimization of execution.

         R12  Non-Subversion Rule

         Rule 12:   If a relational system has a low level (single-record-
                    at-a-time) language,  that low level cannot be used to
                    subvert or bypass the integrity rules and  constraints
                    expressed  in  the higher  level  relational  language
                    (multiple-records-at-a-time).

         In the relational approach,   preservation of integrity  is  made
         independent  of  logical  data  structure  to  achieve  integrity
         independence (see rule R10).

         Remark:   Rulc  R12   is  extremely difficult  for a  "relational
         pretender"  like Cullinet's IDMS/R and ADR's Datacom/DB  to obey,
         because  such  a system already supports an interface  below  the
         relational integrity constraint interface.   Even if the DBMS had
         an  authorization  feature   which   would  permit  only  certain
         specified programs to use the low level language,  that would not
         be enough  (and neither DBMS goes even this far, at this time).

         In fact,  in talking with vendors of "relational pretenders",  it
         is quite obvious that they had not given this matter a thought !








                                    8





         3.  Conclusion

         As stressed in section 2 (Non-Claims),  there is no contention in
         this article  that evaluating a candidate DBMS with regard to its
         degree of fidelity to the relational model  is all  that needs to
         be done,  when trying  to determine  which is the best DBMS for a
         particular company or portion thereof.  However,  I do claim that
         this fidelity evaluation  is  today  an essential,  and  possibly
         crucial, part of the choosing process.  There are many clear (and
         obvious)  relationships  between user needs in  general  and  the
         twelve rules cited above.  This applies whether we are discussing
         productivity  of  application  programmers,  direct access to the
         databases by end users, ease of database design and installation,
         vastly  improved control of database integrity,  or protection of
         the firm's investment in application programs and in the training
         of their employees.


         4.  References


         [EFCl]   E. F. Codd,   "Missing Information  (Applicable and
                  Inapplicable) in Relational Databases", Report EFC-4,
                  The Relational Institute, San Jose, CA,  Feb 21, 1986

         [EFC2]   E. F. Codd, "More Commentary on Missing Information in
                  Relational Databases",  Report EFC-14,  The Relational
                  Institute, San Jose, CA,  January 12, 1987

         [EFC3]   E. F. Codd, "View Updatability: Algorithm VU-1",
                  Report EFC-15, The Relational Institute, San Jose, CA
                  January 22, 1987

         [HWB]    H. W. Buff, "The View Update Problem is Undecidable",
                  private communication from the Swiss Reinsurance Co.,
                  Zurich, Switzerland  August 4, 1986

















                                                   (revised June 2, 1987)



                                    9

