home *** CD-ROM | disk | FTP | other *** search
Text File | 1992-08-27 | 62.1 KB | 1,479 lines |
- .nr tp 12
- .nr sp 12
- .nr pp 11
- .nr fp 10
- .sz \n(pp
- .fo ''%''
- .(l C
- .sz \n(sp
- .b
- FUTURE TRENDS IN DATA BASE SYSTEMS
- .sz \n(pp
- .sp 3
- .i
- Michael Stonebraker
- .sp
- Department of Electrical Engineering
- and Computer Sciences
- University of California
- Berkeley, CA 94720
- .sp
- .r
- .)l
- .sp 2
- .ce
- .uh Abstract
- .pp
- This paper discusses the likely evolution of commercial
- data managers
- over the next several years.
- Topics to be covered include:
- .(l
- Why SQL has become an intergalactic standard.
- Who will benefit from SQL standardization.
- Why the current SQL standard has no chance of lasting.
- Why all data base systems will be distributed soon.
- What new technologies are likely to be commercialized.
- Why vendor independence may be achievable.
- .)l
- The objective of this paper is to present the author's vision of
- the future. As with all papers of this sort, this vision is
- likely to be controversial. Moreover, the reader will detect
- many of the author's biases and is advised to react
- with the appropriate discounting.
- .sh 1 "INTRODUCTION"
- .pp
- This paper is written from the perspective of a researcher who
- has had some opportunities to observe the commercial marketplace
- over the last several years. From this exposure
- I would like to comment on some of the current trends in this
- marketplace. In addition, I would also like to
- speculate on some of the likely trends
- in the marketplace over the next several years.
- .pp
- Due to the position of IBM, the importance of SQL in this
- evolution cannot be discounted.
- Others have pointed out the
- numerous serious flaws in SQL [DATE85]. Consequently, this paper
- will not discuss the technical problems in the language; rather,
- it will focus on the impact of SQL standardization. It will
- briefly discuss why the standard came about. However, more
- importantly, it will make a case that very few organizations
- will benefit directly from the standardization effort.
- Consequently, considerable
- effort is being spent to construct a standard, and organizations
- may not reap the benefits which they anticipate.
- .pp
- Then, the paper will turn to the current collection of prototype data base
- systems that are being constructed in research labs around the
- world. In particular, the characteristics
- that make these systems noticeably better than current
- commercial systems are identified. Unless some dramatic slowdown
- in technology transfer takes place, these ideas will quickly
- move from prototypes into commercial systems. The paper
- then argues that this movement of new features will
- spell the doom of a standardized version
- of SQL.
- .pp
- The paper then considers
- important technological trends. The most significant one
- appears to be distributed data bases, and
- the paper turns to this phenomenon and explains why
- all commercial systems are likely to become distributed data managers.
- It also comments on what important problems remain to be solved
- to facilitate ``industrial strength'' distributed data base systems.
- .pp
- In addition, I will comment on other technological and research
- trends which are likely to be significant in future data base
- managers. These comments are in the areas of data base machines, high
- transaction rate systems, main memory data base systems, and
- new storage devices.
- .pp
- Lastly, the paper will address a very serious problem which
- most users of data base systems struggle with. Namely,
- they are constrained to coping with ``the sins of the past,''
- namely a large amount of application code
- written in COBOL or other third generation languages
- which accesses previous generation
- data managers (such as IMS and other ``tired technology'' systems
- as well as ``home-brew'' data managers).
- This accumulated baggage from the past is usually an impediment to
- taking advantage of future hardware and software possibilities.
- Consequently, the paper closes with a step-by-step procedure
- by which any user can migrate over a period of years into
- an environment where he is not constrained to the iron
- of any particular
- hardware vendor.
- At the end of this paper, I will revisit the issue of standardization
- in light of the proposed migration path and indicate what sort
- of standardization activity might assist this process.
- .sh 1 "WHY SQL"
- .pp
- About 1984 the tom-toms started beating very loudly for SQL.
- The message was conveyed first by hardware vendors (iron mongers) in search of
- a data manager. In brief the message said ``IBM will make a big deal
- of DB 2 and SQL. I want to be compatible with IBM.'' A similar
- message was conveyed by so-called value-added resellers (VARS) who
- said ``I want
- application code that I write to run both on your data manager
- and on DB 2''.
- Discussions with VARS or iron mongers concerning
- .b exactly
- what they meant by
- SQL and exactly what they wanted in terms of compatibility usually
- evoked
- an answer of ``I don't know''. Hence the early tom-toms were being
- beaten by people who were not exactly sure of what they wanted.
- .pp
- Later, the tom-tom pounding was picked up by representatives of
- large users of data base services. Usually, the message they
- delivered was:
-
- .ll -4
- .in +4
- ``I need to run my applications on IBM iron and
- on the iron of vendors X, Y, and Z. I plan to move to DB 2
- as my IBM system and I want to ensure that the DB 2 applications
- I write can be moved to the iron of these other vendors. SQL
- is the mechanism that will allow me to achieve this objective.''
- .in -4
- .ll +4
-
- The vendors of commercial data managers are not stupid. They listen
- to the tom-toms and react appropriately. Consequently, all
- vendors of data base systems have put in place
- plans to support SQL.
- Moreover, all other query languages (e.g. QUEL, Datatrieve, etc.),
- regardless of
- their intellectual appeal, will become ``sunset''
- interfaces, i.e. they are likely to slowly fade away and
- become a thing of the
- past. I wish to make two other points in a bit more detail.
- .pp
- First, there is less interest in standardized SQL outside
- the USA. In fact, offshore DBMS users seem
- .b much
- more inclined to use fourth generation languages and thereby are
- less sensitive to the SQL issue. This point is further discussed in
- the next section.
- .pp
- A second point is that data base system vendors were
- immediately divided into two camps; those
- that already had SQL and those that had to spend a large number
- of man-years to retrofit SQL into their systems. Clearly, this
- presented a significant advantage to vendors in the first category, and
- helped reshape the competitive positions of various DBMS suppliers.
- In addition, one interesting measure of vendor responsiveness is the
- date of SQL introduction by vendors in category 2. Responsive
- vendors had SQL in 1986, others followed at later times.
- .sh 1 "WHO WILL BENEFIT FROM STANDARD SQL"
- .pp
- We turn first to a definition of three possible levels of SQL
- standardization that might make sense and indicate the level
- at which ANSI activity has taken place. Then, we consider
- the classes of users who might benefit from the current ANSI
- standardization.
- .sh 2 "Levels of Standardization"
- .pp
- There are three possible ways of interpreting SQL:
- .(l
- 1) SQL, the data definition language
- 2) SQL, the query language
- 3) SQL, the embedding in a host language
- .)l
- Using the first interpretation, one would standardize
- CREATE, DROP, ALTER and any other commands
- that involve storage management and schema creation or modification.
- This portion of SQL is used by data base administrators (DBAs)
- and standardization of SQL in this area might benefit this class
- of persons.
- .pp
- Using the second interpretation, one would standardize SQL, the query
- language. This would entail adding SELECT, UPDATE, INSERT, and DELETE
- to the list of standard commands. In this way an end user of standard SQL
- could expect his SQL commands to run on any DBMS supporting
- the standard.
- .pp
- The third interpretation would standardize SQL as it is executed from a
- host language. This interface includes the DECLARE CURSOR, OPEN CURSOR,
- FETCH, UPDATE, and CLOSE CURSOR commands. In this way, a programmer
- could expect his host language program to work across multiple
- DBMSs that adhered to the standard.
- .pp
- Loosely speaking we can call these three levels:
- .(l
- level 1: the DBA level
- level 2: the end user level
- level 3: the programmer level
- .)l
- .pp
- It should be clearly noted that the ongoing ANSI standardization
- effort is at level 3. However,
- vendors often mean something else by standard SQL.
- For example, Sybase has chosen only to implement level 2 SQL, while
- INGRES, Oracle and DB 2 all implement level 3.
- Consequently, the purchaser of a data base system
- should carefully inquire as to what ``SQL support''
- really means when he is contemplating an SQL-based data manager.
- .pp
- The last point to note is that level 3 ANSI SQL, DB 2 and SQL/DS
- are
- .b all
- slightly different versions of SQL.
- Hence, the concept ``standard SQL'' must be carefully
- tempered to reflect the fact that
- all level 3 SQL systems are different in at least minor ways.
- This corresponds closely to the current UNIX marketplace where the
- UNIXes offered by various vendors also differ
- in minor ways.
- .sh 2 "Who Will Benefit From SQL Standardization"
- .sh 3 "Introduction"
- .pp
- As mentioned earlier, ANSI has standardized a level 3 SQL interface.
- Such
- standardization might be of benefit to:
- .(l
- data base administrators
- end users
- application programmers
- vendors of 4th generation languages
- vendors of distributed data base systems
- .)l
- In the next several subsections we indicate which of these groups
- are likely to benefit from SQL standardization.
- .sh 3 "Data Base Administrators"
- .pp
- Clearly, a level 3 standard includes a level 1 standard as a subset.
- Consequently, a DBA who garners experience with schema definition
- on one data manager will be able to leverage this experience
- when designing data bases for a second DBMS. Hence, a DBA should
- benefit from the ANSI standardization effort. However, there are
- several caveats that must be noted.
- .pp
- First, most relational DBMSs have nearly the same collection of level
- 1 capabilities. Hence, except for minor syntactic variations, level
- 1 was already effectively standardized
- and there was no need for
- ANSI machinery in this area.
- .pp
- Second, differences exist in the storage of data
- dictionary information (the system catalogs).
- A DBA
- usually wishes to query the system
- catalogs to retrieve schema information.
- The current ANSI standard does not
- address this aspect, and each standard SQL system will have a
- different representation for the dictionary.
- Lastly, differences exist in the exact form of indexes
- and the view support facilities, which may influence the details
- of data base design.
- The current SQL standard does not address these differences, and they
- limit the leverage a DBA can expect.
- .pp
- In summary, all current relational systems are standard in that they
- allow a user to construct and index relations consisting of
- named columns, usually with
- nearly the same syntax.
- Consequently, data base design methodologies
- appropriate to one system are nearly
- guaranteed to be appropriate for other systems.
- At this level, current relational systems are
- already standard, and nothing additional
- need be done.
- .pp
- There are also differences
- between the various systems in the areas of types of indices, storage
- of system catalogs, and view support.
- These aspects
- are not yet addressed by the ANSI standardization effort.
- As a result, I don't perceive that DBAs will benefit greatly from the
- ANSI effort, relative to what they will automatically gain
- just by using relational systems.
- .sh 3 "End Users"
- .pp
- Since the ANSI standardization effort includes a level 2 SQL
- facility as a subset, one could claim that end users will benefit
- because they can learn the SQL for one data base system and then be
- able to transfer that knowledge to other standard data base systems. However,
- this claim is seriously flawed.
- .pp
- First, end users are not going to use SQL. Human factors studies
- and early usage of relational systems has shown clearly that end users
- will use customized interfaces appropriate to their application,
- usually of the ``fill in the form'' variety [ROWE85]. Such customized
- interfaces will be written by programmers. Consequently, end users
- will not benefit from SQL standardization because they won't use
- the language.
- .pp
- Even if end users
- .b did
- use SQL, they are still subject to widely
- divergent presentation services. For example EASE/SQL from
- ORACLE is very different from IBMs QMF, yet both allow
- a human to interactively construct and execute SQL commands.
- These differences will limit the leverage obtainable.
- .sh 3 "Programmers"
- .pp
- One could argue that programmers will benefit from standardization of
- the level 3 SQL interface because the programs that they write for
- one SQL system will run on another standard DBMS. Moreover,
- once they learn the define-open-fetch cursor paradigm for one
- system, they will immediately be able to write programs for another
- DBMS. This argument is very seriously flawed.
- .pp
- First, this argument only applies to vendors who have chosen to
- support the level 3 standard. It clearly does not apply to Sybase,
- and any other vendor who has chosen to implement
- the standard only at level 2.
- .pp
- Second, and perhaps of supreme importance, programmers are not
- going to use the level 3 interface. Most DBMS vendors
- offer so-called fourth generation languages (4GLs). Such products
- include Natural, Ramis, Adds-online, Ideal, INGRES/ABF, and SQL-forms.
- In general these products allow a programmer to:
- .(l
- define screens
- define operations to be executed as a result of user input into screens
- interactively call subsystems such as the report writer
- .)l
- Application programmers familiar both with 4GL products and with
- the level 3 style application programming interface report that there
- is a factor of 3-10 in leverage from using a 4GL.
- Consequently, a client of DBMS technology is generally well
- advised to use a 4GL wherever possible
- and to foresake the level 3 programming
- interface.
- This
- advice is nearly universally true in business data processing
- applications. In engineering applications, on the other hand, 4GLs
- may be less advantageous.
- .pp
- In summary, application programmers are going to use
- 4GLs because of their software development leverage, and
- not the level 3 SQL interface. Moreover,
- every 4GL is totally unique, and there is no standardization
- in sight. The only company who could drive a 4GL standardization
- activity would be IBM. However, most data base professionals
- do not believe that IBM has a 4GL (not withstanding
- IBMs marketing of CSP as a 4GL). Consequently, it will be several
- years before there is any possible standardization in this
- area.
- .pp
- On the other hand, suppose a user decides
- .b not
- to use a 4GL because he is concerned about portability or
- alleged poor performance
- in older products. His applications
- .b still
- require screen definition facilities, report specifications, and
- graph specifications. These facilities are unique to each
- vendor and not addressed in any way in the SQL standard. To move
- from one standard DBMS to another, one must relearn the facilities in
- each of these areas. As a result, only perhaps 10-20 percent of
- the total specification system is covered by ANSI SQL, and the
- remainder must be relearned for each system. To avoid this
- retraining, a user must either write and port his own facilities in
- these areas, an obviously distasteful strategy, or
- he must depend on some specific vendor to provide a standard
- collection of facilities on all platforms important to him.
- SQL is clearly of no assistance in this dimension.
- .sh 3 "4GL Vendors"
- .pp
- One could argue that vendors of 4GL products will benefit
- from standardization because they will be able to easily
- move their products onto a variety of different data
- managers. Although this argument has merits,
- it is also somewhat flawed.
- .pp
- First, as noted before there is no standard for information in the
- system catalogs. All 4GLs must read and write
- information in the dictionary, and this will be code unique
- to each target DBMS.
- Second,
- I have asked a variety of 4GL users which target DBMSs
- are of greatest interest
- to them. They typically
- respond
- with the following three priority requests:
- .(l
- 1) IMS
- 2) DB 2
- 3) some ``home-brew'' data manager
- .)l
- To satisfy these requests, a 4GL vendor
- must develop complete custom
- interfaces for systems 1 and 3. Only the interface for
- system 2
- would be assisted by standardization.
- Third, most DBMS vendors have (or are developing) capabilities which
- superset the ANSI standard. The reasons for this are discussed at
- length in the next section. A 4GL vendor who wishes to interface
- to such an extended DBMS has two choices. First, he can restrain his
- use to the subset which is standard. Consequently, there will
- be underlying DBMS capabilities which he does not exploit, and he will
- be at a
- relative disadvantage compared to 4GLs (such as the one from the
- DBMS vendor in question) which take full advantage
- of underlying facilities.
- The second choice is to do non standard extensions for each target
- DBMS.
- Both choices are unattractive.
- Lastly, any 4GL that is marketed by a hardware vendor is unlikely
- to take advantage of any opportunity for portability provided by SQL because
- such a vendor will likely resist providing a migration path for
- applications off of his particular hardware.
- .pp
- Hence, standardization on SQL clearly helps a 4GL vendor who
- wishes to make his code portable. However, not all of them will wish to,
- and there is substantial
- effort in the areas of system catalogs, non-standard
- extensions and coupling to non SQL data bases
- which is required to make this portability occur.
- .sh 3 "Vendors of Heterogeneous Distributed DBMSs"
- .pp
- One could argue that distributed data base systems should
- have so-called ``open architectures'' and be able to
- manage data that is stored in local data managers
- written by various vendors. Hence, vendors of open architecture
- products might benefit from SQL standardization, since foreign
- local data managers will be easier to interface to.
- .pp
- Basically, a distributed DBMS vendor sees the world in exactly the same way
- as a vendor of a 4GL. Hence, the above section applies exactly to this
- class of user.
- .sh 3 "Summary"
- .pp
- We can summarize the possible groups who might benefit from
- standardization of SQL as follows:
- .(l
- DBAs This group will benefit from the fact that all relational
- systems use essentially the same data definition language
- definition language, regardless of the query
- language supported.
-
- end users This group will not use SQL and will be unaffected by
- standardization.
-
- programmers This group will primarily use 4GLs and consequently will be
- unaffected by standardization.
-
- 4GL vendors This group may benefit from standardization if they choose to
- try to interface to a variety of data managers. However, they
- still have a lot of work to do, and some of them will resist
- exploiting this portability.
-
- Distributed They are in the same position as 4GL vendors.
- DBMS vendors
- .)l
- .pp
- One draws the unmistakable conclusion that the large amount of effort
- that is being poured into SQL standardization may
- .b not
- pay
- handsome dividends.
- A user of DBMS technology will only benefit if he chooses 4GL and
- distributed DBMS products from vendors committed to open architectures.
- He will then benefit indirectly from the efforts of these vendors to make their
- products run on a variety of SQL engines.
- .pp
- However, the situation is much worse than
- has been portrayed so far because standard SQL, as currently
- defined, stands
- .b no
- .b chance
- of lasting more than a few
- years. The next section shows why SQL will not ``stick''.
- .sh 1 "WHY STANDARD SQL IS DOOMED"
- .sh 2 "Introduction"
- .pp
- All relational DBMSs were designed to solve the needs of
- business data processing applications. Specifically, they were
- designed to rectify the disadvantages of earlier hierarchical and
- network data base systems. Most DBMS professionals
- agree that they have succeeded at this task admirably.
- However, equally
- well understood are the needs of other users
- of DBMS technology in the areas of spatial data, CAD
- data, documents, etc. There is a renaissance of research
- activity building ``next generation prototypes''
- which attempt to rectify the drawbacks of current relational systems.
- Consequently, one could say
- that there are three generations of systems:
- .(l
- generation 1: Hierarchical and Network Systems
- generation 2: Relational Systems
- generation 3: Post-relational Systems
- .)l
- .pp
- The following research prototypes are all examples of prototype
- post-relational systems:
- .(l
- EXODUS [CARE86]
- GEM [TSUR84]
- IRIS [FISH87]
- NF2 [DADA86]
- ORION [BANE87]
- POSTGRES [STON86a]
- STARBURST [LIND87]
- .)l
- Although they are exploiting various ideas,
- one can make the following observation:
-
- .in +4
- .ll -4
- Essentially all ideas that are being exploited by the above prototype
- systems can be
- added to current commercial relational data base systems by
- extending or reworking their capabilities.
- .in -4
- .ll +4
-
- Hence, it is obvious that aggressive vendors will quickly extend
- their current SQL engines with relational versions of the successful
- capabilities of these prototypes. In this way, vendors will create
- systems that are
- substantial supersets of SQL.
- Since each vendor will do unique extensions,
- they will all be incompatible.
- Moreover, IBM will be the slowest
- to provide extensions to DB 2.
- .pp
- These extensions will solve
- problems that are so important to large classes of users that they will
- gladly use the extended capabilities. In this way, any application that a user
- writes for vendor A's system will not run without substantial maintenance
- on vendor B's system and vica-versa. This will ensure that application
- portability will not be achieved through SQL.
- .pp
- The rest of this section indicates two areas in which seductive
- next generation capabilities are expected.
- .sh 2 "Management of Knowledge Bases"
- .pp
- I wish to discuss
- knowledge bases first with regard to expert systems and then with regard
- to conventional business data processing. I conclude this
- subsection with a discussion of why it is essential that knowledge
- management become a data base service.
- .pp
- Expert systems typically use
- .b rules
- to embody the knowledge of an expert, and I will use interchangeably
- the concept of a knowledge base and a rule base.
- One important application area of expert systems is in
- surveillance systems. The object to be monitored
- could be a physical object, such as manufacturing line, an oil
- refinery, or a stock market.
- It might also be an area of real estate, such as
- a battlefield. In either case, an expert system is desired which watches
- the state of the object and alerts a human
- if ``abnormal'' events occur.
- Such surveillance applications fundamentally
- involve the data base for the monitored object.
- Moreover, abnormal events are typically defined by a rule base,
- developed by consultation with human experts. Hence, such applications
- require a large data base (the monitored object) and a large set
- of rules (the events to watch for).
- .pp
- In conventional business data processing applications there is also
- substantial use for a rule base. For example, consider the processing
- of purchase orders. The following rules might well apply in a
- typical company:
- .(l
- All POs over $100 must be signed by a manager
- All POs over $1000 must be signed by the president
- All POs for computer equipment must be signed by the MIS director
- All POs for consultants must have an analysis of need attached
- .)l
- Similar rule systems
- control allocation of office furniture (e.g, only vice presidents
- can have wood desks), commission plans for salespersons (e.g, commission
- is paid only on non discounted POs), vacation accrual, etc.
- .pp
- The possible techniques available to support such composite
- rule and data base applications are:
- .(l
- 1) Put the rules in an application program and the data in a data base.
- 2) Put the rules and the data in an expert system shell.
- 3) Put the rules in an expert system shell and the data in a data base.
- 4) Put both the rules and the data in a composite data/rule base.
- .)l
- I now argue that only option 4 makes any long term technical sense.
- Option 1 is widely used by business data processing applications
- to implement rules systems such as our purchase order example. The
- disadvantage of this approach is that the rules are buried in
- the application program and are thereby difficult to understand and
- tedious to change as business conditions evolve. Moreover, if a
- new program is written to interact with the data base, it must be coded
- to enforce the rules in a fashion consistent with the
- previously written application programs. The possibility for
- error is consequently high. In summary, when rules
- are embedded in an application
- program, they are hard to code, hard to change, and hard to enforce
- in a consistent fashion.
- .pp
- The second alternative is to put both the data and the rules in an
- expert system environment such as Prolog, OPS5, KEE, ART,
- or S1. The problem with this approach is that these systems, without exception,
- assume that facts available to their rule engines are resident in main memory.
- It is simply not practical to put a large data base into virtual memory.
- Even if this were possible, such a data base would have no transaction
- support and would not be sharable by multiple users. In short, current expert
- system shells do not include data base support, and option 2 is simply
- infeasible.
- .pp
- Option 3 is advocated by the vendors of
- expert system shells and is termed
- .b loose
- .b coupling.
- In this approach rules are stored in main memory in the shell
- environment which contains an inference engine. Whenever
- necessary, this program will run queries against a data base
- to gather any needed extra information. Hence, rules are
- stored in a rule manager and data in a separate data manager. A layer
- of ``glue'' is then used to couple these two subsystems together.
- An example of this architecture is KEE/Connection from Intellicorp.
- .pp
- Unfortunately loose coupling will fail miserably on a wide variety
- of problems, and a simple example will illustrate the
- situation. Suppose one wanted to
- monitor a single data item in a data base, i.e,
- whenever the data item changes in the data base, it should change
- on the screen of a monitoring human.
- Many investment banking and
- brokerage houses are building automated trading systems that are
- much more sophisticated versions of this simplistic example.
- .pp
- The expert system can run a query to
- fetch the data item in question. However, it will become quickly
- out of date and must be fetched anew. This repeated querying of the
- data base will needlessly consume resources and will always result
- in the screen being some amount of time out of date. Loose
- coupling will fail badly in environments where the expert system
- cannot fetch a small, static portion of the data base on which to
- operate. Most problems I can think of fail this
- ``litmus test''.
- .pp
- The fourth alternative is to have a single data/rule system to manage
- both rules and data, i.e. to
- implement
- .b tight
- .b coupling.
- Such a system must be
- .b active
- in that it must perform asynchronous operations
- to enforce the rules. This is in contrast to current
- commercial DBMS
- which are
- .b passive
- in that they respond to user's requests but have no concept
- of independent action.
- .pp
- An active system can tag the data item being watched
- by our simplistic application and send a message to an
- application program whenever the data item changes. This will
- be an efficient solution to our monitoring example.
- Such a data manager will automatically support sharing
- of rules, the ability to add and drop rules on the fly, and
- the ability to query the rule set.
- .pp
- Tight coupling can be achieved in a variety of ways. Extensions to
- the view definition facility can be utilized as well as
- extensions to the SQL language directly [STON87].
- In the case that the resulting queries are recursive,
- processing algorithms have been investigated in
- [ULLM85, IOAN87, ROSE86, BANC86].
- .pp
- Without a doubt many of these ideas will lead to commercial
- implementations, and I expect that many will be successful.
- The bottom line is that rules and inference will almost
- certainly move into data base systems
- over the next few years. It appears feasible to
- support this feature by supersetting the query language, and this will
- certainly be the method of choice for SQL vendors.
- .sh 2 "Object Management"
- .pp
- If I hear the phrase ``everything is an object'' once more, I think
- I will scream. Peter Buneman expressed this frustration
- most concisely in [BUNE86]: ``Object-oriented is a semantically overloaded
- term''.
- Moreover, in a panel discussion on Object-Oriented Data Bases
- (OODBs) at VLDB/87, six panelists
- managed to disagree completely on exactly what an
- OODB might be.
- .pp
- In any case, there are a class of applications which must manage
- data that does not fit the standard business data processing world
- where objects are character strings, integers, floating
- point numbers and maybe date, time, money and packed decimal.
- Non-business environments
- must manage data consisting of documents, three dimensional spatial
- objects, bitmaps corresponding to pictures, icons for graphical
- objects, vectors of observations, arrays of scientific data, complex
- numbers, etc.
- .pp
- In general these applications are badly served by current data
- base systems, regardless of what data model is
- supported. This point is discussed in detail in [STON83], and
- we present here only a very simple example. Suppose a user
- wishes to store the layout of Manhattan , i.e. a data set consisting
- of two-dimensional rectangular boxes. Obviously, a box can be
- represented by the coordinates of its two corner
- points (X1,Y1) and (X2, Y2). Consequently, a reasonable schema for
- this data is to construct a BOX relation as follows:
- .(l
- BOX (id, X1, Y1, X2, Y2)
- .)l
- .pp
- The simplist possible query in this environment is to place a template
- over this spatial data base and ask for all boxes that are visible in
- the viewing region. If this region corresponds to the unit square, i.e.
- the box from (0,0) to (1,1), then the most efficient representation of
- the above query in SQL is:
- .(l
- .bp
- select *
- from BOX
- where X1 <= 1 and
- X2 >= 0 and
- Y1 <= 1 and
- Y2 >= 0
- .)l
- Moreover, it generally takes a few tries before a skilled SQL user
- reaches this representation.
- Consequently, even
- trivial queries are hard to program. In addition, no matter what
- collection of B-tree or hash indexes are constructed on any key or
- collections of keys, this query will require the run-time execution
- engine to examine, on the average, half of the index records in
- some index. If there are 1,000,000 boxes,
- 500,000 index records
- will be inspected by an average query. This will ensure
- bad performance even on a very large machine.
- .pp
- In summary the box application is poorly served on
- existing relational DBMSs because simple queries are difficult
- to construct in SQL and they execute with bad
- performance.
- To support the box environment, a relational DBMS must:
-
- .ll -4
- .in +4
- 1) support ``box'' as a data type. In this way, the BOX relation can
- have two fields as follows:
- .(l
- BOX (id, description)
- .)l
-
- 2) Support && as an SQL operator meaning ``overlaps''. In this way, the
- query can be expressed as:
- .(l
- select *
- from BOX
- where description && ``(0,0), (1,1)''
- .)l
-
- 3) Support a spatial access method such as R-trees [GUTM84] or
- K-D-B trees [ROBI81]. This will ensure that the above extended SQL
- command can be efficiently processed.
- .ll +4
- .in -4
-
- .pp
- In addition, examples can be easily constructed which emphasize the
- need for multiple inheritance of data and operators, efficient storage
- of very large objects, objects which are composed of other objects, etc.
- Proposals addressing various
- of these ideas are contained in [STON86b, CARE86, BANE87, FISH87, LIND87],
- and these should move into commercial systems in the near future.
- The aggressive
- vendors will be include such capabilities as extensions to SQL.
- .sh 2 "Summary"
- .pp
- ANSI is currently preparing a draft of SQL 2, its proposed future
- extension to the current SQL standard. However, it contains
- .b no
- capabilities in the areas of
- knowledge management and object management. Since these
- capabilities are perceived to be
- .b extremely
- useful in a wide variety of situations,
- aggressive vendors will move ahead in these areas with
- vendor-specific capabilities.
- As a result SQL 2 will contain only a subset of available
- commercial functions. In a time of rapid technological change,
- the standard will substantially lag the industry leaders and will
- be doomed to instantaneous
- technological
- obsolescence.
- .pp
- To clearly see the reason for this dismal state of affairs
- one need only look at the philosophy of standardization
- that is being pursued. There are two successful
- models, the
- .b resolution
- model and the
- .b beacon
- model. In the beacon model
- one assembles the vendors of existing similar but not quite
- compatible products in a committee with interested users
- and charges them with
- resolving their differences by a political negotiation. This model
- of political resolution by a large committee works well when:
- .(l
- 1) there are many implementations of the object being standardized
- 2) dramatic changes are not happening in the object being standardized
- 3) resolution of differences is a political problem
- .)l
- The ongoing standardization efforts in Cobol and Fortran clearly
- fit into the resolution model.
- .pp
- On the other hand, Ada is a good example of the beacon model. Here
- a new standard was invented with no commercial implementations
- preceding it. In this case, DOD wisely generated the standard by
- charging several
- small teams with designing languages and then picked the best one. In
- this case, design of the standard was accomplished by a small team of
- very gifted language designers decoupled from any political
- process.
- The beacon model works very well when:
- .(l
- 1) there are no implementations of the object being standardized
- 2) dramatic changes are contemplated
- 3) a small team of technical experts does the actual design
- .)l
- .pp
- We can now examine SQL standardization in this light. Clearly,
- the previous activity (which we call SQL-1) is an example of the
- resolution model. Moreover, the process has converged a collection
- of slightly incompatible versions of SQL to a political resolution.
- However, SQL-2 is a proposal to extend the language onto virgin
- turf, i.e. to include capabilities which no vendors currently have
- implementations for. Moreover, relational DBMSs are an area
- where dramatic technical change is happening. Hence, now capabilities
- would be best worked on by a small team of experts, and not by a large
- group of politicians.
- .pp
- In summary, there are two defendable choices open to ANSI at the current
- time. First, they could follow the resolution model. In this case, they
- have accomplished their initial objective of coalescing the initial
- versions of SQL. They should now adjourn the committee for a couple
- of years while the aggressive vendors do substantial supersets. Later
- they should reconvene the committee to resolve the differences by
- political negotiation. On the other hand, if they choose the beacon
- model, they should subcontract two or more small teams of experts to
- do proposals and then pick the best one. The problem with ANSI SQL
- is that they did SQL-1 according to the resolution model. Now with
- no change in structure, they are trying to switch to the beacon model. As
- a result, I
- feel they are guaranteed to fail.
- .sh 1 "DISTRIBUTED DATA BASES"
- .sh 2 "Why Distributed DBMSs"
- .pp
- There is considerable confusion in the marketplace concerning
- the definition of a distributed DBMS. At the very least it must
- provide a
- ``seamless'' interface to data that is stored on multiple
- computer systems. For example, if EMP is stored on a machine in London
- and DEPT is placed on a machine in Hong Kong, then it must be possible
- to join these relations without explicitly logging on to both
- sites and assembling needed data manually at some processing location.
- Instead one would want the notion of ``location transparency'' whereby
- one could simply state the following SQL:
- .(l
- select name
- from EMP
- where dept in
- select dname
- from DEPT
- where floor = 1
- .)l
- There are several vendors who are
- marketing software systems as distributed data bases which
- cannot run the above query but instead
- provide only remote access to data at a single site or
- provide only a micro-to-mainframe connection.
- Such systems are
- .b NOT
- distributed DBMSs
- and the marketing hype on the part of such vendors should
- be immediately discounted.
- .pp
- Moreover, location transparency can
- be provided either
- by:
- .(l
- a network file system (NFS) or
- a distributed DBMS
- .)l
- A user should
- .b very
- .b carefully
- check which technique is being using by any vendor
- who claims to sell a distributed data
- base system. Consider a user in San Francisco who is interacting
- with the above EMP and DEPT relations in London and Hong Kong. To
- find the names of employees on the first floor using an NFS solution,
- both relations will be paged over
- the network and the join accomplished in San Francisco. Using
- a distributed DBMS a heuristic optimizer will choose an intelligent
- accessing strategy and probably choose to move the
- the first-floor departments to London, perform the join there, and
- then move the
- end result to San Francisco.
- This strategy will generally be orders of magnitude faster than
- an NFS strategy. As Bill Joy once said:
- .(l
- think remote procedures not remote data
- .)l
- Put differently, one should send the queries to the data and not
- bring the data to the query.
- .pp
- A lazy vendor can quickly implement an NFS-based distributed data
- manager that will offer bad performance.
- Distributed DBMSs with heuristic optimizers are considerably more work,
- but offer much better performance. A client of distributed data
- managers must develop the sophistication to be able to distinguish the
- lazy vendors from the serious ones.
- .pp
- Distributed data base systems will find universal acceptance because they
- address all of the following situations.
- First, most large organizations are geographically decentralized and have
- multiple computer systems at multiple locations. It is usually
- impractical to have a single ``intergalactic'' DBA to control the world-wide
- data resources of a company. Rather one wants to have a DBA at each site,
- and then construct a distributed data base to allow users to access
- the company resource.
- .pp
- Second, in high transaction rate environments one must assemble
- a large computing resource. While it is
- certainly acceptable to buy a large mainframe computer (e.g. an IBM Sierra
- class machine), it will be
- nearly 2 orders of magnitude cheaper to assemble a network of smaller machines
- and run a distributed data base system. Tandem has shown that
- transaction processing on this architecture expands linearly with
- the number of processors. In most environments, a very efficient
- transaction processing engine can be assembled by networking small
- machines and running a distributed DBMS. The ultimate version of
- this configuration is
- a network of personal computers.
- .pp
- Third, suppose one wants to offload data base cycles from a
- large mainframe onto a back-end machine, as typically advised by data
- base machine companies including
- Britton-Lee and Teradata.
- If so, it will make sense to support the possibility of more
- than one back-end CPU, and a distributed DBMS is required. In
- fact, Teradata includes one on their machine already.
- .pp
- Fourth, as will be discussed presently, I expect more and more users
- to have workstations on their desks, replacing standard terminals.
- I also expect most workstations will have attached
- disks to ensure good I/O performance. In such an environment, one will have
- a large number of data bases on workstations that may be of a personal
- nature (such as appointment calendars, phone directories, mail lists, etc.)
- Even such personal data bases require a distributed DBMS, because
- such tasks as electronically scheduling as meeting require them.
- .pp
- Lastly, virtually all users must live with the ``sins of the past'',
- i.e. data currently implemented in a multitude of previous generation
- systems. It is impossible to
- rewrite all applications at once, and a distributed DBMS which supports
- foreign local data managers allows a graceful transition
- into a future architecture by allowing old applications for obsolete
- data bases to coexist with new applications written for a
- current generation DBMSs.
- This point is further elaborated in Section 7.
- .pp
- I expect everybody to want a distributed data base system for one or
- more of these five reasons. Hence, I believe that all
- DBMS vendors will implement distributed DBMSs and
- it will be hard to find
- vendors who offer only a single site DBMS in a few years.
- .sh 2 "Research Issues in Distributed DBMSs"
- .pp
- There has been a mountain of research on algorithms to support
- distributed data bases in the areas of query processing [SELI80],
- concurrency control [BERN81], crash recovery [SKEE82] and update
- of multiple copies [DAVI85]. In this section, I indicate two
- important problems which require further investigation.
- .pp
- First, users are contemplating
- .b very
- .b large
- distributed data base systems consisting of hundreds or even thousands
- of nodes. In a large network, it becomes unreasonable to assume that each
- relation has a unique name. Moreover, having the query optimizer inspect
- all possible processing sites as candidate locations to perform a distributed
- join will result in unreasonably long optimizer running times.
- In short, the problems of ``scale'' in distributed data bases
- merit investigation by the research
- community.
- .pp
- Second, current techniques for updating multiple copies of
- objects require additional investigation. Consider the simple
- case of a second copy of a person's checking account
- at a remote location. When that person cashes a check, both copies
- must be updated to ensure consistency in case of failure. Hence,
- at least two round trip messages must be paid to the remote location
- to perform this reliably. If the remote account is in Hong Kong, one can expect
- to wait an unreasonable amount of time for this message traffic to
- occur. Hence, there will be no sub-second response times to updates
- of a replicated object. To a user of DBMS services, this delay
- is unreasonable, and algorithms that address this issue efficiently
- must be developed. Either a lesser guarantee than consistency must
- be considered, or alternatively algorithms that work only on special
- case updates (e.g, ones guaranteed to be commutative) must be
- investigated. The work reported in [KUMA88] is a step in this direction.
- .sh 1 "OTHER TECHNOLOGIES"
- .pp
- In this section I discuss a collection of other interesting trends
- that may be significant in the future.
- .sh 2 "Data Base Machines"
- .pp
- It appears that the conventional iron mongers are advancing
- the performance of single chip CPUs at nearly a factor of two
- per year, and that this improvement will continue for at
- least the next couple of years. Bill Joy quotes single chip
- CPU performance as:
- .(l
- MIPS = 2 ** (year - 1984)
- .)l
- Therefore, in 1990 we can expect 64 MIPS on a chip. Not
- only is this prognosis
- likely to happen, but also, machines built from the resulting
- chips are guaranteed to be extremely cheap, probably on the order of
- $1K - $4K per MIP. Moreover, nothing stops aggressive
- system integrators from coupling such CPUs into shared memory multiprocessors
- to achieve very powerful multiprocessor machines. Earlier examples of
- this approach include the DEC 62xx series of machines and the Sequent
- Symetry.
- In light of these advances in general purpose
- machines, it seems unlikely that a hardware data base machine
- vendor can develop cost effective CPUs. Because such a vendor makes
- machines by the 10s, he is at a significant disadvantage
- against a conventional iron monger who makes machines by the 10,000s.
- It is generally agreed that a factor of 3, at a bare minimum,
- is required in the
- custom architecture before a custom machine is feasible. Personally,
- I don't see where to get such a number. As a result, I see hardware
- data base machines as a difficult business in the coming years.
- .sh 2 "High Transaction Rate Systems"
- .pp
- It is clear that relational data base systems will be used for
- production applications which generally consist of
- repetitive transactions, each of which is a collection
- of single-record SQL commands.
- The goal is to do 100, 500, even 1000 such transactions per
- second. Most relational systems are getting increasingly nimble
- and should continue to do so over the next couple of years. Moreover,
- all commercial systems have essentially the same architecture,
- so that any tactic
- used by one vendor to increase performance can be quickly
- copied by other vendors. Hence, the
- ``performance wars'' tend to be a ``leapfroging'' sort of affair, and the
- current winner is usually the vendor who
- came out with a new system most recently. Moreover,
- all systems are expected to converge to essentially the
- same ultimate performance.
- .pp
- The bottom line is that
- .b all
- vendors are addressing high transaction rate environment
- because that is where a significant number of customer applications reside.
- All will offer similar performance in this marketplace. The ability
- of any specific vendor to claim this arena as
- his ``turf'' is guaranteed to fail.
- .sh 2 "Main Memory Data Bases"
- .pp
- Not only are CPU prices per MIP plummeting, but also main memory
- prices are in ``free fall''. Prices are currently under $500
- per megabyte in most environments where competition exists, and are
- continuing to drop. Moreover, the maximum amount of main memory
- that can be put on a machine is skyrocketing in a commensurate manner.
- This increasingly allows a client of data base
- services to contemplate a data base entirely (or mostly) resident in
- main memory.
- Current DBMSs have been typically designed under the assumption that
- all (or most) data is on disk. As a result, they must
- be changed to efficiently
- handle very large buffer pools,
- to implement hash-join processing strategies [SHAP86],
- and to deal efficiently with log processing (which may be the only I/O
- which remains in this environment).
- .pp
- The opportunity of
- using
- .b persistent
- main memory is also enticing. One idea would be for the memory system
- to automatically keep the before and after image of any changed bits
- as well as the transaction identifier of the transaction
- making the change. If the transaction gets aborted, the memory
- system can automatically roll backwards. Upon commit, the before
- image can either be discarded or spooled to a safe place to provide
- an additional measure of security. With error correcting codes and
- alternate power
- used in the memory system, this will provide
- a highly reliable main memory transaction system. My speculation
- is that it is neither difficult nor expensive to design such a system.
- .pp
- Such techniques will hopefully become part of commercial iron
- in the not to distant future.
- .sh 2 "New Storage Architectures"
- .pp
- Besides persistent main memory, there are some other ideas that may
- prove appealing. First, one could construct a high speed, write-only
- device with arbitrary capacity. Such an ``ideal logger'' could be
- constructed out of persistent main memory, an auxiliary processor and
- a tape drive or optical disk device. Additionally, the log
- can be substantially compressed
- during spooling. The CPU cycles for such activity seem well worth the
- benefit that appears possible.
- .pp
- Optical disk drives have received considerable attention, and they may well
- play an important part in future memory systems for data managers.
- Lastly, the most intriguing idea concerns the availability of
- very cheap 5 1/4'' and 3 1/2'' drives. Rather than
- using a smaller number of 14'' disks (such as the 3380), it seems plausible
- to construct a large capacity disk system out of an larger number of
- small drives. It appears that such a disk system could offer the
- possibility of a
- .b large
- number of arms and modest (if any) higher cost per bit compared
- to 3380 style technology. A step in this direction direction is
- the work reported in [PATT88].
- Moreover, how to construct file systems for such devices
- is an interesting area of research. For instance, should one stripe
- blocks from a single file across all the disks. Alternately, should one
- retain the sequential organization of most current file systems whereby
- a single file is stored in large extents on a single drive.
- .sh 1 "HOW TO ACHIEVE VENDOR INDEPENDENCE"
- .pp
- The current software and technological environment
- may allow an astute client of data base services
- to achieve vendor independence. What follows
- is a step by step algorithm by which any user can achieve freedom
- from his current hardware vendor. Since the most common vendor
- to which clients are firmly wedded is IBM, we use an IBM customer as an
- example
- and show in this section how that client can become vendor independent.
- We assume that the hypothetical client begins with his data in an
- IMS data base and his application programs running within CICS.
- .sh 2 "Step 1: Get to a Relational Environment"
- .pp
- The first step is for the client to
- replace his data manager with a relational DBMS.
- Many companies are already considering exactly this sort of
- migration, and there are several strategies available to
- accomplish this step.
- In this subsection we discuss one possible approach.
- Consider the purchase of a distributed data base
- system that allows data in local data bases to be managed by a variety of
- local data managers. Such ``open architecture''
- distributed data managers are available at least from
- Relational Technology and Oracle and without doubt,
- will soon be available from others.
- Consequently, the example client should consider purchasing
- a distributed DBMS that
- manages local data within both IMS and the target relational data
- manager. With this software, he can recode his old application programs
- one by one from IMS to his target relational DBMS. At any point in
- time, he has some old and some new programs. The old ones
- can be run directly against IMS, while the new ones can be run
- through the distributed DBMS. After the entire application has
- been recoded to make SQL calls, he can discard the distributed
- DBM,
- move
- the data from IMS to the target DBMS and
- then run his programs directly against the target DBMS.
- .pp
- Hence, a client can obtain a distributed DBMS and then slowly migrate his
- application and data bases from IMS
- to the target environment.
- This code and data conversion can be done at the client's leisure
- over a number
- of years (or even decades). At some point he will finish this step and
- have all his data in a modern DBMS.
- .sh 2 "Step 2: Buy Workstations"
- .pp
- .pp
- It is inevitable that all ``glass teletype'' terminals will be replaced by
- workstations in the near future. Hence, 3270-style terminals
- are guaranteed to become antiques and will be replaced by new
- devices which will be Vaxstation 3000, Sun 3, PC/RT,
- Apollo, Macintosh, or PS 2 style machines.
- Clients will replace their glass teletypes with workstations for
- two reasons:
- .(l
- 1) to get a better human interface
- 2) cost
- .)l
- It is obvious to everybody that bitmap-mouse-window environments
- are much easier to use than 3270 style systems. For example, a user
- can have multiple windows on the screen and
- his application can take as many interrupts as needed
- since a local CPU is being used. There is no need for the
- cumbersome ``type to the bottom of the screen and then hit enter'' interfaces
- that are popular with 3270s. Already, knowledge workers (e.g,
- stock traders, engineers, computer programmers) are being given
- workstations. Later, data workers (e.g, clerks, secretaries, etc.)
- will also get workstations. The basic tradeoff is that a workstation
- translates into some quantifiable improvement in
- employee productivity. The cost, of course, is the purchase and
- maintenance of the
- workstation. This tradeoff will be made in favor of workstations for
- high priced employees and not for lower paid ones. Over time,
- as workstations continue to fall in price,
- it will be cost effective
- to give one to virtually everybody.
- .pp
- The second reason to give employees a workstation is that it enables one to
- move an application program from a mainframe (a 370 in our example)
- which costs more than $100,000 per MIP
- to a workstation which costs perhaps $1000 per MIP. The overall
- cost savings can be staggering.
- Hence, over the next decade I expect workstations to essentially
- replace glass teletypes completely.
- .pp
- Whether one chooses to move to workstations for human interface reasons
- or cost considerations does not matter. To take advantage of
- either, one must move application programs from a 370
- to a workstation. Moreover, the only sensible way to
- do this is to rewrite them completely to change from a
- ``type to the bottom of the screen'' to a ``menu-mouse-bitmap-window''
- style interface. During this rewrite, one must also move the
- program from CICS to some other programming environment (e.g.
- Unix, OS 2) This leads to step 3.
- .sh 2 "Step 3: Rewrite Application Programs"
- .pp
- Whatever the reason chosen, clients must migrate application
- programs from CICS to the workstation. Of course, a client
- can run a window on a workstation that is simply a 3270
- simulator connected to CICS. In this way, a client can slowly migrate
- his applications to the new environment while the old ones continue
- to run in CICS through workstation simulation of a glass
- teletype interface. At some point, all CICS applications will have been
- rewritten, and only a relational DBMS remains running on the
- 370 machine. Of course, this migration may take years (or even
- decades). However a persistent client can move at a rate
- appropriate to his resources. This will lead ultimately to step 4.
- .sh 2 "Step 4: Move to a Server Philosophy"
- .pp
- At this point the example client has
- application programs running on a workstation
- and a relational data base system running on a shared host. These
- machines communicate over some sort of networking system. Moreover,
- the applications send SQL commands over this network to the shared
- host and receive answers or status back. In this environment, one
- should move to
- the following thinking:
- .(l
- workstations are application servers
- shared hosts are SQL servers
- .)l
- Moreover, SQL servers should be thought of as a commodity product.
- To the extent that a client remains within the standard SQL defined
- by ANSI, it should be possible to replace an SQL
- servers built by one vendor (in this case IBM) with
- an SQL server bought from another vendor (say DEC) or even
- by a collection of servers running a distributed data base
- system (say a network of Suns). Vendor independence
- has
- been facilitated since it is now fairly easy to buy SQL cycles
- from the vendor who offers the best package of price/performance/
- reliability. If the vendor of choice fails to remain on the
- performance curve compared to his competitors, there is
- little difficulty in
- unhooking that vendor's machine and replacing it with one built by one
- of his
- competitors which offers superior cost effectiveness.
- .pp
- Similarly, one should think of workstations as application servers.
- If one is careful, one can write applications which run on a
- variety of workstations.
- If the current vendor ceases to offer
- price competitive iron, the client can simply replace his workstations
- by those built by one of his competitors. In this way
- .b iron
- .b independence
- is achieved.
- .sh 2 "Summary"
- .pp
- During these four steps, a client will choose at least the
- following:
- .(l
- a relational DBMS
- a workstation Operating System
- a window manager
- networking hardware and software
- an application programming language
- .)l
- An IBM customer will, of course, be guided by his IBM salesman
- to choose the following:
- .(l
- relational DBMS: DB 2 plus the DBMS in the extended edition of OS 2
- workstation OS: OS 2
- window manager: IBM Presentation Manager
- networking software: SNA
- application programming language: COBOL (?)
- .)l
- In addition, he will be sold on the virtues of SAA as part of
- his solution. If the client moves in this direction,
- he will achieve iron independence to at least some degree. He
- can buy workstations from any of the clone manufacturers and
- can use SQL services that run on the various instantiations
- of IBM iron (e.g, PS 2, AS 400, 370, etc.).
- .pp
- However, the client
- can also make an alternate collection of choices:
- .(l
- relational DBMS: one from an independent vendor
- workstation OS: Unix
- window manager: X Window System
- networking software: TCP/IP
- application programming language: 4GL from an independent vendor
- .)l
- With these choices he can be assured of buying application
- servers and data servers from at
- least the following companies: DEC, DG, IBM, HP, Sun, Apollo,
- ICL, Bull, Siemans, Sequent, Pyramid, Gould, and the clone
- makers.
- .pp
- This section has pointed out a path by which one may
- obtain iron independence. Along this path, a collection of
- options must be chosen. These can be the ones suggested
- by the salesperson of a particular hardware vendor
- or the set that will maximize iron independence.
- This choice can be made by
- each client.
- .sh 2 "Standards Revisited"
- .pp
- We close this paper with some comments on what can be done to
- assist a user in achieving vendor independence. Clearly, a
- user can buy an open architecture distributed
- data base system. In this scenario the client will have available
- the extended SQL implemented by that vendor. Statements
- in extended SQL will run on a local data base that is managed by the
- local data manager provided by the vendor. Standard SQL will
- be executable on foreign local data managers. Such distributed
- data base software will provide a seamless interface that hides
- data location and allows data to be moved at will as business conditions
- change without impacting application programs.
- .pp
- A second possibility is that a user will remain within standard SQL
- and build location information into his application programs. In this
- way, he will expect to send SQL commands onto a network for
- remote processing by some server.
- The server must accept the remote request and send back a reply. To
- facilitate
- being able to replace one server by a different one, it is
- .b crucial
- that a standard format for communication of SQL commands and
- the resulting responses over a network
- be developed. Standardization of remote data base access (RDA) is being
- pursued by ISO but appears not to be an important ANSI activity.
- In my opinion, remote data base access will be
- more important
- than local data base access from an application program.
- I would encourage standards organizations to budget their resources
- accordingly.
-
- .ce
- \fBREFERENCES\fP
- .nr ii 10m
- .ip [BANE87]
- Banerjee, J. et. al., ``Semantics and Implementation of Schema Evolution
- in Object-oriented Databases,'' Proc. 1987 ACM-SIGMOD Conference on
- Management of Data, San Francisco, Ca., May 1987.
- .ip [BANC86]
- Bancilhon, F. and Ramakrishnan, R., ``An Amateur's Introduction
- to Recursive Query Processing Strategies,'' Proc. 1986 ACM-SIGMOD Conference
- on Management of Data, Washington, D.C., May 1986.
- .ip [BERN81]
- Bernstein, P. and Goodman, N., ``Concurrency Control in Database Systems,''
- Computing Surveys, June 1981.
- .ip [BUNE86]
- Buneman, P. and Atkinson, M., ``Inheritance and Persistence in Database
- Programming Languages,'' Proc. 1986 ACM-SIGMOD Conference on Management
- of Data, Washington, D.C., May 1986.
- .ip [CARE86]
- Carey, M., et. al., ``The Architecture
- of the EXODUS Extensible DBMS,'' Proc. International Workshop on
- Object-Oriented Database Systems, Pacific Grove, Ca., September 1986.
- .ip [DADA86]
- Dadams, P. et. al., ``A DBMS Prototype to Support NF2 Relations,'' Proc.
- 1986 ACM-SIGMOD Conference on Management of Data, Washington, D.C., May 1986.
- .ip [DATE85]
- Date, C., ``A Critique of SQL,'' SIGMOD Record, January, 1985.
- .ip [DAVI85]
- Davidson, S. et. al., ``Consistency in Partitioned Networks,'' Computing
- Surveys, Sept. 1985.
- .ip [FISH87]
- Fishman, D. et. al., ``Iris: An Object-Oriented Database Management System,''
- ACM-TOOIS, January, 1987.
- .ip [GUTM84]
- Gutman, A., ``R-trees: A Dynamic Index Structure for Spatial Searching,''
- Proc. 1984 ACM-SIGMOD Conference on Management of Data, Boston, Mass.
- June 1984.
- .ip [IOAN87]
- Ioannidis, Y. and Wong, E., ``Query Optimization Through Simulated
- Annealing,'' Proc. 1987 ACM-SIGMOD Conference on Management of Data,
- San Francisco, Ca., May 1987.
- .ip [KUMA88]
- Kumar, A. and Stonebraker, M., ``Semantics Based Transaction Management
- Techniques for Replicated Data,'' Proc. 1988 ACM-SIGMOD Conference
- on Management of Data, Chicago, Il., June 1988.
- .ip [LIND87]
- Lindsay, B., ``A Data Management Extension Architecture,'' Proc. 1987
- ACM-SIGMOD Conference on Management of Data, San Francisco, Ca., May 1987.
- .ip [PATT88]
- Patterson, D. et. al., ``A Case for Redundant
- Arrays of Inexpensive Disks (RAID),'' Proc. 1988 ACM-SIGMOD Conference
- on Management of Data, Chicago, Il., June 1988.
- .ip [ROBI81]
- Robinson, J., ``The K-D-B Tree: A Search Structure for Large
- Multidimensional Indexes,'' Proc. 1981 ACM-SIGMOD Conference on
- Management of Data, Ann Arbor, Mich., May 1981.
- .ip [ROSE86]
- Rosenthal, A. et. al., ``Traversal Recursion: A Practical
- Approach to Supporting Recursive Applications,'' Proc. 1986 ACM-SIGMOD
- Conference on Management of Data, Washington, D.C., May 1986.
- .ip [ROWE85]
- Rowe, L., ``Fill-in-the-Form Programming,''
- Proc. 1985 Very Large Data Base Conference, Stockholm, Sweden,
- August 1985.
- .ip [SELI80]
- Selinger, P. and Adiba, M., ``Access Path Selection in a Distributed
- Database Management System,'' PROC ICOD, Aberdeen, Scotland, July 1980.
- .ip [SHAP86]
- Shapiro, L., ``Join Processing in
- Database Systems with Large Main Memories,''
- ACM-TODS, Sept. 1986.
- .ip [SKEE82]
- Skeen, D., ``Non-blocking Commit Protocols,'' Proc. 1982 ACM-SIGMOD
- Conference on Management of Data,
- Ann Arbor, Mich., May 1982.
- .ip [STON83]
- Stonebraker, M., et. al., ``Application of Abstract Data Types and Abstract
- Indexes to CAD Data,'' Proc. Engineering Applications Stream of 1983
- Data Base Week, San Jose, Ca., May 1983.
- .ip [STON86a]
- Stonebraker, M. and Rowe, L., ``The Design of POSTGRES,'' Proc.
- 1986 ACM-SIGMOD
- Conference on Management of Data, Washington, D.C., May 1986.
- .ip [STON86b]
- Stonebraker, M., ``Inclusion of New Types in Relational Data Base Systems,''
- Proc. Second International Conference on Data Engineering, Los Angeles,
- Ca., Feb. 1986.
- .ip [STON87]
- Stonebraker M. et. al., ``The Design of the POSTGRES Rules System,'' Proc.
- 1987 IEEE Data Engineering Conference, Los Angeles, Ca., Feb. 1987.
- .ip [TSUR84]
- Tsur, S. and Zaniolo, C., ``An Implementation of GEM -- Supporting a Semantic
- Data Model on a Relational Back-end,'' Proc. 1984 ACM-SIGMOD Conference
- on Management of Data, Boston, Mass., June 1984.
- .ip [ULLM85]
- Ullman, J., ``Implementation of Logical Query Languages for Data Bases,''
- Proceedings of the 1985 ACM-SIGMOD International Conference
- on Management of Data,
- Austin, TX, May 1985.
-