There's a Better Way to Define the Correct Realization Notion

Joan Krone

Department of Mathematics and Computer Science
Denison University
Granville, Ohio 43023
Tel: (614) 587-6484
Fax: (614) 587-6417
Email: krone@denison.edu

William F. Ogden

Department of Computer and Information Science
The Ohio State University
Columbus, OH 43210
Tel: (614) 292-6004
Fax: (614) 292-2911
Email: ogden@cis.ohio-state.edu

Abstract

Specifying software interfaces that give developers maximal implementational latitude but that, at the same time, provide clients with maximal functional utility and conceptual understandability often leads to situations where the state space of a proposed implementation bears no resemblance to the interface model. Adequate generality requires relational specifications, and when relationally specified components are composed, they may lead to non-deterministic implementations, something existing formal treatment options fail to handle except in special cases.

For the general case we propose a natural, theoretical definition of what it means for an implementation to satisfy an abstract specification. This definition, both global and behavioral, sheds light on the more traditional pragmatic definitions of refinements, correspondences, and other mappings, which involve local static relationships between the realization and conceptualization views of objects, as opposed to relationships between views of behaviors. In particular, we claim that the behavioral definition of correct implementation always gives rise to a correspondence relation between abstract objects and their realizations. This means that a complete modularly verifiable system for specifying reusable software can be developed by including a mechanism for defining correspondence relations.

Keywords: data abstraction, semantics, verification, concept, realization

Workshop Goals: discussing technical issues related to reuse

Groups: reuse and formal methods, reusable component certification, design guidelines for reuse, education

1 Background

For the past several years the Reusable Software Research Group at Ohio State University has examined the issue of reuse from many points of view, emphasizing a disciplined approach to software design that permits composition of components and modular reasoning. Specifications are called concepts and implementations are referred to as realizations. We believe that these two components should be separate entities and that a client of any component should see only the concept without implementation details. We believe that software should be certified as correct before it is used or reused and that this certification process should be based on mathematical reasoning. It is this concern for correctness that motivates our investigation of the formal semantics for data abstraction, i.e., the correct realization notion.

2 Position

The reuse community has developed a multiplicity of guidelines to follow when designing software with reuse in mind [7]. Two of those guidelines stand out as critically important, both tied to the notion of correct realizations:

* Software should be specified at an appropriate level of abstraction.

* Software should not be reused unless it has been certified as correct.

The choice of an appropriate level of abstraction is important because it allows one to reason about what the software is doing without needing to know implementation details. We call this abstract specification a concept. A well designed concept is broad enough to encompass all possible implementations for that concept, including those that require temporal flexibility. For example, one might describe a sorting machine in a concept so that the implementer is free to use heap sort, quick sort, or any other sorting algorithm, hence employing reuse of the concept itself.

Moreover, if one wants to include lazy initialization or an eager beaver approach to carrying out any part of a specification, it should be entirely possible to do so and still stay within the conceptual requirements. With regard to abstract data types, well designed concepts specifying an ADT should export not just a single type, but rather a family of types, i.e., should have generic capabilities. For example, a stack concept should describe stacks of any size and of any type, thereby promoting multiple reuse.

All of these general characteristics of reusable software are ones that good design requires, and our semantics for data abstraction must be capable of dealing with all of them. However, there is never a guarantee that in any given system all of the components will be designed according to accepted guidelines. There is no way to screen out inferior design or poorly chosen implementations for either good or bad specifications. Our definition of a correct realization must be able to include all of these.

For example, suppose one needs to implement a specification calling for Z6 with two operations, one operation that increments any given object of Z6 and another that permits one to determine whether an object has an even or an odd value. Each object is initialized to zero. One might choose to implement this concept using Z2, since alternating zeros and ones will maintain consistency with incrementing in Z6 in view of the fact that the only operation allowing one to see a result is the one that tells whether the value is even or odd. Similarly, one could choose to implement this concept using Z4, definitely not something the software engineering community might encourage, but nevertheless a possibility that our definition must accommodate.

Indeed, our definition of a correct realization must be able to explain a program that is its own specification at one extreme, while being robust enough to explain implementations that include hidden capabilities at the opposite extreme. For example, one might specify a search concept for which a variety of possible realizations might be chosen, among them being linear search, tree search, and use of a hash function for storing and locating values. A good design of the search concept should be at a level of abstraction that allows reasoning about a search without pinning down the specific details of how the search is to be carried out. The presence of a hash function should not show up in the specification, yet our semantics must be able to accept such an implementation as correct.

Still more challenging are implementations for non-deterministic specifications. For example, one might write a concept in which one calls for the choice of a number within a given range. Among intuitively acceptable realizations are those that use some random generator for the choice, those that always pick a single value, and those that follow any particular pattern between the single choice and the random choice. In this case, it may be possible for a client using such a program component to guess that a particular pattern is being used and then to depend on that pattern in writing other client components. Most would agree that a program depending on a hidden pattern in order to carry out some particular job is in fact acceptable. This means that our semantics must be able to explain this situation and accept such a program as correct.

While most approaches to correctness depend on mappings between conceptual and implementation spaces, we believe that the notion of a realization meeting a given specification should be expressible in terms independent from such mappings. Such a definition will provide semantics for not only the concept of data abstraction but for program correctness in general.

3 Comparison

In order to certify a given piece of software as correct, one needs a way to connect realization objects to the conceptual objects. In the literature this connection has been introduced as an "abstraction function" by some, as a "correspondence" by others, and as a "simulation" or a "refinement function" in the concurrency community [1],[2],[3],[4,[5],[6]. In fact it has been shown that this relationship need not always be a function, but rather may sometimes need to be a relation [8] .

One might categorize implementations according to these mappings.

* Those that have a one-to-one match with the concept. For example, suppose a concept specifies a stack as a string, and an implementation for it uses an array and top index. Here, the correspondence is functional, matching the conceptual string with the concatenation of the elements in the array from the first to the top.

* Those for which a given implementation may require relational correspondence with the concept, because for each object in the realization space, there may be multiple matching objects in the concept . For example, suppose one writes a general specification for a minimal spanning forest. One might choose a Kruskal or a Prim implementation or some other not as popular one. In any case it is likely that as a spanning forest is built, there will be choices to make, i.e., there may be several edges of equal weight only one of which must be chosen to add to an existing partial spanning forest on the way to completing the construction. Here there is no longer a one-to-one correspondence, but instead a one-to-many relation.

* Those for which the realization may get away with a less complicated structure than the concept describes. For example, given a specification for a two state machine with operations change_state and read_state, in which the read state operation has a pre-condition that allows reading only when the machine is in the initial state, one might use an implementation with only one state. Here a correspondence can be given only by introducing an adjunct variable.

The one-to-one matching is easy to deal with formally, but the other cases present technical difficulties when trying to provide a proof system that is both sound and complete. Among the most challenging situations are realizations that are in any way non-deterministic and realizations in which there are time delays. For example, a concept may call for the toss of a die, but the realization may do nothing until the value of the die is actually needed for some later action. In this case, if one tries to match conceptual traces with implementation traces, the problem is that at the conceptual level one assumes the die has been tossed, but at the realization it may not have been.

An even simpler case concerns operators with more than one argument. While simple state to state traces may be sufficient to explain behavior of monadic operators, such traces are inadequate for operators with more than one argument.

We believe that it is important to define the "correct realization" notion in terms that are independent from mappings and in a manner that can accommodate all of the above listed cases. In fact, it is necessary that this definition be capable of encompassing all kinds of realizations that should intuitively be considered correct.

Rather than attempting to make this definition in terms of mappings or even in terms of conceptual and realization traces, we propose that such a definition can be given in terms of a single sequence of "scenarios," a kind of history written in terms of the operators (the names of which are the same in the concept and in the realization). Such scenarios can be interpreted both in the conceptual world and the implementation world. Since each single scenario has two interpretations, one can discuss what is happening in both spaces at any single point. These scenarios will look like sequences of procedure calls, and a "proper scenario" is one in which at any point before a particular call, the confirm clause of the most recently called operator holds and the assume clause of the concept has been met for the next call.

Finally, having a "proper scenario" should be equivalent to the existence of a correspondence between the given realization and the concept it is implementing.

References

[1] M. Abadi, L. Lamport, "The Existence of Refinement Mappings," Theoretical Computer Science, vol. 82, pp. 253-284, 1991.

[2] P. Gardiner, C. Morgan, "A Single Complete Rule for Data Refinement," Journal of Formal Aspects of Computing Science, vol. 5, 367-382, 1993.

[3] C.A.R. Hoare, He, Jifeng and J. Sanders, "Prespecification in Data Refinement," Information Processing Letters, vol. 25, pp. 71-76, 1987.

[4] C. Jones, Systematic Software Development Using VDM, Prentice-Hall, 1990.

[5] B. Liskov, J. Guttag, Abstraction and Specification in Program Development, MIT Press and McGraw Hill, 1986.

[6] N. Lynch, F. Vaandrager, "Forward and Backward Simulations," MIT/LCS/TM-486, MIT, Cambridge, MA.

[7] W. Ogden. M. Sitaraman, B. Weide, S. Zweben, "The RESOLVE Framework and Discipline - - A Research Synopsis," Software Engineering Notes, vol. 19, pp. 23-28, 1994.

[8] M. Sitaraman, B. Weide, W. Ogden, "Using Abstraction Relations to Verify Abstract Data Type Representations," IEEE TSE, 1997, to appear.

Biography

Joan Krone has participated in the Reusable Software Research Group since 1984. She currently chairs the Department of Mathematics and Computer Science at Denison University. Her area of research is formal specification and verification of software.

Bill Ogden is Associate Professor of Computer Science at Ohio State University, and he is also a member of the Reusable Software Research Group. His research is focused on the problem of providing a conceptually robust framework for software engineering.