Steven Atkinson
Software Verification Research Centre
Department of Computer Science
University of Queensland
Brisbane, AUSTRALIA, 4072
Tel: +61 (0)7 3365-1003
Fax: +61 (0)7 3365-1533
Email:
atkis@cs.uq.edu.au
The challenge of software reuse is to record software development knowledge in such a way that it is readily stored digitally and easily retrieved, understood and used by other software engineers. Hence at the highest level, the process of retrieval from a repository is to take a description of a problem, and search for stored knowledge which may help to solve the described problem.
The purpose of component retrieval is therefore concerned with finding component behaviours which solve a problem described by a query. Most proposed component retrieval schemes simulate this search for desired behaviour, either by using text-based, vocabulary classification, structural matching or specification matching techniques. Of course, most retrieval schemes also consider contextual aspects, such as resource usage, efficiency and internal representation.
A theoretical definition of behavioural retrieval is required to describe the ideal (modulo contextual aspects) retrieval scheme, despite its probable inefficiency in practice. In this paper a theoretical definition of behavioural retrieval is outlined, showing that useful, exploitable structure does exist in behaviour.
Keywords: reuse, component behaviour models, retrieval
Workshop Goals: extending behavioural theory to reuse-in-the-large, networking
Working Groups: ``Rigorous Behavioural Specification as an Aid
to Reuse''
``Component Certification Tools, Frameworks and
Processes''
The development of useful software component libraries is a major challenge in the field of software reuse. The central operation of a software library system is that of retrieval - how to find a collection of components which are most suitable for a particular application. A library is of no value if software engineers cannot exploit its stored knowledge. Along with other operations of insertion and evolution, approaches to the library operation of retrieval can benefit from formalization. The goal of this research is to formalize software library operations to form an integrated specification of a software library system, defining a theory upon which practical libraries can be built.
In this position paper, a theoretical retrieval strategy utilizing component behaviour is outlined. Most proposed retrieval methods approximate behavioural retrieval by using surrogate representations (e.g. terms inside facets, signatures, predicates) to describe behavioural properties of components. It is therefore pertinent to develop a theory of behavioural retrieval that while probably impractical to implement, avoids approximations and uses complete information.
Behavioural retrieval works by exploiting the executability of software components. Programs are executed using components, and the responses of components are recorded. Retrieval is achieved by selecting those components whose responses (with respect to the program) are closest to a pre-determined set of desired responses. This idea was originally called ``behavioural sampling'' by Podgurski and Pierce[1].
The behavioural retrieval scheme is outlined as follows: we first take an abstract view of components, in terms of their execution responses to programs. It will be shown in Section 3.2 that these collections of responses can be meaningfully partially ordered. Section 3.3 shows how the lattice induced by the partial order can be used to find best behavioural approximations (if an exact match cannot be found) to the component desired.
In order to develop a theory of behavioural retrieval, an execution model is required to explain how components execute programs and generate output responses.
In this execution model, a component is represented as a relation between programs and responses. This is because in general, a program execution can yield several responses (due to non-determinism) and a response may be evoked by more than one program. Formally, a component c can be declared as:
A program is modelled as a sequence of calls on the component's interface. A response is a sequence of values in correspondence with a program. A value at position i in a response sequence represents the effect that the ith call in the program sequence had on the component and the environment.
It is important to note that this theoretical treatment of behavioural retrieval does not dictate a representation for effects; the representation of effects may or may not include descriptions of how the component state changed as a result of the call, or what information was shared with the environment.
Execution proceeds as follows: when sent a program a component will respond by executing each call in the program and producing a corresponding output sequence (a response). This continues until all calls in the program have been executed or a call is rejected, in which case the execution ceases at that stage.
In practice there are various reasons why a call may be rejected. The method name called may not be the name of an operation in the component's interface, or the supplied input may not satisfy the operation's preconditions. The precise reason will not be a concern at this level of abstraction.
In theory it is possible to collect all possible component responses to a given program. Therefore the result of executing a program p on a component c yields a collection of response sequences . This can be formally denoted as the image of p under the relation c: i.e., .
Other possible execution models can be adopted. One example of a different model is where a component provides all responses for each ``run'' of calls in the program where consecutive calls are not rejected. This execution model may help locate components which by themselves only partially usefully respond to a program but when composed could yield the response desired.
Behavioural retrieval of components critically depends upon having a sound basis for behavioural comparison, otherwise library retrieval operations would not select the most appropriate components. In this section, a notion of component behaviour is derived from the collection of responses, and a meaningful partial order over behaviours is defined.
In effect, each program determines a context in which the behaviour of a component is exhibited. The behaviour of a component c is derived from the set of response sequences by removing those responses which are proper extensions of other responses. Thus, the behaviour of a component c is the set of guaranteed responses to a program p.
In order to obtain a behaviour from a collection of responses , we apply the following filter function:
The behaviour of a component can be now be denoted as: . It is important to note that the filter function is not one-to-one; there are components whose effects in response to a program are different, but whose filtered behaviour is the same. Components whose behaviours with respect to a given program are identical are said to be behaviourally equivalent.
Essentially, a component behaviour is a set of sequences of effects, where the representation of effects is deliberately left undecided. Consider two such sets and constructed by executing a program p on two different components. A partial order over behaviours and is defined as:
That is, behaviour is related to behaviour under if for all response sequences in , either the sequence or a prefix exists in the set of responses . There are two reasons why this particular partial order over behaviours has been chosen. Firstly, this definition allows for removal of non-determinism, since it is possible that some responses in are not in . Secondly, the definition allows for removal of rejected calls, by allowing sequences in to be extensions of sequences in .
This measure of behavioural closeness is based upon common effects (sequence prefixes), amount of determinism and considers rejection points (blocking).
The partial order defines a lattice structure, where the meet of two behaviours is defined as . The meet of two behaviours can be interpreted as the behaviour which can respond to a program as either or would respond, up to certain calls in the program. When these calls are executed, the meet behaviour can reject the call and either or does not, thereby being extensions under of the meet behaviour.
The join of two behaviours is defined as , where the function removes all prefix sequences from a set of responses. The join of two behaviours is not always defined, because two components may respond to a program p with completely unrelated responses. The join of two behaviours can be interpreted as the behaviour which can respond to a program p as either of behaviours or , but may have less response sequences (more deterministic) and extended response sequences (blocks less often) as compared to behaviours and .
The retrieval process is now defined using the meet and join operations. A behavioural query consists of two parts:
Given a query (p,b), behavioural retrieval proceeds as shown in Figure 1:
Figure 1: The Process of Behavioural Retrieval
The retrieval process is sound in the sense that if a set of behaviourally equivalent components C with a behaviour exactly matching the desired behaviour exists in the component library, then all members of C and no other components are returned as a result of the retrieval process (i.e. perfect recall and precision). The proof of this property is given in [2].
The principle of behavioural retrieval was first suggested by Podgurski and Pierce[1]. Their ``behavioural sampling'' technique did not necessarily collect all the possible execution responses but rather ``sampled'' the responses over a number of executions, and exercised the most commonly used operations based on a probability distribution.
The idea of using a lattice structure as a vehicle for retrieval has been suggested by Mili et al[3]. In their retrieval system the nodes of the lattice were relations, serving as surrogates for specifications. The partial ordering over relations, more-defined, has strong similarities in purpose with the ordering presented above, although the process of Figure 1 returns not only refinements of the desired behaviour but also components whose behaviour may be refined to that desired.
The theory for behavioural retrieval, outlined here and given in detail in [2], combines and extends the behavioural sampling and lattice-based retrieval work in three directions. Firstly, an abstract model of execution and behaviour was defined, independent of any particular language formalism. Secondly, a notion of rejection is included in the behavioural model. Thirdly, a sound prescription for behavioural retrieval including a precise notion of approximate behaviour is provided.
It is intended that current research into composition in object-oriented systems[4] will be combined with this theory of behavioural retrieval to directly address the problem of compositional retrieval (i.e. retrieval of composed components to satisfy a behavioural requirement). The compositional retrieval problem was noted by Hall[5] and repeated in Mili et al[6].
In conclusion, the theory of behavioural retrieval has revealed an exploitable lattice structure by directly using perfect knowledge of component behaviour, rather than approximations to it. Despite its impracticality, the behavioural retrieval scheme is an important backdrop for other retrieval schemes. It is sound in the sense that it has perfect recall and precision when exact matches are available, and complete in that it provides a meaningful notion of behavioural approximation.
Steven Atkinson is currently writing up his dissertation, titled ``Formal Reuse from Software Libraries''. He has been studying as a doctoral candidate at the Software Verification Research Centre for the last three years. During that time, he has also participated as a programmer on a knowledge-based software re-engineering project[7]. His research interests include formal, knowledge-based approaches to the technical problems of software reuse, re-engineering of legacy code and the ongoing development of the formal object-oriented specification language Object-Z[8].