Examining Behavioural Retrieval

Steven Atkinson

Software Verification Research Centre
Department of Computer Science
University of Queensland
Brisbane, AUSTRALIA, 4072
Tel: +61 (0)7 3365-1003
Fax: +61 (0)7 3365-1533
Email: atkis@cs.uq.edu.au

Abstract:

The challenge of software reuse is to record software development knowledge in such a way that it is readily stored digitally and easily retrieved, understood and used by other software engineers. Hence at the highest level, the process of retrieval from a repository is to take a description of a problem, and search for stored knowledge which may help to solve the described problem.

The purpose of component retrieval is therefore concerned with finding component behaviours which solve a problem described by a query. Most proposed component retrieval schemes simulate this search for desired behaviour, either by using text-based, vocabulary classification, structural matching or specification matching techniques. Of course, most retrieval schemes also consider contextual aspects, such as resource usage, efficiency and internal representation.

A theoretical definition of behavioural retrieval is required to describe the ideal (modulo contextual aspects) retrieval scheme, despite its probable inefficiency in practice. In this paper a theoretical definition of behavioural retrieval is outlined, showing that useful, exploitable structure does exist in behaviour.

Keywords: reuse, component behaviour models, retrieval

Workshop Goals: extending behavioural theory to reuse-in-the-large, networking

Working Groups: ``Rigorous Behavioural Specification as an Aid to Reuse''
``Component Certification Tools, Frameworks and Processes''

Background

The development of useful software component libraries is a major challenge in the field of software reuse. The central operation of a software library system is that of retrieval - how to find a collection of components which are most suitable for a particular application. A library is of no value if software engineers cannot exploit its stored knowledge. Along with other operations of insertion and evolution, approaches to the library operation of retrieval can benefit from formalization. The goal of this research is to formalize software library operations to form an integrated specification of a software library system, defining a theory upon which practical libraries can be built.

Position

In this position paper, a theoretical retrieval strategy utilizing component behaviour is outlined. Most proposed retrieval methods approximate behavioural retrieval by using surrogate representations (e.g. terms inside facets, signatures, predicates) to describe behavioural properties of components. It is therefore pertinent to develop a theory of behavioural retrieval that while probably impractical to implement, avoids approximations and uses complete information.

Behavioural retrieval works by exploiting the executability of software components. Programs are executed using components, and the responses of components are recorded. Retrieval is achieved by selecting those components whose responses (with respect to the program) are closest to a pre-determined set of desired responses. This idea was originally called ``behavioural sampling'' by Podgurski and Pierce[1].

Behavioural Retrieval

The behavioural retrieval scheme is outlined as follows: we first take an abstract view of components, in terms of their execution responses to programs. It will be shown in Section 3.2 that these collections of responses can be meaningfully partially ordered. Section 3.3 shows how the lattice induced by the partial order can be used to find best behavioural approximations (if an exact match cannot be found) to the component desired.

An Execution Model

In order to develop a theory of behavioural retrieval, an execution model is required to explain how components execute programs and generate output responses.

In this execution model, a component is represented as a relation between programs and responses. This is because in general, a program execution can yield several responses (due to non-determinism) and a response may be evoked by more than one program. Formally, a component c can be declared as:

displaymath224

A program tex2html_wrap_inline228 is modelled as a sequence of calls on the component's interface. A response is a sequence of values in correspondence with a program. A value at position i in a response sequence represents the effect that the ith call in the program sequence had on the component and the environment.

It is important to note that this theoretical treatment of behavioural retrieval does not dictate a representation for effects; the representation of effects may or may not include descriptions of how the component state changed as a result of the call, or what information was shared with the environment.

Execution proceeds as follows: when sent a program a component will respond by executing each call in the program and producing a corresponding output sequence (a response). This continues until all calls in the program have been executed or a call is rejected, in which case the execution ceases at that stage.

In practice there are various reasons why a call may be rejected. The method name called may not be the name of an operation in the component's interface, or the supplied input may not satisfy the operation's preconditions. The precise reason will not be a concern at this level of abstraction.

In theory it is possible to collect all possible component responses to a given program. Therefore the result of executing a program p on a component c yields a collection of response sequences tex2html_wrap_inline238 . This can be formally denoted as the image of p under the relation c: i.e., tex2html_wrap_inline244 .

Other possible execution models can be adopted. One example of a different model is where a component provides all responses for each ``run'' of calls in the program where consecutive calls are not rejected. This execution model may help locate components which by themselves only partially usefully respond to a program but when composed could yield the response desired.

Ordering Responses

 

Behavioural retrieval of components critically depends upon having a sound basis for behavioural comparison, otherwise library retrieval operations would not select the most appropriate components. In this section, a notion of component behaviour is derived from the collection of responses, and a meaningful partial order over behaviours is defined.

Deriving Behaviour from Collected Responses

In effect, each program determines a context in which the behaviour of a component is exhibited. The behaviour of a component c is derived from the set of response sequences tex2html_wrap_inline238 by removing those responses which are proper extensions of other responses. Thus, the behaviour of a component c is the set of guaranteed responses to a program p.

In order to obtain a behaviour tex2html_wrap_inline254 from a collection of responses tex2html_wrap_inline238 , we apply the following filter function:

axdef40

The behaviour tex2html_wrap_inline254 of a component can be now be denoted as: tex2html_wrap_inline260 . It is important to note that the filter function is not one-to-one; there are components whose effects in response to a program are different, but whose filtered behaviour is the same. Components whose behaviours with respect to a given program are identical are said to be behaviourally equivalent.

The Partial Order and its Interpretation

Essentially, a component behaviour is a set of sequences of effects, where the representation of effects is deliberately left undecided. Consider two such sets tex2html_wrap_inline266 and tex2html_wrap_inline268 constructed by executing a program p on two different components. A partial order over behaviours tex2html_wrap_inline266 and tex2html_wrap_inline268 is defined as:

displaymath264

That is, behaviour tex2html_wrap_inline266 is related to behaviour tex2html_wrap_inline268 under tex2html_wrap_inline280 if for all response sequences in tex2html_wrap_inline268 , either the sequence or a prefix exists in the set of responses tex2html_wrap_inline266 . There are two reasons why this particular partial order over behaviours has been chosen. Firstly, this definition allows for removal of non-determinism, since it is possible that some responses in tex2html_wrap_inline266 are not in tex2html_wrap_inline268 . Secondly, the definition allows for removal of rejected calls, by allowing sequences in tex2html_wrap_inline268 to be extensions of sequences in tex2html_wrap_inline266 .

This measure of behavioural closeness is based upon common effects (sequence prefixes), amount of determinism and considers rejection points (blocking).

Retrieving using Behaviours

 

The partial order defines a lattice structure, where the meet of two behaviours is defined as tex2html_wrap_inline296 . The meet of two behaviours can be interpreted as the behaviour which can respond to a program as either tex2html_wrap_inline266 or tex2html_wrap_inline268 would respond, up to certain calls in the program. When these calls are executed, the meet behaviour can reject the call and either tex2html_wrap_inline266 or tex2html_wrap_inline268 does not, thereby being extensions under tex2html_wrap_inline280 of the meet behaviour.

The join of two behaviours is defined as tex2html_wrap_inline318 , where the function tex2html_wrap_inline320 removes all prefix sequences from a set of responses. The join of two behaviours is not always defined, because two components may respond to a program p with completely unrelated responses. The join of two behaviours can be interpreted as the behaviour which can respond to a program p as either of behaviours tex2html_wrap_inline266 or tex2html_wrap_inline268 , but may have less response sequences (more deterministic) and extended response sequences (blocks less often) as compared to behaviours tex2html_wrap_inline266 and tex2html_wrap_inline268 .

The retrieval process is now defined using the meet and join operations. A behavioural query consists of two parts:

Given a query (p,b), behavioural retrieval proceeds as shown in Figure 1:

   figure84
Figure 1: The Process of Behavioural Retrieval

Phase 1: the program is executed on each component in the library, resulting in a set of behaviours SB;

Phase 2: the set of behaviours SB is by definition embedded in the lattice of tex2html_wrap_inline280 . The position of the desired behaviour b in the lattice is found.

Phase 3: the closest ancestor and descendant behaviours of b in SB (if any) are selected, using the lattice operators meet and join.

Phase 4: the components which responded to program p with those closest behaviours (if any) are returned as the result of the retrieval.

The retrieval process is sound in the sense that if a set of behaviourally equivalent components C with a behaviour exactly matching the desired behaviour exists in the component library, then all members of C and no other components are returned as a result of the retrieval process (i.e. perfect recall and precision). The proof of this property is given in [2].

Comparison

The principle of behavioural retrieval was first suggested by Podgurski and Pierce[1]. Their ``behavioural sampling'' technique did not necessarily collect all the possible execution responses but rather ``sampled'' the responses over a number of executions, and exercised the most commonly used operations based on a probability distribution.

The idea of using a lattice structure as a vehicle for retrieval has been suggested by Mili et al[3]. In their retrieval system the nodes of the lattice were relations, serving as surrogates for specifications. The partial ordering over relations, more-defined, has strong similarities in purpose with the ordering presented above, although the process of Figure 1 returns not only refinements of the desired behaviour but also components whose behaviour may be refined to that desired.

The theory for behavioural retrieval, outlined here and given in detail in [2], combines and extends the behavioural sampling and lattice-based retrieval work in three directions. Firstly, an abstract model of execution and behaviour was defined, independent of any particular language formalism. Secondly, a notion of rejection is included in the behavioural model. Thirdly, a sound prescription for behavioural retrieval including a precise notion of approximate behaviour is provided.

It is intended that current research into composition in object-oriented systems[4] will be combined with this theory of behavioural retrieval to directly address the problem of compositional retrieval (i.e. retrieval of composed components to satisfy a behavioural requirement). The compositional retrieval problem was noted by Hall[5] and repeated in Mili et al[6].

In conclusion, the theory of behavioural retrieval has revealed an exploitable lattice structure by directly using perfect knowledge of component behaviour, rather than approximations to it. Despite its impracticality, the behavioural retrieval scheme is an important backdrop for other retrieval schemes. It is sound in the sense that it has perfect recall and precision when exact matches are available, and complete in that it provides a meaningful notion of behavioural approximation.

References

1
A. Podgurski and L. Pierce, ``Behaviour sampling: A technique for automated retrieval of reusable components,'' in Proc. 14th ICSE, pp. 349-360, 1992.

2
S. Atkinson and R. Duke, ``Behavioural retrieval from class libraries,'' Australian Computer Science Communications, vol. 17, pp. 13-20, Jan. 1995. An extended version appears as Software Verification Research Centre Technical Report 94-28, 1994., available at URL ftp://ftp.cs.uq.edu.au/pub/SVRC/techreports/tr94-28.ps.Z.

3
A. Mili, R. Mili, and R. Mittermeir, ``Storing and retrieving software components: A refinement based system,'' in Proc. 16th ICSE, pp. 91-100, IEEE Computer Society Press, May 1994.

4
S. Butler and R. Duke, ``Defining Composition Operators for Object Interaction,'' Tech. Rep. 96-12, Software Verification Research Centre, Dept. of Computer Science , Univ. of Queensland, 1996. Accepted for publication in Journal of Object-oriented Systems (OOS).

5
R. J. Hall, ``Generalized behavior-based retrieval,'' in Proc. 15th ICSE, pp. 371-380, May 1993.

6
H. Mili, F. Mili, and A. Mili, ``Reusing Software: Issues and Research Directions,'' IEEE Transaction on Software Engineering, vol. 21, pp. 528-561, June 1995.

7
J. Harrison, P. Bailes, A. Berglas, and I. Peake, ``Re-engineering 4gl-based information system applications,'' in Asia-Pacific Software Engineering Conference, pp. 448-457, IEEE Computer Society Press, Los Alamitos CA., 1995.

8
R. Duke, P. King, G. Rose, and G. Smith, ``The Object-Z specification language,'' in Technology of Object-Oriented Languages and Systems: TOOLS 5 (T. Korson, V. Vaishnavi, and B. Meyer, eds.), pp. 465-483, Prentice-Hall, 1991.

Biography

Steven Atkinson is currently writing up his dissertation, titled ``Formal Reuse from Software Libraries''. He has been studying as a doctoral candidate at the Software Verification Research Centre for the last three years. During that time, he has also participated as a programmer on a knowledge-based software re-engineering project[7]. His research interests include formal, knowledge-based approaches to the technical problems of software reuse, re-engineering of legacy code and the ongoing development of the formal object-oriented specification language Object-Z[8].