Substantiating the Position

In this section we give very summary details about the implementation of our system. First, we discuss its overall architecture.

In any database system, the key to efficient retrieval is the availability of an ordering between the database entries. Typically, database entries are totally ordered, i.e. for any two distinct entries, one is necessarily smaller than the other (for the given ordering). For software components, we could not identify such a total ordering relation; hence, as a substitute, we defined a partial ordering relation, namely the refinement ordering. This gives our repository a lattice-like structure, where nodes represent specifications; and programs are attached to specifications in such a way that each program is attached to the highest specification with respect to which it is correct. This integrity constraint must be maintained by all database operations. As an example of a database structure, see in figure [*] the database obtained by storing a number of Pascal compilers, whose definition is given in [#!structured92!#]. Each node in this graph represents a specification; to each node we attach programs that are correct with respect to the specification that this node represents (these are not shown in the figure).

Figure: A Repository of Pascal Compilers
\begin{figure}
\setlength{\unitlength}{0.02in}
\begin{center}
\begin{picture}(1...
...e(0,-1){20}}
\put(120,80){\line(0,1){20}}
\end{picture}\end{center}
\end{figure}

Storage and retrieval operations consist of navigating through a graph structure, matching the (storage or retrieval) key against nodes of the graph. Each node of the graph is represented by a Unix file which contains an Otter formula that defines the specification at hand, as well as information on neighboring nodes in the graph, and information on programs that are attached to this node.

The navigation through the graph is carried out by traditional graph traversal algorithms —which we have implemented in C under Unix. The matching of the (storage or retrieval) key against the current node is carried out in two steps: first we generate an Otter theorem that states that the key is a refinement of the current node; then we submit this theorem to the Otter theorem prover and await its result. The theorem prover and our C program interact to determine the result and act accordingly.

Whenever exact retrieval fails, the user has the option of invoking approximate retrieval. Approximate retrieval is defined in terms of a measure of functional proximity between specifications, which measures how close two specifications are by assessing how much information they have in common. We have determined that a lattice operator, namely the meet (greatest lower bound) captures precisely the amount of information that two specifications have in common. The algorithm of approximate retrieval for a given retrieval key k proceeds by computing the meet of k with nodes in the graph, and identifying those noodes that maximize the meet with k (i.e. that have most common information with k). In the compiler example, the graph that we obtain by computing the meet of all nodes with k, for some retrieval k which is given in [#!structured92!#], is shown in figure [*]. Note that this is obtained from the original graph by collapsing its two layers: whenever two nodes are collapsed, we can conclude that they differ by functional properties that are irrelevant as far as k is concerned; i.e. as far as k is concerned, one node is as good as another. In practice, this algorithm has given results that are quite satisfactory to the intuition.

Figure: Ordering Nodes by their Proximity to k.
\begin{figure}
\setlength{\unitlength}{0.02in}
\begin{center}
\begin{picture}(1...
...0,40){o}
\put(120,80){o}
\put(160,120){o}
\end{picture}\end{center}
\end{figure}