Predator

The Predator system is based on the idea that data structures should be mechanically generated from libraries of primitive components, where each component implements precisely one feature. Users specify the set of features that they want, and Predator synthesizes the target data structure. This approach eliminates the implementation and maintenance problems of feature combinatorics, and is scalable by just adding new primitives to the Predator library.

In program generation systems like Predator, careful design and implementation of components is critical. The interfaces of components should reflect the basic abstractions of the domain. Such interfaces might be identified using domain modeling techniques. In Predator, component interfaces actually were deliberately designed to ensure that they would be suitable for building complex data structures. Good component designs result from interfaces that possess the following three properties [#!batory92a!#,#!batory92b!#]:

High-level abstractions. It is well-known that using high-level abstractions makes programs easier to write and debug. It is essential for component interfaces to hide the complex details of their encapsulated data structures; not doing so would make components difficult to use and virtually impossible to combine.
Standardized interfaces. A key feature of software component/software generator technologies is the ability to interchange different data structure implementations to address application performance requirements. Note that plug-compatible interfaces are already present to some extent in existing component libraries (such as Booch Components and libg++). In fact, all basic data structures (lists, trees, arrays, etc.) could even be viewed as different implementations of the same container abstraction (that is, all of these data structures serve as containers for collections of objects).
Layered designs. Experience has shown that many software systems have hierarchical designs. The layering of abstractions (and their implementations) provides a powerful way to design, build, and understand complex software. Layering is an important technique for managing complexity. By partitioning a system into layers, system design can be greatly simplified.

High-level abstractions, standardized interfaces, and layered designs characterize our implementation of Predator. Each data structure feature is encapsulated in a separate component. This collection of components—which is inherently open-ended—defines the Predator library. Target data structures—those that would be requested by Predator users—are specified as hierarchical compositions of library components.

Predator provides language extensions to support the specification and composition of primitive components, and compilers to convert them into efficient executable code. The Predator compilers use advanced optimizations such as inlining and partial evaluation. Currently, there are two compilers (P1 [#!sirkin93!#] and P2 [#!batory93!#]), both providing language extensions to ANSI C.¹

P1 demonstrated that our approach was promising. It was used to generate the data structures for the OPS5/LEAPS system, a compiler for OPS5 rules [#!brant93!#]. OPS5/LEAPS was chosen because it demands high-performance and complex data structures. Preliminary experiments have shown that using P1 to generate OPS5/LEAPS code results in improved programmer productivity and executable code performance relative to hand written code. In the largest example attempted so far, P1 generated almost 7000 lines of C code whose performance was 20-30% better than that of OPS5/LEAPS. Reports on these experiments are forthcoming.

P2 is a follow-on project to P1. It supports domain-specific extensions to ANSI C and provides a more modular and maintainable architecture than P1. P2 is a system that we plan to distribute.

In order to further verify the productivity and executable code performance advantages of our approach, we used a simple benchmark²to test programs using the Booch Components and libg++ container classes against P1 and P2 generated container code.

**Figure:** Code size (in words) of dictionary benchmark programs.
$\begin{figure}\begin{center} \begin{tabular}{\vert l\vert r\vert r\vert r\vert r... ...\\ P2 & 308 & 310 & 316 & 310 \\ \hline \end{tabular}\end{center} \end{figure}$

**Figure:** Execution time (in seconds) of dictionary benchmark programs.
$\begin{figure}\begin{center} \begin{tabular}{\vert l\vert r\vert r\vert r\vert r... ... P2 & 39.9 & 33.1 & 5.9 & 2.9 \\ \hline \end{tabular}\end{center} \end{figure}$

Three observations regarding productivity were immediately apparent:

The size of the P1 and P2 programs were the same or smaller than corresponding implementations for the Booch C++ Components and libg++ (see Figure $[*]$ ). The reason is that both P1 and P2 offer high-level container abstractions that make programs compact and quicker to write.
It was trivial to alter container implementations in P1 and P2 programs. In general, only a few lines of declarations needed to be changed; no executable lines were modified.
Programs that used the Booch C++ Component and libg++ libraries required more extensive modifications when container implementations were altered. Different data structures had either different interfaces or different names for semantically equivalent functions.

Also observe that the performance of P1 and P2 code is comparable to the performance of the other programs (see Figure $[*]$ ).