Creation of Reusable Components Based on Formal Methods

Youwen Ouyang and Doris L. Carver

Dept. of Computer Science
Louisiana State University
Baton Rouge, LA 70803
Email: youwen@bit.csc.lsu.edu

Abstract:

Challenges for repository-based reuse include populating the repository by artifacts that are both useful and usable, organizing the artifacts in a way that promotes easy retrieval and comparison, and providing guidelines to assist necessary modifications. We emphasize the aspect of reuse that a component can be constructed by modifying another component. Clustering analysis is applied to the formal specification of an application to group components that are similar in function into clusters. Generic requirements are extracted for the resulting clusters. The repository is populated by frameworks that are generated to accomplish the requirements. The individual components are then implemented by modifying the appropriate framework of the same clusters.

Keywords: Formal specification, clustering analysis, generic requirements, reusable frameworks, repository-based reuse

Workshop Goals: Learning and networking, exchange of ideas

Working Groups: Object Technology, Architectures, and Domain Analysis: What'Æs the Connection? Is there a connection?

Background

Designing and constructing reusable components requires effort that is not required to build non-reusable software components. Evidence has shown that reusable components must be properly defined to be exploitable for designing new applications [1]. However, developers are focused on their particular applications and are constrained in terms of time, costs, and resources. Project teams typically do not want to spend additional time developing software that does not directly affect their own application [2]. The reality remains that existing components are typically not interoperable (because they are based on conflicting assumptions), not composable (because they have ad hoc interfaces), and not easily transformed [3].

Position

The creation and maintenance of a repository of reusable components is the up front cost that most project teams are reluctant to invest. The use of formal methods to represent the components offers advantages over either textual descriptions or structural descriptions. Formal representations provide semantic descriptions. By using semantic information, we can focuse on what the components do rather than how they do it. By applying clustering analysis to the formal specification of an application, we identify generic requirements. The frameworks that accomplish the generic requirements serve as candidates for reuse by the implementation of individual transactions. This approach treats software reuse as an integral part of software development and results in direct benefit to the project teams.

In the following section, we present a systematic way of creating a repository of frameworks that are guaranteed to be useful and usable by the software system that is under development. For more information, please refer to [4]. The assumption is that a formal specification of the application is available as the result of the specification analysis phase of the software development.

Creation of the repository

Our approach is not restricted to any particular formal language or application domain. We have chosen the formal specification Z [5] due to its wide acceptance. The relational database domain was chosen because of its popularity in industry and the lack of effort from the reuse community [6]. Transactions, according to [7], are the logical units of action upon a relational database. In Z, schemas are used to specify the functionality requested by the transactions.

The first step is to identify groups of transactions that share more functionality within the groups than across the groups. Clustering analysis can be employed to identify families of schemas. For a large scale software system, the development involves many people where each group may have different expertise. The activity of the design phase can be viewed as allocating portions of the specification to appropriate groups and to appropriate programmers within each group. When similar schemas are allocated to the same group of programmers for implementation, the programmers can concentrate on one type of problem. Therefore, less time is wasted in changing from one unrelated activity to the next even if no explicit reuse is conducted. On the other hand, design trade-off in relation to requirements and what can be delivered is typically done at requirement analysis and specification time [8]. Identifying closely related schemas as groups at the specification stage allows specifiers to negotiate with clients and make it possible to design a standardized, reusable framework for one cluster of schemas. The implementation of individual schemas in the cluster can reuse the framework and avoid unnecessary duplication of effort.

Classification is a statistical technique concerned with separating distinct sets of objects and allocating new objects to previously defined groups. It pertains to a known number of groups. The emphasis is on deriving a rule that can be used to optimally assigned a new object to the respective labeled classes. Clustering analyses, on the other hand, make no assumptions concerning the number of groups or the group structure. A traditional hierarchical agglomerative clustering (HAC) analysis consists of three phases: represent the objects of interest in a common measurement space, define similarity or distance measurement between objects and intermediate clusters, iteratively merge objects into clusters. The analysis results in a cluster hierarchy.

We define the measurement space to reflect the functionality requested by each transaction. Empirical studies of programming knowledge suggest that, in a limited domain, a relatively small number of expressions are frequently used to describe certain information [9]. Therefore, predicate patterns that are used to specify basic operations of relational database can serve as guidelines for specifiers to produce formal specifications. On the other hand, using predicate patterns will allow automatic extraction of the functionality that are requested by the schemas. As a result, each transaction is represented by a set of <descriptor, value> pairs. The descriptor identifies one type of basic operation requested by the transaction and the value reflects the level of effort associated with the basic operation.

The similarity of any two transactions is defined as the percentage of common functionality between the two transactions. This definition is to avoid claiming that two transactions are similar when they share a great deal of common functionality and also have a significant amount of distinct functionality. We want to minimize the effort of modifying the implementation of one transaction for the other. The HAC starts with singleton clusters, with each cluster contains only one object, and iteratively merges two clusters that are most similar into one. The single linkage method defines the similarity between two given clusters as the minimum similarity measurement of all cross-cluster pairs of transactions. This procedure is to avoid grouping two clusters together when there exist some pair of very similar transactions, one from each cluster, while the rest of transactions are not similar to each other at all. Since we want to reuse only if the effort of reuse is less than the effort of designing from scratch, a threshold value is used to determine the minimum similarity that the final clusters have to preserve. The iteration of merging, then, stops when the two most similar remaining clusters reveal a similarity measure that is lower than the threshold value.

By analyzing the transactions within a cluster, generic requirements of functionality are extracted. The purpose of extracting generic requirements is to enable the creation of frameworks so that individual transactions of the same clusters can be implemented by modifying the frameworks. There are several benefits of designing frameworks to accomplish generic requirement. First, these requirements take the same representation scheme as that used for individual transactions. Therefore, it is easy for the designer to compare the desired functionality of individual transactions with the functionality provided by the frameworks and to make modification accordingly. Second, the implementation of different transactions from the same framework can lead to a desirable uniformity of the system. Therefore, the system will be easier to understand and hence to maintain. Third, the fact that the framework will be a template for several transactions forces the developers of the framework to strive for a high quality product which in turn can lead to applications of higher quality. In many cases, designing frameworks can be allotted to a small team of highly skilled engineers. On the other hand, by designing a single framework to serve a number of distinct transactions, the cost of developing the framework can be amortized over those transactions.

Comparison

Most efforts in repository-based reuse have been invested in the organization of the repositories, that is how to represent, store, and retrieve components. However, the reality is that there has been little objective basis for deciding whether a component should be included in such repositories. Esteva describes the use of inductive learning techniques based on software metrics to identify reusable components from existing code [10]. The conjecture is that components that are easier to comprehend will tend to be reused more often. Castano and De Antonellis propose a methodology for populating a repository of reusable components by application independent conceptual model [11]. A collection of available conceptual schemas belonging to one or more applications in one or several domains is classified and arranged into categories. However, the similarity measurement between given schemas is defined based on matching keywords in the schemas. We identify generic requirements based on the formal specification and provide a systematic and objective basis for deciding what should be included in the repository.

Another interesting observation of our research is that the reuse community has paid little attention on reusing the reuse effort. If two components with similar functionality, A and B, are to be developed in a new system, under current reuse schemes, two trips to the repository will be needed. One trip for finding a candidate component C in the repository to be reused by A and the other trip for finding C again to be reused by B. By grouping A and B before retrieval, as we propose, only one trip to the repository is necessary.

References

1: M. Sitaraman, D. Fleming, J. Hopkins, and S. Sreerama, ``Why (Not) Reuse (Typical) Code Components?,'' in Proceedings of the 7th WISR, 1995.
2: E. Mambella, R. Ferrari, F. D. Carli, and A. L. Surdo, ``An Integrated Approach to Software Reuse Practice,'' in Proceedings of the Symposium on Software Reusability, pp. 63-71, 1995.
3: D. Garlan, ``Research Directions in Software Architecture,'' ACM Computing Surveys, vol. 27, pp. 257-261, June 1995.
4: Y. Ouyang and D. Carver, ``A Model to Facilitate the Reuse of Specifications,'' in Proceedings of the 3nd World Conference on Integrated Design and Process Technology, vol. 1, pp. 392-399, 1996.
5: J. M. Spivey, The Z Notation: A Reference Manual. Prentice Hall Internation(UK) Ltd., 1992.
6: R. E. Johnson, ``Why Doesn't the Reuse Community Talk About Reusable Software,'' in Proceedings of the 7th WISR, 1995.
7: C. J. Date, An Introduction to Database Systems. Addison-Wesley Publishing Company Inc., 1990.
8: R. Prieto-Diaz, ``Reuse as a New Paradigm for Software Development,'' in System Reuse: Issues in Initiating and Improving a Reuse Program (M. Sarshar, ed.), London:Springer-Verlag, 1996.
9: E. Soloway and K. Ehrlich, ``Empirical Studies of Programming Knowledge,'' IEEE Transactions on Software Engineering, vol. 10, pp. 595-609, September 1984.
10: J. C. Esteva, ``Automatic Identification of Reusable Components,'' in Proceedings of the Seventh International Workshop on Computer-Aided Software Engineering, pp. 80-87, 1993.
11: S. Castano and V. D. Antonellis, ``A Constructive Approach to Reuse of Conceptual Components,'' in Proceedings of the Second International Workshop on Software Reusability, pp. 19-28, 1993.

Biography

Ouyang received her Ph.D. from Louisiana State University in December 1996 with topic area in software reuse. She also holds a master degree in Applied Statistics and is interested in applying statistical methods to the process of software development.

Dr. Carver is a professor of Computer Science at Louisiana State University. Her research areas are reusability, requirements specification, object-oriented software development, and reverse engineering.