A Phrase Says More than 1000 Symbols
 
 

Roland T. Mittermeir

Institut für Informatik-Systeme

Universität Klagenfurt
Universitätsstrasse 65 - 67
A-9020 Klagenfurt
AUSTRIA/Europe

Tel: ++43 (463) 2700-575, Fax: ++43 (463) 2700-505
Email: mittermeir@ifi.uni-klu.ac.at

URL: http://www.ifi.uni-klu.ac.at/cgi-bin/staff_home?roland
 
 
 

Abstract

Realizing that software is not more nor less than an executable description of itself offers new perspectives on software description and lets question some "old wisdom." In a stratified retrieval description, the richness and impreciseness of natural language might well prove to outperform more elaborate description schemes, even / especially with large repositories.
 
 
 
 

Keywords: Component issues, software retrieval, natural language, fuzzyness.

Workshop Goals: Software description, software retrieval, reuse process, evaluation.

Working Groups: Component issues, evaluation, testing and verification.
 

1 Background Besides various organizational and psychological issues, the issue of adequately describing software is still critical for various aspects of software engineering, reuse being not the least among them. The approaches to describe software for reuse purposes range from simple keyword-based approaches via facetted descriptions to approaches based on formal specifications. There are several situations, where simple approaches are quite adequate, however they are inadequate for dealing with large repositories and they are equally inadequate for fine grained distinction among similar components.

The current position results from experience with more acclaimed appraoches to describe software components as well as from considerations developped in the realm of software comprehension.

2 Position For a reuse concept to work, the effort needed to check out reuse options should be kept as small as possible. Either developers have to know "their" repository almost "by heart", or they should be in a position to express their need in a nutshell and obtain a reasonalby accurate answer whether it makes sense to probe deeper.

We propose two strata of software description. Full text natural language descriptions are suggested to serve for identifying a set of "highly promising" candidate components. Among those, further discrimination is obtained by investigating "characteristic tuples" [MP98] of sampled behavior attached to each reusable asset.  

3 Approach

3.1 Skim off the cream Natural language is imprecise. Hence, it is usually discredited amongst software engineers and even more so among software researchers. This critique seems justified, when one aims at developing software from scratch. However, to identify readymade components, this critique seems less justified. In fact, empirical results from related fields [GR96] show, that plain natural language descriptions outperform more structured descriptions in their retrieval characteristic.

The arguments put forth in [GR96] can be easily transferred to software description. Once the repository has a certain size, simple keywords will no longer be adequate. Facets, on the other hand, might be too inflexible to express nuances, especially when gradual adherence or multiple adherence to categories is important. Full fledged formal specifications might simply be too complicated to be used by many general practitioners.

Natural language, on the other hand, needs not to be taught to natural people. All they need to learn is to express themselves in a well-focussed manner. A further advantage is, that we can describe in natural language everything we can reason about, hence, natural language descriptions are neutral with respect to the kind of asset being described. Finally, natural language allows to express nuances. This not only in terms of specific functional qualities, but specifically also with respect to non-functional qualities and with respect to the interrelationship of these non-functional qualities and relationships among those and particular application situations.

For the reuse practitioner -- in spite of the dominance of declarativism in specifications -- it might not only be of interest to know what is done by a component, but to some extent also how it is done. A natural language description can take care of both. The impreciseness inherent in natural language specification can be used by appropriatly loosening or tightening linguistic quantification. But in any case, we have to assume that natural language retrieval will return not a single component but a set of components.

To avoid confusion, we stress that by natural language description we are referring to descriptions written specifically for reuse- and retrieval purposes. We do not consider that any natural documentation written for other purposes (be it analysis documents or in-line comments) might assume this purpose.

3.2 Disambiguate what's left Here, we have to distinguish between the kind of assets. We consider three categories: Since upstream products are meant to be interpreted by a human reader. We cannot offer more than submitting the set to the human requestor. Hopefully, documents meant for humans are structured in such a way, that the reader can determine after reasonably quick perusal, whether they suite her/his purpose.

Code is quite often the primary focus in reuse centered discussions. Obviously, one wants to obtain not only somehow adequate code but precisely adequate code. To determine this the hard way, one would have to test and to study. Lots of these efforts can be saved by focussing on the executable properties already during inclusion of the asset and attaching characteristic tuples (test-sets) to the asset. They can also be used to further automate fine-grained search [MP98].

Finally, test suites themselves might be subject for reuse. Assuming that the test-suites in the candidate set are signature-compatible, they basically speak for themselves. Hence, by focussing on their respective symmetric difference, the reuser can without too much effort identify which testsuit suits her/his purpose.      

4 Comparison

The ideas expressed here are based on the dual abstraction problem pointed at in [MPMM98]. For the natural language part of our approach, the most important references can be found in the classical information retrieval literature. A very comprehensive treatment is given in [FG90,FBY92]. The fine grained operational approach is related to the work of [Hal93,PP93]. The main difference is though, that we do not propose to generate test data randomly, but to use well chosen characteristic tuples that might be taken from a well disciplined quality assurance-/testing process.

References

[FBY92]  W.B. Frakes and R. Baeza-Yates, editors. Information Retrieval: Data Structures and Algorithms. Prentice Hall, Upper Saddle River, N.J., 1992.

[FG90] W. B. Frakes and P. B. Gandel. Representing Reusable Software. Information and Software Technology, 32(10):653 -- 663, December 1990.

[GR96] E.J. Guglielmo and N.C. Rowe. Natural-language retrieval of images based on descriptive captions. ACM Transactions on Information Systems, 14(3):237 -- 267, July 1996.

[Hal93] R. J. Hall. Generalized Behaviour-based Retrieval. In Int. Conf. on Software Engineering -- ICSE93, pages 371--380, Baltimore, MD, May 1993. IEEE Computer Society Press.

[MP98] R. T. Mittermeir and H. Pozewaunig. Classifying Components by Behavioral Abstraction. In 4th Intl. Conf. on Computer Science and Informatics -- JCIS'98, volume III, pages 547--550. Assoc. f. Intelligent Machinery, AIM, October 1998.

[MPMM98] R. T. Mittermeir, H. Pozewaunig, A. Mili, and R. Mili. Uncertainty Aspects in Component Retrieval. In Int. Conf. on Information Processing and Management of Uncertainty in Knowledge-Based Systems -- IPMU98, pages 564--571, July 1998.

[PP93] A. Podgurski and L. Pierce. Retrieving Reusable Software by Sampling Behavior. ACM Transactions on Software Engineering and Methodology, 2(3):286--303, July 1993.  

Biography

Roland T.  Mittermeir (mittermeir@ifi.uni-klu.ac.at) http://www.ifi.uni-klu.ac.at/cgi-bin/staff_home?roland

Roland T. Mittermeir is professor of informatics and chairperson of the Institut für Informatik-Systeme of the Universität Klagenfurt.

His primary research interests are in software reuse, software reverse engineering and requirements analysis. He helped in the development of several repositories of reusable components and in the conception of reuse schemes.