NetNews Usenet Archive 1992 #16

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #16 / NN_1992_16.iso / spool / comp / object / 2976 < prev next >

Wrap

Internet Message Format | 1992-07-22 | 9.3 KB

Xref: sparky comp.object:2976 comp.lang.eiffel:979 Newsgroups: comp.object,comp.lang.eiffel Path: sparky!uunet!psinntp!merlin.hgc.edu!jcm From: jcm@hgc.edu (James McKim) Subject: Re: How to design a data structure library. Message-ID: <1992Jul22.142916.22115@merlin.hgc.edu> Sender: usenet@merlin.hgc.edu (Action News Central) Organization: The Hartford Graduate Center References: <1820@snap> <1992Jul21.124256.17256@bony1.bony.com> Date: Wed, 22 Jul 1992 14:29:16 GMT Lines: 260 In article <1992Jul21.124256.17256@bony1.bony.com> richieb@bony1.bony.com (Richard Bielak) writes: >In article <1820@snap> paj@uk.co.gec-mrc (Paul Johnson) writes: > >[...] > >>Cursors >>------- >> [Examples of LIST with built in cursor vs LIST with outside iterators deleted] >> >>The ISE system has the disadvantage that multiple iterations over a >>list are difficult to do. In particular, an interleaved iteration >>involving two cursors at different points in the list can be very >>tricky. > >I think that cursors should be separate from the data structure they >are used to traverse. The main reason is to allow multiple iterations >over the same list. The scheme present in the ISE library (with "mark" >and "return") works OK within a single program. It fails, when the >data structured is shared - either through a database or concurrent >process. I agree. They also tend to add complexity to the class. And since the LIST class plays a role in a variety of other ISE library classes (e.g. TREE, WINDOW) it's complexity "contaminates" the others. It'll be interesting to see what changes ISE has made in these classes in the next release. [...] >> >>RISC vs CISC >>------------ >> >>The ISE data structure classes have large numbers of features. Many >>of these provide efficient short cuts. For instance there is a list >>merge facility which removes items from one list and inserts them into >>another. The ISE linked list implementation allows this to be done >>with a little internal pointer shuffling. Other features are rarely >>used and do not offer efficiency short-cuts. For instance, the >>`remove_all' procedure scans the list and removes all items which are >>equal to the argument. This could be done just as quicly by an >>explicit loop. >> >>Inheriting from the ISE list classes can be a trial. In order to >>provide a true subtype, one has to implement a large number of >>features, many of which are of doubtful utility and others for which >>there are no clear semantics or implementation. >> >>The SIG design has none of this. The data structure features provide >>a bare minimum of functionality. This makes the documentation simpler >>to use, but some tasks are less efficient than they might be. >> > >I would prefer RISC. I think it was Jim McKim, who said that the >features of a class should form a "minimum spanning" set - that is >just enough to be complete, but not more. Yup. A minimum spanning set should be "sufficient" in the sense of Booch and, ideally, specifiable via preconditions and postconditions. There are other criteria we like to see a minimum spanning set satisfy, but these are the main ones. Paul's ideas on "fine-grained" inheritance are not unrelated here. To be fair to ISE, the further up the inheritance hierarchy you go the more RISC-like the classes get. These classes tend to be deferred, so we users tend to use the classes at the low end of the hierarchy which are more MISC-like. > >I've implemented a persistent list class, which did _not_ inherit from >LIST. The main reason was that LIST has too many things that I did not >want to implement for persistent lists (eg. "mark", "return", >"index_of", "duplicate). I also often use a simpler LIST class of my own devising. > > >> >>Functional vs Procedural >>------------------------ > >[...] >> >>Things become worse when considering large objects. The ISE SET class >>provides `intersect', `merge' and `subtract' procedures. Hence the >>only way to create a new set `c' which is the union of two existing >>sets `a' and `b' is: >> >> c.Clone (a); >> c.merge (b); >> >>With some kinds of objects, this would be rather less efficient than >> >> c := a + b; >> >>On the other hand providing operators `*' and `+' for intersection and >>union would result in something like: >> >> a := a + b; >> >>This creates a new (and fairly large) object before throwing away the >>old value of `a'. If this is implemented as a linked list then you >>could have a huge amount of garbage being generated. >> >>Does it make sense to implement both kinds of operations in a class, >>so that the programmer can pick the appropriate one? >> > > >I suppose one should implement only the more fundamental set of >operations, and then let the descendant create the "+" or "*" >operators if desired. This goes back to the "minimal spanning set" >idea. Our version of the minimal spanning set for a SET class includes only Create, Insert an element, Remove an element as commands and Cardinality, Has a given element as queries. We would leave it to descendants to provide either version of merge, intersect, and subtract. But this avoids Paul's question. It seems to me that Union (I think ISE used 'merge' as the verb form of union), Intersection and Difference (ditto for 'subtract' here) are fundamentally binary operations. That is, they're single valued functions of two variables, and so should be designed as such. Thus I favor the c := a + b or c := union( a, b ) form. I would add union, intersection, and difference as a first increment, through inheritance, to the minimal SET class. Then as a second increment, I would add Merge, Intersect, and Subtract. Then document that the typical use for Merge is a.merge( b ) in place of a := a + b because the former is more efficient than the latter. Ditto for Intersect and Subtract. > > > >>Documentation >>------------- >> >>What should go into the documentation of a cluster? > >Good question. There should be a general description of what the >classes are doing. For example, the description of data structure >cluster should talks about cursors. Very good question. I would also want to see some examples of how the classes interact with each other, but I think we've only scratched the surface here. > >>What should be in the documentation of each individual class? > >At least "flat class | short" :-) And of course this does you little good unless you've done a thorough job with preconditions, postconditions, and invariants, at least as comments. Somewhere we need an example of the typical use of the class. Maybe there are three levels. Cluster level (discussed above), class level (typical use, how features interact, invariants), and feature level (preconditions and postconditions). > >>Do we need discussion of time and space complexity? At what level of >>detail? > >I think so. Sometimes a simple guideline on how to use a class >efficiently would suffice. Yup. > >>Do we also need an outline of the implementation method? At what >>level of detail? > >This is probably not needed, as long as the source code is available. And as long as the information above is available. _Ideally_, if you're a client of the class, you shouldn't need the source code unless you suspect a bug. If you're inheriting from a class then more detailed documentation is certainly necessary. In particular, you're much more apt to need the source code. > > >>Is the short-flat version of a class useful? (Sig appear to think >>not). > >It's essential. Absolutely! If there's no easy way to obtain the public interface for a class, then much of the benefit of information hiding and encapsulation are lost. I don't know what SIG was thinking here. Richie and I have kicked around the idea of refining short. The default version would extract the public interface of features exported to all classes (the current version). A second version would be "short for class X" which would extract the public interface of features visible to X. Since Eiffel supports selective export, this could be a proper superset of what the default version gives. If X is a descendant of the class in question then all it's features are visible and we have the "short-for-descendants" mentioned below. > >>Do we need a separate section: "Guide to Writing Descendants"? > >I think so. I'm in process of writing docs for a database cluster, and >large part of the doc will be on that topic. > >What would be useful is a "short-for-descendants". An extract of a >class that contains the signatures of all the features, without the >code. This kind of documentation and more is going to be needed. Especially if vendors begin supplying libraries without source code. > >>What sort of examples are needed? > >At least an example of typical use. Perhaps some examples that >illustrate the difference between efficient and inefficient use of the >library. Yup, and see above. > > >>Are inheritance graphs useful? >> > >I don't find these particularly useful, although they are pretty to >look at. > I like 'em. I think we at least need what ISE provides in their library documentation. They show the complete inheritance hierarchy above the current class. > > >...richie > >-- [...] -- Jim -- *------------------------------------------------------------------------------* Jim McKim (203)-548-2458 In exactly 3.2 seconds it will be a few Internet: jcm@hgc.edu minutes to 5:00.