home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!sun-barr!cs.utexas.edu!usc!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!ucbeh.san.uc.edu!uceng.uc.edu!babbage.ece.uc.edu!dain
- Newsgroups: comp.lang.c++
- Subject: Re: Garbage Collection for C++
- Message-ID: <1992Aug12.161219.11877@babbage.ece.uc.edu>
- From: dain@holmes.ece.uc.edu (Dain Samples)
- Date: Wed, 12 Aug 1992 16:12:19 GMT
- Reply-To: Dain.Samples@uc.edu
- Sender: root@babbage.ece.uc.edu (Operator)
- References: <1992Aug6.014619.2111@ucc.su.OZ.AU> <3907@starlab.UUCP>
- Organization: University of Cincinnati
- Originator: dain@holmes.ece.uc.edu
- Nntp-Posting-Host: holmes.ece.uc.edu
- Lines: 195
-
-
- In article <3907@starlab.UUCP>, mi@starlab.UUCP (Michael Hoennig) writes:
- |> In article <1992Aug6.014619.2111@ucc.su.OZ.AU> maxtal@extro.ucc.su.OZ.AU (John MAX Skaller) writes:
- |>
- |> >GC classes cannot be allocated on the stack or statically,
- |> >so when one is, it is silently converted to a GC pointer
- |> >to an object.
- |>
- |> >Pointers to GC class objects will be GC pointers.
- |> >GC pointers are like classes, they have built in destructors and
- |> >constructors so they can't just disappear.
- |>
- |> I think this is a neverending loop. If pointers to GC class objects will be
- |> GC pointers too, they will be silently converted to a GC pointer (which is
- |> a pointer to a GC class) and must be silently converted to a GC ...
- |>
- |> => GC class objects must be allocatable on the stack - but the compiler
- |> must recocnize this.
-
-
- Before I saw this thread in this newsgroup (I'v just started
- re-reading after a long hiatus), I posted the following to
- comp.compilers. I apologize to those of you that read both groups for
- the duplication.
-
-
- I have been doing some work on implementing garbage collection for
- C++, but some of these comments will apply to GC for C.
- I will summarize some of the contents of a paper I will be presenting
- at the Int'l Workshop on Memory Management in Sept. A copy of the
- paper can be ftp'd from the e-address below. I would be interested in
- any commentary.
-
- babbage.ece.uc.edu:~ftp/pub/biblio/papers/ads-iwmm92.ps.Z
-
- I have reached the conclusion that only conservative garbage
- collection allows no changes to the existing language. Even so, as
- Hans Boehm has pointed out, there are (relatively easily solved)
- problems with code produced by highly intelligent optimizers (i.e.,
- make them less intelligent, or even more intelligent, depending on
- your point of view). Any other GC-scheme that assumes no changes to C
- (or C++) requires prodigious amounts of compiler-generated information
- to describe the structure of runtime objects based on the most general
- of possibilities (e.g., see the '92 SIGPLAN paper by Diwan, Moss, and
- Hudson), or constrains the programming idioms that can be used by
- programmers. (Much of the discussion about using C as an IL has
- focused on the observation that a translator emitting C can in fact
- easily constrain itself.)
-
- Since such constraints border on language redefinition anyway, I
- addressed the question: what is a minimum number of language
- enhancements that could be added to C++ to allow reasonable garbage
- collection to be implemented as part of the runtime system? I haven't
- tried to back-pedal the language changes to C since I take advantage
- of the class/object structure in C++ to simplify the declaration of
- *which* thingies are going to be collected. I also have not totally
- removed the need for `prodigious amounts of information', but I do
- think it is more manageable and reasonably generated.
-
- I added three additional keywords: collected, heap, and embedded. The
- latter keyword is used to declare pointers that can be used to point
- into objects that are collected, and the other two are used to declare
- collectible objects and allow the programmer to allocate normally gc'd
- objects on the heap.
-
- The long-range goal is to allow garbage collection to be treated as
- malloc/free is treated in C programming systems: not `really' part of
- the language, but assumed to exist, assumed to work, assumed to have a
- certain (minimal) functionality, and replaceable by the user. (An
- aside: does anyone really want to argue that C without malloc [or
- other dynamic memory allocator] is still C?) The paper describes the
- language enhancements to C++, and describes how the compiler can
- generate data structures that support many flavors of GC, including
- copying, conservative, mark-and-sweep, etc.
-
- I'm currently working on a prototype that implements a basic,
- non-copying mark-and-sweep collector with conservative collection of
- the runtime stack. I hope to have it completed by Sept.
-
- If you have read this far and still aren't sure whether the paper
- would interest you or not, I include a section of it that describes
- some of the sub-goals of the design:
-
- To streamline what follows, we define some terms. C++ classes that
- have been declared collectible are referred to as {\em gc-classes\/}
- or {\em gc-types}, their instantiations as {\em gc-objects}, and the
- space in which they are collected as {\em gc-space}. Objects that are
- pointed to by an application from outside gc-space are said to be {\em
- rooted}, and the pointers are called {\em roots}. Gc-objects to which
- there exists a path of pointers from a rooted object are said to be
- {\em grounded}; rooted objects are trivially grounded. Pointers to or
- into gc-objects are called {\em gc-pointers}. Gc-pointers {\em
- into\/} gc-objects are also called {\em embedded\/} pointers (also
- called {\em interior\/} pointers by some). Variables that contain
- bit-patterns that could be interpreted as gc-pointers but might
- actually be something that just happens to look like a gc-pointer
- (e.g., an integer, a sequence of characters, etc.) are called {\em
- ambiguous roots}. The act of informing the GC system that a pointer
- may point to a gc-object is called {\em registration}.
-
- As mentioned, the primary goal of this design is to retro-fit C++ with
- garbage collection facilities while not changing the spirit of the
- language. That spirit, based on the spirit of C, can be summarized
- provocatively as: (1) Programmers should be able to do anything they
- want, subject to their willingness to deal with the consequences
- (side-effects and interactions of features whose invariants are
- violated). (2) No one pays for any feature of the language that they
- do not use.
-
- Based on this spirit, we consequently adopt the following goals for
- the system.
-
- Adding GC to C++ should not force overhead on users that do not use
- it. When GC is used, it should not interfere with software modules
- that do not use it. GC must not interfere with the current and/or
- standard definitions of the language. That is, a GC facility should
- not cause re-definition or loss of language features.
-
- The dual of this requirement is that programmers that use GC must take
- the responsibility either for not violating the invariants of the
- design, or for knowing how to deal with the consequences. Almost any
- restriction imposed by a GC-cooperative C++ compiler can be bypassed
- by a sufficiently determined programmer. The compiler therefore can
- take responsibility only for maintaining its GC invariants, but cannot
- be expected to find any and all ways in which users may have violated
- those invariants.
-
- The language should be changed as little as possible. A strict goal
- of no change was seen rather quickly to be unrealistic. Either all
- runtime objects would have to be garbage collected---a {\em major\/}
- change to the design of the language and a violation of the spirit of
- the language---or, at a minimum, programmers would need language
- support to be able to specify which (classes of) objects are to be
- garbage collected. A design consequence of this goal is that a
- program can have two dynamic memory spaces: the space dedicated to
- objects that are to be garbage collected, and the usual `heap' of
- dynamically allocatable memory.
-
- Use of a garbage collector must not impose a complete dichotomy on
- runtime objects: collected objects must be able to refer to, contain,
- and be contained in non-collected objects, and {\em vice versa}.
- Gc-objects are not restricted to exist only in gc-space: gc-objects
- should be instantiable as global, automatic, or heap objects, if the
- programmer so desires. Obviously, only those that are in gc-space can
- and will be automatically reclaimed. While it may be necessary to add
- data to an object to support GC, it must be the case that objects of a
- type have exactly one memory representation, no matter where they
- reside (global space, stack, heap, or gc-space). Furthermore, it
- should be possible for the user to declare that a specific object is
- to be automatically reclaimed. For instance, the programmer should
- not have to declare a special class or data type to have arrays of
- characters (strings) automatically reclaimed.
-
- Arrays of gc-objects must be supported.
-
- The design should support pointers into objects and pointer-to-member.
- For instance, if a gc-object has an array of characters as an element,
- the programmer should still be able to traverse that array with a
- pointer-to-char. Compiler and runtime mechanisms must exist that
- allow correct reclamation of gc-objects that have such `embedded'
- pointers into them. (`Embedded' here refers to where the pointer is
- pointing, and not to the location of the pointer itself.)
-
- C unions and Pascal tagless variant records are a way of hiding
- information from the compiler and/or exposing representation
- information to the user. Unions should be supported wherever
- possible. (However, unions can be supported only with non- or
- mostly-copying GC strategies; cf.~\S\ref{unions}).
-
- It would be nice if the design did not preclude supporting multiple
- and discontiguous gc-spaces, where each gc-space may have a different
- GC strategy. The prototype being constructed does not yet support
- multiple gc-spaces, and so this paper does not pursue this goal
- directly. The problem of cross-registration of root pointers between
- GC strategies is complex enough that more data is needed on the
- desirability of this feature to justify a design to support it at this
- time.
-
-
-
- ========================================================================
- A. Dain Samples, Dain.Samples@uc.edu, wk:(513)556-4783, hm:(513)771-5492
- Dept of ECE, ML#30 University of Cincinnati, Cin'ti OH 45221-0030
- ------------------------------------------------------------------------
-
- --
- ========================================================================
- A. Dain Samples, Dain.Samples@uc.edu, wk:(513)556-4783, hm:(513)771-5492
- ------------------------------------------------------------------------
- It is so difficult to find the beginning. Or, better, it is difficult
- to begin at the beginning. And not try to go further back.
- -Wittgentstein (from "On Certainty")
-
- May the background be with you.
- -Heidegger (from "Being and Time", paraphrased)
-