home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!synaptx!synaptics.com!daveg
- From: daveg@synaptics.com (Dave Gillespie)
- Newsgroups: comp.lang.c++
- Subject: Re: Garbage Collection for C++
- Message-ID: <DAVEG.92Aug13025629@synaptx.synaptics.com>
- Date: 13 Aug 92 09:56:29 GMT
- References: <1992Aug6.014619.2111@ucc.su.OZ.AU>
- Sender: daveg@synaptx.Synaptics.Com
- Organization: Synaptics, Inc., San Jose, CA
- Lines: 177
- In-reply-to: mi@starlab.UUCP's message of 10 Aug 92 07:33:22 GMT
-
- > Any pointer to a GC class is a GC pointer, not an ordinary pointer.
- > GC objects declared on the stack or in other GC classes or static
- > are silently converted to GC references (GC pointers with
- > reference semantics).
-
- I doubt this will be practical. Partly because it would be too much
- of a nuisance to use C++ if you had to keep track of GC vs. non-GC
- pointers yourself, and partly because it would be impractical to
- convert the libraries (as many people have said).
-
- Remember, we're not just talking about the standard C libraries, we're
- talking about all the X libraries, all the Unix library functions, all
- libraries that any software developer ships to its customers, etc., etc.
- It would be a nightmare.
-
- There is the argument that "we already did it for `const'", which
- is sort-of true, but not really. Const-ness is not such a pressing
- issue: You can get around a missing `const' in a library function
- with a cast (which is "safe" in the practical sense, assuming a
- careful and knowledgeable programmer, while casting away `GC' would
- not be safe in any sense). Also, GC pointers would presumably have
- some additional cost associated with them (or else why do they
- exist?), and users would not enjoy paying this cost for every library
- function just *in case* that function was called with a GC pointer.
-
- Couldn't we get away with having a garbage collector that didn't
- need pointers to be declared `GC'? The only thing that would need
- special `GC' treatment would be "new": There would be a special
- "GCnew" or something which would allocate an object with the
- property that it went away when it was no longer referenced. Note
- that it would be safe for an implementation to allocate *all*
- objects as GC-able, but it would be a good idea at least in the
- transition period to provide both traditional "new" and "GCnew".
-
- Some Lisp systems, such as Kyoto Common Lisp, work this way. Its
- garbage collector scans the whole stack as "raw memory," looking
- for 32-bit words that could be pointers to GC-able objects. It
- allocates GC-able objects in such a way that it is relatively
- easy to tell if a given memory address points to a GC-able object
- or not. For reference, here's KCL's solution (roughly): Memory
- is divided into pages. All objects stored on a given page will
- have the same size. There is a "page table" which contains, for
- each page of memory, the size of the objects stored in that page
- (or zero for pages which do not contain GC-able things). Given
- any 32-bit quantity which "might" be a pointer, first take the
- high bits to find the page, check if this is a valid GC-able page,
- and if so, round the page offset down to a multiple of the page's
- object size to get a pointer to an object on the page. (KCL
- actually checks to see if the offset is a multiple of the size,
- but since C++ pointers can point into the middle of an object
- you'd have to accept any offset as a possibly valid pointer.)
-
- KCL scans the stack and other memory where pointers may live; any
- word which looks like a pointer to a GC-able object is assumed to
- be one. This means you wind up occasionally keeping around objects
- which are garbage, if their addresses happen to be sitting in some
- integer variable, say. But this doesn't hurt anything, and you
- never err in the other (dangerous) direction.
-
- This may sound slow, but actually I've found KCL's garbage
- collector to be quite fast. KCL's GC is *not* truly generational,
- incremental, ephemeral, or what have you---I don't know if their
- scheme can survive the transition to these sorts of high-tech
- collectors.
-
- The vital question is, can a scheme like this be made to work on
- all architectures on which C++ could reasonably be expected to be
- used?
-
- Also, when KCL collects a garbage object it simply puts it on a
- free list, it doesn't compact memory. The actual KCL allocator
- does some fancier stuff wherein large, variable-sized things like
- strings and arrays go in a separate compactable memory area which
- is handled differently, but I don't think you'd be able to get away
- with this for C++. I think you'd have to live with no compaction
- at all: Whereas it is fairly benign to mark an object because
- an integer happened to hold its address, you wouldn't want to adjust
- that integer when you moved the object! You'd need special GC
- pointers if you wanted compaction, and those would probably not
- be practical.
-
- On the other hand, C++'s allocator doesn't do any compaction right
- now so we wouldn't really be giving anything up.
-
-
- If you instead require *classes* to be declared GC, then there is no
- point in declaring pointers to be GC as well since a pointer is GC
- if, and only if, the pointed-to type is a GC class. The problem with
- this is, suppose you have a GC class which contains a string (i.e.,
- a pointer to an old-fashioned non-GC'd array of chars). Suppose you
- no longer have any pointers a certain object of this class, but you
- still do have a pointer to its component string. You lose, if the
- destructor for the class deletes the string (which it pretty much has
- to do). So a GC class could only safely contain pointers to other
- GC classes, which rules out strings unless you also have GC arrays
- of predefined types, which sounds too awkward to work. Oh, well.
-
-
- > GC pointers can be deleted, and the delete is done if and only
- > if the pointer is the last one. The pointer has to get nulled out
- > by the delete command (so it won't dangle). Alternatively,
- > the delete command is just ignored, the delete is applied
- > when the pointer goes out of scope, or, when the GC collector
- > discovers there are no references to the object.
-
- Your first suggestion can only be work if reference counting is
- used, or if every delete does most of the work of a global GC!
-
- Deleting when the pointer goes out of scope doesn't make much sense.
-
- The "when the GC discovers there are no references" option is what
- always happens when you have GC, so that's just another way of saying
- that "delete" is ignored for GC objects (*not* GC pointers---note
- that a GC pointer may point to a non-GC object, no matter what
- scheme is being used; otherwise, libraries would not be able to
- work both ways).
-
- What sounds much more useful is:
-
- GC objects can be deleted, in which case the programmer promises
- that the argument to "delete" was the last existing pointer to the
- deleted object. The effect is undefined if this is not the case.
-
- This allows people to delete things without waiting for a GC to
- happen, if they *know* the object is no longer used.
-
- I expect the last-existing-pointer rule would be impractically
- strict, so how about this:
-
- GC objects can be deleted, in which case any remaining pointers
- to the object become invalid and must not be used to dereference
- an object.
-
- In other words, just the same as for deleting objects right now!
- You can get dangling pointers from this, but you're not allowed to
- use them. The garbage collector has to be bulletproof enough to
- cope with these dangling pointers. That shouldn't be too hard:
- Suppose you do a GC before the deleted memory is reused. The
- deleted memory would be marked, but since it is noted down as
- deleted, the GC skips it. Now suppose you reuse the memory and
- then do a GC, with dangling pointers to the "old" use of the memory
- still around. No problem, it's another case of phantom references:
- the new object might not get collected as soon as it should, but
- nothing truly unsafe happens.
-
-
- Here are some more issues that you would have to worry about with
- a GC-ing C++:
-
- What if a signal handler fires during a GC? (Maybe you just have
- no choice but to have the system mask or disable signals during GC.)
-
- What if a GC occurs during a signal handler, and the interrupted
- program was in the middle of, say, constructing or destructing an
- object? (I don't think you get bitten either way, but you'd want
- to check this *really carefully*!)
-
- What if a pointer to a GC-able object is passed to foreign (non-C++)
- code, which stashes it somewhere the C++ collector doesn't know to
- look? (All garbage-collecting languages have this problem, and as
- far as I know all they can do is shrug and warn programmers not to
- let go of the last C++ reference to an object if they know there
- might be non-C++ references still lurking around.)
-
- What if you have virtual base classes, and there are only references
- to the base part of an object? The derived part of the object has
- a pointer to the base part but not the other way around, so the
- derived part might be collected in this case. Can this ever be a
- problem?
-
-
- Some food for thought,
- -- Dave
- --
- Dave Gillespie
- daveg@synaptics.com, uunet!synaptx!daveg
- or: daveg@csvax.cs.caltech.edu
-