home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!synaptx!synaptics.com!daveg
- From: daveg@synaptics.com (Dave Gillespie)
- Newsgroups: comp.lang.c++
- Subject: Re: Garbage Collection for C++
- Message-ID: <DAVEG.92Aug14194411@synaptx.synaptics.com>
- Date: 15 Aug 92 02:44:11 GMT
- References: <1992Aug6.014619.2111@ucc.su.OZ.AU>
- <DAVEG.92Aug13025629@synaptx.synaptics.com>
- <1992Aug14.021547.15215@news.mentorg.com>
- Sender: daveg@synaptx.Synaptics.Com
- Organization: Synaptics, Inc., San Jose, CA
- Lines: 277
- In-reply-to: bcannard@hppcb36.mentorg.com's message of 14 Aug 92 02:15:47 GMT
-
- In article <1992Aug14.021547.15215@news.mentorg.com> bcannard@hppcb36.mentorg.com (Bob Cannard @ PCB x5565) writes:
- > In article <DAVEG.92Aug13025629@synaptx.synaptics.com>, daveg@synaptics.com (Dave Gillespie) writes:
- > |> Couldn't we get away with having a garbage collector that didn't
- > |> need pointers to be declared `GC'? The only thing that would need
- > |> special `GC' treatment would be "new" [...]
- >
- > I can't see that this is viable given the basic constraints on C++: among
- > other things, those who don't want GC shouldn't have to suffer the hits.
- > Marking classes or pointers (as appropriate, depending on the GC
- > implementation) means the compiler knows about any extra treatment it has
- > to give to those pointers/objects, and can leave it out for "ordinary"
- > code.
-
- The "GCnew" approach means you're marking objects (rather than classes
- or pointers) as GC-able. It's still true that non-GC-allocated objects
- would work just like they always did.
-
- > In particular, in my own application - which will eventually be quite a big
- > program - I only want GC for certain classes, where the problem of determining
- > when an object must be deleted has no sensible solution other than GC. Yet
- > performance (both run time and memory use) carries a high premium, so I can't
- > afford to have GC everywhere.
-
- So you use "GCnew" to allocate objects of those classes, and "new"
- elsewhere.
-
- The "extra treatment" that GC classes would need basically involves
- adding some kind of tag information. If you declare an entire class
- to be GC, it makes sense to put such a tag in every instance of the
- class. If you instead work by allocating regular classes in a
- special GC-able way, then it makes more sense to put the tag only
- in instances of the class that are in the GC heap.
-
- I still think declaring pointers to be GC would be too much of a
- hassle for the programmer, so we're left having to assume all
- pointers may point to GC-allocated objects. Without adding lots of
- tags in places that make C compatiblity awkward, we're left with a
- completely "conservative", i.e., brute-force GC algorithm. We scan
- every word on the stack, every word in static storage, and every
- word on the non-GC heap for things that look like GC pointers.
- (Since this is slow, a good optimization might be to separate the
- static area and possibly the non-GC heap into things-which-contain-
- pointers and things-which-don't, like massive arrays of doubles
- which would be a waste of time to scan.)
-
- The decision of which static area or which non-GC heap a thing
- should go in can be made at compile-time.
-
- In the GC heap, we'd only scan the objects that have been marked
- (just like classic mark-and-sweep GC's do). If the allocator has
- recorded the size of an object (which most C allocators do), then
- it's easy to tell the range of words that need to be scanned. An
- optimization here would be to add a tag next to the object which
- identifies which words inside it are pointers.
-
- If you have "GCnew"d an array of objects, the whole array gets a
- single tag. The array must be marked and swept as a whole anyway,
- because of pointer arithmetic.
-
- To the user this scheme would feel exactly like C++ does now, with
- no loopholes as far as I can tell. The only change is that you
- are allowed to say "GCnew" instead of "new", on any kind of data,
- in which case that data gets deleted automatically as soon as there
- are no pointers of any kind to it left. Very clean, very simple
- (at least where it counts---to the C++ user).
-
- > |> Some Lisp systems, such as Kyoto Common Lisp, work this way. Its
- > |> garbage collector scans the whole stack as "raw memory," looking
- > |> for 32-bit words that could be pointers to GC-able objects. It
- > |> allocates GC-able objects in such a way that it is relatively
- > |> easy to tell if a given memory address points to a GC-able object
- > |> or not.
-
- > Exactly what I mean. If the GC has to scan the whole stack (and all the
- > statics and globals while we're at it), it's doing too much work.
-
- This is exactly what KCL does now. Stacks tend not to be all that
- large in practice, especially if you have C++ vector classes rather
- than actual local arrays.
-
- > The reference-count solution is more efficient than that, because the need
- > for GC is restricted to a small but important part of the program. The
- > idea of languages like C++ is to give the compiler the information it
- > needs to get the best performance out of the resulting code.
-
- I wasn't thinking about reference counting solutions at all. I agree
- that for reference counting you'd need a special kind of class
- declaration.
-
- I think people have compared the performance and found that GC, even
- simple, stupid GC, comes out ahead of reference counting. The overhead
- of all those increments and decrements is enormous (relatively speaking).
- Linked list traversals that used to involve one memory read per step now
- involve several reads and writes, for example.
-
- Pointers to reference-counted objects would be messy because of
- pointer arithmetic. An array as a whole ought to have a reference
- count (not the individual things inside it), but a pointer can
- point anywhere in the middle of it.
-
- Also, reference counting has the flaw that it can't deal with circular
- structures. If A points to B, and B points to A, they both have a
- reference count of 1 and so will get collected, even if nothing
- else in the program points to either of them. (Remember, even if
- your programs don't use circular structures, many programs do, and
- a GC solution that's part of the language will be used by those
- people too.)
-
- > |> This may sound slow, but actually I've found KCL's garbage
- > |> collector to be quite fast.
-
- > Quite fast compared to what? In previous projects I've been involved with
- > (using C rather than C++) "malloc" and "free" were found to be too slow,
- > and were replaced by custom ones. I want this GC to clip along as fast as
- > possible.
-
- You're sort of comparing apples to oranges here. On the one hand,
- there's the speed of a "new" operation; on the other hand, there's
- the speed of the GC. Especially if your language allows explicit
- "delete" for the common case that you know an object is garbage, GC
- will happen much more rarely than "new".
-
- I don't think my "GCnew" has to be any slower than "new".
-
- I was comparing KCL's GC to other garbage collectors I've seen, such
- as GNU Emacs, Lucid Lisp, and Allegro Lisp. I haven't done a careful
- race of all four, but KCL "felt" nice and fast, and certainly was not
- noticeably slower than the others in casual use.
-
- > |> The vital question is, can a scheme like this be made to work on
- > |> all architectures on which C++ could reasonably be expected to be
- > |> used?
-
- > If it can be made to work on one common architecture, it can probably be
- > tweaked for all.
-
- I'm not so sure. Consider a related example: making tagged pointers.
- (Note: this is *not* a part of the GC system I proposed above!)
- Most Lisps put some minimal tag information in the bottom two
- bits of a 32-bit pointer. Since word accesses to non-word-aligned
- addresses are slow or illegal on most architectures, these Lisps
- allocate everything on a multiple-of-four boundary so that the bottom
- two bits of a pointers are sitting around doing nothing. Why not put
- tag bits there! That's just what they do, and it works fine for them,
- but that's not a suitable thing to write into a language standard.
- The Lisp standard can be implemented in other ways just as well.
-
- Languages like C++ and Lisp get ported to all kinds of gonzo architectures
- that you and I have never heard of. Having the low two bits available
- to play with may not be a safe assumption for those machines.
-
- We have to be sure we don't write anything into a proposed GC system
- for C++ that can only be implemented by a feature like low-two-bits
- tags, which could make it difficult to port C++ to unusual machines.
-
- Another example: Machine architectures have changed a lot over the
- lifetime of C. Back then, it was fashionable to allow word accesses
- to odd addresses (e.g., VAX, 68020, 8086). Suppose the ANSI C
- committee had put something into the language that *depended* on
- fast odd-word accesses? We'd be hosed now, because current fashion
- (ahem, scientific design) has the fastest machines treating odd
- word accesses as an error.
-
- > The vital question is, would it be useable?
-
- That's the question that is so vital, it goes without saying. :-)
- Except that experience shows it's probably a good idea to say it
- anyway, and on a regular basis. :-(
-
- > |> I think you'd have to live with no compaction
- > |> at all: Whereas it is fairly benign to mark an object because
- > |> an integer happened to hold its address, you wouldn't want to adjust
- > |> that integer when you moved the object! You'd need special GC
- > |> pointers if you wanted compaction, and those would probably not
- > |> be practical.
-
- > This is why special GC pointers have been proposed in the first place.
- > If GC is in use, the programmer must accept the "no clever tricks"
- > constraint, which includes not passing pointers around disguised as
- > integers.
-
- And, "no passing pointers around to Xlib, because Xlib was written
- before GC pointers were invented."
-
- > And once that constraint is accepted, there may be no good
- > reason for not having compaction.
-
- I kind of like the way KCL does it, which is to compact only specific
- kinds of things (i.e., array data), and to make pointers to *compactible*
- things be special. That's not such a nuisance; a C++ vector class, for
- example, would not be compactible but the data vector it allocates and
- keeps inside itself could be. The accesses to that compactible thing
- would all be nicely localized inside the class, so it's okay if the
- pointers to it are a bit more linguistically awkward.
-
- It would be a shame not to be able to compact strings, though.
- There are lots of existing libraries with non-compactible arguments
- and local variables that point to strings. What if you pass a pointer
- to a compactible string into "printf", and a GC occurs during the call
- which moves the string? "printf" hasn't declared its argument as a GC
- pointer or compactible pointer, so it loses.
-
- You might be able to squeak out of it by noting that "printf" is not
- going to invoke a GC since only "GCnew" calls can do that, and "printf",
- being non-GC-aware, will not call "GCnew". The only two remaining
- tricky points then are existing functions with callbacks to code that
- can do a GC (like X toolkits, say), and signal handlers that invoke
- a GC (which might have to be forbidden for all sorts of other reasons).
-
- > |> [...] suppose you have a GC class which contains a string (i.e.,
- > |> a pointer to an old-fashioned non-GC'd array of chars). Suppose you
- > |> no longer have any pointers a certain object of this class, but you
- > |> still do have a pointer to its component string. You lose, if the
- > |> destructor for the class deletes the string (which it pretty much has
- > |> to do).
-
- > What you have described is a programming error. If a collectable object
- > contains such an old-style pointer, it is the programmer's responsibility
- > to ensure that the object will not be deleted as long as it is needed.
-
- That makes sense. As long as all types, including, say, char arrays,
- can be GC-able, then it's perfectly reasonable to expect this.
-
- > This in no way prevents us from having old style pointers in GC objects:
- > it's no different from the existing situation where a class contains a
- > pointer to a string.
-
- This is the old when-are-temporaries-deleted issue. At least there it
- can be resolved in some definitive way, by specifying in the language
- when temporaries are deleted (say, only on statement boundaries or
- whatever). With GC you don't have this luxury because GC's happen so
- unpredictably. (Especially if signal handlers can invoke GC!)
-
- > |> GC objects can be deleted, in which case any remaining pointers
- > |> to the object become invalid and must not be used to dereference
- > |> an object.
-
- > This would accord with existing C++ philosophy, but I don't see how
- > you could make a GC that's immune to dangling pointers. Anyone out
- > there know otherwise?
-
- A "conservative" GC pretty much *has* to be immune to dangling pointers.
-
- > |> What if a pointer to a GC-able object is passed to foreign (non-C++)
- > |> code, which stashes it somewhere the C++ collector doesn't know to
- > |> look? (All garbage-collecting languages have this problem, and as
- > |> far as I know all they can do is shrug and warn programmers not to
- > |> let go of the last C++ reference to an object if they know there
- > |> might be non-C++ references still lurking around.)
-
- > Hence the proposal for tagging GC-able objects, and having something
- > comparable to "const" which declares "I will not do any clever tricks
- > with this pointer, including converting it to an integer, performing
- > pointer arithmetic on it, storing it, etc". This only needs a change
- > to the C++ declarations of library functions, not to the functions
- > themselves. IMO this makes it feasible to mix GC and libraries.
-
- I don't see how either tagging or declaring things helps here. I
- didn't mean "what if some novice incorrectly stashes away a pointer
- to a GC object," I meant "what if someone *needs* to pass a pointer
- to a GC object to a function which will stash it away where the GC
- can't find it."
-
- This is an issue even for the brute-force conservative GC I outlined
- above: Suppose someone maps a segment of shared memory at some
- address range which the GC wasn't expecting to exist, and puts the
- only pointer to an object there?
-
- At least if you aren't compacting anything, this won't hurt as long
- as that person also keeps at least one pointer to the object in
- regular GC-scanned memory.
-
- -- Dave
- --
- Dave Gillespie
- daveg@synaptics.com, uunet!synaptx!daveg
- or: daveg@csvax.cs.caltech.edu
-