home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!synaptx!synaptics.com!daveg
- From: daveg@synaptics.com (Dave Gillespie)
- Newsgroups: comp.lang.c++
- Subject: Re: Garbage Collection for C++
- Message-ID: <DAVEG.92Aug18002213@synaptx.synaptics.com>
- Date: 18 Aug 92 07:22:13 GMT
- References: <1992Aug6.014619.2111@ucc.su.OZ.AU>
- <DAVEG.92Aug14194411@synaptx.synaptics.com>
- <1992Aug15.074911.7965@news.mentorg.com>
- Sender: daveg@synaptx.Synaptics.Com
- Organization: Synaptics, Inc., San Jose, CA
- Lines: 249
- In-reply-to: bcannard@hppcb36.mentorg.com's message of 15 Aug 92 07:49:11 GMT
-
- In article <1992Aug15.074911.7965@news.mentorg.com> bcannard@hppcb36.mentorg.com (Bob Cannard @ PCB x5565) writes:
-
- >|> So you use "GCnew" to allocate objects of those classes, and "new"
- >|> elsewhere.
-
- > I have two problems with this.
-
- > The first is that the GC is forced to check every pointer to see if it might
- > point to a GC object, and the compiler is required to provide the necessary
- > information that identifies all these pointers. The GC is forced to check many
- > more pointers than is necessary (I estimate that in my application, roughly
- > 90% of the objects would be non-GC). The programmer knows damn well that these
- > pointers are irrelevant to the GC, and I get very annoyed when I can't pass
- > that information on to the compiler.
-
- Okay, maybe there should be a declaration that a pointer does *not*
- point to a GC-able object. The rules would have to be a bit different
- from "const" and "volatile", since the unsafe cast is to add, not
- remove, the property. But it would resolve the problem of existing
- libraries which don't declare their pointer arguments "GC".
-
- Most of the gain from this declaration is to be had for pointers which
- are members of structs and class objects (as opposed to variables,
- arguments, etc.) You could make it a declaration on the member rather
- than on the pointer type itself.
-
- > The second is that having to test for GCness at run time represents an
- > additional overhead for the GC which I'd rather avoid.
-
- The GC still has to figure out what to do about uninitialized pointers.
- I guess if you must explicitly declare pointers to be GC-able then you
- could have the compiler initialize them to 0 by default.
-
- The overhead for testing if the pointer points in approximately the
- right direction is quite cheap. If you have GC and non-GC heaps in
- disjoint memory spaces, it's a single comparison. Even if they are
- interleaved, it can be a single lookup in a table indexed by pages.
-
- > In fact, I'm forced to wonder: if the GC has to figure out the difference
- > at run time, would it not be more efficient to simply make everything GCable?
- > It looks to me as if there is no point in having GC and nonGC coexist, unless
- > the two can be distinguished at compile time.
-
- Good point. There is backward compatibility to worry about.
-
- >|> The "extra treatment" that GC classes would need basically involves
- >|> adding some kind of tag information. If you declare an entire class
- >|> to be GC, it makes sense to put such a tag in every instance of the
- >|> class. If you instead work by allocating regular classes in a
- >|> special GC-able way, then it makes more sense to put the tag only
- >|> in instances of the class that are in the GC heap.
-
- > That looks wrong to me. Suppose I have a pointer to some object. The GC
- > comes along and finds this pointer. It must first determine whether or not
- > the object is a GC object or a non-GC object. It must then find the tag. If
- > the object is part of an array, the tag could be anywhere. How the hell does
- > the GC figure out where the tag is? How does it even tell that the object is
- > inside an array? The object could contain an index to show which element of the
- > array it is, but it would be cheaper to tag each element individually.
-
- This is going to be an issue for any C++ garbage collector. If there
- is a pointer to any element of an array, then the whole array is alive
- (because of pointer arithmetic). Indeed, if I have a "char *", I sure
- hope the C++ runtime system doesn't decide all the characters after the
- one I'm pointing to are garbage!
-
- The KCL-like system I outlined before is able to handle this. A given
- page of memory has objects all the same size. A quick page-table check
- tells whether or not this is a GC-able page, and, if so, how big its
- objects are. Note that an "object" is a whole array in this case; after
- all, the whole array was allocated with a single "new" call. Just round
- the page offset down to the next lower multiple of the object size to
- get a pointer to the beginning of the object (and, thus, the tag).
-
- Really big arrays can be treated differently, and somewhat less
- efficiently; being big, they're going to be rare. Say, a binary tree
- sorted by memory address, which can be searched in O(lg N) time to
- find the object into which a pointer points.
-
- >|> I still think declaring pointers to be GC would be too much of a
- >|> hassle for the programmer
-
- > ???? You've got to be kidding! You can cope with const (or haven't you got
- > cfront 3.0 yet - heh heh - evil chortle) but not with GC? If you can't cope
- > with hassle, you're in the wrong language! :-)
-
- First, you can cast away "const" if you're willing to check the safety
- of things by hand. This is a "relief valve" that is not practical with
- a "GC" property.
-
- Second, libraries can typically be made "const-safe" by changing the
- public declarations of the functions only, not their internals. (This
- is essentially casting away const in an underhanded way.)
-
- Third, even adding "const" caused so much heartburn on the part of
- the C++ programming world as to induce evil chortles in those who
- discuss it, but you are still willing to go through it all over
- again with "GC"?
-
- Note that the "non-GC" declaration I mentioned before doesn't have
- this problem; existing code can be left alone with zero effort.
-
- >|> , so we're left having to assume all
- >|> pointers may point to GC-allocated objects. Without adding lots of
- >|> tags in places that make C compatiblity awkward, we're left with a
- >|> completely "conservative", i.e., brute-force GC algorithm. We scan
- >|> every word on the stack, every word in static storage, and every
- >|> word on the non-GC heap for things that look like GC pointers.
- >|> (Since this is slow, a good optimization might be to separate the
- >|> static area and possibly the non-GC heap into things-which-contain-
- >|> pointers and things-which-don't, like massive arrays of doubles
- >|> which would be a waste of time to scan.)
-
- > Probably worse than the v*rd*mmt reference counting that I'm already using,
- > especially if the GC part of the data structure is relatively small. Blech!
- > You really want all that inescapable overhead in order to save a bit of hassle?
-
- The separation into areas can be done by the compiler. I think if you
- added up the number of words on the stack and the number of pointers in
- static memory, you'd find it's suprisingly small for typical programs.
- And if your program has a bulky stack, it probably has lots of
- "char buf[256]"s in it, in which case why aren't those "string buf"s?
- (With the "string" object storing the actual data in the heap, of
- course.)
-
- >|> To the user this scheme would feel exactly like C++ does now, with
- >|> no loopholes as far as I can tell. The only change is that you
- >|> are allowed to say "GCnew" instead of "new", on any kind of data,
- >|> in which case that data gets deleted automatically as soon as there
- >|> are no pointers of any kind to it left. Very clean, very simple
- >|> (at least where it counts---to the C++ user).
-
- > What counts is the *end user*, the customer, the person who buys the
- > program. They don't care if it's clean and simple, they care that it does
- > the job, does it fast, and doesn't blow up. They notice when programs get
- > slower, and they complain. If necessary, by buying from a competitor.
-
- Well, the best way to make a program do the job, do it fast, and not
- blow up is to keep it clean and simple. :-)
-
- >|> And, "no passing pointers around to Xlib, because Xlib was written
- >|> before GC pointers were invented."
-
- > Ahem. This is where the "promise not to do anything clever" stuff comes in.
- > Just like string.h was invented before const, all it needed was to update the
- > _declarations_: the _definitions_ could still be in C, Mongolian Cobol
- > or whatever, and would not have to change. If a function can't make that
- > promise, you can't give it a GC pointer. Them's the breaks.
-
- I predict that them breaks would completely hobble a program trying
- to interface to the outside world. Eventually, your information has
- to go to some external interface, which is generally a library routine
- written by somebody else, perhaps in another language as you said;
- anything that goes to that library can't be GC'd, and that restriction
- will trickle back through the program and interfere with everything.
- Just my guess, that is.
-
- >|> You might be able to squeak out of it by noting that "printf" is not
- >|> going to invoke a GC since only "GCnew" calls can do that, and "printf",
- >|> being non-GC-aware, will not call "GCnew".
-
- > Printf can make the promise for exactly the reasons you give.
-
- Okay, I guess I had that one coming. :-)
-
- > The callback problem might be another side of the promise. The signals
- > I'm not sure about, but how do they differ from, say, doing a "new" or
- > "delete" in a signal handle, which might be invoked while the program
- > is in the middle of a "new" or "delete"? Lots to think about here.
-
- At least an implementation can protect itself by disabling signals
- around its "new" and "delete" routines, if that's what it takes, but
- the implementation can't very well disable signals around "printf"
- and every other existing library routine.
-
- > [comments about GC objects referring to ordinary ones]
-
- > I think it only becomes a serious problem if a GC object refers to a non-GC
- > object which refers to a GC object which... The non-GC object must not be
- > deleted before the GC object gets collected, otherwise the GC is going to get
- > soundly screwed even if it is conservative; yet the programmer has no way
- > of ensuring that the GC object has already gone. Alternatively, the GC object's
- > destructor deletes the non-GC object, and the programmer holds on to a pointer
- > to the non-GC object until it's safe to get rid of it. Hmmm...
-
- You're right, it's awkward no matter what you do, unless GC objects
- refer only to other GC objects. (That's why I don't like the idea
- of declaring classes GC-able; you want to be able to allocate *any*
- type, even "double *", to be GC-able.)
-
- It's only safe for a GC-able object to maintain a non-GC-able object
- if it never lets a pointer to that object outside of itself. (For
- example, the typical "string" class can allocate the underlying data
- vector that way as long as "operator[]" doesn't return a reference.)
-
- I don't think this lack of safety is enough to veto the idea, though.
- It's pretty tame compared to some other booby-traps in C and C++...
- As long as you use GC in a consistent way, you'll be fine.
-
- > [discussion about foreign functions stashing pointers where GC can't go]
-
- >|> I don't see how either tagging or declaring things helps here. I
- >|> didn't mean "what if some novice incorrectly stashes away a pointer
- >|> to a GC object," I meant "what if someone *needs* to pass a pointer
- >|> to a GC object to a function which will stash it away where the GC
- >|> can't find it."
-
- > I wasn't talking about novices either, I was talking about detecting situations
- > where things have gone wrong, and informing the programmer so that corrective
- > action can be taken.
-
- I guess so, although this situation would be awfully rare.
-
- > To be honest, with the availability of static,
- > automatic, and non-GC allocated data structures, I'm not sure that this
- > will ever be an insurmountable barrier, because of the ability to make a
- > permanent copy, _provided_ the compiler can give warning when it happens.
-
- I would tend to agree, possibly even without the warning. The problem
- really is with us right now: Suppose I pass a pointer to a function,
- which saves it away (for caching purposes, say), and then I delete the
- pointer not realizing the function still needs it. The answer is to
- read the documentation and program carefully. If you add a declaration
- to allow this to be caught by the compiler, you have to ask whether we
- should add declarations for a zillion other obscure pitfalls as well.
-
- > If there *are* situations where it is unavoidable, we are going to need a
- > locking mechanism. [...]
-
- Only if we insist on C++ being completely, totally safe and protected.
- Even languages that try to do that generally have a little bit of fine
- print at the bottom saying, "oh, yes, and if you use the foreign
- function interface, all bets are off." :-)
-
- > Thanks, Dave, interesting stuff.
-
- Thanks for keeping things on an interesting level. I'm frankly surprised
- this thread hasn't devolved into a flame war yet. :-)
-
- > Here's hoping that all this discussion will
- > eventually get hammered into concrete proposals for a viable system!
-
- Agreed!
-
- -- Dave
- --
- Dave Gillespie
- daveg@synaptics.com, uunet!synaptx!daveg
- or: daveg@csvax.cs.caltech.edu
-