NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / comp / lang / cplus / 12489 < prev next >

Wrap

Internet Message Format | 1992-08-17 | 12.9 KB

Path: sparky!uunet!synaptx!synaptics.com!daveg From: daveg@synaptics.com (Dave Gillespie) Newsgroups: comp.lang.c++ Subject: Re: Garbage Collection for C++ Message-ID: <DAVEG.92Aug18002213@synaptx.synaptics.com> Date: 18 Aug 92 07:22:13 GMT References: <1992Aug6.014619.2111@ucc.su.OZ.AU> <DAVEG.92Aug14194411@synaptx.synaptics.com> <1992Aug15.074911.7965@news.mentorg.com> Sender: daveg@synaptx.Synaptics.Com Organization: Synaptics, Inc., San Jose, CA Lines: 249 In-reply-to: bcannard@hppcb36.mentorg.com's message of 15 Aug 92 07:49:11 GMT In article <1992Aug15.074911.7965@news.mentorg.com> bcannard@hppcb36.mentorg.com (Bob Cannard @ PCB x5565) writes: >|> So you use "GCnew" to allocate objects of those classes, and "new" >|> elsewhere. > I have two problems with this. > The first is that the GC is forced to check every pointer to see if it might > point to a GC object, and the compiler is required to provide the necessary > information that identifies all these pointers. The GC is forced to check many > more pointers than is necessary (I estimate that in my application, roughly > 90% of the objects would be non-GC). The programmer knows damn well that these > pointers are irrelevant to the GC, and I get very annoyed when I can't pass > that information on to the compiler. Okay, maybe there should be a declaration that a pointer does *not* point to a GC-able object. The rules would have to be a bit different from "const" and "volatile", since the unsafe cast is to add, not remove, the property. But it would resolve the problem of existing libraries which don't declare their pointer arguments "GC". Most of the gain from this declaration is to be had for pointers which are members of structs and class objects (as opposed to variables, arguments, etc.) You could make it a declaration on the member rather than on the pointer type itself. > The second is that having to test for GCness at run time represents an > additional overhead for the GC which I'd rather avoid. The GC still has to figure out what to do about uninitialized pointers. I guess if you must explicitly declare pointers to be GC-able then you could have the compiler initialize them to 0 by default. The overhead for testing if the pointer points in approximately the right direction is quite cheap. If you have GC and non-GC heaps in disjoint memory spaces, it's a single comparison. Even if they are interleaved, it can be a single lookup in a table indexed by pages. > In fact, I'm forced to wonder: if the GC has to figure out the difference > at run time, would it not be more efficient to simply make everything GCable? > It looks to me as if there is no point in having GC and nonGC coexist, unless > the two can be distinguished at compile time. Good point. There is backward compatibility to worry about. >|> The "extra treatment" that GC classes would need basically involves >|> adding some kind of tag information. If you declare an entire class >|> to be GC, it makes sense to put such a tag in every instance of the >|> class. If you instead work by allocating regular classes in a >|> special GC-able way, then it makes more sense to put the tag only >|> in instances of the class that are in the GC heap. > That looks wrong to me. Suppose I have a pointer to some object. The GC > comes along and finds this pointer. It must first determine whether or not > the object is a GC object or a non-GC object. It must then find the tag. If > the object is part of an array, the tag could be anywhere. How the hell does > the GC figure out where the tag is? How does it even tell that the object is > inside an array? The object could contain an index to show which element of the > array it is, but it would be cheaper to tag each element individually. This is going to be an issue for any C++ garbage collector. If there is a pointer to any element of an array, then the whole array is alive (because of pointer arithmetic). Indeed, if I have a "char *", I sure hope the C++ runtime system doesn't decide all the characters after the one I'm pointing to are garbage! The KCL-like system I outlined before is able to handle this. A given page of memory has objects all the same size. A quick page-table check tells whether or not this is a GC-able page, and, if so, how big its objects are. Note that an "object" is a whole array in this case; after all, the whole array was allocated with a single "new" call. Just round the page offset down to the next lower multiple of the object size to get a pointer to the beginning of the object (and, thus, the tag). Really big arrays can be treated differently, and somewhat less efficiently; being big, they're going to be rare. Say, a binary tree sorted by memory address, which can be searched in O(lg N) time to find the object into which a pointer points. >|> I still think declaring pointers to be GC would be too much of a >|> hassle for the programmer > ???? You've got to be kidding! You can cope with const (or haven't you got > cfront 3.0 yet - heh heh - evil chortle) but not with GC? If you can't cope > with hassle, you're in the wrong language! :-) First, you can cast away "const" if you're willing to check the safety of things by hand. This is a "relief valve" that is not practical with a "GC" property. Second, libraries can typically be made "const-safe" by changing the public declarations of the functions only, not their internals. (This is essentially casting away const in an underhanded way.) Third, even adding "const" caused so much heartburn on the part of the C++ programming world as to induce evil chortles in those who discuss it, but you are still willing to go through it all over again with "GC"? Note that the "non-GC" declaration I mentioned before doesn't have this problem; existing code can be left alone with zero effort. >|> , so we're left having to assume all >|> pointers may point to GC-allocated objects. Without adding lots of >|> tags in places that make C compatiblity awkward, we're left with a >|> completely "conservative", i.e., brute-force GC algorithm. We scan >|> every word on the stack, every word in static storage, and every >|> word on the non-GC heap for things that look like GC pointers. >|> (Since this is slow, a good optimization might be to separate the >|> static area and possibly the non-GC heap into things-which-contain- >|> pointers and things-which-don't, like massive arrays of doubles >|> which would be a waste of time to scan.) > Probably worse than the v*rd*mmt reference counting that I'm already using, > especially if the GC part of the data structure is relatively small. Blech! > You really want all that inescapable overhead in order to save a bit of hassle? The separation into areas can be done by the compiler. I think if you added up the number of words on the stack and the number of pointers in static memory, you'd find it's suprisingly small for typical programs. And if your program has a bulky stack, it probably has lots of "char buf[256]"s in it, in which case why aren't those "string buf"s? (With the "string" object storing the actual data in the heap, of course.) >|> To the user this scheme would feel exactly like C++ does now, with >|> no loopholes as far as I can tell. The only change is that you >|> are allowed to say "GCnew" instead of "new", on any kind of data, >|> in which case that data gets deleted automatically as soon as there >|> are no pointers of any kind to it left. Very clean, very simple >|> (at least where it counts---to the C++ user). > What counts is the *end user*, the customer, the person who buys the > program. They don't care if it's clean and simple, they care that it does > the job, does it fast, and doesn't blow up. They notice when programs get > slower, and they complain. If necessary, by buying from a competitor. Well, the best way to make a program do the job, do it fast, and not blow up is to keep it clean and simple. :-) >|> And, "no passing pointers around to Xlib, because Xlib was written >|> before GC pointers were invented." > Ahem. This is where the "promise not to do anything clever" stuff comes in. > Just like string.h was invented before const, all it needed was to update the > _declarations_: the _definitions_ could still be in C, Mongolian Cobol > or whatever, and would not have to change. If a function can't make that > promise, you can't give it a GC pointer. Them's the breaks. I predict that them breaks would completely hobble a program trying to interface to the outside world. Eventually, your information has to go to some external interface, which is generally a library routine written by somebody else, perhaps in another language as you said; anything that goes to that library can't be GC'd, and that restriction will trickle back through the program and interfere with everything. Just my guess, that is. >|> You might be able to squeak out of it by noting that "printf" is not >|> going to invoke a GC since only "GCnew" calls can do that, and "printf", >|> being non-GC-aware, will not call "GCnew". > Printf can make the promise for exactly the reasons you give. Okay, I guess I had that one coming. :-) > The callback problem might be another side of the promise. The signals > I'm not sure about, but how do they differ from, say, doing a "new" or > "delete" in a signal handle, which might be invoked while the program > is in the middle of a "new" or "delete"? Lots to think about here. At least an implementation can protect itself by disabling signals around its "new" and "delete" routines, if that's what it takes, but the implementation can't very well disable signals around "printf" and every other existing library routine. > [comments about GC objects referring to ordinary ones] > I think it only becomes a serious problem if a GC object refers to a non-GC > object which refers to a GC object which... The non-GC object must not be > deleted before the GC object gets collected, otherwise the GC is going to get > soundly screwed even if it is conservative; yet the programmer has no way > of ensuring that the GC object has already gone. Alternatively, the GC object's > destructor deletes the non-GC object, and the programmer holds on to a pointer > to the non-GC object until it's safe to get rid of it. Hmmm... You're right, it's awkward no matter what you do, unless GC objects refer only to other GC objects. (That's why I don't like the idea of declaring classes GC-able; you want to be able to allocate *any* type, even "double *", to be GC-able.) It's only safe for a GC-able object to maintain a non-GC-able object if it never lets a pointer to that object outside of itself. (For example, the typical "string" class can allocate the underlying data vector that way as long as "operator[]" doesn't return a reference.) I don't think this lack of safety is enough to veto the idea, though. It's pretty tame compared to some other booby-traps in C and C++... As long as you use GC in a consistent way, you'll be fine. > [discussion about foreign functions stashing pointers where GC can't go] >|> I don't see how either tagging or declaring things helps here. I >|> didn't mean "what if some novice incorrectly stashes away a pointer >|> to a GC object," I meant "what if someone *needs* to pass a pointer >|> to a GC object to a function which will stash it away where the GC >|> can't find it." > I wasn't talking about novices either, I was talking about detecting situations > where things have gone wrong, and informing the programmer so that corrective > action can be taken. I guess so, although this situation would be awfully rare. > To be honest, with the availability of static, > automatic, and non-GC allocated data structures, I'm not sure that this > will ever be an insurmountable barrier, because of the ability to make a > permanent copy, _provided_ the compiler can give warning when it happens. I would tend to agree, possibly even without the warning. The problem really is with us right now: Suppose I pass a pointer to a function, which saves it away (for caching purposes, say), and then I delete the pointer not realizing the function still needs it. The answer is to read the documentation and program carefully. If you add a declaration to allow this to be caught by the compiler, you have to ask whether we should add declarations for a zillion other obscure pitfalls as well. > If there *are* situations where it is unavoidable, we are going to need a > locking mechanism. [...] Only if we insist on C++ being completely, totally safe and protected. Even languages that try to do that generally have a little bit of fine print at the bottom saying, "oh, yes, and if you use the foreign function interface, all bets are off." :-) > Thanks, Dave, interesting stuff. Thanks for keeping things on an interesting level. I'm frankly surprised this thread hasn't devolved into a flame war yet. :-) > Here's hoping that all this discussion will > eventually get hammered into concrete proposals for a viable system! Agreed! -- Dave -- Dave Gillespie daveg@synaptics.com, uunet!synaptx!daveg or: daveg@csvax.cs.caltech.edu