NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / comp / lang / cplus / 12367 < prev next >

Wrap

Internet Message Format | 1992-08-14 | 14.1 KB

Path: sparky!uunet!synaptx!synaptics.com!daveg From: daveg@synaptics.com (Dave Gillespie) Newsgroups: comp.lang.c++ Subject: Re: Garbage Collection for C++ Message-ID: <DAVEG.92Aug14194411@synaptx.synaptics.com> Date: 15 Aug 92 02:44:11 GMT References: <1992Aug6.014619.2111@ucc.su.OZ.AU> <DAVEG.92Aug13025629@synaptx.synaptics.com> <1992Aug14.021547.15215@news.mentorg.com> Sender: daveg@synaptx.Synaptics.Com Organization: Synaptics, Inc., San Jose, CA Lines: 277 In-reply-to: bcannard@hppcb36.mentorg.com's message of 14 Aug 92 02:15:47 GMT In article <1992Aug14.021547.15215@news.mentorg.com> bcannard@hppcb36.mentorg.com (Bob Cannard @ PCB x5565) writes: > In article <DAVEG.92Aug13025629@synaptx.synaptics.com>, daveg@synaptics.com (Dave Gillespie) writes: > |> Couldn't we get away with having a garbage collector that didn't > |> need pointers to be declared `GC'? The only thing that would need > |> special `GC' treatment would be "new" [...] > > I can't see that this is viable given the basic constraints on C++: among > other things, those who don't want GC shouldn't have to suffer the hits. > Marking classes or pointers (as appropriate, depending on the GC > implementation) means the compiler knows about any extra treatment it has > to give to those pointers/objects, and can leave it out for "ordinary" > code. The "GCnew" approach means you're marking objects (rather than classes or pointers) as GC-able. It's still true that non-GC-allocated objects would work just like they always did. > In particular, in my own application - which will eventually be quite a big > program - I only want GC for certain classes, where the problem of determining > when an object must be deleted has no sensible solution other than GC. Yet > performance (both run time and memory use) carries a high premium, so I can't > afford to have GC everywhere. So you use "GCnew" to allocate objects of those classes, and "new" elsewhere. The "extra treatment" that GC classes would need basically involves adding some kind of tag information. If you declare an entire class to be GC, it makes sense to put such a tag in every instance of the class. If you instead work by allocating regular classes in a special GC-able way, then it makes more sense to put the tag only in instances of the class that are in the GC heap. I still think declaring pointers to be GC would be too much of a hassle for the programmer, so we're left having to assume all pointers may point to GC-allocated objects. Without adding lots of tags in places that make C compatiblity awkward, we're left with a completely "conservative", i.e., brute-force GC algorithm. We scan every word on the stack, every word in static storage, and every word on the non-GC heap for things that look like GC pointers. (Since this is slow, a good optimization might be to separate the static area and possibly the non-GC heap into things-which-contain- pointers and things-which-don't, like massive arrays of doubles which would be a waste of time to scan.) The decision of which static area or which non-GC heap a thing should go in can be made at compile-time. In the GC heap, we'd only scan the objects that have been marked (just like classic mark-and-sweep GC's do). If the allocator has recorded the size of an object (which most C allocators do), then it's easy to tell the range of words that need to be scanned. An optimization here would be to add a tag next to the object which identifies which words inside it are pointers. If you have "GCnew"d an array of objects, the whole array gets a single tag. The array must be marked and swept as a whole anyway, because of pointer arithmetic. To the user this scheme would feel exactly like C++ does now, with no loopholes as far as I can tell. The only change is that you are allowed to say "GCnew" instead of "new", on any kind of data, in which case that data gets deleted automatically as soon as there are no pointers of any kind to it left. Very clean, very simple (at least where it counts---to the C++ user). > |> Some Lisp systems, such as Kyoto Common Lisp, work this way. Its > |> garbage collector scans the whole stack as "raw memory," looking > |> for 32-bit words that could be pointers to GC-able objects. It > |> allocates GC-able objects in such a way that it is relatively > |> easy to tell if a given memory address points to a GC-able object > |> or not. > Exactly what I mean. If the GC has to scan the whole stack (and all the > statics and globals while we're at it), it's doing too much work. This is exactly what KCL does now. Stacks tend not to be all that large in practice, especially if you have C++ vector classes rather than actual local arrays. > The reference-count solution is more efficient than that, because the need > for GC is restricted to a small but important part of the program. The > idea of languages like C++ is to give the compiler the information it > needs to get the best performance out of the resulting code. I wasn't thinking about reference counting solutions at all. I agree that for reference counting you'd need a special kind of class declaration. I think people have compared the performance and found that GC, even simple, stupid GC, comes out ahead of reference counting. The overhead of all those increments and decrements is enormous (relatively speaking). Linked list traversals that used to involve one memory read per step now involve several reads and writes, for example. Pointers to reference-counted objects would be messy because of pointer arithmetic. An array as a whole ought to have a reference count (not the individual things inside it), but a pointer can point anywhere in the middle of it. Also, reference counting has the flaw that it can't deal with circular structures. If A points to B, and B points to A, they both have a reference count of 1 and so will get collected, even if nothing else in the program points to either of them. (Remember, even if your programs don't use circular structures, many programs do, and a GC solution that's part of the language will be used by those people too.) > |> This may sound slow, but actually I've found KCL's garbage > |> collector to be quite fast. > Quite fast compared to what? In previous projects I've been involved with > (using C rather than C++) "malloc" and "free" were found to be too slow, > and were replaced by custom ones. I want this GC to clip along as fast as > possible. You're sort of comparing apples to oranges here. On the one hand, there's the speed of a "new" operation; on the other hand, there's the speed of the GC. Especially if your language allows explicit "delete" for the common case that you know an object is garbage, GC will happen much more rarely than "new". I don't think my "GCnew" has to be any slower than "new". I was comparing KCL's GC to other garbage collectors I've seen, such as GNU Emacs, Lucid Lisp, and Allegro Lisp. I haven't done a careful race of all four, but KCL "felt" nice and fast, and certainly was not noticeably slower than the others in casual use. > |> The vital question is, can a scheme like this be made to work on > |> all architectures on which C++ could reasonably be expected to be > |> used? > If it can be made to work on one common architecture, it can probably be > tweaked for all. I'm not so sure. Consider a related example: making tagged pointers. (Note: this is *not* a part of the GC system I proposed above!) Most Lisps put some minimal tag information in the bottom two bits of a 32-bit pointer. Since word accesses to non-word-aligned addresses are slow or illegal on most architectures, these Lisps allocate everything on a multiple-of-four boundary so that the bottom two bits of a pointers are sitting around doing nothing. Why not put tag bits there! That's just what they do, and it works fine for them, but that's not a suitable thing to write into a language standard. The Lisp standard can be implemented in other ways just as well. Languages like C++ and Lisp get ported to all kinds of gonzo architectures that you and I have never heard of. Having the low two bits available to play with may not be a safe assumption for those machines. We have to be sure we don't write anything into a proposed GC system for C++ that can only be implemented by a feature like low-two-bits tags, which could make it difficult to port C++ to unusual machines. Another example: Machine architectures have changed a lot over the lifetime of C. Back then, it was fashionable to allow word accesses to odd addresses (e.g., VAX, 68020, 8086). Suppose the ANSI C committee had put something into the language that *depended* on fast odd-word accesses? We'd be hosed now, because current fashion (ahem, scientific design) has the fastest machines treating odd word accesses as an error. > The vital question is, would it be useable? That's the question that is so vital, it goes without saying. :-) Except that experience shows it's probably a good idea to say it anyway, and on a regular basis. :-( > |> I think you'd have to live with no compaction > |> at all: Whereas it is fairly benign to mark an object because > |> an integer happened to hold its address, you wouldn't want to adjust > |> that integer when you moved the object! You'd need special GC > |> pointers if you wanted compaction, and those would probably not > |> be practical. > This is why special GC pointers have been proposed in the first place. > If GC is in use, the programmer must accept the "no clever tricks" > constraint, which includes not passing pointers around disguised as > integers. And, "no passing pointers around to Xlib, because Xlib was written before GC pointers were invented." > And once that constraint is accepted, there may be no good > reason for not having compaction. I kind of like the way KCL does it, which is to compact only specific kinds of things (i.e., array data), and to make pointers to *compactible* things be special. That's not such a nuisance; a C++ vector class, for example, would not be compactible but the data vector it allocates and keeps inside itself could be. The accesses to that compactible thing would all be nicely localized inside the class, so it's okay if the pointers to it are a bit more linguistically awkward. It would be a shame not to be able to compact strings, though. There are lots of existing libraries with non-compactible arguments and local variables that point to strings. What if you pass a pointer to a compactible string into "printf", and a GC occurs during the call which moves the string? "printf" hasn't declared its argument as a GC pointer or compactible pointer, so it loses. You might be able to squeak out of it by noting that "printf" is not going to invoke a GC since only "GCnew" calls can do that, and "printf", being non-GC-aware, will not call "GCnew". The only two remaining tricky points then are existing functions with callbacks to code that can do a GC (like X toolkits, say), and signal handlers that invoke a GC (which might have to be forbidden for all sorts of other reasons). > |> [...] suppose you have a GC class which contains a string (i.e., > |> a pointer to an old-fashioned non-GC'd array of chars). Suppose you > |> no longer have any pointers a certain object of this class, but you > |> still do have a pointer to its component string. You lose, if the > |> destructor for the class deletes the string (which it pretty much has > |> to do). > What you have described is a programming error. If a collectable object > contains such an old-style pointer, it is the programmer's responsibility > to ensure that the object will not be deleted as long as it is needed. That makes sense. As long as all types, including, say, char arrays, can be GC-able, then it's perfectly reasonable to expect this. > This in no way prevents us from having old style pointers in GC objects: > it's no different from the existing situation where a class contains a > pointer to a string. This is the old when-are-temporaries-deleted issue. At least there it can be resolved in some definitive way, by specifying in the language when temporaries are deleted (say, only on statement boundaries or whatever). With GC you don't have this luxury because GC's happen so unpredictably. (Especially if signal handlers can invoke GC!) > |> GC objects can be deleted, in which case any remaining pointers > |> to the object become invalid and must not be used to dereference > |> an object. > This would accord with existing C++ philosophy, but I don't see how > you could make a GC that's immune to dangling pointers. Anyone out > there know otherwise? A "conservative" GC pretty much *has* to be immune to dangling pointers. > |> What if a pointer to a GC-able object is passed to foreign (non-C++) > |> code, which stashes it somewhere the C++ collector doesn't know to > |> look? (All garbage-collecting languages have this problem, and as > |> far as I know all they can do is shrug and warn programmers not to > |> let go of the last C++ reference to an object if they know there > |> might be non-C++ references still lurking around.) > Hence the proposal for tagging GC-able objects, and having something > comparable to "const" which declares "I will not do any clever tricks > with this pointer, including converting it to an integer, performing > pointer arithmetic on it, storing it, etc". This only needs a change > to the C++ declarations of library functions, not to the functions > themselves. IMO this makes it feasible to mix GC and libraries. I don't see how either tagging or declaring things helps here. I didn't mean "what if some novice incorrectly stashes away a pointer to a GC object," I meant "what if someone *needs* to pass a pointer to a GC object to a function which will stash it away where the GC can't find it." This is an issue even for the brute-force conservative GC I outlined above: Suppose someone maps a segment of shared memory at some address range which the GC wasn't expecting to exist, and puts the only pointer to an object there? At least if you aren't compacting anything, this won't hurt as long as that person also keeps at least one pointer to the object in regular GC-scanned memory. -- Dave -- Dave Gillespie daveg@synaptics.com, uunet!synaptx!daveg or: daveg@csvax.cs.caltech.edu