home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!stanford.edu!rutgers!faatcrl!iecc!compilers-sender
- From: pardo@cs.washington.edu (David Keppel)
- Newsgroups: comp.compilers
- Subject: Large-space GC
- Keywords: GC, storage, design
- Message-ID: <92-08-117@comp.compilers>
- Date: 20 Aug 92 02:23:17 GMT
- Sender: compilers-sender@iecc.cambridge.ma.us
- Reply-To: pardo@cs.washington.edu (David Keppel)
- Organization: Computer Science & Engineering, U. of Washington, Seattle
- Lines: 67
- Approved: compilers@iecc.cambridge.ma.us
-
- As long as we're discussing GC: has anybody studied garbage collection in
- wide-address/persistant store machines? Consider, for example, that you
- have a machine with a 64-bit address space and a distributed single-level
- store. In such a system it is likely that pointers in the active process
- will point to memory on disk, across the network, etc. Today's GC's
- already can take a long time because the GC space may be hundreds of
- megabytes and because parts of that space are out of main memory. In
- ``large-space'' machines it is likely a space will be many gigabytes of
- fragmented memory, distributed across even devices slower than disks.
-
- Note that `conventional' distributed GC algorithms don't really solve the
- problem because they still are designed for spaces on the order of
- hundreds of megabytes. In short, they assume the the entire pointer space
- can be traced to do one round of GC.
-
- I am imagining situations where a process can map in large databases that
- are used only sparsely, or where a process contains a pointer to a data
- structure that contains pointers in to other processes, each happily
- generating (and reclaiming) garbage. Even a simple dictionary could cover
- gigabytes of data structures describing text, tables, static pictures,
- animations, and so on. The time to trace the dictionary is probably on
- the order of days.
-
- The general problem is that in a large address space there isn't enough
- time to trace *all* pointers to live objects. The GC really needs to
- restrict the search to reasonable-size subspaces.
-
- I imagine several possible solutions, but I'm sure there must be
- others besides:
-
- - Reference counting. Tends to have a large compute overhead and
- fails when there are cycles, but has excellent locality and can
- reallocate freed storage immediately.
-
- - Mark segments of the memory space as non-GC'd. Variations are
- possible: the collector gets to avoid performing a collection in the
- segment, it gets to assume that the segment is never GC'd, thus
- never moves, contains no roots, ...
-
- - Mark segments as externally GC'd.
-
- - Keep summary info for a segment, where the summary info says what
- other segments have pointers in to the segment and what other
- segments are pointed to by the segment. The summary info might be
- similar to inter-generation info in generational collectors, or to
- distributed collector references.
-
- - Make all references in to a segment equivalent. The segment is
- deallocated all at once, and only when the last pointer in to the
- segement disappears.
-
- - Qualify objects and types as reference-contributing or not,
- exportable (to another segment) or not, ...
-
- There are additional problems to be solved. For example, for security and
- protection reasons, my user-level process really shouldn't be allowed to
- set mark bits in something owned by another ``administrative entity.''
- Yet halting tracing at read-only boundaries doesn't work, because the
- read-only object may have a reference back in to the local (writable)
- segment that the collector is trying to GC.
-
- I will summarize e-mail.
-
- ;-D oN ( GC for the 90's: Reduce, Reuse, Recycle! ) Pardo
- --
- Send compilers articles to compilers@iecc.cambridge.ma.us or
- {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
-