home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!snorkelwacker.mit.edu!ai-lab!life.ai.mit.edu!tmb
- From: tmb@arolla.idiap.ch (Thomas M. Breuel)
- Newsgroups: comp.lang.c++
- Subject: garbage collection (some actual DATA)...
- Message-ID: <TMB.92Aug20145357@arolla.idiap.ch>
- Date: 20 Aug 92 18:53:57 GMT
- Sender: news@ai.mit.edu
- Reply-To: tmb@idiap.ch
- Distribution: comp
- Organization: IDIAP (Institut Dalle Molle d'Intelligence Artificielle
- Perceptive)
- Lines: 217
-
- OK, lots of people have been claiming that GC is "obviously" a drag
- and "obviously" costs you an arm and a leg. Well, here are some simple
- benchmarks that suggest that this isn't so obvious.
-
- The main point is that there is no significant difference between
- running the same code with or without GC if you explicitly free the
- storage that you have allocated (the small difference that there is is
- probably due to the fact that the storage returned by gc_malloc is
- already initialized; this is not strictly speaking necessary).
-
- In this particular case, it turns out that letting the garbage
- collector do the freeing is actually _faster_ than manual freeing of
- memory. I don't want to claim that this is generally true, but it
- shows that even conservative collectors _can_ do better than manual
- storage management.
-
- Reference counting in C++ is, not surprisingly, the slowest method of
- memory management. You can reduce the cost significantly by putting
- the reference count into the data itself, but it then gets much
- messier if you try to "package up" memory management into a template
- class.
-
- For comparison, I have also included a "real" system with GC, namely
- NJ/SML. In fact, compared to all the C/C++ programs, it seems to have
- the lowest overhead for memory management. The reason why it isn't
- much faster than the C version is probably that the compiler is still
- under development (some inefficient calling sequences, failure to
- inline), and that there are a lot of (useful!) runtime checks for
- arithmetic overflow, out-of-bounds indexes, etc. built in.
- Incidentally, the NJ/SML collector managed to clean up this large
- volume of garbage (about 48M) without a single major collection,
- meaning, that in an interactive system there would have been no
- noticeable delay.
-
- Thomas.
-
- ================================================================
-
- int *a[10];
-
- main() {
- int i;
- for(i=0;i<1000000;i++) {
- int j = i%10;
- int k,*p;
- if(a[j]) free(a[j]);
- p=(int*)malloc(10*sizeof (int));
- a[j]=p;
- for(k=0;k<10;k++) *p++ = 0;
- }
- }
-
- compiled with gcc-2.2 -g -O4
-
- arolla$ time a.out
- 15.7 real 15.6 user 0.0 sys
- arolla$ int *a[10];
-
- Here is a profile for the same program compiled with "-p":
-
- %time cumsecs #call ms/call name
- 40.4 10.69 1 10690.00 _main
- 18.1 15.48 mcount
- 10.5 18.271000000 0.00 .rem
- 10.5 21.05 _strcpy
- 9.5 23.571000001 0.00 _malloc
- 6.3 25.25 999991 0.00 _free
- 4.6 26.46 _realloc
- 0.0 26.46 1 0.00 .udiv
- ...
-
- Now, the time spent in _malloc, _free, and (presumably) _strcpy is
- storage allocation related. This is a total of 8.2 seconds, or 50% of
- the running time of the program.
-
- ================================================================
-
- int *a[10];
-
- void *gc_malloc();
-
- main() {
- int i;
- gc_init();
- for(i=0;i<1000000;i++) {
- int j = i%10;
- int k,*p;
- if(a[j]) gc_free(a[j]);
- p=(int*)gc_malloc(10*sizeof (int));
- a[j]=p;
- for(k=0;k<10;k++) *p++ = 0;
- }
- }
-
- compiled with gcc-2.2 -g -O4 memory-gc-auto.c ~/mit/src/lib/cgc/gc.a
- (Boehm-Dehmers collector version 1.6)
-
- arolla$ time a.out
- 16.8 real 16.6 user 0.0 sys
- arolla$
-
- ================================================================
-
- int *a[10];
-
- void *gc_malloc();
-
- main() {
- int i;
- gc_init();
- for(i=0;i<1000000;i++) {
- int j = i%10;
- int k,*p;
- p=(int*)gc_malloc(10*sizeof (int));
- a[j]=p;
- for(k=0;k<10;k++) *p++ = 0;
- }
- gcollect();
- }
-
- compiled with gcc-2.2 -g -O4 memory-gc-auto.c ~/mit/src/lib/cgc/gc.a
- (Boehm-Dehmers collector version 1.6)
-
- arolla$ time a.out
- 12.3 real 12.1 user 0.2 sys
- arolla$
-
- Note that this is _faster_ than the version that calls gc_free
- explicitly. (Of course, GC has an unusually low overhead in this case,
- since there are very few roots.)
-
- ================================================================
-
- template <class T>
- struct Counted {
- private:
- int *count;
- T *object;
- void drop() {
- if(!count) return;
- (*count)--;
- if(*count<=0) {
- delete count;
- delete object;
- count=0;
- object=0;
- }
- }
- void acquire() {
- (*count)++;
- }
- public:
- Counted() {
- count=0;
- object=0;
- }
- Counted(T *object):object(object) {
- count = new int(1);
- }
- ~Counted() {
- drop();
- }
- Counted(Counted &other) {
- count=other.count;
- object=other.object;
- acquire();
- }
- Counted &operator=(Counted &other) {
- drop();
- count=other.count;
- object=other.object;
- acquire();
- return *this;
- }
-
- T &operator*() {return *object;}
- T *operator->() {return object;}
- operator T() {return *object;}
- };
-
- struct IArt {
- int a[10];
- };
-
- typedef Counted<IArt> CIArt;
-
- CIArt a[10];
-
- main() {
- for(int i=0;i<1000000;i++) {
- a[i%10] = CIArt(new IArt);
- }
- }
-
- compiled with gcc-2.2 -g -O4:
-
- arolla$ time a.out
- 33.0 real 32.8 user 0.0 sys
- arolla$
-
- ================================================================
-
- val a = Array.array(10,Array.array(10,0));
-
- fun doit(0) = ()
- | doit(i) =
- (Array.update(a,i mod Array.length(a),Array.array(10,0));
- doit(i-1))
-
- compiled with nj-sml-0.85
-
- - timeit(fn () => doit(1000000));
- user: 16.130000 gc: 1.720000 system: 0.030000 real: 17.902393
- val it = () : unit
- -
-
- ================================================================
-