NetNews Usenet Archive 1992 #18

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #18 / NN_1992_18.iso / spool / comp / lang / cplus / 12637 < prev next >

Wrap

Text File | 1992-08-20 | 5.6 KB | 231 lines

Path: sparky!uunet!snorkelwacker.mit.edu!ai-lab!life.ai.mit.edu!tmb From: tmb@arolla.idiap.ch (Thomas M. Breuel) Newsgroups: comp.lang.c++ Subject: garbage collection (some actual DATA)... Message-ID: <TMB.92Aug20145357@arolla.idiap.ch> Date: 20 Aug 92 18:53:57 GMT Sender: news@ai.mit.edu Reply-To: tmb@idiap.ch Distribution: comp Organization: IDIAP (Institut Dalle Molle d'Intelligence Artificielle Perceptive) Lines: 217 OK, lots of people have been claiming that GC is "obviously" a drag and "obviously" costs you an arm and a leg. Well, here are some simple benchmarks that suggest that this isn't so obvious. The main point is that there is no significant difference between running the same code with or without GC if you explicitly free the storage that you have allocated (the small difference that there is is probably due to the fact that the storage returned by gc_malloc is already initialized; this is not strictly speaking necessary). In this particular case, it turns out that letting the garbage collector do the freeing is actually _faster_ than manual freeing of memory. I don't want to claim that this is generally true, but it shows that even conservative collectors _can_ do better than manual storage management. Reference counting in C++ is, not surprisingly, the slowest method of memory management. You can reduce the cost significantly by putting the reference count into the data itself, but it then gets much messier if you try to "package up" memory management into a template class. For comparison, I have also included a "real" system with GC, namely NJ/SML. In fact, compared to all the C/C++ programs, it seems to have the lowest overhead for memory management. The reason why it isn't much faster than the C version is probably that the compiler is still under development (some inefficient calling sequences, failure to inline), and that there are a lot of (useful!) runtime checks for arithmetic overflow, out-of-bounds indexes, etc. built in. Incidentally, the NJ/SML collector managed to clean up this large volume of garbage (about 48M) without a single major collection, meaning, that in an interactive system there would have been no noticeable delay. Thomas. ================================================================ int *a[10]; main() { int i; for(i=0;i<1000000;i++) { int j = i%10; int k,*p; if(a[j]) free(a[j]); p=(int*)malloc(10*sizeof (int)); a[j]=p; for(k=0;k<10;k++) *p++ = 0; } } compiled with gcc-2.2 -g -O4 arolla$ time a.out 15.7 real 15.6 user 0.0 sys arolla$ int *a[10]; Here is a profile for the same program compiled with "-p": %time cumsecs #call ms/call name 40.4 10.69 1 10690.00 _main 18.1 15.48 mcount 10.5 18.271000000 0.00 .rem 10.5 21.05 _strcpy 9.5 23.571000001 0.00 _malloc 6.3 25.25 999991 0.00 _free 4.6 26.46 _realloc 0.0 26.46 1 0.00 .udiv ... Now, the time spent in _malloc, _free, and (presumably) _strcpy is storage allocation related. This is a total of 8.2 seconds, or 50% of the running time of the program. ================================================================ int *a[10]; void *gc_malloc(); main() { int i; gc_init(); for(i=0;i<1000000;i++) { int j = i%10; int k,*p; if(a[j]) gc_free(a[j]); p=(int*)gc_malloc(10*sizeof (int)); a[j]=p; for(k=0;k<10;k++) *p++ = 0; } } compiled with gcc-2.2 -g -O4 memory-gc-auto.c ~/mit/src/lib/cgc/gc.a (Boehm-Dehmers collector version 1.6) arolla$ time a.out 16.8 real 16.6 user 0.0 sys arolla$ ================================================================ int *a[10]; void *gc_malloc(); main() { int i; gc_init(); for(i=0;i<1000000;i++) { int j = i%10; int k,*p; p=(int*)gc_malloc(10*sizeof (int)); a[j]=p; for(k=0;k<10;k++) *p++ = 0; } gcollect(); } compiled with gcc-2.2 -g -O4 memory-gc-auto.c ~/mit/src/lib/cgc/gc.a (Boehm-Dehmers collector version 1.6) arolla$ time a.out 12.3 real 12.1 user 0.2 sys arolla$ Note that this is _faster_ than the version that calls gc_free explicitly. (Of course, GC has an unusually low overhead in this case, since there are very few roots.) ================================================================ template <class T> struct Counted { private: int *count; T *object; void drop() { if(!count) return; (*count)--; if(*count<=0) { delete count; delete object; count=0; object=0; } } void acquire() { (*count)++; } public: Counted() { count=0; object=0; } Counted(T *object):object(object) { count = new int(1); } ~Counted() { drop(); } Counted(Counted &other) { count=other.count; object=other.object; acquire(); } Counted &operator=(Counted &other) { drop(); count=other.count; object=other.object; acquire(); return *this; } T &operator*() {return *object;} T *operator->() {return object;} operator T() {return *object;} }; struct IArt { int a[10]; }; typedef Counted<IArt> CIArt; CIArt a[10]; main() { for(int i=0;i<1000000;i++) { a[i%10] = CIArt(new IArt); } } compiled with gcc-2.2 -g -O4: arolla$ time a.out 33.0 real 32.8 user 0.0 sys arolla$ ================================================================ val a = Array.array(10,Array.array(10,0)); fun doit(0) = () | doit(i) = (Array.update(a,i mod Array.length(a),Array.array(10,0)); doit(i-1)) compiled with nj-sml-0.85 - timeit(fn () => doit(1000000)); user: 16.130000 gc: 1.720000 system: 0.030000 real: 17.902393 val it = () : unit - ================================================================