home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.arch
- Path: sparky!uunet!mcsun!Germany.EU.net!Urmel.Informatik.RWTH-Aachen.DE!martin
- From: martin@math.rwth-aachen.de ( Martin Schoenert)
- Subject: Re: Threaded Code
- Message-ID: <martin.713808550@bert>
- Sender: news@Urmel.Informatik.RWTH-Aachen.DE (Newsfiles Owner)
- Nntp-Posting-Host: bert.math.rwth-aachen.de
- Organization: Rechnerbetrieb Informatik / RWTH Aachen
- References: <55535@mentor.cc.purdue.edu> <13910@goanna.cs.rmit.oz.au> <GLEW.92Aug7185506@pdx007.intel.com> <1992Aug10.141731.10007@email.tuwien.ac.at>
- Date: 14 Aug 92 16:09:10 GMT
- Lines: 396
-
- I tried to experiment a little bit with the functions written by Anton
- Ertl. However, I had lots of problems, because they are so small that
- they are subject to all kinds of strange optimizations.
-
- So I wrote my own tests. First I defined a virtual (stack) machine with
- 28 instructions (not just 2 as in Anton's tests). For this virtual
- machine I wrote a small test program, which computes the number of primes
- less than 40000 in a very straightforward way. Then I wrote and tested
- the following 6 implementations of this virtual machine.
-
- Bytecode with switch statement (swi.c):
- The instructions of the virutal machine are encoded by the integers
- [0..27]. This bytecode is interpreted by a 'switch' statement, where
- each case implements one instruction. After the code of a case has
- done its work it performs a 'goto' to a label immediately before the
- 'switch'.
-
- Bytecode with switch statement, handoptimized (swo):
- This is the same code as above, execpt that the assembler file
- produced by the compiler was edited to remove the unnecessary range
- check of the 'switch'.
-
- Bytecode with label pointers (inb.c):
- The instructions are again encoded by the integers [0..27]. There is
- an array that contains for each instruction the label pointer of the
- code for this instruction. To execute the next instruction the code
- fetches the bytecode, looks it up in the array, and jumps to the
- address found there. Of course this works only with gcc.
-
- Subroutine threading (sub.c):
- For each instruction there is a (C) function which implements this
- instruction. The instructions are encoded with the addresses of
- those functions. Each function returns after it has done its work,
- and there is a central dispatcher in 'main', which looks as follows
- 'while (1) (**pc++)();'
-
- Direct threading (dir.c):
- Again the instructions are encoded by addresses of code that
- implements them. However, this time all code that is in 'main' and
- the addresses are addresses of labels in 'main'. To execute the next
- instruction the code fetches the address and jumps there. So this is
- similar to ``Bytecode with label pointers'', except no indirection is
- needed. Again, this works only with gcc.
-
- Indirect threading (ind.c):
- This lies between ``Direct threading'' and ``Bytecode with label
- pointers''. Each instruction is encoded by the address of the place
- in an array where the address of the code implementing this
- instruction is stored. So to go to the next instruction this
- executes 'goto ***pc++;' instead of 'goto *instr[*pc++];', which
- saves one shift and one add. I included this because it corresponds
- to Anton's indirect method.
-
- On a DECstation 5000/120 this gave the following results:
-
- Compiler swi swo inb sub dir ind
- GCC 2.2.2 11.2 9.0 8.2 8.0 6.2 7.3
- MIPS CC 10.9 8.8 - 10.3 - -
-
- Note the rather large performance gain one gets by removing the range
- check from the switch code. With both compilers it gives approx. 20%.
- It would be really nice if a comiler could do this on his own. This
- shouldn't be very hard. (Maybe somebody who knows gcc better than I do
- wants to try this? ;-)
-
- The remaining difference between 'swo' and 'inb' is due to the fact that
- the code for 'swo' jumps back to the dispatch code at the top of the
- 'switch' statement. Yes, one jump accounts for 7%! This is because the
- code that implements the individual instructions is so simple (between 3
- and 12 machine instructions).
-
- The time for 'sub' compiled with gcc is much better than the time for
- 'sub' compiled with cc because gcc allows to declare the two global
- variables 'pc' and 'sp' (the virtual machines program counter and stack
- pointer) as register variables. Note that gcc still writes the value of
- those register variables from time to time to the memory. If one removes
- one of those stores by hand (in the assembler output) the time drops to
- 7.4 seconds.
-
- The remaining difference between 'sub' and 'dir' is due to the jumps to
- and returns from the subroutines, which are not present in 'dir'. Thus,
- if a C compiler could be relied upon to eliminate tail calls, then 'sub'
- could be made as fast as 'dir'.
-
- (Side note, *Eliminating tail calls* means that in the following
- situation:
-
- f ( ... ) { ...; g(); }
-
- the compiler generates the following code
-
- deallocate stack frame
- *jump* to 'g'
- ; we will never reach this point because 'g' returns to 'f's caller
-
- instead of
-
- *call* 'g'
- deallocate stack frame
- return
-
- A special case of tail call elimination is the elimination of *tail
- recursive* function calls. A function is tail recursive if its last
- statement is a call to itself. If this call of a tail recursive
- function is eliminated, such a function only uses a constant amount
- of stack space for all its invocations. I understand that gcc does
- tail recursive call elimination.)
-
- You may of course draw your own conclusions, however mine are as follows.
-
- Label pointers are an interesting feature that allow the efficient
- implementation of bytecode and direct threaded interpreters.
-
- However, rather than relying on nonstandard extensions of C to implement
- bytecode and direct threaded interpreters, I would like to see the
- following optimizations in C compiler.
-
- 1. Detecting that the expression in a switch statement can only have a
- small number of values (e.g., if the expression is an unsigned char
- variable), and compiling the switch using a jump table but no range
- check.
-
- 2. Allowing (e.g., with an option or a pragma) that a goto to a
- dispatcher is inlined. I am not certain how one would define what a
- dispatcher is, but one possible optimization that would handle this
- is to replace each unconditional branch by the code it branches to,
- up to (and including) the first (conditional or unconditional)
- branch.
-
- 3. Perform (again under the control of an option or pragma) tail call
- elimination.
-
- If a compiler performed those optimizations the implementations of
- bytecode and direct threaded interpreters such as 'swi' and 'sub' would
- be just as efficient as the implementations using gcc's label pointers.
- Such optimizations might also be worthwile in other situations where
- label pointers would not help at all.
-
- Martin.
-
- --
- Martin Sch"onert, Martin.Schoenert@Math.RWTH-Aachen.DE, +49 241 804551
- Lehrstuhl D f"ur Mathematik, Templergraben 64, RWTH, D 51 Aachen, Germany
-
- Here is the code, so you can make your own experiments.
-
- /** prog *******************************************************************/
- INST LD0, /* n = 0; */
- INST LD2, /* i = 2; */
- INST DUP, /* while ( i <= 40000 ) { */
- INST LDL, CONST 200,
- INST LDL, CONST 200,
- INST MUL,
- INST BGT, CONST 27,
- INST LD2, /* j = 2; */
- INST DUP, /* while ( j*j <= i ) { */
- INST DUP,
- INST MUL,
- INST DUP2,
- INST BGT, CONST 11,
- INST DUP1, /* if ( i % j == 0 ) goto next_i */
- INST DUP1,
- INST MOD,
- INST LD0,
- INST BEQ, CONST 9,
- INST LD1, /* j++; */
- INST ADD,
- INST BRA, CONST -15, /* } */
- INST DUP2, /* n++; */
- INST LD1,
- INST ADD,
- INST PUD2,
- INST DRP, /* next_i: (drop j) */
- INST LD1, /* i++; */
- INST ADD,
- INST BRA, CONST -33, /* } */
- INST DRP, /* print n; */
- INST PRINT,
- INST STOP /* exit */
-
- /** swi.c ******************************************************************/
- #include <stdio.h>
-
- int main ()
- {
- enum {
- LD0, LD1, LD2, LD3, LD4, LD5, LD6, LDL,
- DUP, DUP1, DUP2, DUP3, PUD, PUD1, PUD2, PUD3, DRP,
- ADD, SUB, MUL, QUO, MOD, BRA, BEQ, BNE, BGT,
- PRINT, STOP };
-
- static unsigned char prog [] = {
- #define INST
- #define CONST
- #include "prog"
- };
- unsigned char * pc = prog;
-
- static long stack [64];
- long * sp = stack;
-
- dispatch:
- switch ( *pc++ ) {
- case LD0: sp++; *sp = 0; goto dispatch;
- case LD1: sp++; *sp = 1; goto dispatch;
- case LD2: sp++; *sp = 2; goto dispatch;
- case LD3: sp++; *sp = 3; goto dispatch;
- case LD4: sp++; *sp = 4; goto dispatch;
- case LD5: sp++; *sp = 5; goto dispatch;
- case LD6: sp++; *sp = 6; goto dispatch;
- case LDL: sp++; *sp = *pc++; goto dispatch;
- case DUP: sp++; *sp = sp[-1]; goto dispatch;
- case DUP1: sp++; *sp = sp[-2]; goto dispatch;
- case DUP2: sp++; *sp = sp[-3]; goto dispatch;
- case DUP3: sp++; *sp = sp[-4]; goto dispatch;
- case PUD: sp[-1] = *sp; sp--; goto dispatch;
- case PUD1: sp[-2] = *sp; sp--; goto dispatch;
- case PUD2: sp[-3] = *sp; sp--; goto dispatch;
- case PUD3: sp[-4] = *sp; sp--; goto dispatch;
- case DRP: sp--; goto dispatch;
- case ADD: sp--; *sp += sp[1]; goto dispatch;
- case SUB: sp--; *sp -= sp[1]; goto dispatch;
- case MUL: sp--; *sp *= sp[1]; goto dispatch;
- case QUO: sp--; *sp /= sp[1]; goto dispatch;
- case MOD: sp--; *sp %= sp[1]; goto dispatch;
- case BRA: pc += *(signed char*)pc; goto dispatch;
- case BEQ: sp-=2; pc+=(sp[1]==sp[2]?*(signed char*)pc:1); goto dispatch;
- case BNE: sp-=2; pc+=(sp[1]!=sp[2]?*(signed char*)pc:1); goto dispatch;
- case BGT: sp-=2; pc+=(sp[1] >sp[2]?*(signed char*)pc:1); goto dispatch;
- case PRINT: printf("%d\n",*sp--); goto dispatch;
- case STOP: return;
- }
-
- }
-
- /** ind.c ******************************************************************/
- #include <stdio.h>
-
- int main ()
- {
- enum {
- LD0, LD1, LD2, LD3, LD4, LD5, LD6, LDL,
- DUP, DUP1, DUP2, DUP3, PUD, PUD1, PUD2, PUD3, DRP,
- ADD, SUB, MUL, QUO, MOD, BRA, BEQ, BNE, BGT,
- PRINT, STOP };
-
- static void * instr [] = {
- &&ld0, &&ld1, &&ld2, &&ld3, &&ld4, &&ld5, &&ld6, &&ldl,
- &&dup, &&dup1, &&dup2, &&dup3, &&pud, &&pud1, &&pud2, &&pud3, &&drp,
- &&add, &&sub, &&mul, &&quo, &&mod, &&bra, &&beq, &&bne, &&bgt,
- &&print, &&stop };
-
- static unsigned char prog [] = {
- #define INST
- #define CONST
- #include "prog"
- };
- unsigned char * pc = prog;
-
- static long stack [64];
- long * sp = stack;
-
- goto *instr[*pc++];
-
- ld0: sp++; *sp = 0; goto *instr[*pc++];
- ld1: sp++; *sp = 1; goto *instr[*pc++];
- ld2: sp++; *sp = 2; goto *instr[*pc++];
- ld3: sp++; *sp = 3; goto *instr[*pc++];
- ld4: sp++; *sp = 4; goto *instr[*pc++];
- ld5: sp++; *sp = 5; goto *instr[*pc++];
- ld6: sp++; *sp = 6; goto *instr[*pc++];
- ldl: sp++; *sp = *pc++; goto *instr[*pc++];
- dup: sp++; *sp = sp[-1]; goto *instr[*pc++];
- dup1: sp++; *sp = sp[-2]; goto *instr[*pc++];
- dup2: sp++; *sp = sp[-3]; goto *instr[*pc++];
- dup3: sp++; *sp = sp[-4]; goto *instr[*pc++];
- pud: sp[-1] = *sp; sp--; goto *instr[*pc++];
- pud1: sp[-2] = *sp; sp--; goto *instr[*pc++];
- pud2: sp[-3] = *sp; sp--; goto *instr[*pc++];
- pud3: sp[-4] = *sp; sp--; goto *instr[*pc++];
- drp: sp--; goto *instr[*pc++];
- add: sp--; *sp += sp[1]; goto *instr[*pc++];
- sub: sp--; *sp -= sp[1]; goto *instr[*pc++];
- mul: sp--; *sp *= sp[1]; goto *instr[*pc++];
- quo: sp--; *sp /= sp[1]; goto *instr[*pc++];
- mod: sp--; *sp %= sp[1]; goto *instr[*pc++];
- bra: pc += *(signed char*)pc; goto *instr[*pc++];
- beq: sp-=2; pc+=(sp[1]==sp[2]?*(signed char*)pc:1); goto *instr[*pc++];
- bne: sp-=2; pc+=(sp[1]!=sp[2]?*(signed char*)pc:1); goto *instr[*pc++];
- bgt: sp-=2; pc+=(sp[1] >sp[2]?*(signed char*)pc:1); goto *instr[*pc++];
- print: printf("%d\n",*sp--); goto *instr[*pc++];
- stop: return;
-
- }
-
- /** sub.c ******************************************************************/
- #include <stdio.h>
-
- #ifdef __GNUC__
- register void (* * pc)() asm("$22");
- register long * sp asm("$23");
- #else
- void (* * pc)();
- long * sp;
- #endif
-
- void LD0 () { sp++; *sp = 0; }
- void LD1 () { sp++; *sp = 1; }
- void LD2 () { sp++; *sp = 2; }
- void LD3 () { sp++; *sp = 3; }
- void LD4 () { sp++; *sp = 4; }
- void LD5 () { sp++; *sp = 5; }
- void LD6 () { sp++; *sp = 6; }
- void LDL () { sp++; *sp = *(int*)pc++; }
- void DUP () { sp++; *sp = sp[-1]; }
- void DUP1 () { sp++; *sp = sp[-2]; }
- void DUP2 () { sp++; *sp = sp[-3]; }
- void DUP3 () { sp++; *sp = sp[-4]; }
- void PUD () { sp[-1] = *sp; sp--; }
- void PUD1 () { sp[-2] = *sp; sp--; }
- void PUD2 () { sp[-3] = *sp; sp--; }
- void PUD3 () { sp[-4] = *sp; sp--; }
- void DRP () { sp--; }
- void ADD () { sp--; *sp += sp[1]; }
- void SUB () { sp--; *sp -= sp[1]; }
- void MUL () { sp--; *sp *= sp[1]; }
- void QUO () { sp--; *sp /= sp[1]; }
- void MOD () { sp--; *sp %= sp[1]; }
- void BRA () { pc += *(int*)pc; }
- void BEQ () { sp-=2; pc+=(sp[1]==sp[2]?*(int*)pc:1); }
- void BNE () { sp-=2; pc+=(sp[1]!=sp[2]?*(int*)pc:1); }
- void BGT () { sp-=2; pc+=(sp[1] >sp[2]?*(int*)pc:1); }
- void PRINT () { printf("%d\n",*sp--); }
- void STOP () { exit(0); }
-
- int main ()
- {
- static void (* prog [])() = {
- #define INST
- #define CONST (void *)
- #include "prog"
- };
- static long stack [64];
-
- pc = prog;
- sp = stack;
- while ( 1 ) (**pc++)();
- }
-
- /** dir.c ******************************************************************/
- #include <stdio.h>
-
- int main ()
- {
- static void * prog [] = {
- #define INST &&
- #define CONST (void *)
- #include "prog"
- };
- void * * pc = prog;
-
- static long stack [64];
- long * sp = stack;
-
- goto **pc++;
-
- LD0: sp++; *sp = 0; goto **pc++;
- LD1: sp++; *sp = 1; goto **pc++;
- LD2: sp++; *sp = 2; goto **pc++;
- LD3: sp++; *sp = 3; goto **pc++;
- LD4: sp++; *sp = 4; goto **pc++;
- LD5: sp++; *sp = 5; goto **pc++;
- LD6: sp++; *sp = 6; goto **pc++;
- LDL: sp++; *sp = *(int*)pc++; goto **pc++;
- DUP: sp++; *sp = sp[-1]; goto **pc++;
- DUP1: sp++; *sp = sp[-2]; goto **pc++;
- DUP2: sp++; *sp = sp[-3]; goto **pc++;
- DUP3: sp++; *sp = sp[-4]; goto **pc++;
- PUD: sp[-1] = *sp; sp--; goto **pc++;
- PUD1: sp[-2] = *sp; sp--; goto **pc++;
- PUD2: sp[-3] = *sp; sp--; goto **pc++;
- PUD3: sp[-4] = *sp; sp--; goto **pc++;
- DRP: sp--; goto **pc++;
- ADD: sp--; *sp += sp[1]; goto **pc++;
- SUB: sp--; *sp -= sp[1]; goto **pc++;
- MUL: sp--; *sp *= sp[1]; goto **pc++;
- QUO: sp--; *sp /= sp[1]; goto **pc++;
- MOD: sp--; *sp %= sp[1]; goto **pc++;
- BRA: pc += *(int*)pc; goto **pc++;
- BEQ: sp-=2; pc+=(sp[1]==sp[2]?*(int*)pc:1); goto **pc++;
- BNE: sp-=2; pc+=(sp[1]!=sp[2]?*(int*)pc:1); goto **pc++;
- BGT: sp-=2; pc+=(sp[1] >sp[2]?*(int*)pc:1); goto **pc++;
- PRINT: printf("%d\n",*sp--); goto **pc++;
- STOP: return;
-
- }
-