OXCC.TXT Version: 1.433 -- preliminary 5 Nov 1995 by Norman D. Culver OXCC is a multipass interpreting C compiler with numerous language extensions (see c.grm). OXCC generates output in an Architecture Neutral Format (ANF). Sample backends are provided to guide the programmer in dealing with ANF and in writing additional backends for a specific purpose. The builtin interpreter provides for a great deal of flexibility but it does make the compiler a memory hog. The entire file under compilation is stored in memory as an Abstract Syntax Tree (AST). The AST can be printed to stdout with the -a option. Language extensions have been inspired by GCC, MSC, and Watcom C. OXCC is designed to produce 16 bit, 32 bit, 64 bit, segmented and flat model code for any target architecture and operating system. OXCC can regenerate C source code after interpreting some or all of its' input. Source regeneration properly handles `malloced' data containing pointers. The regenerated source code can be `shrouded'. Regenerated source files have the suffix .cr . The builtin interpreter can be run in fast or slow mode. Slow mode maintains elaborate pointer and initialization information which often is the only way to catch a subtle runtime bug. The compiler and include files are located in the file `oxcc.cff'. It is run from the command line by the skeleton program `oxcc.exe' (see skel.doc). If no switches are enabled, OXCC merely checks the input file(s) for errors. OXCC is reentrant; a set of C calls and a set of class based calls are provided. Multiple instances of OXCC can be run simultaneously. A program being compiled by OXCC can run OXCC as a subroutine. The program under compilation can gain access to the AST and symbol tables which describe it by using the calls __builtin_iv() and __builtin_root(). See: toxcc.c Usage: oxcc [-adoqrstuwABDEFGHILMOPRSTWY(] file... -a == print ast -s == print symbol table -t == print runtimes -u == print memory usage -r == run the code -f == if run code, go fast -L == produce listing -E == preprocess only -S == shrouded source output -T == trace lines while interpreting -P == Parse only "-(args for interpreted main" -dn == enable debug output, 1=parser 2=lexer 3=both -q == suppress printing gratuitous info -o outfile == name of output file, default is first infile -A == ansi_mode (suppress extensions, not fully implemented) -W == if run code and not fast mode, warn about address problems -w == suppress compiler warnings -R func == if run code, start at function `func' -Ipath == include path for the C preprocessor -Ddef == define something for the C preprocssor -Gx == generate output in format `x' (abdnmrs) -Ox == generate output for operating system `x' (dDwWCNoOUL) -Hx == generate output for hardware type `x' (iIPDHmM) -Bx == generate output for debugger `x' (vwbgd) -Fx == generate object file format `x' (oOPWBace) -Yx == generate assembler format `x' (ugmt) -Mx == use memory model `x' (tsmlchx) OUTPUT OPTIONS -Gs regenerate source (output file has .cr suffix) -SGs regenerate shrouded source (output file has .cr suffix) -Gb generate bytecodes (calls oxccb, output file has .byt suffix) -LGb generate bytecode listing (calls oxccb, output file has .lst suffix) -Ga generate assembler output (calls oxccaH, where H is hardware type) -Gd generate readable ANF code (output file has .dbg suffix) -Gm generate machine code (calls oxccmH, where H is hardware type) -Gn generate ANF code (output file has .anf suffix) -Gr generate RIP code (calls oxccr, output file has .rip suffix) (see oxanf.doc, oxanf.h) -Ox placed in header slot `target_os' default `D' DOS32PLAT -Hx placed in header slot `target_hardware' default `I' INTEL8632 -Bx placed in header slot `target_debugger' default 0 NONE -Fx placed in header slot `obj_format' default `a' AOUTFORMAT -Yx placed in header slot `target_assembler' default `g' GAS -Mx placed in header slot `memory_model' default `x' MODFLAT INCOMPATIBILITIES FUNCTION DECLARATIONS OXCC, being a multipass compiler, always chooses the `best' declaration for a function. The old style practice of hiding function declarations with a declaration containing an unknown number of args (commonly used by some programmers) just will not work. At the very least you will get a warning if a subroutine is called with arguments imcommensurate with the `best' declaration. OXCC will not assume that the declaration of an undeclared function `func' is `int func()', you must explicitly declare all functions. LANGUAGE EXTENSIONS RUNTIME INTERPRETATION The -r switch will cause OXCC to interpret the AST if it can find a function named `main' or failing that a function with the base name of the input file e.g. test32.c with a function named `test32'. The user can specify a unique starting function with the -R switch. Arguments can be passed to the starting function by using the -( switch providing the starting function adheres to the argc, argv convention. e.g.: oxcc -r test32.c "-(23 hello 14 -W" Another way to cause runtime interpretation is to call the starting function from the right hand side of the last initialized outer variable. The only restriction is that the starting function must return a value. INTERPRETING OUTER DECLARATIONS INCLUDING INNER STATIC VARIABLES OXCC evaluates (interprets) non-constant expressions in outer declarations. Anything that can appear in a normal C program can contribute to the value that is stored in an initialized variable. Uninitialized variables can become initialized as a side effect of a function call. Two reserved words `_ival' and `_ifunc' can be prepended to variables and functions respectively in order to prevent them from appearing in the output. e.g.: double q = sin(2.0) / cos(4.3); void *ptr = malloc(200); // interpreted malloc acts like calloc static int x,y; _ifunc int initfunc() // function `initfunc' will not be output { int i; x = 50; // static variable x is initialized to 50. y = 25; // static variable y is initialized to 25. for(i = 0; i < x; ++i) ptr[i] = malloc(y); // initialize the array of pointers return 0; } int startfunc(int z) // function `startfunc' will appear in output { x += z; // static variable x is modified before output ... return 0; } _ival int z = initfunc(); // variable `z' will not be output char *ary[20] = {[2]=ptr[3], [3]=malloc(x), [18]=malloc(y)}; _ival int dummy = startfunc(25); // variable `dummy' will not be output AUTOMATIC VARIABLES (INNER DECLARATIONS) Automatic variables can be initialized with non-constant expressions. Static variables mentioned inside functions can be non-constant and will be initialized at outer declaration time. `alloca' is not a suitable initializer for a static variable inside a function, use `malloc'. DEFAULT ARGUMENTS FOR FUNCTIONS Functions can be declared with default args, just use an `=' and fill in the right hand side. e.g.: int func(int a = 3, struct _a b = {2.3,4,1}, char *cp = "hello") { .... } Functions with default args can be called with 0 or more actual args. They can also be called normally. e.g.: func(cp: "goodby"); // a and b will take the default values func(3,B,ptr); // a, b, and cp are fully specified func(3); // b and cp will take the default values func(); // a, b, and cp take the default values LABELED IDENTIFIERS FOR INITIALIZING ARRAYS AND STRUCTURES This extension is inspired by GCC 2.6.x . e.g.: int array[200] = {[5]= 2, [123]= 45}; int array[20][50] = {[3][12]= 6, [18][23]= 8}; struct { int x; int y; double q; struct { int a; int b; } bb; struct { int a; int b; } cc; } aa = {.q=4.6, .cc={.b = 12}}; // everything else set to 0 COMPOUND EXPRESSIONS RETURN A VALUE Place parenthesis around braces to create a compound expression which consists of multiple statements and returns a value. This extension is inspired by GCC. e.g.: int y = ({int i; for(i=0; i<200;++i) if(i>x) break; i+x; // mention variable to be returned }); // y = i+x NESTED FUNCTIONS Nested functions are functions placed inside functions at the location of automatic variables, i.e. before stmts following a left brace. All of the automatic variables of the enclosing function are within the scope of the nested function and do not have to be passed as arguments when the nested function is called. OXCC implements flavor #1 of nested functions in which the stack of the nested function coincides with the stack of the enclosing function. This is the most efficient way to deal with nested functions but precludes a nested function from being called recursively. The address of a nested function can be taken and passed to a syncronous callback. Asyncronous callbacks (such as might occur in an operating system like WINDOWS) will not work. Nested functions can call other nested functions which are within scope. Flavor #2 of nested functions requires that the nested function be extracted from its' surroundings and given a stack of its' own. A pointer to the stack frame of the enclosing function is passed invisibly whenever the nested function is called. Callbacks are implemented with thunks. This method produces a much slower nested function facility but is usually necessary when generating machine language. Asyncronous callbacks will not work. TYPEOF, ALIGNOF The type of an expression can be derived and applied wherever a normal type would be used. e.g.: typeof(x) y; typeof(*x) y; The alignment of an expression can be obtained. e.g.: int x = __alignof__(y); COMPUTED TYPEDEF A typedef can be computed. e.g.: typedef XTYPE = x; XTYPE q; STRUCTURE ALIGNMENT AND PACKING Structures can be designated as packed with the `_Packed' keyword. OXCC also supports the awful __attribute__ constructions of GCC. OXCC also supports various forms of the #pragma pack(n) directives but it is strongly suggested that these not be used because source regeneration does not handle pragma regeneration. LOCAL LABELS Each block is a scope in which local labels can be declared. The value of the label goes out of scope with the block. This is handy for macros. GCC inspired. e.g.: { __label__ l1: // declares l1 to be a local label ... goto l1; ... l1: } CASE RANGES Case values may be expressed in the form: case 2 ... 4: ARITHMETIC ON VOID POINTERS Pointers typed as void* are assumed to have the same size as char* for the purpose of pointer arithmetic. MACROS WITH VARIABLE NUMBERS OF ARGUMENTS GCC inspired extension to the C preprocessor. e.g.: #define myprintf(format, args...) \ fprintf(stderr, format, ## args) ZERO LENGTH ARRAYS Arrays of zero length are allowed within structures. CONDITIONALS WITH OMITTED OPERANDS The construction x ? : y is equivalent to x ? x : y except that x is not evaluated a second time. DOUBLE WORD INTEGERS The long long type is supported. LONG DOUBLE The long double type is supported. FUNCTION TYPES Various keywords from the segmented DOS world are understood by OXCC. Currently OXCC does not do anything other than label the function for later processing. (see c.grm) SEGMENT INFORMATION The keywords `__segdef__' and `__seguse__' are used to specify segment info. The arguments to __segdef__ must be constant expressions. This info is passed along to back end code generators. e.g.: __segdef__ DATA16 arg1, arg2, arg3; // 0 to 3 args __segdef__ DATA32 arg1, arg2, arg3; __segdef__ TEXT16 arg1, arg2, arg3; __seguse__ DATA32; int x,y,z; __seguse__ TEXT16; int func() { } __seguse__ TEXT32; int func1() { } BASED POINTERS Microsoft C defines based pointers and segment variables, OXCC currently parses and stores the information for later processing by back ends, but it does not yet know how to interpret this stuff. It can correctly regenerate source. NEAR FAR HUGE POINTERS Ditto as per Based pointers.(see c.grm) ASSEMBLER INSTRUCTIONS Various flavors of assembler code can be absorbed and regenerated by OXCC. Interpretation is out of the question and assembler instructions are not passed to back end code generators on the theory that portability can never be achieved. The OXCC solution is to provide an extensible facility for direct generation of ANF code. (see c.grm) ANF INSTRUCTION BLOCKS OXCC generates ANF code (see anf.doc, oxanf.h) from C instructions. The programmer can generate ANF code by enclosing it in a block. e.g.: __anf__ { mov x,y/2; // divide y by 2 and store in x lsh y,z,3; // shift z left by 3 and store in y ... } ANF blocks can be placed inside or outside of functions. The basic set of ANF instructions can be extended by programmers to achieve meaningful (I hope) methods of expressing concepts which normally require assembler code. Essentially, ANF instructions consist of an opcode followed by up to 3 arguments, the opcode set can be extended by: 1. add new strings to oxanf.h 2. compile oxanf.h 3. insert in oxlib.cff with `cfar.exe' (see oxcc.mak) ANF arguments can be any valid C expression and are evaluated by OXCC with code generation where appropriate. NO-NAME STRUCTURES/UNIONS Inspired by Visual C++ 2.0 In order to compile 32 bit Windows programs it is necessary to deal with un-named structures and unions which are members of named struct/unions. This feature permits the programmer to reference the members of the un-named struct/unions as if they were members of the enclosing named container. Just make sure that all of the member names are unique. Very nice idea. GLOBAL SUBROUTINES IN OXCC -- also callable by code being interpreted (see oxcc.h and toxcc.c) void *__builtin_iv(void); void *__builtin_root(void); void *oxcc_get_pg(void *iv); void oxcc_enable_trace(void *iv); void oxcc_disable_trace(void *iv); void oxcc_debug(void *iv, int bits); void oxcc_proc_ptr_info(void *iv, void (*func)()); func(void*,void*,void*,long); void oxcc_proc_syms(void *iv, unsigned space, void (*func)()); func(AstP node, long symb, void *container); void oxcc_proc_swtable(void*iv, void *swnode, void (*func)()); func(long swval, AstP root); void oxcc_proc_mallocs(void *iv, void *func()); func(void *loc, int size, Item *ip); void *oxcc_open_instance(void); void oxcc_set_options(void *iv, char *opts); int oxcc_preproc_file(void *iv, void *is, void *os, void *es, int argc, char **argv); int oxcc_parse_file(void *iv, void *is, void *es, char *filename); void oxcc_print_parse_errors(void *iv, void *es); int oxcc_check_ast_tree(void *iv, void *es, char *filename); int oxcc_init_outers(void *iv, void *es); int oxcc_run_tree(void *iv, void *es, char *fnam, char *arg, char *startf); int oxcc_gen_code(void *iv, void *es, char *filename, void *os); void oxcc_cleanup_parse(void *iv); void oxcc_close_codefile(void *iv); void oxcc_close_instance(void *iv); void oxcc_print_ast(void *iv, void *os, int flag); void *oxcc_get_ast_root(void *iv); int oxcc_eval_expr(void *iv, void *buf, double *result, void *es); void gSetup(void *self, void *str); int gPreProc(void *self, void *is, void *os, void *es,int argc,char **argv); int gParse(void *self, void *is, void *es, char *filename); void gPerror(void *self, void *es); int gCheckTree(void *self, void *es, char *filename); int gInitOuters(void *self, void *es); int gRunCode(void *self, void *es, char *filename, char *args); int gGenCode(void *self, void *es, void *os, char *filename); void gCleanup(void *self); void gCloseCode(void *self); void gPrtAst(void *self, void *es, int flag); void *gGetRoot(void *self); int gEval(void *self, void *buf, double *result, void *es); TODO 1. Improved optimization 2. Inline functions 3. Flavor #2 for nested functions 4. Modify oxcc to be callable as a reentrant subroutine or class [DONE 25May] 5. True interpretation of segmented code and 16 bit code. 6. Interpret ANF instruction blocks. 7. Support long double and complex data types. [long double DONE 5Nov] 8. Write more back ends 9. Better documentation 10. Better test program [test.bat for starters] 11. Add a new type `enumstring' to avoid parallel tables 12. Built in inheritance engine with COM, SOM, DCE compliance. [Coming up] 13. Generate Java,html,RIP code (need to juice up the grammar a bit) 14. Suggestions ??