home *** CD-ROM | disk | FTP | other *** search
- OXCC.TXT Version: 1.433 -- preliminary 5 Nov 1995 by Norman D. Culver
-
- OXCC is a multipass interpreting C compiler with numerous language
- extensions (see c.grm).
-
- OXCC generates output in an Architecture Neutral Format (ANF).
- Sample backends are provided to guide the programmer in dealing with
- ANF and in writing additional backends for a specific purpose.
-
- The builtin interpreter provides for a great deal of flexibility
- but it does make the compiler a memory hog. The entire file under
- compilation is stored in memory as an Abstract Syntax Tree (AST).
- The AST can be printed to stdout with the -a option.
-
- Language extensions have been inspired by GCC, MSC, and Watcom C.
- OXCC is designed to produce 16 bit, 32 bit, 64 bit, segmented and
- flat model code for any target architecture and operating system.
-
- OXCC can regenerate C source code after interpreting some or all of
- its' input. Source regeneration properly handles `malloced' data
- containing pointers. The regenerated source code can be `shrouded'.
- Regenerated source files have the suffix .cr .
-
- The builtin interpreter can be run in fast or slow mode. Slow mode
- maintains elaborate pointer and initialization information which
- often is the only way to catch a subtle runtime bug.
-
- The compiler and include files are located in the file `oxcc.cff'. It
- is run from the command line by the skeleton program `oxcc.exe'
- (see skel.doc). If no switches are enabled, OXCC merely checks the
- input file(s) for errors.
-
- OXCC is reentrant; a set of C calls and a set of class based calls
- are provided. Multiple instances of OXCC can be run simultaneously.
- A program being compiled by OXCC can run OXCC as a subroutine.
- The program under compilation can gain access to the AST and
- symbol tables which describe it by using the calls __builtin_iv() and
- __builtin_root(). See: toxcc.c
-
- Usage: oxcc [-adoqrstuwABDEFGHILMOPRSTWY(] file...
- -a == print ast -s == print symbol table
- -t == print runtimes -u == print memory usage
- -r == run the code -f == if run code, go fast
- -L == produce listing -E == preprocess only
- -S == shrouded source output -T == trace lines while interpreting
- -P == Parse only
- "-(args for interpreted main"
- -dn == enable debug output, 1=parser 2=lexer 3=both
- -q == suppress printing gratuitous info
- -o outfile == name of output file, default is first infile
- -A == ansi_mode (suppress extensions, not fully implemented)
- -W == if run code and not fast mode, warn about address problems
- -w == suppress compiler warnings
- -R func == if run code, start at function `func'
- -Ipath == include path for the C preprocessor
- -Ddef == define something for the C preprocssor
- -Gx == generate output in format `x' (abdnmrs)
- -Ox == generate output for operating system `x' (dDwWCNoOUL)
- -Hx == generate output for hardware type `x' (iIPDHmM)
- -Bx == generate output for debugger `x' (vwbgd)
- -Fx == generate object file format `x' (oOPWBace)
- -Yx == generate assembler format `x' (ugmt)
- -Mx == use memory model `x' (tsmlchx)
-
- OUTPUT OPTIONS
- -Gs regenerate source (output file has .cr suffix)
- -SGs regenerate shrouded source (output file has .cr suffix)
- -Gb generate bytecodes (calls oxccb, output file has .byt suffix)
- -LGb generate bytecode listing (calls oxccb, output file has .lst suffix)
- -Ga generate assembler output (calls oxccaH, where H is hardware type)
- -Gd generate readable ANF code (output file has .dbg suffix)
- -Gm generate machine code (calls oxccmH, where H is hardware type)
- -Gn generate ANF code (output file has .anf suffix)
- -Gr generate RIP code (calls oxccr, output file has .rip suffix)
-
- (see oxanf.doc, oxanf.h)
- -Ox placed in header slot `target_os' default `D' DOS32PLAT
- -Hx placed in header slot `target_hardware' default `I' INTEL8632
- -Bx placed in header slot `target_debugger' default 0 NONE
- -Fx placed in header slot `obj_format' default `a' AOUTFORMAT
- -Yx placed in header slot `target_assembler' default `g' GAS
- -Mx placed in header slot `memory_model' default `x' MODFLAT
-
-
- INCOMPATIBILITIES
-
- FUNCTION DECLARATIONS
- OXCC, being a multipass compiler, always chooses the `best' declaration
- for a function. The old style practice of hiding function declarations
- with a declaration containing an unknown number of args (commonly used
- by some programmers) just will not work. At the very least you
- will get a warning if a subroutine is called with arguments imcommensurate
- with the `best' declaration. OXCC will not assume that the declaration
- of an undeclared function `func' is `int func()', you must explicitly
- declare all functions.
-
-
- LANGUAGE EXTENSIONS
-
- RUNTIME INTERPRETATION
- The -r switch will cause OXCC to interpret the AST if it can find
- a function named `main' or failing that a function with the base name
- of the input file e.g. test32.c with a function named `test32'.
- The user can specify a unique starting function with the -R switch.
- Arguments can be passed to the starting function by using the -( switch
- providing the starting function adheres to the argc, argv convention.
- e.g.:
- oxcc -r test32.c "-(23 hello 14 -W"
- Another way to cause runtime interpretation is to call the starting
- function from the right hand side of the last initialized outer variable.
- The only restriction is that the starting function must return a value.
-
-
- INTERPRETING OUTER DECLARATIONS INCLUDING INNER STATIC VARIABLES
- OXCC evaluates (interprets) non-constant expressions in outer declarations.
- Anything that can appear in a normal C program can contribute to the value
- that is stored in an initialized variable. Uninitialized variables can
- become initialized as a side effect of a function call. Two reserved words
- `_ival' and `_ifunc' can be prepended to variables and functions
- respectively in order to prevent them from appearing in the output.
- e.g.:
- double q = sin(2.0) / cos(4.3);
- void *ptr = malloc(200); // interpreted malloc acts like calloc
- static int x,y;
- _ifunc int initfunc() // function `initfunc' will not be output
- {
- int i;
- x = 50; // static variable x is initialized to 50.
- y = 25; // static variable y is initialized to 25.
- for(i = 0; i < x; ++i)
- ptr[i] = malloc(y); // initialize the array of pointers
- return 0;
- }
- int startfunc(int z) // function `startfunc' will appear in output
- {
- x += z; // static variable x is modified before output
- ...
- return 0;
- }
- _ival int z = initfunc(); // variable `z' will not be output
- char *ary[20] = {[2]=ptr[3], [3]=malloc(x), [18]=malloc(y)};
- _ival int dummy = startfunc(25); // variable `dummy' will not be output
-
-
- AUTOMATIC VARIABLES (INNER DECLARATIONS)
- Automatic variables can be initialized with non-constant expressions.
- Static variables mentioned inside functions can be non-constant and
- will be initialized at outer declaration time.
- `alloca' is not a suitable initializer for a static variable inside
- a function, use `malloc'.
-
-
- DEFAULT ARGUMENTS FOR FUNCTIONS
- Functions can be declared with default args, just use an `=' and fill
- in the right hand side.
- e.g.:
- int func(int a = 3, struct _a b = {2.3,4,1}, char *cp = "hello")
- {
- ....
- }
- Functions with default args can be called with 0 or more actual args.
- They can also be called normally.
- e.g.:
- func(cp: "goodby"); // a and b will take the default values
- func(3,B,ptr); // a, b, and cp are fully specified
- func(3); // b and cp will take the default values
- func(); // a, b, and cp take the default values
-
-
- LABELED IDENTIFIERS FOR INITIALIZING ARRAYS AND STRUCTURES
- This extension is inspired by GCC 2.6.x .
- e.g.:
- int array[200] = {[5]= 2, [123]= 45};
- int array[20][50] = {[3][12]= 6, [18][23]= 8};
- struct {
- int x;
- int y;
- double q;
- struct {
- int a;
- int b;
- } bb;
- struct {
- int a;
- int b;
- } cc;
- } aa = {.q=4.6, .cc={.b = 12}}; // everything else set to 0
-
-
- COMPOUND EXPRESSIONS RETURN A VALUE
- Place parenthesis around braces to create a compound expression
- which consists of multiple statements and returns a value.
- This extension is inspired by GCC.
- e.g.:
- int y = ({int i;
- for(i=0; i<200;++i)
- if(i>x)
- break;
- i+x; // mention variable to be returned
- }); // y = i+x
-
-
- NESTED FUNCTIONS
- Nested functions are functions placed inside functions at
- the location of automatic variables, i.e. before stmts following
- a left brace. All of the automatic variables of the enclosing
- function are within the scope of the nested function and do not
- have to be passed as arguments when the nested function is called.
-
- OXCC implements flavor #1 of nested functions in which the stack
- of the nested function coincides with the stack of the enclosing
- function. This is the most efficient way to deal with nested functions
- but precludes a nested function from being called recursively. The
- address of a nested function can be taken and passed to a syncronous
- callback. Asyncronous callbacks (such as might occur in an operating
- system like WINDOWS) will not work. Nested functions can call other
- nested functions which are within scope.
-
- Flavor #2 of nested functions requires that the nested function be
- extracted from it's surroundings and given a stack of it's own. A
- pointer to the stack frame of the enclosing function is passed invisibly
- whenever the nested function is called. Callbacks are implemented with
- thunks. This method produces a much slower nested function facility
- but is usually necessary when generating machine language. Asyncronous
- callbacks will not work.
-
-
- TYPEOF, ALIGNOF
- The type of an expression can be derived and applied wherever a normal
- type would be used.
- e.g.:
- typeof(x) y;
- typeof(*x) y;
- The alignment of an expression can be obtained.
- e.g.:
- int x = __alignof__(y);
-
-
- COMPUTED TYPEDEF
- A typedef can be computed.
- e.g.:
- typedef XTYPE = x;
- XTYPE q;
-
-
- STRUCTURE ALIGNMENT AND PACKING
- Structures can be designated as packed with the `_Packed' keyword.
- OXCC also supports the awful __attribute__ constructions of GCC.
- OXCC also supports various forms of the #pragma pack(n) directives
- but it is strongly suggested that these not be used because source
- regeneration does not handle pragma regeneration.
-
-
- LOCAL LABELS
- Each block is a scope in which local labels can be declared. The
- value of the label goes out of scope with the block. This is handy
- for macros. GCC inspired.
- e.g.:
- {
- __label__ l1: // declares l1 to be a local label
- ...
- goto l1;
- ...
- l1:
- }
-
-
- CASE RANGES
- Case values may be expressed in the form:
- case 2 ... 4:
-
-
- ARITHMETIC ON VOID POINTERS
- Pointers typed as void* are assumed to have the same size as char*
- for the purpose of pointer arithmetic.
-
-
- MACROS WITH VARIABLE NUMBERS OF ARGUMENTS
- GCC inspired extension to the C preprocessor.
- e.g.:
- #define myprintf(format, args...) \
- fprintf(stderr, format, ## args)
-
-
- ZERO LENGTH ARRAYS
- Arrays of zero length are allowed within structures.
-
-
- CONDITIONALS WITH OMITTED OPERANDS
- The construction x ? : y
- is equivalent to x ? x : y
- except that x is not evaluated a second time.
-
-
- DOUBLE WORD INTEGERS
- The long long type is supported.
-
- LONG DOUBLE
- The long double type is supported.
-
- FUNCTION TYPES
- Various keywords from the segmented DOS world are understood by OXCC.
- Currently OXCC does not do anything other than label the function
- for later processing. (see c.grm)
-
-
- SEGMENT INFORMATION
- The keywords `__segdef__' and `__seguse__' are used to specify
- segment info. The arguments to __segdef__ must be constant expressions.
- This info is passed along to back end code generators.
- e.g.:
- __segdef__ DATA16 arg1, arg2, arg3; // 0 to 3 args
- __segdef__ DATA32 arg1, arg2, arg3;
- __segdef__ TEXT16 arg1, arg2, arg3;
-
- __seguse__ DATA32;
- int x,y,z;
-
- __seguse__ TEXT16;
- int func()
- {
- }
- __seguse__ TEXT32;
- int func1()
- {
- }
-
-
- BASED POINTERS
- Microsoft C defines based pointers and segment variables, OXCC currently
- parses and stores the information for later processing by back ends,
- but it does not yet know how to interpret this stuff. It can correctly
- regenerate source.
-
-
- NEAR FAR HUGE POINTERS
- Ditto as per Based pointers.(see c.grm)
-
-
- ASSEMBLER INSTRUCTIONS
- Various flavors of assembler code can be absorbed and regenerated by
- OXCC. Interpretation is out of the question and assembler instructions
- are not passed to back end code generators on the theory that portability
- can never be achieved. The OXCC solution is to provide an extensible
- facility for direct generation of ANF code. (see c.grm)
-
-
- ANF INSTRUCTION BLOCKS
- OXCC generates ANF code (see anf.doc, oxanf.h) from C instructions. The
- programmer can generate ANF code by enclosing it in a block.
- e.g.:
- __anf__ {
- mov x,y/2; // divide y by 2 and store in x
- lsh y,z,3; // shift z left by 3 and store in y
- ...
- }
- ANF blocks can be placed inside or outside of functions.
- The basic set of ANF instructions can be extended by programmers to
- achieve meaningful (I hope) methods of expressing concepts which
- normally require assembler code. Essentially, ANF instructions consist
- of an opcode followed by up to 3 arguments, the opcode set can be
- extended by:
- 1. add new strings to oxanf.h
- 2. compile oxanf.h
- 3. insert in oxlib.cff with `cfar.exe' (see oxcc.mak)
- ANF arguments can be any valid C expression and are evaluated by OXCC with
- code generation where appropriate.
-
-
- NO-NAME STRUCTURES/UNIONS
- Inspired by Visual C++ 2.0
- In order to compile 32 bit Windows programs it is necessary to deal with
- un-named structures and unions which are members of named struct/unions.
- This feature permits the programmer to reference the members of the
- un-named struct/unions as if they were members of the enclosing named
- container. Just make sure that all of the member names are unique.
- Very nice idea.
-
-
- GLOBAL SUBROUTINES IN OXCC -- also callable by code being interpreted
-
- (see oxcc.h and toxcc.c)
- void *__builtin_iv(void);
- void *__builtin_root(void);
- void *oxcc_get_pg(void *iv);
- void oxcc_enable_trace(void *iv);
- void oxcc_disable_trace(void *iv);
- void oxcc_debug(void *iv, int bits);
-
- void oxcc_proc_ptr_info(void *iv, void (*func)());
- func(void*,void*,void*,long);
- void oxcc_proc_syms(void *iv, unsigned space, void (*func)());
- func(AstP node, long symb, void *container);
- void oxcc_proc_swtable(void*iv, void *swnode, void (*func)());
- func(long swval, AstP root);
- void oxcc_proc_mallocs(void *iv, void *func());
- func(void *loc, int size, Item *ip);
-
- void *oxcc_open_instance(void);
- void oxcc_set_options(void *iv, char *opts);
- int oxcc_preproc_file(void *iv, void *is, void *os, void *es,
- int argc, char **argv);
- int oxcc_parse_file(void *iv, void *is, void *es, char *filename);
- void oxcc_print_parse_errors(void *iv, void *es);
- int oxcc_check_ast_tree(void *iv, void *es, char *filename);
- int oxcc_init_outers(void *iv, void *es);
- int oxcc_run_tree(void *iv, void *es, char *fnam, char *arg, char *startf);
- int oxcc_gen_code(void *iv, void *es, char *filename, void *os);
- void oxcc_cleanup_parse(void *iv);
- void oxcc_close_codefile(void *iv);
- void oxcc_close_instance(void *iv);
-
- void oxcc_print_ast(void *iv, void *os, int flag);
- void *oxcc_get_ast_root(void *iv);
- int oxcc_eval_expr(void *iv, void *buf, double *result, void *es);
-
- void gSetup(void *self, void *str);
- int gPreProc(void *self, void *is, void *os, void *es,int argc,char **argv);
- int gParse(void *self, void *is, void *es, char *filename);
- void gPerror(void *self, void *es);
- int gCheckTree(void *self, void *es, char *filename);
- int gInitOuters(void *self, void *es);
- int gRunCode(void *self, void *es, char *filename, char *args);
- int gGenCode(void *self, void *es, void *os, char *filename);
- void gCleanup(void *self);
- void gCloseCode(void *self);
- void gPrtAst(void *self, void *es, int flag);
- void *gGetRoot(void *self);
- int gEval(void *self, void *buf, double *result, void *es);
-
-
- TODO
- 1. Improved optimization
- 2. Inline functions
- 3. Flavor #2 for nested functions
- 4. Modify oxcc to be callable as a reentrant subroutine or class [DONE 25May]
- 5. True interpretation of segmented code and 16 bit code.
- 6. Interpret ANF instruction blocks.
- 7. Support long double and complex data types. [long double DONE 5Nov]
- 8. Write more back ends
- 9. Better documentation
- 10. Better test program [test.bat for starters]
- 11. Add a new type `enumstring' to avoid parallel tables
- 12. Built in inheritance engine with COM, SOM, DCE compliance. [Coming up]
- 13. Generate Java,RIP code (need to juice up the grammar a bit)
- 14. Make ANF more general and text readable
- 15. Suggestions ??
-