OS/2 Shareware BBS: 10 Tools

home *** CD-ROM | disk | FTP | other *** search

/ OS/2 Shareware BBS: 10 Tools / 10-Tools.zip / pccts1.zip / ADVTUT / ADVTUT.MS next >

Wrap

Text File | 1993-04-01 | 79KB | 2,692 lines

.de iH \\$1 \\$2 \\$3 \\$4 \\$5 \\$6 \\$7 \\$8 \\.XS .if \\n(NS-4 .if \\n(NS-3 .if \\n(NS-2 .if \\n(NS-1 \\*(SN \\$1 \\$2 \\$3 \\$4 \\$5 \\$6 \\$7 \\$8 \\.XE .. .de 1s .nr PS 11 .nr VS 12 .br .LP .. .de 2s .nr PS 11 .nr VS 16 .br .LP .. .de cB .nr PS 9 .nr VS 11 .br .LP .KS .LD .ft CW .. .de cE .DE .KE .2s .ft R .. .EH 'PCCTS Advanced Tutorial 1.0x' .OH 'PCCTS Advanced Tutorial 1.0x' .EF '''Page %' .OF '''Page %' .TL \s+4PCCTS Version 1.0x Advanced Tutorial\s-4 .AU Terence Parr, Hank Dietz, Will Cohen .AI School of Electrical Engineering Purdue University West Lafayette, IN 47907 \fISpring 1992\fP \f(CWparrt@ecn.purdue.edu\fP \f(CWhankd@ecn.purdue.edu\fP \f(CWcohenw@ecn.purdue.edu\fP .2s .LP .br .PP Constructing a translator can be viewed as an iterative refinement process moving from language recognition to intermediate-form transformation. This document presents one possible sequence of refinements. We shall use as many features of PCCTS as is reasonable without regards to optimality. .PP We will develop a compiler for a simple string manipulation language called \fIsc\fP. The compiler will generate code for a simple stack machine. .PP This document assumes familiarity with the concept of a parser generator and other language recognition tools \(em in particular, a passing familiarity with YACC is useful. .PP The development will proceed as follows: .IP \fI(i)\fP Define a grammar to recognize functions, statements, variable definitions and expressions in \fIsc\fP. .IP \fI(ii)\fP Add symbol table management to the grammar developed in \fI(i)\fP .IP \fI(iii)\fP Add actions to translate \fIsc\fP into stack code for a simple stack machine. .IP \fI(iv)\fP Construct trees as an intermediate-form. .IP \fI(v)\fP Build a code-generator to translate the trees into executable C code. .bp .NH .iH \fIsc\fP Language definition .PP \fIsc\fP is a simple, C-like language with only one data object type\(em string. It has global variables, functions, arguments and local variables. There are no arrays or structures. .NH 2 .iH \fIsc\fP Expressions .PP Expression atoms are limited to variables, string constants (\f(CW"\fIcharacter(s)\f(CW"\fR) and function calls (\f(CWfunction(\fIexpr\f(CW)\fR). The string constants \f(CW"false"\fP or \f(CW"0"\fP represent false and \f(CW"true"\fR or \f(CW"1"\fP represent true. A limited set of operators are available (from highest to lowest precedence) .IP \f(CW-\fP 8n Unary operator negate. Numeric operator. .IP "\f(CW*, /\fP" Binary operators multiply and divide. Groups left-to-right. Numeric operator. .IP "\f(CW+, -\fP" Binary operators add and subtract. Groups left-to-right. Numeric operator except for \f(CW+\fP which is \*Qconcatenate\*U if one of operands is non-numeric. .IP "\f(CW==, !=\fP" Binary operators equal, not equal. Groups left-to-right. Numeric or non-numeric. .IP \f(CW=\fP Binary assignment operator. Groups right-to-left. For example, \f(CWa\ =\ b\ =\ c\fP means to assign \f(CWc\fP to \f(CWb\fP and then to \f(CWa\fP. .LP .NH 2 .iH \fIsc\fP Functions and statements .PP Functions may have multiple local variables, but at most one parameter. All functions implicitly return a string to the calling function, although the value need not be used. If a \f(CWreturn\fP-statement is not executed within a function, the return value for that function is undefined. Functions are defined exactly as in C: .cB \fIfunction\fP(\fIarg\fP) { \fIstatement(s)\fP } .cE .PP A function called \f(CWmain()\fP must exist so that our interpreter and compiled code knows where to begin execution. Also, \f(CWmain()\fP always has a implicitly passed parameter which contains any command-line argument. .PP \fIsc\fP has six statements. .IP \f(CWif\fP Conditionals have the form: .RS .cB if ( \fIexpr\fP ) \fIstatement\fP .cE .RE .IP \f(CWwhile\fP Loops have the form: .RS .cB while ( \fIexpr\fP ) \fIstatement\fP .cE .RE .IP "\f(CW{\fP \fIstatement(s)\fP \f(CW}\fP" Block statements treat more than one statement as a single statement as in C. However, one cannot define variables local to a block statement. .IP "\fIexpr\fP\f(CW;\fP" An \fIexpr\fP can include any valid \fIsc\fP expression, but only expressions involving assignments and function calls are useful. .IP "\f(CWreturn\fP \fIexpr\fP \f(CW;\fP" This statement behaves exactly like the C return statement except that only strings can be returned. .IP "\f(CWprint\fP \fIexpr\fP\f(CW;\fP" Print the string described by \fIexpr\fP to \f(CWstdout\fP. .NH 2 .iH Example .PP The following code is a simple \fIsc\fP program that computes \fIn\fP factorial. .cB main(n) { print fact(n); print "\en"; } fact(n) { if ( n == "0" ) return "1"; return n * fact(n - "1"); } .cE .NH .iH Language recognition .PP We begin our project by constructing a parser\(ema program that will recognize phrases in \fIsc\fP. The first two subsections describe the \fIsc\fP lexicon while the later subsections present the \fIsc\fP grammar with and without symbol table management. .NH 2 .iH Lexical analysis .PP Our language has only strings of upper- and lower-case letters, called \f(CWWORD\fP's, operators, string constants and a few grammatical grouping symbols; all of which except \f(CWWORD\fP will be defined implicitly by mentioning them in the grammar. \f(CWWORD\fP will be placed last in the grammar\(emi.e. after all keywords have been defined. Since keywords are really just strings of characters as well, they must be treated specially. When two (or more) regular expressions can be matched for the current input text, DLG scanners resolve the ambiguity by matching the input to the expression mentioned first in the description. Hence, we place .cB #token WORD "[a-zA-Z]+" .cE .LP at the bottom of the file. .PP Tabs, blanks and carriage-returns are all considered white space and are to be ignored (although we wish to track line numbers). The following ANTLR code will instruct the parser to skip over white space. .cB #token "[\et\e ]+" << zzskip(); >> /* Ignore White */ #token "\en" << zzline++; zzskip(); >> .cE .PP Strings are strings of characters enclosed in double-quotes. For simplicity we will allow any character not in the ASCII range 0..0x1F (hex) plus any "escaped" version of the same character set. .cB #token STRING "\e"(~[\e0-\e0x1f\e"\e\e]|(\e\e~[\e0-\e0x1f]))*\e"" <<;>> .cE .NH 2 .iH Attributes .PP The DLG scanner matches keywords, \f(CWWORD\fP's, etc... on the input stream. The way ANTLR parsers access that text is though the attribute mechanism. Attributes are run-time objects associated will all grammar elements (items to the right of the \f(CW:\fP in a rule definition). Although attributes are associated with subrules and rule references, only those attributes linked to grammar tokens are of interest to us. Attributes are denoted \f(CW$\fP\fIi\fP where \fIi\fP is the grammar element \fIi\fP positions from the start of the alternative. The user must define an attribute type called \f(CWAttrib\fP. This type is most often some sort of string whether constant or dynamically allocated. For simplicity, we employ a standard attribute type available with the PCCTS system. Attributes are structures that contain a constant width (30 character) string. By placing the character array inside of a structure, the entire string can be copied like any simple C variable. The ANTLR pseudo-op \f(CW#header\fP is used to describe attributes and other information that must be present in all generated C files. In our case, we simply need to include the standard text attribute definition. This pseudo-op must occur first in the grammar file if it exists at all. .cB #header <<#include "charbuf.h">> .cE .NH 2 .iH Grammatical analysis .PP This subsection describes the grammatical or syntactic aspects of \fIsc\fP. No symbol table management is used and therefore functions and variables are considered simple \f(CWWORD\fP's. Later versions of the grammar can use the tokens \f(CWVAR\fP and \f(CWFUNC\fP. .NH 3 .iH Definitions, variables .PP An \fIsc\fP program is a sequence of definitions\(emvariables or functions: .cB p : ( func | "var" def ";" )* ; .cE .LP The \f(CW( )*\fP subrule means that there are zero or more definitions. The \f(CW|\fP operator starts a new alternative. .PP The grammar for a variable definitions is broken up between the rule \f(CWp\fP and \f(CWdef\fP so that \f(CWdef\fP can be reused for parameter definitions (it also makes more sense when code generation has been added). .cB def : WORD ; .cE .NH 3 .iH Functions .PP \fIsc\fP functions follow C's format except that the default return type is a string instead of \f(CWint\fP. .cB func : WORD "\e(" { WORD } "\e)" "\e{" ( def )* ( statement )* "\e}" ; .cE .LP Note that the parentheses and the curly-braces must be escaped with \f(CW\e\fP because they are special regular expression symbols. .NH 3 .iH Statements .PP The statements outlined in the \fIsc\fP language definition can be described with .cB statement : expr ";" | "\e{" ( statement )* "\e}" | "if" "\e(" expr "\e)" statement {"else" statement} | "while" "\e(" expr "\e)" statement | "return" expr ";" | "print" expr ";" ; .cE .LP where \f(CW{ }\fP means optional. .NH 3 .iH Expressions .PP The operators and expression atoms described in the language definition can be recognized by the following grammar fragment. .cB expr : WORD "=" expr | expr0 ; expr0 : expr1 ( ("=="|"!=") expr1 )* ; expr1 : expr2 ( ("\e+"|"\e-") expr2 )* ; expr2 : expr3 ( ("\e*"|"/") expr3 )* ; expr3 : {"\e-"} expr4 ; expr4 : STRING | WORD { "\e(" { expr } "\e)" } | "\e(" expr "\e)" ; .cE .LP Rule \f(CWexpr\fP is ambiguous if we only look one token into the future since \f(CWWORD\fP can also be an \f(CWexpr0\fP. However, if we were to tell ANTLR to use two tokens, it could see that the \f(CW"="\fP assignment operator uniquely identifies which alternative to match when \f(CWWORD\fP is the first token of lookahead. Rule \f(CWexpr\fP makes this grammar LL(2); but, we only use the extra token of lookahead in \f(CWexpr\fP (leaving all others LL(1)). Please note the use of ANTLR's command-line option \*Q\f(CW-k 2\fP\*U in the makefiles presented below. .PP ANTLR does not have a method of explicitly outlining operator precedence. Instead precedence is implicitly defined by the rule invocation sequence or abstractly by the parse-tree. Rule \f(CWexpr\fP calls \f(CWexpr0\fP which calls \f(CWexpr1\fP etc... nesting more and more deeply. The precedence rule-of-thumb in ANTLR (and any LL-type parser) is: the deeper the nesting level, the higher the precedence (the more tightly the operator binds). Operators in the expression starting rule have the lowest precedence whereas operators in the last rule in the expression recursion have the highest precedence. This becomes obvious when a parse-tree for some input text is examined. .PP Once again, note that the operators for \fIsc\fP must be escaped as they are reserved regular expression operators as well. .bp .NH 2 .iH Complete ANTLR description (w/o symbol table management) .PP The following code is the complete ANTLR description to recognize \fIsc\fP programs. Nothing is added to the symbol table and undefined variables/functions are not flagged. .cB #header <<#include "charbuf.h">> #token "[\et\e ]+" << zzskip(); >> /* Ignore White */ #token "\en" << zzline++; zzskip(); >> #token STRING "\e"(~[\e0-\e0x1f\e"\e\e]|(\e\e~[\e0-\e0x1f]))*\e"" <<;>> << main() { ANTLR(p(), stdin); } >> p : ( func | "var" def ";" )* "@" ; def : WORD ; func : WORD "\e(" { def } "\e)" "\e{" ( "var" def ";" )* ( statement )* "\e}" ; statement : expr ";" | "\e{" ( statement )* "\e}" | "if" "\e(" expr "\e)" statement {"else" statement} | "while" "\e(" expr "\e)" statement | "return" expr ";" | "print" expr ";" ; expr : WORD "=" expr | expr0 ; expr0 : expr1 ( ("==" | "!=") expr1 )* ; expr1 : expr2 ( ("\e+" | "\e-") expr2 )* ; expr2 : expr3 ( ( "\e*" | "/" ) expr3 )* ; expr3 : {"\e-" } expr4 ; expr4 : STRING | WORD { "\e(" { expr } "\e)" } | "\e(" expr "\e)" ; #token WORD "[a-zA-Z]+" .cE .NH 2 .iH Makefile .PP The following makefile can be used to make the above language description into an executable file. We assume that the ANTLR includes and standard attribute packages are located in a directory accessible to us as \f(CW../h\fP. The grammar listed above must be in file \f(CWtut1.g\fP. .cB # # Makefile for 1.00 tutorial (no symbol table stuff) # ANTLR creates parser.dlg, err.c, tut1.c, tokens.h # DLG creates scan.c, mode.h # CFLAGS= -I../h GRM=tut1 SRC=scan.c $(GRM).c err.c OBJ=scan.o $(GRM).o err.o tutorial: $(OBJ) $(SRC) cc -o $(GRM) $(OBJ) # build a parser and lexical description from a language description $(GRM).c parser.dlg : $(GRM).g antlr -k 2 $(GRM).g # build the scanner from a lexical description scan.c : parser.dlg dlg -C2 parser.dlg scan.c .cE .LP Remember that \f(CWmake\fP wants a tab, not spaces, in front of the action statements (e.g. \f(CWcc\fP, \f(CWantlr\fP, ...). .NH 2 .iH Testing .PP After successful completion of \f(CWmake\fP, the executable \f(CWtut\fP will exist in the current directory. \f(CWtut\fP takes input from \f(CWstdin\fP. Therefore, one parses a file via .cB tut < \fIfile\fP.sc .cE .LP The prompt will return without a message if no syntax errors were discovered. However, if \fIfile\fP contains lexical or syntactic errors, error messages will appear. We have given no instructions related to translation, so nothing happens if all is well. .NH 2 .iH Symbol table management .PP The grammar presented thus far can only recognize \fIsc\fP programs. No translation is possible because it does not deal with the semantics of the input program only its structure. To begin semantic interpretation, one must add new symbols to a symbol table. The symbols required to \*Qunderstand\*U an \fIsc\fP program are .IP \f(CWVAR\fP 10n Symbol is a local (denoted with the \f(CW#define\fP constant \f(CWLOCAL\fP), a parameter (\f(CWPARAMETER\fP) or global (\f(CWGLOBAL\fP) variable. .IP \f(CWFUNC\fP 10n Symbol is a function whose level is always \f(CWGLOBAL\fP. .PP A symbol table record requires two fields: \f(CWtoken\fP which indicates either \f(CWVAR\fP or \f(CWFUNC\fP and \f(CWlevel\fP which is either \f(CWLOCAL\fP, \f(CWPARAMETER\fP or \f(CWGLOBAL\fP. .PP The PCCTS system comes with a simple symbol table manager. The source is documented well and its functions will be referenced without explanation here. To use the functions, our grammar must include a file called \f(CWsym.h\fP and define a symbol table structure. We shall put the \f(CW#include\fP in the \f(CW#header\fP directive. .cB #header <<#include "sym.h" #include "charbuf.h">> .cE .LP The file \f(CWsym.c\fP contains the actual functions and must be linked into your executable. Our makefile will handle this automatically. The \f(CWsym.c\fP can be found in the \f(CWsupport/sym\fR subdirectory of the standard PCCTS installation; this should be copied into the tutorial directory. .PP The symbol table manager requires a number of fields within the symbol table entry. A template has been provided that the user can copy into their work directory and modify. The template in \f(CWsym.h\fP will be modified in our case to include the two fields mentioned above\(em\f(CWtoken\fP and \f(CWlevel\fP. .cB typedef struct symrec { char *symbol; struct symrec *next, *prev, **head, *scope; int token; /* either FUNC or VAR */ int level; /* either LOCAL, GLOBAL, PARAMETER */ int offset; /* offset from sp on the stack */ /* locals are - offset, param is 0 */ /* used only in tut4; reserved */ } Sym, *SymPtr; .cE .LP We add the following definitions to the front of our grammar. .cB << #define HashTableSize 999 #define StringTableSize 5000 #define GLOBAL 0 #define PARAMETER 1 #define LOCAL 2 static Sym *globals = NULL; /* global scope for symbols */ >> .cE where \f(CWglobals\fP is used to track all global symbols (functions and variables). Also, to print out symbol scopes, we define a function called \f(CWpScope(Sym\ *p)\fP that dumps a scope to \f(CWstdout\fP. It's implementation is unimportant and given with the full grammar description listed below. To initialize the symbol table, we add a function call to the symbol table manager library yielding a new \f(CWmain()\fP. .cB main() { zzs_init(HashTableSize, StringTableSize); ANTLR(p(), stdin); } .cE .PP When a variable, parameter or function is defined, we want to add that symbol to the symbol table. We shall treat parameters like variables grammatically and use field \f(CWlevel\fP to differentiate between them. When a \f(CWWORD\fP is found in rule \f(CWdef\fP, we will add it to the symbol table using the scope and level passed into \f(CWdef\fP. The situation is complicated slightly by the fact that a local variable may have the same name as a global variable. Scopes are linked lists that weave through the symbol table grouping all entries within the same scope. Our grammar for rule \f(CWdef\fP becomes: .cB def[Sym **scope, int level] : <<Sym *var;>> (WORD | VAR) ; .cE .LP where the init-action defines a local variable called \f(CWvar\fP. .PP To handle the definition of previously unknown symbols, we add an action after the \f(CWWORD\fP reference. .cB ( WORD <<zzs_scope($scope); /* set current scope to scope passed in */ var = zzs_newadd($1.text); /* create entry, add text of WORD to table */ var->level = $level; /* set the level to level passed in */ var->token = VAR; /* symbol is a variable */ >> | VAR ) .cE .LP To deal with a symbol defined in another scope we add the following action to the \f(CWVAR\fP reference. .cB ( WORD <<...>> | VAR <<var = zzs_get($1.text); /* get previous definition */ if ( level != var->level ) /* make sure we have a diff scope */ { zzs_scope($scope); /* same here as above for unknown */ var = zzs_newadd($1.text); var->level = $level; var->token = VAR; } else printf("redefined variable ignored: %s\en", $1.text); >> ) .cE .LP Note that this implies that the lexical analyzer will modify the token number according to its definition in the symbol table (if any). This is generally not a good idea, but can be quite helpful when trying to remove nasty ambiguities in your grammar. Typically more than one token of lookahead makes it unnecessary to use \*Qderived\*U tokens. We do so here to illustrate the interaction of lexical analyzer and parser. .PP Rule \f(CWp\fP must be modified to pass a scope and level to rule \f(CWdef\fP. In addition, we will remove the global scope at the end of parsing and print out the symbols. .cB p : <<Sym *p;>> ( func | "var" def[&globals, GLOBAL] ";" )* <> "@" ; .cE .PP Rule \f(CWfunc\fP now must create a \f(CWFUNC\fP symbol table entry and define a \f(CWVAR\fP entry for its parameter if one exists. Rule \f(CWfunc\fP checks for duplicate \f(CWFUNC\fP entries by only matching unknown \f(CWWORD\fP's. If a function had been previously defined, its token would be \f(CWFUNC\fP. .cB func : <<Sym *locals=NULL, *var, *p;>> WORD <<zzs_scope(&globals); var = zzs_newadd($1.text); var->level = GLOBAL; var->token = FUNC; >> "\e(" { def[&locals, PARAMETER] } "\e)" "\e{" ( "var" def[&locals, LOCAL] ";" )* ( statement )* "\e}" <> ; .cE .LP At the end of the function, we remove the local scope of variables (and parameter if it exists) and print the symbols out to \f(CWstdout\fP. .PP When a \f(CWWORD\fP is encountered on the input stream, we need to look it up in the symbol table to find out whether it is a variable (parameter) or a function. The token number needs to be changed accordingly before the parser sees it so that it will not try to match a \f(CWWORD\fP. Any \f(CWWORD\fP references in an expression that are not defined in the symbol table are undefined variables. We accomplish this token \*Qderivation\*U strategy by attaching an action to the regular expression for \f(CWWORD\fP. .cB #token WORD "[a-zA-Z]+" <<{ Sym *p = zzs_get(LATEXT(1)); if ( p != NULL ) NLA = p->token; }>> .cE .LP The macro \f(CWLATEXT(1)\fP is always set to the text matched on the input stream for the current token. \f(CWNLA\fP is the next token of look-ahead. We need to change this from \f(CWWORD\fP to whatever is found in the symbol table. .PP Rules for statements and expressions do not change when adding symbol table management because they simply apply a structure to grammar symbols and do not introduce new ones. .NH 2 .iH Complete ANTLR description (with symbol table management) .PP The following code is the complete ANTLR description to recognize \fIsc\fP programs. Functions, variables and parameters are added to the symbol table and are printed to \f(CWstdout\fP after function definitions and at the end of the \fIsc\fP program. .cB #header << #include "sym.h" #include "charbuf.h" >> #token "[\et\e ]+" << zzskip(); >> /* Ignore White */ #token "\en" << zzline++; zzskip(); >> #token STRING "\e"(~[\e0-\e0x1f\e"\e\e]|(\e\e~[\e0-\e0x1f]))*\e"" <<;>> << #define HashTableSize 999 #define StringTableSize 5000 #define GLOBAL 0 #define PARAMETER 1 #define LOCAL 2 static Sym *globals = NULL; /* global scope for symbols */ .cE .cB main() { zzs_init(HashTableSize, StringTableSize); ANTLR(p(), stdin); } pScope(p) Sym *p; { for (; p!=NULL; p=p->scope) { printf("\etlevel %d | %-12s | %-15s\en", p->level, zztokens[p->token], p->symbol); } } >> .cE .cB p : <<Sym *p;>> ( func | "var" def[&globals, GLOBAL] ";" )* <> "@" ; def[Sym **scope, int level] : <<Sym *var;>> ( WORD <<zzs_scope($scope); var = zzs_newadd($1.text); var->level = $level; var->token = VAR; >> | VAR <<var = zzs_get($1.text); if ( $level != var->level ) { zzs_scope($scope); var = zzs_newadd($1.text); var->level = $level; var->token = VAR; } else printf("redefined variable ignored: %s\en", $1.text); >> ) ; .cE .cB func : <<Sym *locals=NULL, *var, *p;>> WORD <<zzs_scope(&globals); var = zzs_newadd($1.text); var->level = GLOBAL; var->token = FUNC; >> "\e(" { def[&locals, PARAMETER] } "\e)" "\e{" ( "var" def[&locals, LOCAL] ";" )* ( statement )* "\e}" <> ; statement : expr ";" | "\e{" ( statement )* "\e}" | "if" "\e(" expr "\e)" statement {"else" statement} | "while" "\e(" expr "\e)" statement | "return" expr ";" | "print" expr ";" ; .cE .cB expr : VAR "=" expr | expr0 ; expr0 : expr1 ( ( "==" | "!=" ) expr1 )* ; expr1 : expr2 ( ( "\e+" | "\e-" ) expr2 )* ; expr2 : expr3 ( ( "\e*" | "/" ) expr3 )* ; expr3 : {"\e-" } expr4 ; .cE .cB expr4 : STRING | VAR | ( FUNC | WORD ) "\e(" { expr } "\e)" | "\e(" expr "\e)" ; #token WORD "[a-zA-Z]+" <<{ Sym *p = zzs_get(LATEXT(1)); if ( p != NULL ) NLA = p->token; }>> .cE .NH 2 .iH File \f(CWsym.h\fP .PP The following is a modification of the \f(CWsym.h\fP template provided with PCCTS. The fields of the symbol table entry structure have augmented for our purposes as outlined above. .cB /* define some hash function */ #ifndef HASH #define HASH(p, h) while ( *p != '\e0' ) h = (h<<1) + *p++; #endif typedef struct symrec { char * symbol; struct symrec *next, *prev, **head, *scope; unsigned hash; int token; /* either FUNC or VAR */ int level; /* either LOCAL, GLOBAL, PARAMETER */ int offset; /* offset from sp on the stack */ /* locals are - offset, param is 0 */ /* used only tut4; reserved */ } Sym, *SymPtr; void zzs_init(); void zzs_done(); void zzs_add(); Sym *zzs_get(); void zzs_del(); void zzs_keydel(); Sym **zzs_scope(); Sym *zzs_rmscope(); void zzs_stat(); Sym *zzs_new(); Sym *zzs_newadd(); char *zzs_strdup(); .cE .NH 2 .iH Makefile (for use with symbol table management) .PP The following makefile can be used to make the above language description into an executable file. We assume that the ANTLR includes and standard attribute packages are located in a directory accessible to us as \f(CW../h\fP. Also, \f(CWsym.c\fP, \f(CWsym.h\fP are located in the current working directory. The grammar listed above must be in file \f(CWtut2.g\fP. .cB # # Makefile for 1.00 tutorial # ANTLR creates parser.dlg, err.c, tut1.c # DLG creates scan.c # CFLAGS= -I../h GRM=tut2 SRC=scan.c $(GRM).c err.c sym.c OBJ=scan.o $(GRM).o err.o sym.o tutorial: $(OBJ) $(SRC) cc -o $(GRM) $(OBJ) $(GRM).c parser.dlg : $(GRM).g antlr -k 2 $(GRM).g scan.c : parser.dlg dlg -C2 parser.dlg scan.c .cE .NH 2 .iH Sample input/output .PP The current state of the program accepts input like the following sample. .cB var i; var j; f(k) { var local; var j; if ( "true" ) local = "zippo"; } g() { var note; note = "1"; while ( note ) { i = "456"; } } .cE .LP The output of our executable, \f(CWtut\fP, would be .cB Locals for f: level 2 | VAR | j level 2 | VAR | local level 1 | VAR | k Locals for g: level 2 | VAR | note Globals: level 0 | FUNC | g level 0 | FUNC | f level 0 | VAR | j level 0 | VAR | i .cE .LP Note that the parameter \f(CWk\fP is level 1 for \f(CWPARAMETER\fP and the local variable \f(CWlocal\fP is level 2 for \f(CWLOCAL\fP. .NH .iH Translate \fIsc\fP to stack code .PP Generating code for a stack machine is simple and can be done by simply adding \f(CWprintf()\fP actions to the grammar in the appropriate places. .PP We begin with a discussion of the stack machine and how to generate code for it. Next we augment our grammar with actions to dump stack code to \f(CWstdout\fP. .NH 2 .iH A simple stack machine for \fIsc\fP .PP Our stack machine consists of a single CPU, a finite stack of strings, a finite memory, a stack pointer (\f(CWsp\fP) and a frame pointer (\f(CWfp\fP). All data items used by stack programs are strings (currently set to a maximum length of 100). Our string stack grows downwards towards 0 from the maximum stack height. .PP To make implementation simple, our stack code will actually be a sequence of macro invocations in a C program. This way C will take care of control-flow and allocating space etc... The minimum stack code program defines a \f(CW_main\fP and includes \f(CWsc.h\fP which contains all of the stack code macros: .cB #include "sc.h" _main() { BEGIN; END; } .cE .LP The \f(CWBEGIN\fP and \f(CWEND\fP macros are explained below. .NH 3 .iH Functions .PP Every \fIsc\fP program requires a \f(CWmain\fP which will we translate to \f(CW_main\fP (a C function \f(CWmain\fP will call \f(CW_main\fP). Other functions are translated verbatim. The parameter to your main program will be the first command line argument when you execute your \fIsc\fP program (after translation to C). If no command line argument is specified, a \f(CW""\fP is pushed. .PP Parameters are pushed on the string stack before a function is called, so no argument is needed to the resulting C function. Return values are implicitly strings but are also returned on the string stack\ \(em avoiding the need to define a return type of the C function. An \*Qinstruction\*U (macro) called \f(CWBEGIN\fP is executed before any other in a function. This saves the current frame pointer and then makes it point to the argument passed into the function. \f(CWEND\fP is used to restore the frame pointer to its previous value and make the stack pointer point to the return value. After a function call, the top of stack is always the return value. .PP Arguments and local variables are at offsets to the frame pointer. The optional argument is at offset 0. The first local variable is at offset 1, the second at offset 2 and so on. Graphically, .cB | . | | . | | arg | fp + 0 | 1st local | fp - 1 | 2nd local | fp - 2 | ... | | . | | . | .cE .LP A dummy argument is pushed by the calling function if no argument is specified. .PP To translate a function, .cB \fIf\fP(\fIarg\fP) { } .cE .LP we simply dump the following .cB \fIf\fP() { BEGIN; END; } .cE .LP Note that the argument disappears because we pass arguments on the string stack not the C hardware stack. .NH 3 .iH Variables .PP Global variables of the form .cB var \fIname\fP; .cE .LP are translated to .cB SCVAR \fIname\fP; .cE .LP where SCVAR is a C type defined in \f(CWsc.h\fP that makes space for a \fIsc\fP variable. .PP Local variables are allocated on the string stack and are created at run time via the instruction \f(CWLOCAL\fP. Each execution of \f(CWLOCAL\fP allocates space for one more local variable on the stack. \f(CWLOCAL\fP's must be executed after the \f(CWBEGIN\fP but before any other stack code instruction. The \f(CWEND\fP macro resets the stack pointer so that these locals disappear after each function invocation. For example, .cB main() { var i; var j; } .cE .LP is translated to: .cB #include "sc.h" _main() { BEGIN; LOCAL; LOCAL; END; } .cE .NH 3 .iH Expressions .PP Operator precedence is defined implicitly in top-down (LL) grammars. The deeper the level of recursion, the higher the precedence. To generate code for operators, one prints out the correct macro for that operator after the two operators have been seen (because we are generating code for a stack machine). If the operator is a unary operator like negate, we wait until the operand is seen and then print the operator. For instance, \f(CWexpr1\fP, .cB expr1 : expr2 ( ("\e+"|"\e-") expr2)* ; .cE .LP would be translated as: .cB expr1 : <<char *op;>> expr2 ( ( "\e+" <<op="ADD";>> | "\e-" <<op="SUB";>> ) expr2 <<printf("\et%s;\en", op);>> )* ; .cE .LP where \f(CWADD\fP and \f(CWSUB\fP are \fIsc\fP stack machine macros defined below. .PP The assignment operator groups right to left and must generate code that duplicates itself after each assignment so that the value of the expression is on the stack for any chained assignement. For example, .cB main() { var a; var b; a = b = "test"; } .cE .LP would be translated to: .cB #include "sc.h" _main() /* main() */ { BEGIN; LOCAL; /* var a; */ LOCAL; /* var b; */ SPUSH("test"); /* a = b = "test"; */ DUP; LSTORE(2); /* store into b */ DUP; LSTORE(1); /* store into a */ POP; END; } .cE .PP Since an expression is a statement, it is possible that a value could be left on the stack. For example, .cB main() { "expression"; } .cE .LP We must pop this extraneous value off the stack: .cB #include "sc.h" _main() { BEGIN; SPUSH("expression"); POP; END; } .cE .NH 3 .iH Instructions .PP All instructions take operands from the stack and return results on the stack. Some have no side effects on the stack but alter the C program counter. .IP \f(CWPUSH(\fIv\f(CW)\fR 12n Push the value of variable \fIv\fP onto the stack. .IP \f(CWSPUSH(\fIs\f(CW)\fR 12n Push the value of the string constant \fIs\fP onto the stack. .IP \f(CWLPUSH(\fIn\f(CW)\fR 12n Push the value of the local variable or function argument at offset \fIn\fP onto the stack. .IP \f(CWSTORE(\fIv\f(CW)\fR 12n Pop the top of stack and store it into variable \fIv\fP. .IP \f(CWLSTORE(\fIn\f(CW)\fR 12n Pop the top of stack and store it into local variable or function argument at offset \fIn\fP. .IP \f(CWPOP\fR 12n Pop the top of stack; stack is one string smaller. No side effects with data memory. .IP \f(CWLOCAL\fR 12n Create space on the stack for a local variable. Must appear after \f(CWBEGIN\fP but before any other instruction in a function. .IP \f(CWBRF(\fIa\f(CW)\fR 12n Branch, if top of stack is \f(CW"false"\fP or \f(CW"0"\fP, to \*Qaddress\*U \fIa\fP (C label); top of stack is popped off. .IP \f(CWBRT(\fIa\f(CW)\fR 12n Branch, if top of stack is \f(CW"true"\fP or \f(CW"1"\fP, to \*Qaddress\*U \fIa\fP (C label); top of stack is popped off. .IP \f(CWBR(\fIa\f(CW)\fR 12n Branch to \*Qaddress\*U \fIa\fP (C label). Stack is not touched. .IP \f(CWCALL(\fIf\f(CW)\fR 12n Call the function \fIf\fP. Returns with result on stack top. .IP \f(CWPRINT\fR 12n Pop the top of stack and print the string to \f(CWstdout\fP. .IP \f(CWRETURN\fR 12n Set the return value to the top of stack. Return from the function. .IP \f(CWEQ\fR 12n Perform \f(CWstack[sp] = (stack[sp+1] == stack[sp])\fP. .IP \f(CWNEQ\fR 12n Perform \f(CWstack[sp] = (stack[sp+1] != stack[sp])\fP. .IP \f(CWADD\fR 12n Perform \f(CWstack[sp] = (stack[sp+1] + stack[sp])\fP. .IP \f(CWSUB\fR 12n Perform \f(CWstack[sp] = (stack[sp+1] - stack[sp])\fP. .IP \f(CWMUL\fR 12n Perform \f(CWstack[sp] = (stack[sp+1] * stack[sp])\fP. .IP \f(CWDIV\fR 12n Perform \f(CWstack[sp] = (stack[sp+1] / stack[sp])\fP. .IP \f(CWNEG\fR 12n Perform \f(CWstack[sp] = -stack[sp]\fP. .IP \f(CWDUP\fR 12n Perform \f(CWPUSH(stack[sp])\fP. .IP \f(CWBEGIN\fR 12n Function preamble. .IP \f(CWEND\fR 12n Function cleanup. .PP Boolean operations yield string constants \f(CW"false"\fP for false and \f(CWtrue\fR for true. All other operations yield string constants representing numerical values (\f(CW"3"+"4"\fP is \f(CW"7"\fP). .NH 2 .iH Examples .PP The sample factorial program from above, .cB main(n) { print fact(n); print "\en"; } fact(n) { if ( n == "0" ) return "1"; return n * fact(n - "1"); } .cE .LP would be translated to: .cB #include "sc.h" _main() /* main(n) */ { /* { */ BEGIN; LPUSH(0); /* print fact(n); */ CALL(fact); PRINT; SPUSH("\en"); /* print "\en"; */ PRINT; END; /* } */ } .cE .cB fact() /* fact(n) */ { BEGIN; LPUSH(0); /* if ( n == "0" ) */ SPUSH("0"); EQ; BRF(iflabel0); SPUSH("1"); /* return "1"; */ RETURN; iflabel0: ; LPUSH(0); /* return n * fact(n - "1"); */ LPUSH(0); SPUSH("1"); SUB; CALL(fact); MUL; RETURN; END; /* } */ } .cE .LP .NH 2 .iH Augmenting grammar to dump stack code .PP In order to generate code, we must track the offset of local variables. To do so, we add a field, \f(CWoffset\fP, to our symbol table record: .cB typedef struct symrec { char * symbol; struct symrec *next, *prev, **head, *scope; unsigned hash; int token; /* either FUNC or VAR */ int level; /* either LOCAL, GLOBAL, PARAMETER */ int offset; /* offset from sp on the stack */ /* locals are negative offsets, param is 0 */ } Sym, *SymPtr; .cE .PP We begin modifying our grammar by making a global variable that tracks the offset of local variables and defining a variable to track label numbers: .cB static int current_local_var_offset, LabelNum=0; .cE .PP Because the translation of all programs must include the \fIsc\fP stack machine definitions, we add a \f(CWprintf\fP to rule \f(CWp\fP: .cB p : <<Sym *p; printf("#include \e"sc.h\e"\en"); >> ( func | "var" def[&globals, GLOBAL] ";" )* <> "@" ; .cE .PP In order to generate the variable definition macros and update the symbol table, we modify rule \f(CWdef\fP as follows: .cB def[Sym **scope, int level] : <<Sym *var;>> ( WORD <<zzs_scope($scope); var = zzs_newadd($1.text); var->level = $level; var->token = VAR; var->offset = current_local_var_offset++; switch(var->level) { case GLOBAL: printf("SCVAR %s;\en", $1.text); break; case LOCAL : printf("\etLOCAL;\en"); break; } >> | VAR <<var = zzs_get($1.text); if ( $level != var->level ) { zzs_scope($scope); var = zzs_newadd($1.text); var->level = $level; var->token = VAR; var->offset = current_local_var_offset++; switch(var-> level) { case GLOBAL: printf("\etSCVAR %s;\en",$1.text);break; case LOCAL : printf("\etLOCAL;\en"); break; } } else printf("redefined variable ignored: %s\en",$1.text); >> ) ; .cE .PP The function definition rule must now dump a function template to \f(CWstdout\fP and generate the \f(CWBEGIN\fP and \f(CWEND\fP macros. \f(CWfunc\fP must also update \f(CWcurrent_local_var_offset\fP. Note that the code to dump symbols is gone. .cB func : <<Sym *locals=NULL, *var, *p; current_local_var_offset =0;>> WORD <<zzs_scope(&globals); var = zzs_newadd($1.text); var->level = GLOBAL; var->token = FUNC; if (strcmp("main",$1.text)) { printf("%s()\en",$1.text); } else printf("_main()\en"); >> "\e(" ( def[&locals, PARAMETER] | <<current_local_var_offset = 1;>> ) "\e)" "\e{" << printf("{\en\etBEGIN;\en"); >> ( "var" def[&locals, LOCAL] ";" )* ( statement )* "\e}" << printf("\etEND;\en}\en"); >> <> ; .cE .PP Statements are easy to handle since \f(CWexpr\fP generates most of the code. .cB statement : <<int n;>> expr ";" <<printf("\etPOP;\en");>> | "\e{" ( statement )* "\e}" | "if" "\e(" expr "\e)" << n = LabelNum++; printf("\etBRF(iflabel%d);\en",n); >> statement << printf("iflabel%d: ;\en",n); >> {"else" statement} | "while" << n = LabelNum++; printf("wbegin%d: ;\en", n); >> "\e(" expr "\e)" << printf("\etBRF(wend%d);\en",n); >> statement << printf("\etBR(wbegin%d);\en", n); >> << printf("wend%d: ;\en",n); >> << n++; >> | "return" expr ";" << printf("\etRETURN;\en"); >> | "print" expr ";" << printf("\etPRINT;\en"); >> ; .cE .NH 2 .iH Full translator .PP The following grammar accepts programs in \fIsc\fP and translates them into stack code. .cB #header <<#include "sym.h" #include "charbuf.h" >> #token "[\et\e ]+" << zzskip(); >> /* Ignore White */ #token "\en" << zzline++; zzskip(); >> #token STRING "\e"(~[\e0-\e0x1f\e"\e\e]|(\e\e~[\e0-\e0x1f]))*\e"" <<;>> << #define HashTableSize 999 #define StringTableSize 5000 #define GLOBAL 0 #define PARAMETER 1 #define LOCAL 2 static Sym *globals = NULL; /* global scope for symbols */ static int current_local_var_offset, LabelNum=0; .cE .cB main() { zzs_init(HashTableSize, StringTableSize); ANTLR(p(), stdin); } pScope(p) Sym *p; { for (; p!=NULL; p=p->scope) { printf("\etlevel %d | %-12s | %-15s\en", p->level, zztokens[p->token], p->symbol); } } >> .cE .cB p : <<Sym *p; printf("#include \e"sc.h\e"\en"); >> ( func | "var" def[&globals, GLOBAL] ";" )* <> "@" ; def[Sym **scope, int level] : <<Sym *var;>> ( WORD <<zzs_scope($scope); var = zzs_newadd($1.text); var->level = $level; var->token = VAR; var->offset = current_local_var_offset++; switch(var->level) { case GLOBAL: printf("SCVAR %s;\en", $1.text); break; case LOCAL : printf("\etLOCAL;\en"); break; } >> | VAR <<var = zzs_get($1.text); if ( $level != var->level ) { zzs_scope($scope); var = zzs_newadd($1.text); var->level = $level; var->token = VAR; var->offset = current_local_var_offset++; switch(var-> level) { case GLOBAL: printf("\etSCVAR %s;\en", $1.text); break; case LOCAL : printf("\etLOCAL;\en"); break; } } else printf("redefined variable ignored: %s\en", $1.text); >> ) ; .cE .cB func : <<Sym *locals=NULL, *var, *p; current_local_var_offset = 0;>> WORD <<zzs_scope(&globals); var = zzs_newadd($1.text); var->level = GLOBAL; var->token = FUNC; if (strcmp("main",$1.text)) { printf("%s()\en",$1.text); } else printf("_main()\en"); >> "\e(" ( def[&locals, PARAMETER] | <<current_local_var_offset = 1;>> ) "\e)" "\e{" << printf("{\en\etBEGIN;\en"); >> ( "var" def[&locals, LOCAL] ";" )* ( statement )* "\e}" << printf("\etEND;\en}\en"); >> <> ; statement : <<int n;>> expr ";" <<printf("\etPOP;\en");>> | "\e{" ( statement )* "\e}" | "if" "\e(" expr "\e)" << n = LabelNum++; printf("\etBRF(iflabel%d);\en",n); >> statement << printf("iflabel%d: ;\en",n); >> {"else" statement} | "while" << n = LabelNum++; printf("wbegin%d: ;\en", n); >> "\e(" expr "\e)" << printf("\etBRF(wend%d);\en",n); >> statement << printf("\etBR(wbegin%d);\en", n); >> << printf("wend%d: ;\en",n); >> << n++; >> | "return" expr ";" << printf("\etRETURN;\en"); >> | "print" expr ";" << printf("\etPRINT;\en"); >> ; .cE .cB expr : <<Sym *s;>> VAR "=" expr << printf("\etDUP;\en"); >> <<s = zzs_get($1.text); if ( s->level != GLOBAL ) printf("\etLSTORE(%d);\en", s->offset); else printf("\etSTORE(%s);\en", s->symbol); >> | expr0 ; expr0 : <<char *op;>> expr1 ( ( "==" <<op="EQ";>> | "!=" <<op="NEQ";>> ) expr1 <<printf("\et%s;\en", op);>> )* ; expr1 : <<char *op;>> expr2 ( ( "\e+" <<op="ADD";>> | "\e-" <<op="SUB";>> ) expr2 <<printf("\et%s;\en", op);>> )* ; expr2 : <<char *op;>> expr3 ( ( "\e*" <<op="MUL";>> | "/" <<op="DIV";>> ) expr3 <<printf("\et%s;\en", op);>> )* ; .cE .cB expr3 : <<char *op=NULL;>> {"\e-" <<op="NEG";>>} expr4 <<if ( op!=NULL ) printf("\et%s;\en", op);>> ; expr4 : <<Sym *s; int arg;>> STRING << printf("\etSPUSH(%s);\en", $1.text); >> | VAR << s = zzs_get($1.text); if ( s->level != GLOBAL ) printf("\etLPUSH(%d);\en", s->offset); else printf("\etPUSH(%s);\en", s->symbol); >> | ( FUNC <<$0=$1;>> | WORD <<$0=$1;>> ) "\e(" { <<arg=0;>> expr <<arg=1;>> } "\e)" <<if ( !arg ) printf("\etSPUSH(\e"\e");\en"); printf("\etCALL(%s);\en", $1.text);>> | "\e(" expr "\e)" ; #token WORD "[a-zA-Z]+" <<{ Sym *p = zzs_get(zzlextext); if ( p != NULL ) NLA = p->token; }>> .cE .NH 2 .iH Makefile for Full Translator .PP The makefile is no different from the one used for the symbol table management version of the tutorial except that we need to change the \f(CWGRM\fP make variable as follows: .cB GRM=tut3 .cE .LP which implies that \f(CWtut3.g\fP is the file containing the third revision of our \fIsc\fP translator. .NH 2 .iH Use of translator .PP Once the translator has been made using the makefile, it is ready to translate \fIsc\fP programs. To translate and execute the factorial example from above, we execute the following: .cB tut3 < fact.c > temp.c .cE where \f(CWfact.c\fP is the file containing the factorial code. \f(CWtemp.c\fP will contain the translated program. It is valid C code because it is nothing more than a bunch of macro invocations (the macros are defined in \f(CWsc.h\fP). We can compile \f(CWtemp.c\fP like any other C program: .cB cc temp.c .cE .LP To execute the program, we type: .cB a.out \fIn\fP .cE .LP where \fIn\fP is the number you want the factorial of. Typing: .cB a.out 8 .cE .LP yields: .cB 40320.000000 .cE .NH 2 .iH Stack code macros\ \(em \f(CWsc.h\fP .PP The following file is included by any C program using the stack code instructions outlined above. .cB /* sc.h -- string C stack machine macros * * For use with PCCTS advanced tutorial version 1.0x */ #include <stdio.h> #include <ctype.h> #include <math.h> /* * The function invocation stack looks like: * * | . | * | . | * | arg | fp + 0 arg is "" if none specified * | 1st local | fp - 1 * | 2nd local | fp - 2 * | ... | * | . | * | . | */ .cE .cB #define STR_SIZE 100 #define STK_SIZE 200 /* define stack */ typedef struct { char text[STR_SIZE]; } SCVAR; static SCVAR stack[STK_SIZE]; static int sp = STK_SIZE, fp; /* number begins with number or '.' followed by number. All numbers * are converted to floats before comparison. */ #define SCnumber(a) (isdigit(a[0]) || (a[0]=='.' && isdigit(a[1]))) .cE .cB #define TOS stack[sp].text #define NTOS stack[sp+1].text #define TOSTRUE ((strcmp(TOS, "true")==0)||(strcmp(TOS, "1")==0) \e ||(SCnumber(TOS)&&atof(TOS)==1.0) ) #define TOSFALSE ((strcmp(TOS, "false")==0)||(strcmp(TOS, "0")==0) \e ||(SCnumber(TOS)&&atof(TOS)==0.0) ) #define PUSH(a) {if ( sp==0 ) {fprintf(stderr, "stk ovf!\en"); exit(-1);} \e strcpy(stack[--sp].text, (a).text);} #define SPUSH(a) {if ( sp==0 ) {fprintf(stderr, "stk ovf!\en"); exit(-1);} \e strcpy(stack[--sp].text, a);} #define LPUSH(a) {if ( sp==0 ) {fprintf(stderr, "stk ovf!\en"); exit(-1);} \e stack[--sp] = stack[fp-a];} .cE .cB #define CALL(f) {f();} #define POP stack[sp++] #define LOCAL {if ( sp==0 ) {fprintf(stderr, "stk ovf!\en"); exit(-1);} \e stack[--sp].text[0] = '\e0';} .cE .cB #define BRF(lab) if ( TOSFALSE ) {POP; goto lab;} else POP; #define BRT(lab) if ( TOSTRUE ) {POP; goto lab;} else POP; #define BR(lab) goto lab #define PRINT printf("%s", POP.text) #define RETURN {strcpy(stack[fp].text, TOS); END; return;} #define STORE(v) {v = POP;} #define LSTORE(off) {stack[fp-off] = POP;} .cE .cB /* operators */ #define EQ {char *a,*b; float c,d; \e a = POP.text; b = POP.text; \e if ( SCnumber(a) && SCnumber(b) ) { \e c=atof(a); d=atof(b); \e if ( c == d ) {SPUSH("true");} \e else SPUSH("false"); \e } \e else if ( strcmp(a, b)==0 ) {SPUSH("true");} \e else SPUSH("false");} #define NEQ {SCVAR a,b; float c,d; \e a = POP.text; b = POP.text; \e if ( SCnumber(a) && SCnumber(b) ) { \e c=atof(a); d=atof(b); \e if ( c != d ) {SPUSH("true");} \e else SPUSH("false"); \e } \e else if ( strcmp(a, b)!=0 ) {SPUSH("true");} \e else SPUSH("false");} #define ADD {SCVAR c; float a,b; \e if ( !SCnumber(NTOS) || !SCnumber(TOS) ) { \e strncat(NTOS, TOS, STR_SIZE - strlen(NTOS)); \e sp++; \e } else { \e a=atof(POP.text); b=atof(POP.text); \e sprintf(c.text, "%f", a+b); PUSH(c); \e }} #define SUB {SCVAR c; float a,b; a=atof(POP.text); b=atof(POP.text); \e sprintf(c.text, "%f", b-a); PUSH(c);} #define MUL {SCVAR c; float a,b; a=atof(POP.text); b=atof(POP.text); \e sprintf(c.text, "%f", a*b); PUSH(c);} #define DIV {SCVAR c; float a,b; a=atof(POP.text); b=atof(POP.text); \e sprintf(c.text, "%f", b/a); PUSH(c);} #define NEG {SCVAR c; float a; a=atof(POP.text); \e sprintf(c.text, "%f", -a); PUSH(c);} #define DUP {if ( sp==0 ) {fprintf(stderr, "stk ovf!\en"); exit(-1);} \e stack[sp-1] = stack[sp]; --sp;} .cE .cB #define BEGIN int save_fp = fp; fp = sp #define END sp = fp; fp = save_fp; main(argc, argv) int argc; char *argv[]; { if ( argc == 2 ) {SPUSH(argv[1]);} else SPUSH(""); CALL(_main); POP; } .cE .NH .iH Intermediate form construction .PP This section describes how trees can be used as an intermediate form of the \fIsc\fP source program and how our \f(CWtut2.g\fP symbol table management example can be modified to construct these trees. Example tree constructions are given as well as the complete source. Note that this section and the code generation section essentially duplicate the translation achieved in the previous section on stack code. We arrive at the solution from a different angle, however. This method is much more general and would be used for \*Qreal\*U grammars. .NH 2 .iH Tree Construction .PP To construct an intermediate form (IF) representation of an \fIsc\fP program, we will construct trees using the abstract syntax tree (AST) mechanism provided with PCCTS. PCCTS supports an automatic and an explicit tree construction mechanism; we will use both. We will augment the grammar that has the symbol table management actions\ \(em \f(CWtut2.g\fP. .PP In order to use the AST mechanism within PCCTS, we must do the following: .IP \(bu Define the AST fields, \f(CWAST_FIELDS\fP, needed by the user. .IP \(bu Define the default AST node constructor, \f(CWzzcr_ast()\fP, that converts from a lexeme to an AST node. .IP \(bu Define an explicit AST node constructor\ \(em \f(CWzzmk_ast()\fP (because we use the explicit tree constructor mechanism also). .PP Our tree nodes will contain a token number to identify the node. This token number can also be a value for which there is no corresponding lexical item (e.g. a function call may have a node called \f(CWDefineVar\fP). If the token indicates that the node represents a variable or function, we must have information as to its level, offset from the frame pointer (if a local or parameter) and the string representing the name of the item. This information can all be held via: .cB #define AST_FIELDS int token, level, offset; char str[D_TextSize]; .cE .LP which will be added to the \f(CW#header\fP PCCTS pseudo-op action. Note that \f(CWD_TextSize\fP is defined in \f(CWcharbuf\fP.h. .PP To tell PCCTS how it should build AST nodes for use with the automatic tree construction mechanism, we define the following macro in the \f(CW#header\fP action: .cB #define zzcr_ast(ast,attr,tok,text) create_ast(ast,attr,tok,text) .cE .LP which merely calls a function we define as follows: .cB create_ast(ast,attr,tok,text) AST *ast; Attrib *attr; int tok; char *text; { Sym *s; ast->token = tok; if ( tok == VAR ) { s = zzs_get(text); ast->level = s->level; ast->offset = s->offset; } if ( tok == STRING || tok == VAR || tok == FUNC ) strcpy(ast->str, text); } .cE .LP Because of the symbol table lookup etc... we make the node constructor a function rather than have the code replicated at each invocation of the \f(CWzzcr_ast\fP macro. .PP When we explicitly construct trees, the automatic node constructor is generally disabled and we must explicitly call a function to create an AST node: .cB AST * zzmk_ast(node, token, str) AST *node; int token; char *str; { Sym *s; node->token = token; if ( token == VAR ) { s = zzs_get(str); node->level = s->level; node->offset = s->offset; } if ( tok == STRING || tok == VAR || tok == FUNC ) strcpy(node->str, str); return node; } .cE This function will be invoked when we have a grammar action with a reference of the form \f(CW#[\fItoken, string\fP]\fR. .PP In order to print out the trees that we have defined, let us define a simple function to do a preorder, lisp-like walk of our trees: .cB lisp(tree) AST *tree; { while ( tree!= NULL ) { if ( tree->down != NULL ) printf(" ("); if ( tree->token == STRING || tree->token == VAR || tree->token == FUNC ) printf(" %s", tree->str); else printf(" %s", zztokens[tree->token]); lisp(tree->down); if ( tree->down != NULL ) printf(" )"); tree = tree->right; } } .cE .PP PCCTS automatically assumes that all rules will produce trees and that all terminals within rules will result in nodes added to the current tree. In the case of rules \f(CWp\fP, \f(CWexpr4\fP, \f(CWstatement\fP and \f(CWfunc\fP, this is not true. We will explicitly handle the trees within these rules. Therefore, we append a \f(CW!\fP after the definition of these rules. We need to decide how each \fIsc\fP construct will be translated to a subtree and when we will pass these trees off to the code generator. We will collect all subtrees within a function and then, just before we remove the symbols for that function, we will pass this tree off to \f(CWlisp\fP to be printed; we will pass it off to the code generator when we define it in the next section. After it has been printed, the function trees will be destroyed. For global variable definitions, we will pass each one individually off to \f(CWlisp\fP and then destroy the tree. .PP Before modifying the rules of the grammar, we must define labels for a number of regular expressions so that the token value of that expression can be referenced from within an action. We also define labels for tokens which do not exist physically in a program, but are needed for code generation. .cB /* Not actual terminals, just node identifiers */ #token DefineFunc #token SLIST .cE .cB /* Define tokens for code generation */ #token DefineVar "var" #token Mul "\*" #token Div "/" #token Add "\+" #token Sub "\-" #token Equal "==" #token NotEqual "!=" #token If "if" #token While "while" #token Return "return" #token Print "print" #token Assign "=" .cE .PP All PCCTS grammars that use the AST mechanism need to pass the address of a \f(CWNULL\fP pointer to the starting rule. .cB main() { AST *root=NULL; zzs_init(HashTableSize, StringTableSize); ANTLR(p(&root), stdin); } .cE .PP Rule \f(CWp\fP is modified as follows: .cB p! : <<Sym *p; AST *v;>> ( func | "var" def[&globals, GLOBAL] ";" <<v = #(#[DefineVar], #2); lisp(v); printf("\en"); zzfree_ast(v); >> )* <> "@" ; .cE .LP The \f(CW#[DefineVar]\fP reference calls our \f(CWzzmk_ast\fP macro to create an AST node whose token is set to \f(CWDefineVar\fP. The reference to \f(CW#(#[DefineVar], #2)\fP is a tree constructor which makes the \f(CWDefineVar\fP node the parent of the tree returned by rule \f(CWdef\fP\ \(em \f(CW#2\fP. In general, the tree constructor has the form: .cB #( \fIroot\fP, \fIsibling1\fP, \fIsibling2\fP, ..., \fIsiblingN\fP ) .cE .LP where any \f(CWNULL\fP pointer terminates the list except for the root pointer. \f(CW#2\fP refers to the second AST node or tree within an alternative\ \(em \f(CWdef\fP in our case. Rule \f(CWdef\fP itself does not change because PCCTS automatically creates a node for the \f(CWWORD\fP or \f(CWVAR\fP and passes it back to the invoking rule. .PP Trees for functions are constructed in the following form: .cB DefineFunc | v FUNC -> DefineVar -> DefineVar -> SLIST | | | v | v optional_arg | stat1 -> ... > statn v local1 -> ... -> localn .cE .LP where \f(CWDefineFunc\fP and \f(CWDefineVar\fP are dummy tokens used only in trees (as opposed to the grammar). Rule \f(CWfunc\fP is modified to build these trees: .cB func! : <<Sym *locals=NULL, *var, *p; AST *f,*parm=NULL,*v=NULL,*s=NULL; current_local_var_offset = 0;>> WORD <<zzs_scope(&globals); var = zzs_newadd($1.text); var->level = GLOBAL; var->token = FUNC; >> "\e(" ( def[&locals, PARAMETER] <<parm=#1;>> | <<current_local_var_offset = 1;>> ) "\e)" "\e{" ( "var" def[&locals, LOCAL] <<if ( v==NULL ) v = #2; else v = #(NULL,v,#2);>> ";" )* ( statement <<if ( s==NULL ) s = #1; else s = #(NULL,s,#1);>> )* "\e}" <<s = #(#[SLIST], s); v = #(#[DefineVar], v); parm = #(#[DefineVar], parm); f = #(#[DefineFunc], #[FUNC,$1.text], parm, v, s); lisp(f); printf("\een"); zzfree_ast(f); p = zzs_rmscope(&locals);>> ; .cE .LP Variable \f(CWparm\fP is set to the AST node created by \f(CWdef\fP if a parameter is found else it is \f(CWNULL\fP. Variable \f(CWv\fP is used to maintain a list of local variables. The tree constructor .cB v = #(NULL,v,#2); .cE .LP says to make a new tree with no root and with siblings \f(CWv\fP and \f(CW#2\fP\ \(em which effectively links \f(CW#2\fP to the end of the current list of variables. Variable \f(CWs\fP behaves in a similar fashion. After the list of variables and statements have been accumulated, we add a dummy node (\f(CWDefineVar\fP, \f(CWSLIST\fP) as a root to each list so that we get the structure given above. To examine the trees we have constructed, we make a call to \f(CWlisp\fP and then destroy the tree. .PP Even though it is not necessary to do so, we will build trees for \f(CWstatement\fP explicitly (in general, the automatic mechanism is best for expressions with simple operator/operand relationships). .cB statement!: <<AST *s=NULL, *el=NULL;>> expr ";" <<#0 = #1;>> | "\e{" ( statement <<if ( s==NULL ) s = #1; else s = #(NULL,s,#1);>> )* "\e}" <<#0 = #(#[SLIST], s);>> | "if" "\e(" expr "\e)" statement {"else" statement <<el=#2;>>} <<#0 = #(#[If], #3, #5,el);>> | "while" "\e(" expr "\e)" statement <<#0 = #(#[While], #3,#5);>> | "return" expr ";" <<#0 = #(#[Return], #2);>> | "print" expr ";" <<#0 = #(#[Print], #2);>> ; .cE .LP Many of the tokens in \f(CWstatement\fP are not included in the tree because they are a notational convenience for humans or are used for grouping purposes. As before, we track lists of statements using the tree constructor: \f(CW#(NULL,s,#1)\fP. .PP The last unusual rule is \f(CWexpr\fP: .cB expr4! : <<AST *f=NULL, *arg=NULL;>> STRING <<#0 = #[STRING, $1.text];>> | VAR <<#0 = #[STRING, $1.text];>> | ( FUNC <<f = #[FUNC, $1.text];>> | WORD <<f = #[FUNC, $1.text];>> ) "\e(" { expr <<arg=#1;>> } "\e)" <<#0 = #(f, arg);>> | "\e(" expr "\e)" <<#0 = #2;>> ; .cE .LP Assigning a tree to \f(CW#0\fP makes that tree the return tree. However, it should really only be used in rules that do not use the automatic tree construction mechanism; i.e. there is a \f(CW! after\fP the rule definition. For a function call, we make a tree with a \f(CWFUNC\fP node at the root and the optional argument as the child. The third alternative merely returns what is computed by \f(CWexpr\fP; the parenthesis are mechanism for syntactic precedence and add nothing to the tree. The tree structure itself embodies the precedence. .PP The only other required modifications to the grammar are to indicate which tokens are operators (subtree roots) and which are operands in the expression rules. To indicate that a token is to be made a subtree root, we append a \f(CW^\fP. All other tokens are considered operands (children). Appending a token with \f(CW!\fP indicates that it is not to be included in the tree. For example, to make trees like: .cB operator | v opnd1 -> opnd2 .cE .LP we modify the operators in \f(CWexpr\fP through \f(CWexpr3\fP in a fashion similar to: .cB expr1 : expr2 ( ( "\e+"^ | "\e-"^ ) expr2 )* ; .cE .LP and .cB expr3 : { "\e-"^ } expr4 ; .cE .NH 2 .iH Full Grammar to Construct Trees .PP The following grammar constructs AST's as an intermediate form (IF) of an \fIsc\fP program and prints out the IF in lisp-like preorder. .cB #header <<#include "sym.h" #include "charbuf.h" #define AST_FIELDS int token, level, offset; char str[D_TextSize]; #define zzcr_ast(ast,attr,tok,text) create_ast(ast,attr,tok,text) #define DONT_CARE 0 #define SIDE_EFFECTS 1 #define VALUE 2 >> #token "[\et\e ]+" << zzskip(); >> /* Ignore White */ #token "\en" << zzline++; zzskip(); >> #token STRING "\e"(~[\e0-\e0x1f\e"\e\e]|(\e\e~[\e0-\e0x1f]))*\e"" <<;>> /* Not actual terminals, just node identifiers */ #token DefineFunc #token SLIST .cE .cB /* Define tokens for code generation */ #token DefineVar "var" #token Mul "\e*" #token Div "/" #token Add "\e+" #token Sub "\e-" #token Equal "==" #token NotEqual "!=" #token If "if" #token While "while" #token Return "return" #token Print "print" #token Assign "=" <<;>> << #define HashTableSize 999 #define StringTableSize 5000 #define GLOBAL 0 #define PARAMETER 1 #define LOCAL 2 .cE .cB static Sym *globals = NULL; /* global scope for symbols */ static int current_local_var_offset=0; create_ast(ast,attr,tok,text) AST *ast; Attrib *attr; int tok; char *text; { Sym *s; ast->token = tok; if ( tok == VAR ) { s = zzs_get(text); ast->level = s->level; ast->offset = s->offset; } if ( tok == STRING || tok == VAR || tok == FUNC ) strcpy(ast->str, text); } .cE .cB AST * zzmk_ast(node, token, str) AST *node; int token; char *str; { Sym *s; node->token = token; if ( token == VAR ) { s = zzs_get(str); node->level = s->level; node->offset = s->offset; } if ( token == STRING || token == VAR || token == FUNC ) strcpy(node->str, str); return node; } .cE .cB lisp(tree) AST *tree; { while ( tree!= NULL ) { if ( tree->down != NULL ) printf(" ("); if ( tree->token == STRING || tree->token == VAR || tree->token == FUNC ) printf(" %s", tree->str); else printf(" %s", zztokens[tree->token]); lisp(tree->down); if ( tree->down != NULL ) printf(" )"); tree = tree->right; } } main() { AST *root=NULL; zzs_init(HashTableSize, StringTableSize); ANTLR(p(&root), stdin); } .cE .cB pScope(p) Sym *p; { for (; p!=NULL; p=p->scope) { printf("\etlevel %d | %-12s | %-15s\en", p->level, zztokens[p->token], p->symbol); } } >> p! : <<Sym *p; AST *v;>> ( func | "var" def[&globals, GLOBAL] ";" <<v = #(#[DefineVar], #2); gen(v,DONT_CARE); printf("\en"); zzfree_ast(v); >> )* <> "@" ; .cE .cB def[Sym **scope, int level] : <<Sym *var;>> ( WORD <<zzs_scope($scope); var = zzs_newadd($1.text); var->level = $level; var->token = VAR; var->offset = current_local_var_offset++; >> | VAR <<var = zzs_get($1.text); if ( $level != var->level ) { zzs_scope($scope); var = zzs_newadd($1.text); var->level = $level; var->token = VAR; var->offset = current_local_var_offset++; } else printf("redefined variable ignored: %s\en", $1.text); >> ) ; func! : <<Sym *locals=NULL, *var, *p; AST *f,*parm=NULL,*v=NULL,*s=NULL; current_local_var_offset = 0;>> WORD <<zzs_scope(&globals); var = zzs_newadd($1.text); var->level = GLOBAL; var->token = FUNC; >> "\e(" ( def[&locals, PARAMETER] <<parm=#1;>> | <<current_local_var_offset = 1;>> ) "\e)" "\e{" ( "var" def[&locals, LOCAL] <<if ( v==NULL ) v = #2; else v = #(NULL,v,#2);>> ";" )* ( statement <<if ( s==NULL ) s = #1; else s = #(NULL,s,#1);>> )* "\e}" <<s = #(#[SLIST], s); v = #(#[DefineVar], v); parm = #(#[DefineVar], parm); f = #(#[DefineFunc], #[FUNC,$1.text], parm, v, s); gen(f,DONT_CARE); printf("\en"); zzfree_ast(f); p = zzs_rmscope(&locals);>> ; .cE .cB statement!: <<AST *s=NULL, *el=NULL;>> expr ";" <<#0 = #1;>> | "\e{" ( statement <<if ( s==NULL ) s = #1; else s = #(NULL,s,#1);>> )* "\e}" <<#0 = #(#[SLIST], s);>> | "if" "\e(" expr "\e)" statement {"else" statement <<el=#2;>>} <<#0 = #(#[If], #3, #5, el);>> | "while" "\e(" expr "\e)" statement <<#0 = #(#[While], #3, #5);>> | "return" expr ";" <<#0 = #(#[Return], #2);>> | "print" expr ";" <<#0 = #(#[Print], #2);>> ; expr : VAR "="^ expr | expr0 ; expr0 : expr1 ( ( "=="^ | "!="^ ) expr1 )* ; expr1 : expr2 ( ( "\e+"^ | "\e-"^ ) expr2 )* ; expr2 : expr3 ( ( "\e*"^ | "/"^ ) expr3 )* ; .cE .cB expr3 : { "\e-"^ } expr4 ; expr4! : <<AST *f=NULL, *arg=NULL;>> STRING <<#0 = #[STRING, $1.text];>> | VAR <<#0 = #[VAR, $1.text];>> | ( FUNC <<f = #[FUNC, $1.text];>> | WORD <<f = #[FUNC, $1.text];>> ) "\e(" { expr <<arg=#1;>> } "\e)" <<#0 = #(f, arg);>> | "\e(" expr "\e)" <<#0 = #2;>> ; #token WORD "[a-zA-Z]+" <<{ Sym *p = zzs_get(LATEXT(1)); if ( p != NULL ) NLA = p->token; }>> .cE .NH 2 .iH Example translations .PP To illustrate how functions get translated, consider the following \fIsc\fP program: .cB var a; main(n) { var b; } .cE .LP Our translator would create trees represented by: .cB ( DefineVar WORD ) ( DefineFunc FUNC ( DefineVar WORD ) ( DefineVar WORD ) SLIST ) .cE .LP where \f(CWWORD\fP represents the variables found on the input stream. \f(CWSLIST\fP has no child because there were no statements in the function. .PP Some example statement transformations follow. .LP \f(CWif\fP: .cB if ( b == "hello" ) print n; .cE .LP translates to .cB ( If ( Equal b "hello" ) ( Print n ) ) .cE .LP \f(CWwhile\fP: .cB while ( i != "10" ) { print i; i = i+"1"; } .cE .LP translates to .cB ( While ( NotEqual i "10" ) ( SLIST ( Print i ) ( = i ( Add i "1" ) ) ) ) .cE .PP Expressions have the operator at the root and the operands as children. For example, .cB i = a + "3" * "9" + f("jim"); .cE .LP would be translated to: .cB ( = i ( Add ( Add a ( Mul "3" "9" ) ) ( f "jim" ) ) ) .cE .LP which looks like: .cB = | v i -> Add | v Add --------------> f | | v v a -> Mul "jim" | v "3" -> "9" .cE .NH .iH Code generation from intermediate form .PP To translate the intermediate form (IF) to the stack code described above, we present the following simple little code generator. We use no devious C programming to make this fast or nifty because this document concentrates on how PCCTS grammars are developed not on how to write spectacular C code. .NH 2 .iH Modification of tree construction grammar .PP To convert from the translator above that printed out trees in lisp format, we add the following definitions needed as arguments to \f(CWgen()\fP in the \f(CW#header\fP action. .cB #define DONT_CARE 0 #define SIDE_EFFECTS 1 #define VALUE 2 .cE .PP Also, we replace all calls to \f(CWlisp(\fItree\fP)\fR with \f(CWgen(\fItree\fP, DONT_CARE)\fP. .NH 2 .iH Code generator .PP Function \f(CWgen()\fP presented below accepts an AST as an argument and walks the tree generating output when necessary. You will notice all the header information needed by this file, \f(CWcode.c\fP. This illustrates one of the minor inconveniences of PCCTS. Any C file that needs to access PCCTS variables, rules, etc... must include all of the things that PCCTS generates at the head of all PCCTS generated files. .PP The code generator accepts an evaluation mode argument so that it can remove some unnecessary code. For example, .cB main() { "3" + f(); } f() {} .cE .LP would result in .cB #include "sc.h" _main() { BEGIN; CALL(f); POP; END; } f() { BEGIN; END; } .cE .LP Code to push and add \f(CW"3"\fP was eliminated because it had no side effects. Only function calls have side effects in \fIsc\fP. .cB /* * Simple code generator for PCCTS 1.0x tutorial * * Terence Parr * Purdue University * Electrical Engineering * March 19, 1992 */ #include <stdio.h> #include "sym.h" #include "charbuf.h" #define AST_FIELDS int token, level, offset; char str[D_TextSize]; #define zzcr_ast(ast,attr,tok,text) create_ast(ast,attr,tok,text) #include "antlr.h" #include "ast.h" #include "tokens.h" #include "dlgdef.h" #include "mode.h" #define GLOBAL 0 #define PARAMETER 1 #define LOCAL 2 .cE .cB static int LabelNum=0, GenHeader=0; #define DONT_CARE 0 #define SIDE_EFFECTS 1 #define VALUE 2 .cE .cB gen(t,emode) AST *t; int emode; /* evaluation mode member { SIDE_EFFECTS, VALUE, DONT_CARE } */ { AST *func, *arg, *locals, *slist, *a, *opnd1, *opnd2, *lhs, *rhs; int n; .cE .cB if ( t == NULL ) return; if ( !GenHeader ) { printf("#include \e"sc.h\e"\en"); GenHeader=1; } switch ( t->token ) { .cE .cB case DefineFunc : func = zzchild(t); arg = zzchild( zzsibling(func) ); locals = zzchild( zzsibling( zzsibling(func) ) ); slist = zzsibling( zzsibling( zzsibling(func) ) ); if ( strcmp(func->str, "main") == 0 ) printf("_%s()\en{\en\etBEGIN;\en", func->str); else printf("%s()\en{\en\etBEGIN;\en", func->str); for (a=locals; a!=NULL; a = zzsibling(a)) printf("\etLOCAL;\en"); gen( slist, DONT_CARE ); printf("\etEND;\en}\en"); break; .cE .cB case SLIST : for (a=zzchild(t); a!=NULL; a = zzsibling(a)) { gen( a, EvalMode(a->token) ); if ( a->token == Assign ) printf("\etPOP;\en"); } break; .cE .cB case DefineVar : printf("SCVAR %s;\en", zzchild(t)->str); break; .cE .cB case Mul : case Div : case Add : case Sub : case Equal : case NotEqual : opnd1 = zzchild(t); opnd2 = zzsibling( opnd1 ); gen( opnd1, emode ); gen( opnd2, emode ); if ( emode == SIDE_EFFECTS ) break; switch ( t->token ) { case Mul : printf("\etMUL;\en"); break; case Div : printf("\etDIV;\en"); break; case Add : printf("\etADD;\en"); break; case Sub : if ( opnd2==NULL ) printf("\etNEG;\en"); else printf("\etSUB;\en"); break; case Equal : printf("\etEQ;\en"); break; case NotEqual : printf("\etNEQ;\en"); break; } break; .cE .cB case If : a = zzchild(t); gen( a, VALUE ); /* gen code for expr */ n = LabelNum++; printf("\etBRF(iflabel%d);\en", n); a = zzsibling(a); gen( a, EvalMode(a->token) ); /* gen code for statement */ printf("iflabel%d: ;\en", n); break; .cE .cB case While : a = zzchild(t); n = LabelNum++; printf("wbegin%d: ;\en", n); gen( a, VALUE ); /* gen code for expr */ printf("\etBRF(wend%d);\en", n); a = zzsibling(a); gen( a, EvalMode(a->token) ); /* gen code for statement */ printf("\etBR(wbegin%d);\en", n); printf("wend%d: ;\en", n); break; .cE .cB case Return : gen( zzchild(t), VALUE ); printf("\etRETURN;\en"); break; .cE .cB case Print : gen( zzchild(t), VALUE ); printf("\etPRINT;\en"); break; .cE .cB case Assign : lhs = zzchild(t); rhs = zzsibling( lhs ); gen( rhs, emode ); printf("\etDUP;\en"); if ( lhs->level == GLOBAL ) printf("\etSTORE(%s);\en", lhs->str); else printf("\etLSTORE(%d);\en", lhs->offset); break; .cE .cB case VAR : if ( emode == SIDE_EFFECTS ) break; if ( t->level == GLOBAL ) printf("\etPUSH(%s);\en", t->str); else printf("\etLPUSH(%d);\en", t->offset); break; .cE .cB case FUNC : gen( zzchild(t), VALUE ); printf("\etCALL(%s);\en", t->str); break; .cE .cB case STRING : if ( emode == SIDE_EFFECTS ) break; printf("\etSPUSH(%s);\en", t->str); break; .cE .cB default : printf("Don't know how to handle: %s\en", zztokens[t->token]); break; } } .cE .cB EvalMode( tok ) int tok; { if ( tok == Assign || tok == If || tok == While || tok == Print || tok == Return ) return VALUE; else return SIDE_EFFECTS; } .cE .NH 2 .iH Makefile for code generator .PP This makefile is roughly the same as before except that we need to tell PCCTS to generate tree information and we need to link in the code generator. .cB CFLAGS= -I../h GRM=tut4 SRC=scan.c $(GRM).c err.c sym.c code.c OBJ=scan.o $(GRM).o err.o sym.o code.o tutorial: $(OBJ) $(SRC) cc -o $(GRM) $(OBJ) $(GRM).c parser.dlg : $(GRM).g antlr -k 2 -gt $(GRM).g scan.c : parser.dlg dlg -C2 parser.dlg scan.c .cE .pn 0 .bp .PX