home *** CD-ROM | disk | FTP | other *** search
- INTERCAL IMPLEMENTOR'S NOTES
- by ESR
-
- The C-INTERCAL compiler has a very conventional implementation using
- YACC and LEX (latterly, bison and flex). It generates C, which
- is then passed to your C compiler.
-
- Lexical issues
-
- Note that the spectacular ugliness of INTERCAL syntax requires that
- the lexical analyzer have two levels. One, embedded in the input()
- function, handles the backquote and bang constructs, and stashes the
- input line away in a buffer for the splat construct's benefit. The
- upper level is generated by lex(1) and does normal tokenizing for YACC.
-
- In order to splatter erroneous statements correctly, the generated code has
- to track both logical and physical lines. That's the reason for the
- `lineno' variable in the generated C, it's actually tracking physical lines.
-
- Numeral tokens for input are defined in a symbol table (numerals.c)
- that is directly included in the run-time library module (cesspool.c).
- This avoids having to put the the size of the numerals array in an
- extern. To add new numeral tokens, simply put them in the numerals
- initializer.
-
- Compilation
-
- The parser builds an array of tuples, one for each INTERCAL statement. Most
- tuples have node trees attached. Once all tuples have been generated,
- the compile-time checker and optimizer phases can do consistency checks
- and expression-tree rewrites. Finally, the tuples are ground out as C code
- by the emit() function.
-
- Calculations are fully type-checked at compile time; they have to be because
- (as I read the manual) the 16- and 32-bit versions of the unary ops do
- different things. The only potential problem here is that the typechecker
- has to assume that :m ~ :n has the type of :n (32-bit) even though the
- result might fit in 16 bits. At run-time everything is calculated in 32
- bits. When INTERCAL-72 was designed 32 bits was expensive; now it's cheap.
- Really, the only reason for retaining a 16-bit type at all is for the
- irritation value of it (yes, C-INTERCAL *does* enforce the 16-bit limit
- on constants).
-
- Labels are mapped to tuple indices (logical line numbers) in the code
- checker, just before optimization.
-
- The optimizer does full recursive folding of all constant expressions
- at compile time (evaluating away all the irritating little kluges you
- have to use to generate 32-bit constants). It also checks for known
- INTERCAL idioms for `test for equality', `test for nonzeroness', and
- the C logical operators &, |, ^, and ~.
-
- Code Generation
-
- Each line of INTERCAL is translated into a C if()-then; the guard part
- is used to implement abstentions and RESUMES, and the arm part
- translates the `body' of the corresponding INTERCAL statement.
-
- The generated C code is plugged into the template file ick-wrap.c
- inside main(). It needs to be linked with cesspool.o, fiddle.o and
- lose.o (these are in libick.a, with the support for runtime switches,
- arrgghh.o). Cesspool.o is the code that implements the storage
- manager; fiddle.o implements the INTERCAL operators; and lose.o is the
- code that generates INTERCAL's error messages. The routine arrgghh.o
- parses the runtime command line arguments.
-
- The abstain[] array in the generated C is used to track line and label
- abstentions; if member i is on, the statement on line i is being
- abstained from. If gerund abstentions/reinstatements are present in
- the code, a second array recording the type of each statement in
- generated into the runtime, and used to ensure that these operations
- are translated into abstention-guard changes on all appropriate line numbers.
-
- RESUMES are implemented with a branch to a generated switch statement
- that executes a goto to the appropriate label. If there are no RESUMES,
- no such switch is generated.
-
- The compiler places a simple label at the location of each COME FROM
- in the program, while all of the machinery for checking for abstention
- and conditional execution and actually performing the jump is placed
- immediately after the code for the target statement.
-
- Credits
-
- I wrote the first version of this compiler over a weekend using a
- pre-ANSI C compiler. It worked, but it wasn't pretty. Louis Howell
- added the array support later; he also torture-tested the COME FROM
- implementation by actually using it for the life2.i program included
- in this distribution, and fixed some bugs. Brian Raiter did a
- much-needed delinting while porting it for ANSI C, flex and Linux. He
- also improved the lexical analyzer's line tracking.
-
-