home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
OS/2 Shareware BBS: 10 Tools
/
10-Tools.zip
/
intercal.zip
/
doc
/
THEORY
< prev
Wrap
Text File
|
1996-06-20
|
4KB
|
93 lines
INTERCAL IMPLEMENTOR'S NOTES
by ESR
The C-INTERCAL compiler has a very conventional implementation using
YACC and LEX (latterly, bison and flex). It generates C, which
is then passed to your C compiler.
Lexical issues
Note that the spectacular ugliness of INTERCAL syntax requires that
the lexical analyzer have two levels. One, embedded in the input()
function, handles the backquote and bang constructs, and stashes the
input line away in a buffer for the splat construct's benefit. The
upper level is generated by lex(1) and does normal tokenizing for YACC.
In order to splatter erroneous statements correctly, the generated code has
to track both logical and physical lines. That's the reason for the
`lineno' variable in the generated C, it's actually tracking physical lines.
Numeral tokens for input are defined in a symbol table (numerals.c)
that is directly included in the run-time library module (cesspool.c).
This avoids having to put the the size of the numerals array in an
extern. To add new numeral tokens, simply put them in the numerals
initializer.
Compilation
The parser builds an array of tuples, one for each INTERCAL statement. Most
tuples have node trees attached. Once all tuples have been generated,
the compile-time checker and optimizer phases can do consistency checks
and expression-tree rewrites. Finally, the tuples are ground out as C code
by the emit() function.
Calculations are fully type-checked at compile time; they have to be because
(as I read the manual) the 16- and 32-bit versions of the unary ops do
different things. The only potential problem here is that the typechecker
has to assume that :m ~ :n has the type of :n (32-bit) even though the
result might fit in 16 bits. At run-time everything is calculated in 32
bits. When INTERCAL-72 was designed 32 bits was expensive; now it's cheap.
Really, the only reason for retaining a 16-bit type at all is for the
irritation value of it (yes, C-INTERCAL *does* enforce the 16-bit limit
on constants).
Labels are mapped to tuple indices (logical line numbers) in the code
checker, just before optimization.
The optimizer does full recursive folding of all constant expressions
at compile time (evaluating away all the irritating little kluges you
have to use to generate 32-bit constants). It also checks for known
INTERCAL idioms for `test for equality', `test for nonzeroness', and
the C logical operators &, |, ^, and ~.
Code Generation
Each line of INTERCAL is translated into a C if()-then; the guard part
is used to implement abstentions and RESUMES, and the arm part
translates the `body' of the corresponding INTERCAL statement.
The generated C code is plugged into the template file ick-wrap.c
inside main(). It needs to be linked with cesspool.o, fiddle.o and
lose.o (these are in libick.a, with the support for runtime switches,
arrgghh.o). Cesspool.o is the code that implements the storage
manager; fiddle.o implements the INTERCAL operators; and lose.o is the
code that generates INTERCAL's error messages. The routine arrgghh.o
parses the runtime command line arguments.
The abstain[] array in the generated C is used to track line and label
abstentions; if member i is on, the statement on line i is being
abstained from. If gerund abstentions/reinstatements are present in
the code, a second array recording the type of each statement in
generated into the runtime, and used to ensure that these operations
are translated into abstention-guard changes on all appropriate line numbers.
RESUMES are implemented with a branch to a generated switch statement
that executes a goto to the appropriate label. If there are no RESUMES,
no such switch is generated.
The compiler places a simple label at the location of each COME FROM
in the program, while all of the machinery for checking for abstention
and conditional execution and actually performing the jump is placed
immediately after the code for the target statement.
Credits
I wrote the first version of this compiler over a weekend using a
pre-ANSI C compiler. It worked, but it wasn't pretty. Louis Howell
added the array support later; he also torture-tested the COME FROM
implementation by actually using it for the life2.i program included
in this distribution, and fixed some bugs. Brian Raiter did a
much-needed delinting while porting it for ANSI C, flex and Linux. He
also improved the lexical analyzer's line tracking.