Emacs: Please use -*- Text -*- mode.  Thank you.
$Id: porting.guide,v 1.26 1993/11/10 22:47:25 gjr Exp $
Copyright (c) 1991-1993 Massachusetts Institute of Technology
			LIAR PORTING GUIDE
			     *DRAFT*
Notes: 
This porting guide applies to Liar version 4.91, but most of the
relevant information has not changed for a while, nor is it likely to
change in major ways any time soon.
This is an early version of this document, and the order of
presentation leaves a lot to be desired.  In particular, the document
does not follow a monotonic progression, but is instead organized in a
dictionary-like manner.  We recommend that you read through the whole
document twice since some important details, apparently omitted, may
have their explanation later in the document.  When reading the
document for the second time, you will have an idea of where this
other information is to be found, if it is present at all.  We have
attempted to insert sufficient forward pointers to make the first
reading bearable, but we may have missed some.
This document implicitly assumes that you are trying to build the
compiler under Unix.  The only compiler sources that depend on Unix
are the sources that contain the pathnames of other files.
The syntax could easily be changed for other file systems.
This document uses Unix pathname syntax and assumes a hierarchical
file system, but it should easy to map these directories to a
different file system.  The DOS runtime library accepts forward
slashes (#\/) as a substitute for backward slashes (#\\), so the
scripts are shared between Unix and DOS.
This document also assumes that you are familiar with MIT Scheme, C,
and the C preprocessor.  It does not describe Liar in detail, and it
does not cover many machine-independent portions at all.  It is
intended to guide a programmer familiar with (MIT) Scheme and C in the
task of porting the compiler to a new architecture, not in modifying
the compiler in any other way.
For questions on Liar not covered by this document, or questions about
this document, contact ``liar-implementors@zurich.ai.mit.edu''.
Text tagged by ==> is intended primarily for the compiler developers.
Good luck!
		Acknowledgments
Liar is the work of many people.  The current version is mostly the
effort of Chris Hanson and Bill Rozas, with significant contributions
from Mark Friedman and Jim Miller.  Arthur Gleckler, Brian LaMacchia,
and Henry Wu have also contributed to the current version of Liar.
Many other people have offered suggestions and criticisms.
The current Liar might never have existed had it not been for the
efforts and help of the now-extinct BBN Butterfly Lisp group.  That
group included Don Allen, Seth Steinberg, Larry Stabile, and Anthony
Courtemanche.  Don Allen, in particular, babysat computers to
painstakingly bootstrap the first version of the then new Liar.
Many of the ideas and algorithms used in Liar, particularly at the RTL
level, are taken from the GNU C compiler, written by Richard Stallman
and many others.
This document was written by Bill Rozas, with modifications and hints
from the people listed above.  The section on the MIT Scheme package
system was written by Arthur Gleckler.
		0. Introduction and a brief walk through Liar.
Liar translates Scode as produced by the procedure SYNTAX, or by the
file syntaxer (SF, for syntax file) into compiled code objects.  The
Scode is translated into a sequences of languages, the last of which
is the binary representation of the compiled code.
The sequence of external languages manipulated is
    Characters --READ--> 
    S-Expressions --SYNTAX--> 
    Scode --COMPILE-SCODE--> 
    compiled code objects.
Liar is a multi-pass compiler, where each major pass has multiple
subpasses.  Many of the subpasses do not manipulate the whole code
graph, but instead follow threads that link the relevant parts of the
graph.
COMPILE-SCODE is the main entry point to Liar, although CBF (for
compile bin file) is the usual entry point.  CBF uses COMPILE-SCODE,
and assumes that the code has been syntaxed by SF producing a .bin
file, and dumps the resulting compiled code into a .com file.  CF (for
compile file) invokes SF and then CBF on a file name argument.
The internal sub-languages used by Liar are:
    Scode --FGGEN--> 
    Flow-graph --RTLGEN--> 
    RTL (Register Transfer Language) --LAPGEN-->
    LAP (Lisp assembly program) --ASSEMBLER--> 
    bits --LINK--> 
    compiled code object.
where FGGEN, etc., are some of the major passes of the compiler.  
The remaining major passes are FGOPT (the flow-graph optimizer), and
RTLOPT (the RTL-level optimizer).  RTL-level register allocation is
performed by RTLOPT, and hardware-level register allocation is
performed by LAPGEN.  ASSEMBLER branch-tensions the output code.
Branch-tensioning is described in a later section.  LINK constructs a
Scheme compiled code object from the bits representing the code and
the fixed data that the compiled code uses at runtime.
compiler/toplev.scm contains the top-level calls of the compiler and
its pass structure.
The ``.com'' files contain compiled code objects, which are linked
further at load time.
	0.1.  Liar's package structure
This section assumes that you are familiar with the MIT Scheme package
system.  If you are not, there is a small description in an appendix
to this document.
The package structure of the compiler reflects the pass structure and
is specified in compiler/machines/port/comp.pkg, where port is the
name of a machine (bobcat, vax, spectrum, mips, i386, alpha, etc.).
The major packages are:
(COMPILER):
	Utilities and data structures shared by most of the compiler.
(COMPILER MACROS):
	Syntax extensions used by the compiler to define language
translation rules.
(COMPILER TOP-LEVEL):
	Top level pass structure of the compiler.
(COMPILER FG-GENERATOR):
	This package contains the flow-graph generator, FGGEN.
(COMPILER FG-OPTIMIZER):
	This package contains the flow-graph analyzer and optimizer,
FGOPT. It has many sub-packages to contain the individual sub-passes.
(COMPILER RTL-GENERATOR):
	This package contains the flow-graph to RTL translator,
RTLGEN. It contains a few sub-packages for the major kinds of
flow-graph operations.
(COMPILER RTL-OPTIMIZER):
	This package contains most of the RTL-level optimizer, RTLOPT.
It has various sub-packages corresponding to some of its sub-passes.
(COMPILER RTL-CSE):
	This package contains the RTL-level common (redundant)
subexpression eliminator pass of the RTL-level optimizer.
(COMPILER LAP-SYNTAXER):
	This package contains most of the machine-dependent parts of
the compiler and the back end utilities.  In particular, it contains
the RTL -> LAP translation rules, and the LAP -> bits translation
rules, i.e. the LAPGEN and ASSEMBLER passes respectively.  It has some
sub-packages for various major utilities (linearizer, map-merger,
etc.).
(COMPILER ASSEMBLER):
	This package contains most of the machine-independent portion of
the assembler.  In particular, it contains the bit-assembler, i.e.
the portion of the assembler that accumulates the bit strings produced
by ASSEMBLER and performs branch-tensioning on the result.
(COMPILER DISASSEMBLER):
	This package contains the disassembler.  It is not needed for
ordinary compiler operation, but is useful for low-level debugging,
and debugging of the compiler and assembler.
	0.2. Liar's sources' directory structure
The directory structure loosely reflects the pass structure of the
compiler.  compiler/machines/port/comp.pkg declares the packages and
the files that constitute them.
compiler/back:
	This directory contains the machine-independent portion of the
back end.  It contains bit-string utilities, symbol table utilities,
label management procedures, the hardware register allocator, and the
top-level assembler calls.
compiler/base:
	This directory contains common utilities used by the whole
compiler, and the top level procedures provided by the compiler.
compiler/etc:
	This directory contains utilities used for cross-compiling,
and checking re-compilations.
compiler/fggen:
	This directory contains the front end of the compiler.  The
code in this directory translates Scode into a flow-graph used by the
analyzer and optimizer.
compiler/fgopt:
	This directory contains the flow-graph analyzer and optimizer.
compiler/rtlbase:
	This directory contains utilities used by the RTL generator and
optimizer.
compiler/rtlgen:
	This directory contains the code that translates the
flow-graph into register transfer language (RTL).
compiler/rtlopt:
	This directory contains the RTL-level optimizer.  It contains
code to perform lifetime analysis, redundant subexpression
elimination, elimination of dead code, etc.
compiler/machines:
	This directory contains a subdirectory for each port of the
compiler.  Each of these subdirectories contains the port (machine)
dependent files of the compiler.
compiler/machines/port:
	This directory contains the definition of machine parameters,
the assembler rules, the disassembler, and RTL to assembly-language
rules for the port.  
All machine-dependent files are in compiler/machines/port and this is
the only directory that needs to be written to port the compiler to a
new architecture.
		1. Liar's runtime model.
Liar does not open code (inline) all operations that the code would
need to execute.  In particular, it leaves error handling and
recovery, interrupt processing, initialization, and invocation of
unknown procedures, to a runtime library written in assembly language.
Although this runtime library need not run in the context of the
CScheme interpreter, currently the only implementation of this library
runs from the interpreter and uses it for many of its operations.
In other words, code generated by Liar does not depend on the
interpreter directly, but indirectly through the runtime library.  It
does depend on the ability to invoke CScheme primitives at runtime,
some of which (eval, etc.)  require the interpreter to be present.  It
should be possible, however, to provide an alternate runtime library
and primitive set that would allow code produced by Liar to run
without the interpreter being present. (Foot) We often toy with this
idea.
On the other hand, since the only instance of the runtime library is
that supplied by the interpreter, Liar currently assumes that the
Scheme object representation is the same as that used by the
interpreter, but this is relatively well abstracted and should not be
hard to change. (Foot) Famous last words.
The runtime library is currently implemented by
microcode/cmpaux-port.m4 and microcode/cmpint.c .  The files
cmpaux.txt and cmpint.txt document these files.  The documentation
files may be found in the microcode or the documentation directories.
microcode/cmpaux-port.m4 is an assembly language machine-dependent file
that allows compiled Scheme to call the C-written library routines and
vice versa.  It is described in cmpaux.txt.
microcode/cmpint.c defines the library in a machine-independent way,
but requires some information about the port and this is provided in
microcode/cmpint2.h, a copy (or link) of the appropriate
microcode/cmpint-port.h file.  The microcode/cmpint-port.h files are
described in cmpint.txt .
cmpint.txt also describes many of the data structures that the
compiled code and runtime library manipulate, and defines some of the
concepts needed to understand the compiler.
The rest of this document assumes that you are using the runtime
library provided by the CScheme interpreter.  If you wish to use Liar
as a compiler for stand-alone programs, a lot of work needs to be
done, and this work is not described here.  Perhaps we will do it in
the future.
If you have not yet read cmpaux.txt and cmpint.txt, please do so
before reading the rest of this document.
You should probably also read [1] and [2] for a discussion of some of
the implementation issues.
		2. Preliminary Observations
	2.1. Constraints on architectures to which Liar can be ported:
- Liar assumes that the target machine is a general-register machine.
That is, operations are based on processor registers, and there is a
set of general-purpose registers that can be used interchangeably.  It
would be hard to port Liar to a pure stack machine, a graph-reduction
engine, a Turing machine, or a 4-counter machine.  However, the
register set required is not huge.  Liar has been ported to the
386/486 architecture which only has eight registers, four of which are
reserved for implementation quantities (e.g. stack pointer and free
pointer) and four of which are left to the register allocator.
- Liar currently assumes that floating-point registers and integer
registers are separate or the same size.  In other words, currently
Liar cannot handle quantities that need multiple registers to hold
them.  For example, on the DEC VAX and the Motorola 88100, there is a
single set of registers, and double floating point values (the only
kind used by Scheme) take two consecutive integer registers.  The
register allocator in Liar does not currently handle this situation,
and thus, floating-point operations are not currently open-coded on
the VAX.
- Liar assumes that the target machine has an address space that is
flat enough that all Scheme objects can be addressed uniformly.  In
other words, segmented address spaces with segments necessarily
smaller than the Scheme runtime heap (i.e. Intel 286) will make Liar
difficult to port.
- Liar assumes that instructions and data can coexist in the same
address space, and that new code objects that contain machine
instructions can be dynamically allocated from and written to the heap
(memory pool) used to allocate all other Scheme objects.  This
assumption in Liar conflicts with some current hardware that has
programmer-visible separate (split) data and instruction caches --
that is, there are two different caches, one used by the processor for
instruction references and the other for data references, and storing
data into memory only updates the data cache, but not the instruction
cache, and perhaps not even memory.  Most of the problems this causes
can be resolved if the user is given enough control over the hardware
caches, i.e. some way to flush or synchronize them.  Furthermore, a
true Harvard architecture, with separate code and data memories, would
be hard to accommodate without relatively major changes.  At some
point in the future we may write a C back end for Liar that handles
this case, since C code space and data space are typically kept
separate by the operating system.  Whatever technique the C back end
may use can probably be emulated by architectures with such a strong
division, although it is likely to be expensive.
	2.2. Some implementation decisions that may make your job
harder or impair the quality of the output code:
- Liar generates code that passes arguments to procedures on a stack.
This decision especially affects the performance on load-store
architectures, common these days.  Liar may be changed in the future
to generate code that passes arguments in registers because most
modern machines have large register sets and memory-based operations
are slower than register-based operations even when the memory
locations have been cached.
- Liar assumes that pushing and popping elements from a stack is
cheap.  Currently Liar does not attempt to bump the stack pointer once
per block of operations, but instead bumps it once per item.  This is
expensive on many modern machines where pre-and-post incrementing are
not supported by the hardware.  This may also change in the
not-too-far future.
- Liar assumes that it is cheap to compute overflow conditions on
integer arithmetic operations.  Generic arithmetic primitives have the
frequent fixnum (small integer) case open coded, and the overflow and
non-fixnum cases coded out of line, but this depends on the ability of
the code to detect and branch on overflow conditions cheaply.  This is
not true of some modern machines, notably MIPS processors.  If your
processor does not make branching on such conditions reasonably cheap,
you may have to use code similar to that used in the MIPS port.  The
MIPS processor has trapping and non-trapping arithmetic instructions.
The trapping arithmetic instructions trap on overflow, but the trap
recovery code is typically so expensive that the generated code
computes the overflow conditions explicitly.
- Liar assumes that extracting, inserting, and comparing bit-fields is
relatively cheap.  The current object representation for Liar
(compatible with the interpreter) consists of using a number of bits
(usually 6) in the most significant bit positions of a machine word as
a type tag, and the rest as the datum, usually an encoded address.
Not only must extracting, comparing, and inserting these tags be
cheap, but decoding the address must be cheap as well.  These
operations are relatively cheap on architectures with bit-field
instructions, but more expensive if they must be emulated with bitwise
boolean operations and shifts, as on the MIPS R3000.  Decoding a datum
into an address may involve inserting segment bits in some of the
positions where the tag is placed, further increasing the dependency
on cheap bit-field manipulation.
- The CScheme interpreter uses a particularly poor representation for
fixnums, forcing Liar's hand.  Fixnums are suitably small integers.
They are immediate objects with a particular tag.  This tag was not
wisely chosen, making fixnum operations more expensive than they need
to be.  This tag may be changed in the future.
- The CScheme interpreter manipulates a stack that grows in a fixed
direction (from higher to lower addresses).  On many modern machines,
there are no special instructions to deal with the stack, so the
decision is arbitrary.  On some machines, however, there are special
instructions to pop and push elements on the stack.  Liar may not be
able to use these instructions if the machine's preferred direction of
stack growth does not match the interpreter's.
	2.3. Emulating an existing port.
The simplest way to port Liar is to find an architecture to which Liar
has already been ported that is sufficiently similar to the target
architecture that most of the code can be written by copying or
trivially translating existing code.  In particular, if the
architectures are really close, there may be no need for
architecture-specific additional tuning.
The compiler is primarily developed on Motorola MC68020 processors, so
this is the best-tuned version, and the other ports are not very well
tuned or not tuned at all.  If you improve an existing port, please
share the improvements by notifying liar-implementors.
- If you have a Vax-like CISC machine, you can try starting from the
Vax, the Motorola MC68020, or the i386 ports.  The Vax and i386 ports
were written by starting from the MC68020 port.  This is probably the
best solution for some architectures like the NS32000, and perhaps
even the IBM 370.
- If you have an ``enlarged'' RISC processor, with some complex
addressing modes, and bit-field instructions, you may want to start by
looking at the Spectrum (HP Precision Architecture) port.  This is
probably a good starting point for the Motorola 88000 and for the IBM
RS/6000.
- If you have a bare-bones RISC processor, similar to a MIPS R3000
processor, you may want to start from this port.  Since the MIPS R3000
is a minimalist architecture, it almost subsumes all other RISCs, and
may well be a good starting point for all of them.  This is probably a
good starting point for the Sparc.  The MIPS port used the Spectrum
port as its model, and the Alpha port used the MIPS port as its model.
- If you have a machine significantly different from those listed
above, you are out of luck and will have to write a port from scratch.
For example, the port to the Intel 386+387/486 uses some of the
concepts and code from ports to other CISCs, but due to the
floating-point stack architecture (instead of register-based), the
floating-point stack management is different (but not very good).
Of course, no architecture is identical to any other, so you may want
to mix and match ideas from many of the ports already done, and it is
probably a good idea for you to compare how the various ports solve
the various problems.
		3. Compiler operation, LAPGEN rules and ASSEMBLER rules.
The front end of the compiler translates Scode into a flow-graph that
is then translated into RTL.  The back end does machine-independent
optimization on the RTL, generates assembly language (in LAP format)
from the RTL, and assembles the resulting bits.
Although RTL is a machine-independent language, the particular RTL
generated for a given program will vary from machine to machine.  The
RTL can vary in the following ways:
- RTL is a language for manipulating the contents of conceptual
registers.  RTL registers are divided into ``pseudo registers'' and
``machine registers''.  Machine registers represent physical hardware
registers, some of which have been reserved and given fixed meanings
by the port (stack pointer, value register, etc.)  while
pseudo-registers represent conceptual locations that contain
quantities that will need physical registers or memory locations to
hold them in the final translation.  An RTL pseudo register can be
mapped to any number of physical registers in the final translation,
and may ``move'' between physical registers.  In order to make the RTL
more homogeneous, the RTL registers are not distinguished
syntactically in the RTL, but are instead distinguished by their value
range.  Machine registers are represented as the N lowest numbered RTL
registers (where N is the number of hardware registers), and all
others are pseudo registers.  Since some RTL instructions explicitly
mention machine registers and these (and their numbers) vary from
architecture to architecture, the register numbers in an RTL program
will vary depending on the back end in use.  Machine registers may be
divided into separate classes (e.g.  address, data, and floating-point
registers) that can contain different types of values.  Pseudo
registers are not distinguished a-priori, but the values stored in
them must be consistent.  For example, if a floating point value is
stored into a particular pseudo register, the register can only be
mapped to floating-point machine registers, and non-floating-point
values cannot be stored in it.
- RTL assumes a load-store architecture, but can accommodate
architectures that allow memory operands and rich addressing modes.
RTL is constructed by generating statements that include relatively
complex expressions.  These expressions may represent multiple memory
indirections or other operations.  An RTL simplifier runs over this
initial RTL, assigning these intermediate quantities to new pseudo
registers and rewriting the original statements to manipulate the
original and new pseudo-registers.  Typically this simplification
results in a sequence of assignments to pseudo-registers with single
operations per assignment and where the only memory operations are
load and store.  However, this simplification pass is controlled by
the port.  The port supplies a set of rewriting rules to the
simplifier that causes the simplifier to leave more complex
expressions untouched, or to be simplified in different ways,
depending on the availability of memory operands or richer addressing
modes.  Since these rules vary from port to port, the final RTL
differs for the different ports.  The simplification process is also
controlled by the availability of various rules in the port, and ports
for richer instruction sets may require less simplification because
hardware instructions and addressing modes that encode more
complicated RTL patterns are directly available.
- The open coding (inlining) of Scheme primitives is
machine-dependent.  On some machines, for example, there is no
instruction to multiply integers, and it may not be advantageous to
open code the multiplication primitive.  The RTL for a particular
program may reflect the set of primitive operations that the back end
for the port can open code.
The resulting RTL program is represented as a control flow-graph where
each of the nodes has an associated list of RTL statements.  The edges
in the graph correspond to conditional and unconditional branches in
the code, and include a low-level predicate used to choose between the
alternatives.  The graph is linearized after the instructions have
been translated to LAP.  There is a debugging RTL linearizer used by
the RTL output routine.
Besides assignments and tests, the RTL has some higher level
statements that correspond to procedure headers, continuation (return
address) headers, etc.  Thus an RTL program is made mostly of register
to register operation statements, a few conditional tests, and a few
higher-level ``glue'' statements.
Once a program has been translated to RTL, the RTL code is optimized
in a machine-independent way by minimizing the number of RTL
pseudo-registers used, removing redundant subexpressions, eliminating
dead code, and by using various other techniques.
The RTL program is then translated into a Lisp-format
assembly-language program (LAP).  Hardware register allocation occurs
during this translation.  The register allocator is
machine-independent and can accommodate different register classes,
but does not currently accommodate register pairs (this is why floating
point operations are not currently open coded on the Vax).
The register allocator works by considering unused machine registers
(those not reserved by the port) to be a cache for the
pseudo-registers.  Thus a particular pseudo-register may map into
multiple machine registers of different types, and these aliases are
invalidated as the pseudo-registers are written or the corresponding
machine registers reused.  Thus the most basic facility that the
register allocator provides is a utility to allocate an alias of a
particular type for a given pseudo-register.
The port defines the types and numbers of machine registers and the
subset that is available for allocation, and the register allocator
manages the associations between the pseudo-registers and their
aliases and the set of free machine registers.  The register allocator
also automatically spills the contents of machine registers to memory
when pressed for machine registers, and reloads the values when
necessary.
Thus the resulting LAP program is the collection of the code issued by
the rules that translate RTL into LAP, the instructions issued behind
the scenes by the register allocator, and the instructions used to
linearize the control flow graph.
The back end provides a set of rules for the translation of RTL to
LAP, and a set of procedures that the register allocator and the
linearizer use to generate such instructions.  The rules are written
using a special syntax that creates entries in a data base used by a
pattern matcher to translate the RTL into LAP.
The linear LAP is then translated into binary form by using the same
pattern matcher with a different set of rules.  These rules define the
translation between assembly language and machine language for the
architecture.  Most of these rules output bit strings to be collected
together, but some output a set of directives to the bit-level
assembler to define labels, or choose between alternative encoding of
the fields depending on the final value of a displacement.  These
alternative encodings are typically used for PC-relative quantities.
The machine-independent bit-assembler collects all the bits together
and keeps track of a virtual program counter used to determine the
distance between instruction fields.  A relaxation process is used to
reduce the size of the resulting encoding (to tension branches, i.e.
to choose the smallest encoding that will do the job when there are
alternatives).
Since most of the LAPGEN rules generate almost fixed assembly language,
where the only difference is the register numbers, most of the LAP to
bits translation can be done when the compiler is compiled.  A
compiler switch, ``COMPILER:ENABLE-EXPANSION-DECLARATIONS?'' allows this
process to take place.  This mechanism has not been used for a while,
however, because the resulting compiler was, although somewhat faster,
considerably bigger, so this switch may not currently work.
Several other compiler parameters and switches control various aspects
of the operation of the back end.  Most parameters and switches are
machine independent, and are defined in compiler/base/switch.scm .
The remaining parameters and switches are defined in
compiler/machines/port/machin.scm.  All compiler parameters and
switches are exported to the Scheme global package for easy
manipulation.
The following switches are of special importance to the back end
writer:
* COMPILER:COMPILE-BY-PROCEDURES? This switch controls whether the
compiler should compile each top-level lambda expression independently
or compile the whole input program (or file) as a block.  It is
usually set to true, but must be set to false for cross-compilation.
The cross-compiler does this automatically.  (Foot) The reason for
this is that the gc offset words in compiled entry points may differ
for the source and target machines, and thus the source machine's
garbage collector may be confused by the target machine's compiled
entry points.  This is circumvented by having the cross compiler
generate a single compiled code block object, manipulated and dumped
as a vector object (instead of as an entry point).  The final entry
points are generated by cross-compile-bin-file-end running interpreted
on the target machine.
* COMPILER:OPEN-CODE-PRIMITIVES? This switch controls whether Liar
will open code (inline) MIT Scheme primitives.  It is usually set to
true and should probably be left that way.  On the other hand, it is
possible to do a lot less work in porting the compiler by not
providing the open coding of primitives and turning this switch off.
 Some of the primitives are open coded by the machine-independent
portion of the compiler, since they depend only on structural
information, and not on the details of the particular architecture.
In other words, CAR, CONS, and many others can be open-coded in a
machine-independent way since their open codings are performed
directly in the RTL.  Turning this switch to false would prevent the
compiler from open coding these primitives as well.
* COMPILER:GENERATE-RTL-FILES? and COMPILER:GENERATE-LAP-FILES? These
are mostly compiler debugging switches.  They control whether the
compiler will issue .rtl and .lap files for every file compiled.  The
.rtl file will contain the RTL for the program, and the .lap file will
contain the input to the assembler.  Their usual value is false.
* COMPILER:INTERSPERSE-RTL-IN-LAP? This is another debugging switch.
If turned on, and COMPILER:GENERATE-LAP-FILES? is also on, the lap
output file includes the RTL statements as comments preceding their
LAP translations.  Its usual value is true.
 ==> RTL predicates are not included, making the control-flow hard to
follow.  This should be fixed.
* COMPILER:OPEN-CODE-FLOATING-POINT-ARITHMETIC? This switch is
defined in compiler/machines/port/machin.scm and determines whether
floating point primitives can and should be open coded by the compiler
or not.  If the port provides open codings for them, it should be set
to true, otherwise to false.
* COMPILER:PRIMITIVES-WITH-NO-OPEN-CODING This parameter is defined in
compiler/machines/port/machin.scm.  It contains a list of primitive
names that the port cannot open code.  Currently there is no simple
list of all the primitives that Liar can open-code.  The list is
implicit in the code contained in rtlgen/opncod.scm.
 ==> The last two parameters should probably be combined and inverted.
COMPILER:PRIMITIVES-WITH-OPEN-CODINGS should replace both of the
above.  This has the advantage that if the RTL level is taught how to
deal with additional primitives, but not all ports have open codings
for them, there is no need to change all the machin.scm files, only
those for which the open coding has been provided.
		4. Description of the machine-specific files
The following is the list of files that usually appears in the port
directory.  The files can be organized differently for each port, but
it is probably easiest if the same pattern is kept.  In particular,
the best way to write most is by editing the corresponding files from
an existing port.  Keeping the structure identical will make writing
decls.scm, comp.pkg, and comp.sf straightforward, and will make future
updates easier to track.
A useful thing to do when writing new port files is to keep
track of the original version from which you started, and
additionally, that on which your original is based.  For example, if
you use machines/mips/assmd.scm as a model for your version, in it you
would find something like
  $ Header: assmd.scm,v 1.1 90/05/07 04:10:19 GMT jinx Exp $
  $MC68020-Header: assmd.scm,v 1.36 89/08/28 18:33:33 GMT cph Exp $
In order to allow an easier merge in the future, it would
be good if you transformed this header into
  $ Header $
  $mips-Header: assmd.scm,v 1.1 90/05/07 04:10:19 GMT jinx Exp $
  $MC68020-Header: assmd.scm,v 1.36 89/08/28 18:33:33 GMT cph Exp $
The new $ Header $ line would be used by RCS to keep track of the
versions of your port and the others could be used to find updates to
the originals that would make updating your port easier.
	4.1. Compiler building files:
* comp.pkg:
	This file describes the Scheme package structure of the
compiler, the files loaded into each package, and what names are
exported and imported from each package.
	To write this file, copy the similar file from an existing
port, change the name of the port (i.e. mips -> sparc), and add or
remove files as appropriate.  You should only need to add or remove
assembler and LAPGEN files.
* comp.cbf:
	This file is a script that can be used to compile the compiler
from scratch.  You can copy this file from another port, and change
the port name.  There is more information in a later section about how
to build the compiler.
* comp.sf:
	This file is a script that is used to pre-process the compiler
sources before they are loaded to be interpreted or compiled.  You
should be able to copy the file from an existing port and replace the
name of the port.  You should also edit the names of the instruction
files in the assembler instruction database section, although this
section is no longer used by default.
The previous three files should be copied or linked to the top-level
compiler directory.  I.E., compiler/comp.pkg should be a link (symbolic
preferably) or copy of compiler/machines/port/comp.pkg .
* comp.con, comp.ldr, comp.bcon, and comp.bldr:
	These files are generated by the CREF subsystem from the
information in the cref.pkg file.  The .bcon and .bldr files are
binary versions of the others, which are scheme sources.  The .con
file contains the ``connectivity code'', that is, the code to create and
link the package objects specified in the .pkg file.  The .ldr file
contains the ``loading code'', that is, the code to load the source
files into the appropriate packages and, in theory, to initialize the
packages.  The CREF subsystem also generates a comp.cref file that
includes cross-reference information.  It is useful to examine this
file to find unbound references (often typos).
* make.scm:
	This file is used to load the compiler on top of a runtime
system that has the file syntaxer (SF) loaded, and defines the version
of the compiler.  The list of files does not appear here because the
comp.pkg already declares them, CREF will build the comp.con and
comp.ldr files that contain this information and will be loaded by
make.scm.
* decls.scm:
	This file defines the pre-processing dependencies between the
various source files.  There are three kinds of pre-processing
dependencies:
  - Syntactic: Different files need to be processed in different syntax
tables that define the macros used by the files.
  - Integrations: Different files import integrable (inline) definitions
from other files, and must be processed in the right sequence in order
to obtain the maximum effect from the integrations (mostly because of
transitive steps).
  - Expansions: Certain procedures can be expanded at compiler
pre-processing time into accumulations of simpler calls.  This is how
the assembly language in the LAPGEN rules can be translated into bits
at compiler pre-processing time.  The files that define the
pre-processing-time expansion functions must be loaded in order to
process those files that use the procedures that can be expanded.
decls.scm builds a database of the dependencies.  This database is
topologically sorted by some of the code in decls.scm itself in order
to determine the processing order.  Since there are circularities in
the integration dependencies, some of the files are processed multiple
times, but the mechanism in decls takes care of doing this correctly.
You should be able to edit the version from another port in the
appropriate way.  Mostly you will need to rename the port (i.e. mips
-> sparc), and add/delete instruction and rule files as needed.
 ==> decls.scm should probably be split into two sections: The
machine-independent dependency management code, and the actual
declaration of the dependencies for each port.  This would allow us to
share more of the code, and make the task of rewriting it less
daunting.
	4.2. Miscellaneous files:
* rgspcm.scm:
	This file declares a set of primitives that can be coded by
invoking runtime library procedures.  This file is no longer machine
dependent, since the portable library has made all the sets identical.
It lives in machines/port for historical reasons, and should probably
move elsewhere.  Obviously, you can just copy it from another port.
 ==> Let's move it or get rid of it!
* rulrew.scm:
	This file defines the simplifier rules that allow more
efficient use of the hardware's addressing modes and other
capabilities.  The rules use the same syntax as the LAPGEN rules, but
belong in the (rule) rewriting database.  Although these rules are
machine-dependent, it should be straightforward to emulate what other
ports have done in order to arrive at a working set.  Moreover, it is
possible to start out with an empty set and only add them as
inefficiencies are discovered in the output assembly language.  These
rules manipulate RTL expressions by using the procedures defined in
compiler/rtlbase/rtlty1.scm and compiler/rtlbase/rtlty2.scm.
* lapopt.scm:
	This file defines a LAP-level peephole optimizer.  Currently
only used in the MIPS port to reduce the number of NOPs in the ``delay
slots'' of load instructions.  The instructions in each LAP-level
basic block are passed to optimize-linear-lap, which outputs the new
sequence of instructions corresponding to the basic block.  Currently
all ports (except the MIPS port) implement this procedure as the
identity procedure.
* machin.scm:
	This file defines architecture and port parameters needed by
various parts of the compiler.  The following is the current list of
the primary parameters.  The definitions of derived parameters not
mentioned here should be copied verbatim from existing ports.  Some of
these parameters are not currently in use, but should all be provided
for completeness.
- USE-PRE/POST-INCREMENT?: Should be true or false depending on
whether the architecture has addressing modes that update the base
address.  It is true on the MC68020, Vax, i386, and HP-PA, and false
on the MIPS and Alpha.
- ENDIANNESS: Should be the symbol LITTLE if an address, when used as
a byte address, refers to the least significant byte of the long-word
addressed by it.  It should be BIG if it refers to the most
significant byte of the long-word.  The compiler has not been ported
to any machines where the quantum of addressability is not an 8-bit
byte, so the notion may not apply to those.
- ADDRESSING-GRANULARITY: How many bits are addressed by the
addressing quantum.  I.e., increasing an address by 1 will bump the
address to point past this number of bits.  Again, the compiler has
not been ported to any machine where this value is not 8.
- SCHEME-OBJECT-WIDTH: How many bits are taken up by a Scheme object.
This should be the number of bits in a C ``unsigned long'', since Scheme
objects are declared as such by the portable runtime library.
- SCHEME-TYPE-WIDTH: How many bits at the most-significant end of a
Scheme object are taken up by the type tag.  The value of
TYPE_CODE_LENGTH in the microcode must match this value.  The value is
currently 6 for systems with a compiler and 8 for systems without one.
- ADDRESS-UNITS-PER-PACKED-CHAR: This parameter defines how much to
increment an address by in order to make it point to the next
character in a string.  The compiler has not been ported to any
configuration where this is not 1, but may be if 16-bit characters are
used in the future.
- FLONUM-SIZE: This is the ceiling of the ratio of the size of a C
``double'' to the size of a C ``unsigned long''.  It reflects how many
Scheme units of memory (measured in Scheme objects) the data in a
Scheme floating point object will take.
- FLOAT-ALIGNMENT: This value defines the bit-alignment constraints
for a C ``double''.  It must be a multiple of scheme-object-width.  If
floating point values can only be stored at even long-word addresses,
for example, this value should be twice scheme-object-width.
- SIGNED-FIXNUM/UPPER-LIMIT: This parameter should be derived from
others, but is specified as a constant due to a shortcoming of the
compiler pre-processing system (EXPT is not constant-folded).  Use the
commented-out expression to derive the value for your port.  All
values that should be derived but are instead specified as constants
are tagged by a comment containing ``***''.
- STACK->MEMORY-OFFSET: This procedure is provided to accommodate
stacks that grow in either direction, but we have not tested any port
in which the stack grows towards larger addresses, because the CScheme
interpreter imposes its own direction of growth.  It should probably
be copied verbatim.
- EXECUTE-CACHE-SIZE: This should match EXECUTE_CACHE_ENTRY_SIZE in
microcode/cmpint-port.h, and is explained in cmpint.txt.
 ==> We should probably rename one or the other to be alike.
The following parameters describe to the front-end the format of
closures containing multiple entry points.  Closures are described in
some detail in cmpint.txt and in section 5.3.3.  Very briefly, a
closure is a procedure object that contains a code pointer and a set
of free variable locations or values.
- CLOSURE-OBJECT-FIRST-OFFSET: This procedure takes a single argument,
the number of entry points in a closure object, and computes the
distance in long-words between the first long-word in the closure
object, and the first long-word containing a free variable.  This is
the number of long-words taken up by the closure object's header, and
the code to represent N closure entry points.
- CLOSURE-FIRST-OFFSET: This procedure takes two arguments, the number
of entry points in a closure object, and the index of one of them, the
first being zero.  It computes the distance between that entry's
environment pointer and the first free variable in the closure object.
The entry's environment pointer will be the address of the entry point
itself if closure entry points are always aligned on long-word
boundaries, or the address of the first entry point if they are not.
- CLOSURE-ENTRY-DISTANCE: This procedure is given the number of entry
points in a closure object, and the indices for two of its entry
points, and computes the number of bytes that separate the two entry
points in the closure object.  This distance should be a multiple of
the parameter COMPILED_CLOSURE_ENTRY_SIZE described in cmpint.txt and
defined in microcode/cmpint-port.h.
- CLOSURE-ENVIRONMENT-ADJUSTMENT: This procedure takes two parameters,
the number of entry points in a closure object, and the index of one
of them.  It computes the number of bytes that must be added to the
entry point's address to result in the entry point's environment
pointer.  If entry points are always aligned on long-word boundaries,
this number should always be zero, otherwise it should be the distance
to the first (lowest addressed) entry point.
The remaining code in machin.scm describes the register set of the
architecture and defines the register conventions imposed by the port.
These conventions must match the expectations of
microcode/cmpaux-port.m4 described in cmpaux.txt.
Machine registers are assigned a contiguous range of non-negative
integers starting from zero.  Typically symbolic names are given to
each of these integers for use in some of the rules, especially those
dealing with the assembly language interface.
- NUMBER-OF-MACHINE-REGISTERS should be the number of machine registers,
i.e. one greater than the number assigned to the last machine register.
- NUMBER-OF-TEMPORARY-REGISTERS is the number of reserved memory
locations used for storing the contents of spilled pseudo-registers.
Liar requires certain fixed locations to hold various implementation
quantities such as the stack pointer, the heap (free memory) pointer,
the pointer to the runtime library and interpreter's ``register''
array, and the dynamic link ``register''.  Typically each of these
locations is a fixed machine register.  In addition, a processor
register is typically reserved for returned values and another for
holding a bit-mask used to clear type tags from objects (the pointer
or datum mask).  All of these registers should be given additional
symbolic names.
 ==> What is MACHINE-REGISTER-KNOWN-VALUE used for?  It would seem that
the datum mask is a known value, but...  Currently all the ports seem
to have the same definition.
The contents of pseudo-registers are divided into various classes to
allow some consistency checking.  Some machine registers always
contain values in a fixed class (e.g. floating point registers and the
register holding the datum mask).
- MACHINE-REGISTER-VALUE-CLASS is a procedure that maps a register to
its inherent value class.  The main value classes are
value-class=object, value-class=address, and value-class=float.
The registers allocated for the special implementation quantities have
fixed value classes.  The remaining registers, managed by the
compiler's register allocator, may be generic (value-class=word) or
allow only certain values to be stored in them (value-class=float,
value-class=address, etc.).
Most of the remainder of compiler/machines/port/machin.scm is a set of
procedures that return and compare the port's chosen locations for
various operations.  Some of these operations are no longer used by
the compiler, and reflect a previous reliance on the interpreter to
accomplish certain environment operations.  These operations are now
handled by invoking the appropriate primitives rather than using
special entry points in the runtime library for them.  Under some
compiler switch settings the older methods for handling these
operations can be re-activated, but this never worked completely, and
may no longer work at all.
- RTL:MACHINE-REGISTER? should return a machine register for those
special RTL registers that have been allocated to fixed registers, and
false otherwise.
- RTL:INTERPRETER-REGISTER? should return the long-word offset in the
runtime library's memory ``register'' array for those special RTL
registers not allocated to fixed registers, and false otherwise.
- RTL:INTERPRETER-REGISTER->OFFSET errors when the special RTL
register has not been allocated to a fixed register, and otherwise
returns the long-word offset into the register  array.
- RTL:CONSTANT-COST is a procedure that computes some metric of how
expensive is to generate a particular constant.  If the constant is
cheaply reconstructed, the register allocator may decide to flush it
(rather than spill it to memory) and re-generate it the next time it
is needed.  The best estimate is the number of cycles that
constructing the constant would take, but the number of bytes of
instructions can be used instead.
- COMPILER:OPEN-CODE-FLOATING-POINT-ARITHMETIC? and
COMPILER:PRIMITIVES-WITH-NO-OPEN-CODING have been described in the
section on compiler switches and parameters.
	4.3. LAPGEN files:
The following files control the RTL -> LAP translation.  They define
the rules used by the pattern matcher to perform the translation, and
procedures used by the register allocator and linearizer to connect
the code that results from each rule.  The rules, and how to write
them, are described further in a later section.
The rule set is partitioned into multiple subsets.  This is not
necessary, but makes re-compiling the compiler faster and reduces the
memory requirements of the compiler.  The partition can be done in a
different way, but is probably best left as uniform as possible
between the different ports to facilitate comparison and updating.
The LAPGEN (RTL->LAP) rules are separated into two different data
bases.  The larger is the statement data base, used to translate whole
RTL instructions.  The smaller is the predicate data base, used to
translate decisions to branch between the RTL basic blocks.
* lapgen.scm:
	This file does not define any rules, but provides a set of
utilities for the back end.  It provides utilities for the rules,
typically procedures for generating code that manipulates the object
representation, additional entry points to the register allocator that
are better suited to the port, and the interface procedures for the
register allocator and the linearizer.
The following definitions constitute the register allocator interface
and must be provided by lapgen.scm:
- AVAILABLE-MACHINE-REGISTERS is a list of the RTL register numbers
corresponding to those registers that the register allocator should
manage.  This should include all machine registers except those
reserved by the port.
- SORT-MACHINE-REGISTERS is a procedure that reorders a list of
registers into the preferred allocation order.
 ==> Is this right?
- REGISTER-TYPE is a procedure that maps RTL register numbers to their
inherent register types (typically GENERAL and FLOAT).
- REGISTER-TYPES-COMPATIBLE? is a boolean procedure that decides
whether two registers can hold the same range of values.
- REGISTER-REFERENCE maps RTL register numbers into register
references, i.e. pieces of assembly language used to refer to those
registers.
- REGISTER->REGISTER-TRANSFER issues code to copy the contents of one
RTL register into another.
- REFERENCE->REGISTER-TRANSFER issues code to copy the contents of a
machine register described by its reference into a given RTL register.
- PSEUDO-REGISTER-HOME maps RTL registers to a fragment of assembly
language used to refer to the memory location into which they will be
spilled if necessary.  This is typically a location (or set of
locations) in the Scheme ``register'' array.
- HOME->REGISTER-TRANSFER generates code that copies the contents of
an RTL register's home (its spill location) into a machine register.
- REGISTER->HOME-TRANSFER generates code that copies the contents of
an RTL register, currently held in a machine register, into its memory
home.
The following definitions constitute the linearizer interface, and
must be provided by lapgen.scm:
- LAP:MAKE-LABEL-STATEMENT generates an assembly language directive
that defines the specified label.
- LAP:MAKE-UNCONDITIONAL-BRANCH generates a fragment of assembly
language used to unconditionally transfer control to the specified
label.
- LAP:MAKE-ENTRY-POINT generates a fragment of assembly language used
to precede the root of the control flow graph.  Its output should use
the assembler directive ENTRY-POINT and generate format and GC words
for the entry point.
The rest of the code in lapgen.scm is a machine-specific set of utilities
for the LAPGEN rules.  Some of the more common procedures are
described in the section that covers the rules.
Of special interest are the utility procedures for manipulating
pc-relative addresses and loads on RISC machines.  RISC machines
typically only have pc-relative branch instructions, but no
pc-relative loads or pc-relative load-effective-address instructions.
On the other hand, they usually have a branch-and-link instruction
that performs a pc-relative branch and stores the return address in
a processor register.  This instruction can be used (by branching to
the next instruction) to obtain its own address, and pc-relative
addresses and loads can use them.  The MIPS back end currently
implements a simple pc-relative address caching scheme that attempts
to reduce the number of such branches by re-using the values produced
by previous branches if they are still available.  This code can be
suitably modified to work on most RISC architectures.
* rules1.scm:
	This file contains RTL statement rules for simple register assignments
and operations.  In particular, it contains the rules for constructing
and destructuring Scheme objects, allocating storage, and memory <->
register transfers.
* rules2.scm:
	This file contains RTL predicate rules for simple equality
predicates (EQ-TEST, TYPE-TEST).
* rules3.scm:
	This file contains RTL statement rules for control-flow
statements like continuation (return address) invocation, several
mechanisms for invoking procedures, stack reformatting prior to
invocation, procedure headers, closure object allocation, expression
headers and declaring the data segment of compiled code blocks for
assembly.  See [1] for some background information on stack
reformatting, and [2] for a discussion of how calls to (the values of)
free variables are handled by Liar.
* rules4.scm:
	This file contains RTL statement rules for runtime library
routines that handle manipulation of variables in first class
environments.  Many of these rules are no longer used by the compiler
unless some switch settings are changed.  See [2] for a discussion of
how Liar handles references to free variables.
* rulfix.scm:
	This file contains statement and predicate rules for
manipulating fixnums (small integers represented in immediate
form).  The rules handle tagging and de-tagging fixnum objects,
arithmetic on them, comparison predicates, and overflow tests.
* rulflo.scm:
	This file contains statement and predicate rules for
manipulating flonums (floating point data in boxed form).  The rules
handle boxing and un-boxing of flonums, arithmetic on them, and
comparison predicates.
	4.4. Assembler files:
* assmd.scm:
	This file defines the following machine-dependent parameters
and utilities for the bit-level assembler:
- MAXIMUM-PADDING-LENGTH: If instructions are not always long-word
aligned, the maximum distance in bits between the end of an
instruction and the next (higher) long-word boundary.
- PADDING-STRING: A bit-string used for padding the instruction block
to a long-word boundary.  If possible, it should encode a HALT or
ILLEGAL instruction.  The length of this bit-string should evenly
divide maximum-padding-length.
- BLOCK-OFFSET-WIDTH: This should be the size in bits of format_word
described in cmpint.txt.  It should be 16 for all byte-addressed
machines where registers hold 32 bits.
- MAXIMUM-BLOCK-OFFSET: The maximum byte offset that can be encoded in
block-offset-width bits.  This depends on the encoding described in
cmpint.txt.  The least significant bit is always used to indicate
whether this block offset points to the start of the object or to
another block offset, so the range may be smaller than the obvious
value.  Furthermore, if instruction alignment constraints are tighter
than byte boundaries, this range may be larger.  For example, if
instructions always start on even long-word boundaries, the bottom two
bits (always zero) are encoded implicitly, and the range is
accordingly larger.
- BLOCK-OFFSET->BIT-STRING: This procedure is given a byte offset and
a boolean flag indicating whether this is the offset to the start of a
compiled code block or to another block-offset, and returns the
encoded value of this offset.
- MAKE-NMV-HEADER: This procedure is given the size in long-words of a
block of instructions, and constructs the non-marked-vector header
that must precede the instructions in memory in order to prevent the
garbage collector from examining the data as Scheme objects.  This
header is just an ``object'' whose type tag is manifest-nm-vector
(TC_MANIFEST_NM_VECTOR in the microcode) and whose datum is the size
in long-words (excluding the header itself).
The following three parameters define how instruction fields are to be
assembled in memory depending on the ``endianness'' (byte ordering) of
the architecture.  You should be able to use the MC68020 (big endian)
or the Vax (little endian) version, or the MIPS version which is
conditionalized for both possibilities since MIPS processors can be
configured either way.
- INSTRUCTION-INSERT! is a procedure, that given a bit-string encoding
instruction fields, a larger bit-string into which the smaller should
be inserted, a position within the larger one, and a continuation,
inserts the smaller bit-string into the larger at the specified
position, and returns the new bit position at which the immediately
following instruction field should be inserted.
- INSTRUCTION-INITIAL-POSITION is a procedure, that given a bit-string
representing a segment of compiled code, returns the bit-string
position at which instruction-insert! should insert the first
instruction.
- INSTRUCTION-APPEND is a procedure, that given the bit-string
encoding successive (fields of) instructions, produces the bit-string
that corresponds to their concatenation in the correct order.
* coerce.scm:
	This file defines a set of coercion procedures.  These
procedures are used to fill fields in instructions.  Each coercion
procedure checks the range of its argument and produces a bit string
of the appropriate length encoding the argument.  Most coercions will
coerce their signed or unsigned argument into a bit string of the
required fixed length.  On some machines (e.g. HP PA), some coercions
may permute the bits appropriately.
* insmac.scm:
	This file defines machine-specific syntax used in the assembler,
and the procedure PARSE-INSTRUCTION, invoked by the syntax expander
for DEFINE-INSTRUCTION to parse the body of each of the instruction
rules.  This code is typically complex and you are encouraged to
emulate one of the existing ports in order to reuse its code.
The following ports use the following syntax for describing
instructions in machine language:
- Spectrum and MIPS:
    (LONG (<width 1> <value 1> <coercion type 1>)
	  (<width 2> <value 2> <coercion type 2>)
	  ...	
	  (<width n> <value n> <coercion type n>))
where all the widths must add up to an even multiple of 32.
- Vax:
Instruction descriptions are made of arbitrary sequences of the
following field descriptors:
    (BYTE (<width 1> <value 1> <coercion type 1>)
	  (<width 2> <value 2> <coercion type 2>)
	  ...	
	  (<width n> <value n> <coercion type n>))
    (OPERAND <size> <value>)
    (DISPLACEMENT (<width> <value>))
The total width of each of these field descriptors must add up to a
multiple of 8.
BYTE is used primarily for instruction opcodes.
OPERAND is used for general addressing modes.
DISPLACEMENT is used for PC-relative branch displacements.
- MC68020:
    (WORD (<width 1> <value 1> <coercion type 1> <size 1>)
	  (<width 2> <value 2> <coercion type 2> <size 2>)
	  ...	
	  (<width n> <value n> <coercion type n> <size 3>))
where all the widths must add up to an even multiple of 16.
Size refers to immediate operands to be encoded in the instruction,
and are omitted when irrelevant.
A missing coercion type means that the ordinary unsigned coercion (for
the corresponding number of bits) should be used.
Additionally, each of these ports provides a syntax for specifying
instructions whose final format must be determined by the
branch-tensioning algorithm in the bit assembler.  The syntax of these
instructions is usually
    (VARIABLE-WIDTH (<name> <expression>)
      ((<low-1> <high-1>)
       <instruction-specifier-1>)
      ((<low-2> <high-2>)
       <instruction-specifier-2>)
      ...
      ((() ())
       <instruction-specifier-n>))
Each instruction specifier is an ordinary (i.e. not VARIABLE-WIDTH)
instruction specifier.  NAME is a variable to be bound to the
bit-assembly-time value of EXPRESSION.  Each of the ranges
<low-1>-<high-1> <low-2>-<high-2>, etc. must be properly nested in the
next, and () specifies no bound.  The final format chosen is that
corresponding to the lowest numbered range containing the value of
<expression>.  Successive instruction specifiers must yield
instructions of non-decreasing lengths for the branch tensioner to
work correctly.  The MC68020 port uses GROWING-WORD instead of
VARIABLE-WIDTH as the keyword for this syntax.
 ==> This should probably be changed.
* inerly.scm:
	This file provides alternative expanders for the
machine-specific syntax.  These alternative expanders are used when
the assembly language that appears in the LAPGEN rules is assembled
(early) at compiler pre-processing time.  That is, the procedures
defined in this file are only used if
COMPILER:ENABLE-EXPANSION-DECLARATIONS? is set to true.  If you reuse
the code in insmac.scm from another port, you should be able to reuse
the inerly.scm file from the same port.  Alternatively, you can write
a dummy version of this code and require
COMPILER:ENABLE-EXPANSION-DECLARATIONS? to be always false.  This
switch defaults to false, currently.  The Spectrum and MIPS versions
currently have dummy versions of this code.
* insutl.scm:
	This file defines machine-specific rule qualifiers and
transformers.  It is often used to define addressing-mode filters and
handling procedures for architectures with general addressing modes.
This file does not exist in the Spectrum port because all the relevant
code has been placed in instr1.scm, and the MIPS port has no
machine-specific qualifiers and transformers.  Qualifiers and
transformers are described further in the chapter on the syntax of
translation rules.
* instr<n>.scm:
	These files define the instruction set of the architecture by
using the syntax defined in insmac.scm and inerly.scm.  There can be
as many of these files or as few as desired by whoever writes the
assembler.  They are usually split according to the size of the files
or along the divisions in the architecture manual.  Not all
instructions in the architecture need to be listed here -- only those
actually used by the back end in the LAPGEN rules and utility procedures.
Privileged/supervisory instructions, BCD (binary coded decimal)
instructions, COBOL-style EDIT instructions, etc., can probably be
safely ignored.
	4.5. Disassembler files:
	The disassembler is almost completely machine dependent.  For
many machines, a reasonable disassembler could be derived from the
description of the instruction set used to assemble programs.  The Vax
disassembler, is essentially constructed this way.  Unfortunately this
has not been generalized, and currently each port has its own
disassembler, often duplicating information contained in the
assembler.
The disassembler is not necessary for the operation of the compiler
proper.  It is, however, a good debugging tool.  You can bring the
compiler up without a disassembler by providing stubs for the
procedures referenced in dassm2.
* dassm1.scm:
	This file contains the top-level of the disassembler.  It is
not machine-dependent, and should probably be moved to another directory.
 ==> Is compiler/back the right place for this?
* dassm2.scm:
	This file contains various utilities for the disassembler.  In
particular, it contains the definitions of
- COMPILED-CODE-BLOCK/BYTES-PER-OBJECT
- COMPILED-CODE-BLOCK/OBJECTS-PER-PROCEDURE-CACHE
- COMPILED-CODE-BLOCK/OBJECTS-PER-VARIABLE-CACHE
  These parameters specify various relative sizes.
 ==> Shouldn't these be in machin.scm?  The first two have
counterparts there, and the last is always 1.
- DISASSEMBLER/READ-VARIABLE-CACHE
- DISASSEMBLER/READ-PROCEDURE-CACHE
  These procedures are used to extract free variable information from
a linked compiled code block.  Variable caches are maintained as
native addresses (i.e. no tag bits), and procedure (execute) caches
contain absolute jump instructions that must be decoded to extract the
address of the called procedure.  Appropriate type bits must be added
to both values before they are returned.
This file also contains a state machine that allows the disassembler
to display data appearing in the instruction stream in an appropriate
format (gc and format words, mainly), and heuristics for displaying
addressing modes and PC-relative offsets in a more legible form.
The output of the disassembler need not be identical to the input of
the assembler.  The disassembler is used almost exclusively for
debugging, and additional syntactic hints make it easier to read.
* dassm3.scm:
	This file contains the code to disassemble one instruction at
a time.  It is completely machine dependent at the time, and any old
way of doing it is fine.
* dinstr<n>.scm:
	In the VAX port, these are copies (or links) to the
instr<n>.scm files.  They are processed with a different syntax table
to construct the disassembler tables instead of the assembler tables.
* dsyn.scm:
	In the VAX port, this file provides the alternative expansion
of DEFINE-INSTRUCTION used to construct the disassembler tables
instead of the assembler rule data base.
		5. All about rules
There are three subsystems in Liar that use rule-based languages.
They are the RTL simplifier, LAPGEN (RTL->LAP translation), and the
assembler.  The assembler need not be rule-based, but given the
availability of the rule language, using the rule mechanism may be the
easiest way to write it.
	5.1. Rule syntax
The assembler rules use a somewhat different syntax from the rest and
will be described later.
The rest of the rules are defined in the following way:
    (DEFINE-RULE <rule-database>
      <rule pattern>
      <qualifier>				; optional
      <rule body>)
* <rule-database> is an expression evaluating to a rule database.  
It should be one of STATEMENT, PREDICATE, or REWRITING.
* <rule pattern> is a list that represents the pattern to match.
Variables in the pattern are written by using the ``?'' syntax.
For example,
- (hello) matches the constant list (hello)
- (? thing) matches anything, and THING is bound in <qualifier> and
<rule body> to whatever was matched.
- (hello (? person)) matches a list of two elements whose first
element is the symbol HELLO, and whose second element can be anything.
The variable PERSON will be bound in <qualifier> and <rule body> and
will have as its value the second element of the list matched.
Thus it would match (hello bill) and PERSON would be the symbol BILL,
(hello (bill rozas)) would match and PERSON would be the list (BILL ROZAS).
- (hello . (? person)) matches a list of one or more elements whose
first element is the symbol HELLO.  PERSON is bound to the rest of the
list.
Thus (hello my dog likes frankfurters) would match and PERSON would be
(MY DOG LIKES FRANKFURTERS).  (hello (my dog)) would match, and PERSON
would be ((MY DOG)).
Variable syntax is further described below.
* <qualifier> is (QUALIFIER <expression>) where <expression> evaluates
to a boolean and further filters matches.  If the qualifier expression
evaluates to false, the rule is not fired.  Otherwise it is.
For example,
    (DEFINE-RULE <some database>
      (multiple (? number) (? divisor))
      (QUALIFIER (and (number? number)
		      (number? divisor)
		      (zero? (remainder number divisor))))
      <rule body>)
will match (MULTIPLE 14 7) and (MULTIPLE 36 4), but will not match
(MULTIPLE FOO 3), (MULTIPLE 37 4), (MULTIPLE 2), (MULTIPLE 14 2 3),
nor (HELLO 14 7).
Rule qualifiers are optional.
* <rule body> is an arbitrary Lisp expression whose value is the
translation determined by the rule.  It will typically use the
variables bound by ``?'' to perform the translation.  The statement
and predicate rules use the LAP macro to generate sequences of
assembly language instructions.
The assembler rules use the following syntax:
    (DEFINE-INSTRUCTION <opcode>
      (<pattern1> <qualifier1> <body1>)
      (<pattern2> <qualifier2> <body2>)
      ...
      )
Where <opcode> is the name of the instruction, and the patterns will
be matched against the cdr of lists whose car is <opcode>.  
The <patterns>, <qualifiers>, and <bodies> are as in the RTL rules,
except that there are typically no qualifiers, and the bodies are
typically written in a special syntax defined in
compiler/machines/port/insmac.scm and described in section 4.4.
For example,
    (DEFINE-INSTRUCTION ADD
      (((R (? target)) (R (? reg1)) (R (? reg2)))
       (WORD (6 #x24)
	     (5 target)
	     (5 reg1)
	     (5 reg2)
	     (11 0)))
      (((R (? target)) (R (? reg)) (& (? constant)))
       (WORD (6 #x23)
	     (5 target)
	     (5 reg)
	     (16 constant SIGNED))))
would match (ADD (R 1) (R 2) (R 3)) and (ADD (R 7) (R 22) (& 257)),
firing the corresponding body.
The bodies are defined in terms of the WORD syntax defined in
insmac.scm, and the ``commas'' used with the pattern variables in the
rule bodies are a consequence of the WORD syntax.  The meaning of the
commas is identical to the meaning of the commas in a ``backquote''
Scheme expression, and is briefly described in section 5.3.1.
	5.2. Rule variable syntax.
Although qualifiers and the simple variable syntax shown are
sufficient, some additional variable syntax is available for common
patterns.  Moreover, the early matcher (used when
COMPILER:ENABLE-EXPANSION-DECLARATIONS? is true) cannot currently
handle qualifiers but can handle the additional variable syntax that
can supplant most qualifiers.  The early matcher is used only on the
assembler rules, so if you want to use it, you only need to use the
restricted language when writing those rules.
The complete variable syntax is as follows:
* (? <name>)  This syntax matches anything in that position of the
potential instance, and binds <name> to the sub-structure matched.
* (? <name> <transform>) This syntax matches anything in that position
of the potential instance as long as <transform> returns non-false on
the sub-structure matched.  <name> is bound to the result returned by
<transform>.  For example,
    (? q (lambda (obj) (and (number? obj) (* 2 obj))))
will match 2, and Q will be bound to 4, but will not match FOO.
* (? <name1> <transform> <name2>)  <name1> and <transform> have the same
meaning as in the previous syntax, and this syntax matches exactly the
same objects, but provides the additional convenience of binding
<name2> to the sub-structure matched, before the transformation.
For example,
    (? q (lambda (obj)
	   (and (pair? obj) 
		(number? (car obj))
		(- (car obj) 23)))
       z)
will match (2 . HELLO), Q will be bound to -21, and Z will be bound to
(2 . HELLO), and will not match 34 or (HELLO . 2).
 ==> The pattern parser seems to understand (?@ <name>) as well, but
this syntax is used nowhere.  The early parser does not understand it.
Should it be flushed?
	5.3. Writing statement rules.
Statement rules provide the translation between RTL instructions and
fragments of assembly language.  Most RTL instructions are
assignments, where an RTL register is written with the contents of a
virtual location or the result of some operation.
	5.3.1. Output of the statement rules
The output of the statement rules is a fragment of assembly language
written in the syntax expected by the LAP assembler.  The fragments,
containing any number of machine instructions, are constructed by
using the LAP macro, built on top of Scheme's QUASIQUOTE (back-quote).
Within a LAP form, you can use UNQUOTE (comma) and UNQUOTE-SPLICING
(comma at-sign) to tag subexpressions that should be evaluated and
appended.  For example,
    (LAP (MOV L ,r1 ,r2)
	 (ADD L ,r3 ,r2))
constructs a fragment with two instructions in it where the values of
r1, r2, and r3 are substituted in the instructions.
The code
    (LAP (MOV L ,r1 ,r2)
	 ,@(generate-test r2))
constructs a fragment whose first instruction is a MOV instruction,
and the rest is the fragment returned by generate-test.
The INST macro is similar to LAP but constructs a single instruction.
It should not be used unless necessary (i.e. in
LAP:MAKE-LABEL-STATEMENT), since you may find yourself later wanting
to change a single instruction into a fragment in a utility procedure,
and having to find every use of the procedure.
 ==> We should change the linearizer to expect
LAP:MAKE-LABEL-STATEMENT to return a fragment, and do away with INST.
An additional macro, INST-EA, is provided to construct a piece of
assembly language representing an addressing mode.  For example,
INST-EA is used by the following procedure in the Vax back-end:
    (define (non-pointer->ea type datum)
      (if (and (zero? type)
	       (<= 0 datum 63))
	  (INST-EA (S ,datum))
	  (INST-EA (&U ,(make-non-pointer-literal type datum)))))
where non-pointer->ea may be used in
    (LAP (MOV L ,(non-pointer->ea <type> <datum>)
		,(any-register-reference target)))
INST-EA is superfluous on machines without general addressing modes
(i.e. load-store architectures).
Each port provides a procedure, named REGISTER-REFERENCE, that maps
between RTL machine registers and the assembly language syntax used to
refer to the registers.  It uses INST-EA to build such references.
The macros LAP, INST, and INST-EA, besides providing the functionality
of QUASIQUOTE, also provide a hook for the compiler pre-processing
time assembly of the code generated by the rules.
	5.3.2. Hardware register allocation
Hardware register allocation occurs during the RTL->LAP translation.
The rules, besides generating assembly language, invoke utilities
provided by the register allocator to reserve and free hardware
registers on which the operations can be performed.
Hardware registers are often divided into different non-overlapping
types that are used in different operations.  For example, modern
hardware typically has a set of integer registers and a set of
floating point registers.  Address operations typically require
operands in integer registers, while floating point operations
typically require floating point registers.  On some machines, notably
the Motorola 68K family, the integer register set is further
subdivided into types with specific operations (address and data).
The register allocator manipulates RTL registers.  RTL registers are
just small integers.  The low end of the valid range of RTL registers
is used to represent the physical registers of the processor (called
machine registers), and the rest of the numbers represent virtual
(pseudo) registers.  The core allocator operations are given an RTL
register number and a register type, and return a suitable machine
register to be used for the operation.
A machine register that holds the value of a pseudo register is called
an ``alias'' for the pseudo register.  A pseudo register may have many
valid aliases simultaneously, usually of different types.  Any
assignment to the pseudo register will invalidate all aliases but one,
namely the machine register actually written, rather than copy the new
value into all the previous aliases.  Thus source references and
destination references have different effects, and are handled by
different procedures in the register allocator.
Pseudo registers have associated homes, memory locations that hold
their values when the machine registers are needed for other purposes.
Most pseudo registers are never written to their homes, since a pseudo
register's value is usually kept in machine register aliases until the
pseudo register is dead, i.e. until its value is no longer needed.  A
pseudo register's aliases can be reused for other purposes if there
are other remaining aliases or this is the last reference to the
pseudo register.  An alias that can be reused is a ``reusable'' alias.
Occasionally, the value of a pseudo register may be transferred to the
register's home and the last alias invalidated, if the register
allocator is running out of registers.  This is called ``spilling'' a
register.
The register allocator maintains a table of associations, called the
``register map'', that associates each pseudo register with its valid
aliases, and each machine register with the pseudo register whose
value it holds (if any).  The register allocator routines modify the
register map after aliases are requested and invalidated, and they
generate assembly language instructions to perform the necessary data
motion for spilling and re-loading at run time.  These instructions
are usually inserted before the code output of the RTL rule in
execution.
If you have chosen your RTL register numbers for machine registers so
that they match the hardware numbers, and your assembly language does
not distinguish between references to a register and other fields, you
can ignore register references and use the RTL register numbers
directly.  This is commonly the case when using integer registers in
load-store architectures.
As a convenience, the register allocator also provides operations that
manipulate register references.  A register reference is a fragment of
assembly language, typically a register addressing mode for general
register machines, that when inserted into a LAP instruction, denotes
the appropriate register.  For example, on the Motorola MC68020,
physical register A3 is represented as RTL register number 11, and a
register reference for it would be ``(A 3)''.  RTL pseudo register 44
may at some point have RTL machine register 11 as its only
address-register alias.  At that time, (REGISTER-ALIAS 44 'ADDRESS)
would return 11.
The interface to the register allocator is defined in
compiler/back/lapgn2.scm.  Not all ports use all of the procedures
defined there.  Often a smaller subset is sufficient depending on
whether there are general addressing modes, etc.  A list of the most
frequently used follows:
* REGISTER-ALIAS expects an RTL register and a register type, and
returns a machine register of the specified type that is a valid alias
for that RTL register if there is one, or false if there is none.
This procedure should only be used for source operand RTL registers.
If the register type is false, then REGISTER-ALIAS will return any
valid alias.
* LOAD-ALIAS-REGISTER! is like REGISTER-ALIAS but always returns a
machine register, allocating one of the specified type if necessary.
This procedure should only be used for source operand RTL registers.
* REFERENCE-ALIAS-REGISTER! performs the same action as
LOAD-ALIAS-REGISTER! but returns a register reference instead of an
RTL register number.
* ALLOCATE-ALIAS-REGISTER! expects an RTL register and a register
type, and returns a machine register of the specified type that is the
only alias for the RTL register and should be written with the new
contents of the RTL register.  ALLOCATE-ALIAS-REGISTER! is used to
generate aliases for target RTL registers.
* REFERENCE-TARGET-ALIAS!  performs the same action as
ALLOCATE-ALIAS-REGISTER! but returns a register reference instead of
an RTL register number.  See CLEAR-REGISTERS! below.
* STANDARD-REGISTER-REFERENCE expects an RTL register, a register
type, and a boolean.  It will return a reference for an alias of the
specified register containing the current value of the RTL register.
This reference will be of the specified type if the boolean is false,
or sometimes of other types if the boolean is true.  In other words,
the boolean argument determines whether other types are acceptable,
although not desirable.  The register type may be false, specifying
that there really is no preference for the type, and any reference is
valid.  STANDARD-REGISTER-REFERENCE should be used only for source
operands (i.e. those that already contain data), and may return a
memory reference for those machines with general addressing modes if
there is no preferred type and alternates are acceptable.
* MOVE-TO-ALIAS-REGISTER! expects a source RTL register, a register
type, and a target RTL register.  It returns a new alias for the
target of the specified type containing a copy of the current contents
of the source.  Often this is accomplished by choosing an alias of the
source that already contains the correct data and making it the only
alias for target.  MOVE-TO-ALIAS-REGISTER! attempts to reuse an alias
for the source register.
* MOVE-TO-TEMPORARY-REGISTER! expects a source RTL register and a
register type and returns an appropriate register containing a copy of
the source.  The register is intended for temporary use, that is, use
only within the code generated by the expansion of the current RTL
instruction, and as such it should not be permanently recorded in the
register map.  The register becomes automatically freed for subsequent
RTL instructions.  MOVE-TO-TEMPORARY-REGISTER! attempts to reuse an
alias for the source register.
* REUSE-PSEUDO-REGISTER-ALIAS! expects an RTL register, a register
type, and two procedures.  It attempts to find a reusable alias for
the RTL register of the specified type, and invokes the first
procedure giving it the alias if it succeeds, or the second procedure
with no arguments if it fails.  MOVE-TO-ALIAS-REGISTER!  and
MOVE-TO-TEMPORARY-REGISTER! use REUSE-PSEUDO-REGISTER-ALIAS! but
occasionally neither meets the requirements.
* NEED-REGISTER! expects an RTL machine register and informs the
register allocator that the rule in use requires that register so it
should not be available for subsequent requests while translating the
current RTL instruction.  The register is available for later RTL
instructions unless the relevant rules invoke NEED-REGISTER! again.
The procedures described above that allocate and assign aliases and
temporary registers call NEED-REGISTER! behind the scenes, but you will
need to invoke it explicitly when calling out-of-line routines.
* LOAD-MACHINE-REGISTER! expects an RTL register and an RTL machine
register and generates code that copies the current value of the RTL
register to the machine register.  It is used to pass arguments in
fixed registers to out-of-line code, typically in the compiled code
runtime library.
* ADD-PSEUDO-REGISTER-ALIAS! expects an RTL pseudo-register and an
available machine register (no longer an alias), and makes the
specified machine register an alias for the pseudo-register.
* CLEAR-REGISTERS! expects any number of RTL registers and clears them
from the register map, preserving their current contents in memory if
needed.  It returns the code that will perform the required motion at
runtime.  It should be used before invoking LOAD-MACHINE-REGISTER! to
ensure that the potentially valid previous contents of the machine
register have been saved.
* CLEAR-MAP! deletes all aliases from the register map, pushing the
data only held in aliases into the memory homes if needed.  This
procedure returns an assembly language code fragment, and is typically
used before invoking out-of-line code.
* DELETE-DEAD-REGISTERS! informs the register allocator that RTL
pseudo registers whose contents will not be needed after the current
RTL instruction can be eliminated from the register map and their
aliases subsequently used for other purposes.
Most of the rules are actually written in terms of machine-specific
procedures that invoke the procedures listed above in fixed ways.
Rule bodies typically match the following code pattern:
    (let* ((rs1 (standard-source source1))
	   (rs2 (standard-source source2))
	   (rt (standard-target target)))
      (LAP ...))
where STANDARD-SOURCE and STANDARD-TARGET are machine-specific
procedures.  The reason for the use of LET* (instead of LET) is given
below.
On a machine with general addressing modes and memory operands, we
might provide their definitions as follows:
    (define (standard-source rtl-reg)
      (standard-register-reference rtl-reg 'GENERAL true))
    (define (standard-target rtl-reg)
      (delete-dead-registers!)
      (reference-target-alias! rtl-reg 'GENERAL))
while on a load-store architecture we might define them as follows:
    (define (standard-source rtl-reg)
      (load-alias-register! rtl-reg 'GENERAL))
    (define (standard-target rtl-reg)
      (delete-dead-registers!)
      (allocate-alias-register! rtl-reg 'GENERAL))
		- VERY IMPORTANT: -
This example brings up the cardinal rule of RTL assignments: 
    Any rule that writes into an RTL pseudo-register MUST invoke
    DELETE-DEAD-REGISTERS! after allocating aliases for the necessary
    sources but before allocating an alias for the target.
If this is not done, the register allocator may decide to spill
no-longer valid data into memory, which will probably make the
compiler get confused in other ways or cause garbage collection
problems later.  If it is done too early, the last valid alias for a
source operand may have been reused in the interim, and the compiler
will assume that the source quantity is contained in memory and will
often generate code that fetches and operates on garbage.
The example above uses LET* instead of LET.  LET would not work in the
above example because Scheme does not specify the order of argument
evaluation, and Liar chooses arbitrary orders, so the
DELETE-DEAD-REGISTERS! implicit in STANDARD-TARGET might be called too
early possibly causing STANDARD-SOURCE to fail.
MOVE-TO-ALIAS-REGISTER! invokes DELETE-DEAD-REGISTERS! because it
simultaneously allocates an alias for a source and for a target.
Thus, if there are other source operands, their aliases must be
allocated before MOVE-TO-ALIAS-REGISTER! is invoked.
	5.3.3. Invocation rules, etc.
The meaning and intent of most statement rules in an existing port is
readily apparent.  The more arcane rules have to do with procedures
and the representation of numbers.  What follows is a description of
some of the more obscure rules related to procedures and some of the
implementation concepts required to understand them.
In the invocation rules, FRAME-SIZE is the number of arguments passed
in the call (often plus one), and there is often more than one rule
with the same keyword, typically to handle the common cases (small
FRAME-SIZE) more efficiently.
Various of the rules specify the number of arguments that the
resulting procedure will accept.  The range is described in terms of
two parameters, MIN and MAX:
 - MIN is always positive and it is one greater than the smallest
number of arguments allowed.
 - MAX may be positive or negative.  If positive, it is one greater
than the largest number of arguments allowed.  If negative, it
indicates that the procedure will accept an unbounded number of
arguments, and the absolute value of MAX, minus (MIN + 1), is the
number of positional optional parameters.  Either way, the absolute
value of MAX is the size of the procedure's call frame counting the
procedure itself.
 These two values are encoded in the format word of the resulting
procedures so that dynamic APPLY can check the number of arguments
passed and reformat the stack frame appropriately.
 Non-positive MINs are used to indicate that the compiled entry point
is not a procedure, but a return address, a compiled expression, or a
pointer to an internal label.
The CONS-CLOSURE rules will dynamically create some instructions in
the runtime heap, and these instructions must be visible to the
processor's instruction fetch unit.  If the instruction and data
caches are not automatically kept consistent by the hardware,
especially for newly addressed memory, the caches must be explicitly
synchronized by the Scheme system.  On machines where the programmer
is given no control over the caches, this will be very hard to do.  On
machines where the control is minimal or flushing is expensive, the
following solution can be used to amortize the cost:
The CONS-CLOSURE rules can generate code to allocate a closure from a
pre-allocated pool and invoke an out-of-line routine to refill the
pool when it is empty.  The routine allocates more space from the
heap, initializes the instructions, and synchronizes the caches.
Since the real entry points are not known until the closure objects
are created, instead of using absolute jumps to the real entry points,
the pre-allocated closures can contain jumps to a fixed routine that
will extract the real entry point from the word pointed at by the
return address and invoke it.  In other words, the code inserted in
closure objects will be
	jsr fixed-routine
	<storage for real-entry-point>
and fixed-routine, written in assembly language, will do something like
	load	0(return-address),rtemp
	jmp	0(rtemp)
The 68040 version of the Motorola 68000 family port uses this trick
because the 68040 cache is typically configured in copyback mode, and
synchronizing the caches involves an expensive supervisor (OS) call.
The Alpha back-end also uses this trick because the caches can be
synchronized only by using the CALL_PAL IMB instruction, which flushes
the complete instruction cache, therefore implying a large re-start
cost.  The Alpha version of this code is currently better than the
68040 version, so you should probably emulate that version.
* (INVOCATION:UUO-LINK (? frame-size) (? continuation) (? name))
  This rule is used to invoke a procedure named by a free variable.
It is the rule used to generate a branch to an execute cache as
described in cmpint.txt.  The rule should allocate a new execute cache
in the compiled code block by using FREE-UUO-LINK-LABEL, and should
then branch to the instruction portion of the execute cache.
FRAME-SIZE is the number of arguments passed in the call, plus one.
* (INVOCATION:GLOBAL-LINK (? frame-size) (? continuation) (? name))
  This rule is identical to the previous one, except that the free
variable must be looked up in the global environment.  It is used to
improve the expansion of some macros that insert explicit references
to the global environment (e.g. The expansion for FLUID-LET inserts
uses (ACCESS DYNAMIC-WIND #f) as the operator of a call).
* (INVOCATION-PREFIX:MOVE-FRAME-UP (? frame-size) (? address))
  This rule is used to shift call frames on the stack to maintain
proper tail recursion.  ADDRESS specifies where to start pushing the
frame.  It should be a pointer into the used portion of the stack,
i.e. point to a higher address.
For example, assume that what follows depicts the stack before
  (INVOCATION-PREFIX:MOVE-FRAME-UP 3 addr)
	|		...		|
	|				|
	+-------------------------------+
	|	    <value n>		|
addr ->	+-------------------------------+
	|				| direction of
	|				| stack growth
	|				|
	|		...		|	|
	|				|	|
	|				|	V
	|				|
	+-------------------------------+
	|	    <value 3>		|
	+-------------------------------+
	|	    <value 2>		|
	+-------------------------------+
	|	    <value 1>		|
spbf ->	+-------------------------------+
Where spbf is the contents of the stack pointer register.
After the invocation prefix, it will look as follows:
	|		...		|
	|				|
	+-------------------------------+
	|	    <value n>		| direction of
addr ->	+-------------------------------+ stack growth
	|	    <value 3>		|
	+-------------------------------+	|
	|	    <value 2>		|	|
	+-------------------------------+	V
	|	    <value 1>		|
spaf ->	+-------------------------------+
The stack pointer register will now contain the value of spaf.
* (INVOCATION-PREFIX:DYNAMIC-LINK (? frame-size) (? address-1) (? address-2))
  This rule is similar to the INVOCATION-PREFIX:MOVE-FRAME-UP rule,
but is used when the destination of the frame is not known at compile
time.  The destination depends on the continuation in effect at the
time of the call, and the section of the stack that contains enclosing
environment frames for the called procedure.  Two addresses are
specified and the one that is closest to the current stack pointer
should be used, that is, the target address is the numerically smaller
of the two addresses since the Liar stack grows towards smaller
addresses.
 ==> This rule need not need not exist in the RTL.  It could be
expanded into a comparison and a use of
INVOCATION-PREFIX:MOVE-FRAME-UP with a computed address.
* (ASSIGN (REGISTER (? target))
	  (CONS-CLOSURE (ENTRY:PROCEDURE (? procedure-label))
			(? min) (? max) (? size)))
  This rule issues the code to create a closure object whose real
entry point is PROCEDURE-LABEL, that will accept a number of arguments
specified by MIN and MAX, and that will have storage for SIZE free
variables.  The free variable storage need not be initialized since it
will be written immediately by subsequent RTL instructions.  The entry
point of the resulting closure object should be written to RTL
register TARGET.  The format of closure objects is described in
cmpint.txt.
* (ASSIGN (REGISTER (? target))
	  (CONS-MULTICLOSURE (? nentries) (? size) (? entries)))
  This rule is similar to the previous rule, but issues code to
allocate a closure object with NENTRIES entry points.  ENTRIES is a
vector of entry-point descriptors, each being a list containing a
label, a min, and a max as in the previous rule.  TARGET receives the
compiled code object corresponding to the first entry.
* (OPEN-PROCEDURE-HEADER (? label-name))
  This rule and its siblings are used to generate the entry code to
procedures and return addresses.  On entry to procedures and
continuations, a gc/interrupt check is performed, and the appropriate
routine in the runtime library is invoked if necessary.  This check is
performed by comparing the memory Free pointer to the compiled code's
version of the MemTop pointer.  The low-level interrupt handlers
change the MemTop pointer to guarantee that such comparisons will fail
in the future.  A standard header generates the following code:
    (LABEL gc-label)
	<code to invoke the runtime library>
	<format and gc words for the entry point>
    (LABEL label-name)
	<branch to gc-label if Free >= MemTop>
Each kind of header invokes a different runtime library utility.  In
addition, procedures that expect dynamic links must guarantee that the
dynamic link is preserved around the execution of the interrupt
handler.  This is accomplished by passing the contents of the dynamic
link register to the appropriate runtime library utility.
* (CLOSURE-HEADER (? label-name) (? nentries) (? num-entry))
  NENTRIES is the number of entry points in the closure object, and
NUM-ENTRY is the zero-based index for this entry point.  Closure
headers are similar to other procedure headers but also have to
complete the Hand-Shake initiated by the instructions stored in the
closure objects so that the closure object appears on top of the
stack.  On architectures where it is necessary, they also have to map
closure objects to their canonical representatives, and back when
backing out because of interrupts or garbage collection.
The file compiler/machines/port/rules3.scm contains most of these
procedure-related rules.  It also contains three procedures that
generate assembly language and are required by the compiler.  These
procedures are used to generate initialization code for compiled code
blocks.
Compiled code blocks have two sections, a code section that contains
the instructions, and a ``constants'' section that contains scheme
objects referenced by the code (e.g. quoted lists and symbols), the
free variable caches for the code, the debugging information
descriptor (more on this later), and the environment where the free
variables in the code must be referenced.  This environment is not
known at compile time, so the compiler allocates a slot in the
constants section for it, but the code itself must store it on first
entry.  In addition, the linker is invoked on first entry to look up
the free variables and fill the variable caches with their correct
contents.  The compiler allocates enough space for each free variable
cache and initializes the space with the information required by the
linker to patch the reference.  This information consists of the name
of the free variable in addition to the number of actual arguments
passed (plus one) for execute references.
If COMPILER:COMPILE-BY-PROCEDURES? is true, the compiler will generate
multiple compiled code blocks, one corresponding to each top-level
lambda expression.  Each of these must be initialized and linked, but
instead of initializing them on first entry, the root compiled code
block links all of them when it is entered.
The linker (a runtime library utility) expects three arguments:
  The address of the first word of the compiled code block, the word
containing the GC vector header for the compiled code block.
  The address of the first linker section in the constants area of the
compiled code block.  The linker sections contain the free variable
caches and are all contiguous.
  The number of linker sections in the compiled code block.
* (GENERATE/QUOTATION-HEADER env-label free-label n-sections)
  This procedure generates the code that initializes the environment
slot at location labeled ENV-LABEL.  The environment is fetched from
the interpreter's environment register.  It also generates code to
invoke the linker on the executing compiled code block.  The first
word of the compiled code block is labeled by the value of
*BLOCK-LABEL*, the first linker section is labeled by FREE-LABEL, and
the number of linker sections is N-SECTIONS.
* (GENERATE/REMOTE-LINK label env-offset free-offset n-sections)
  This procedure is similar to GENERATE/QUOTATION-HEADER but is used
to generate code that initializes and links a different compiled code
block.  It is used to generate the code to insert into the root
compiled code block to link each of the other compiled code blocks
generated when COMPILER:COMPILE-BY-PROCEDURES? is true.
  LABEL is a label in current block's constant section where the
pointer to the code block to be linked is stored,
  ENV-OFFSET is the vector offset in the other code block where the
environment of evaluation should be stored,
  FREE-OFFSET is the vector offset of the first linker section in the
other compiled code block, and
  N-SECTIONS is the number of linker sections in the other block.
* (GENERATE/CONSTANTS-BLOCK consts reads writes execs global-execs statics)
  This procedure generates the assembler pseudo-ops used to generate
the constants and linker section for a compiled code block.  This
section consists of:
  - The constant objects (e.g. quoted lists) referenced by the code.
  - The read variable caches used by the code.
  - The write variable caches used by the code.
  - The execute variable caches used by the code.
  - The global execute variable caches used by the code.
  - The locations for static variables.
  - A slot for the debugging information descriptor generated by the
compiler.
  - A slot for the environment where the code is linked.  
Each word of storage in the constants block is allocated by using a
SCHEME-OBJECT assembler pseudo-op, and the order in which they appear
is the same as the order in which they appear in the final object.
The linker sections (free variable cache sections) must be contiguous,
and each has a one-word header describing the kind of section and its
length.  The environment slot must be the last word in the compiled
code block, immediately preceded by the debugging information
descriptor.  Each SCHEME-OBJECT directive takes a label and the
initial contents of the slot.
This procedure is almost machine-independent, and you should be able
to trivially modify an existing version.  The only machine dependence
is the layout and size of the storage allocated for each execute cache
(uuo link).  This machine-dependence consists entirely of the
definition of the TRANSMOGRIFLY procedure.
TRANSMOGRIFLY takes a list of the following form:
  ((free-variable-1 (frame-size-1-1 . label-1-1)
	            (frame-size-1-2 . label-1-2)
		    ...)
   (free-variable-2 (frame-size-2-1 . label-2-1)
	            (frame-size-2-2 . label-2-2)
		    ...)
   ...)
This list is interpreted as follows: an execute cache for calls to
FREE-VARIABLE-1 with frame size FRAME-SIZE-1-1 (number of arguments
plus one) must be created, and labeled LABEL-1-1, similarly for
 <FREE-VARIABLE-1, FRAME-SIZE-1-2, LABEL-1-2>, 
 <FREE-VARIABLE-2, FRAME-SIZE-2-1, LABEL-2-1>, etc.
Assuming that the initial layout of an execute cache is
  free variable name		; labeled word
  false				; optional storage (e.g. for branch delay slot)
  frame size of call		; arity + 1
TRANSMOGRIFLY will return a list of the following form:
  ((frame-variable-1 label-1-1)
   (#f               dummy-label-1-1)	; optional word(s)
   (frame-size-1-1   dummy-label-1-1)
   (frame-variable-1 label-1-2)
   (#f               dummy-label-1-2)	; optional word(s)
   (frame-size-1-2   dummy-label-1-2)
   ...
   (frame-variable-2 label-2-1)
   (#f               dummy-label-2-1)	; optional word(s)
   (frame-size-2-1   dummy-label-2-1)
   ...)
There may be any number of optional words, but the layout must match
that expected by the macros defined in microcode/cmpint-port.h.  In
particular, the length in longwords must match the definition of
EXECUTE_CACHE_ENTRY_SIZE in microcode/cmpint-port.h, and the definition
of EXECUTE-CACHE-SIZE in compiler/machines/port/machin.scm.
Furthermore, the instructions that the linker will insert should
appear at the word labeled by LABEL-N-M, and should not overwrite the
relevant part of FRAME-SIZE-N-M, since the frame size will be needed
when re-linking after an incremental definition or assignment.
The output format of TRANSMOGRIFLY is the input format for the read
and write execute cache sections.  The procedure DECLARE-CONSTANTS,
local to GENERATE/CONSTANTS-BLOCK, reformats such lists into the final
SCHEME-OBJECT directives and tacks on the appropriate linkage section
headers.
	5.3.4. Fixnum rules.
Scheme's generic arithmetic primitives cannot be open-coded fully for
space reasons.  Most Scheme code that manipulates numbers manipulates
small integers used as counters, vector indices, etc., and using
out-of-line arithmetic procedures to operate on them would make the
code too slow.  The compromise, therefore, is to open-code the common
small integer case, and to handle the rest out of line.  This, of
course, does not perform particularly well for the other common case
of floating point data.
Scheme integers are represented in two formats.  The most common,
fixnum representation, uses the datum field of the objects to directly
encode the values.  The other format, bignum representation, stores
the values in multiple words in memory, and the datum is a pointer to
this storage.  Scheme generic arithmetic procedures will generate
fixnums whenever possible, resorting to bignums when the value exceeds
the range that can be represented in fixnum format.
Since the open-codings provided for the compiler only handle fixnums,
these open-codings must also detect when the result will not fit in a
fixnum in order to invoke the out-of-line utility that will handle
them correctly.  
Most hardware provides facilities for detecting and branching if an
integer operation overflows.  Fixnums cannot use these facilities
directly, because of the tag bits at the high-end of the word.  To be
able to use these facilities (and get the sign bit in the right
place), Scheme fixnums are converted to an internal format before they
are operated on, and converted back to Scheme object format before
storing them in memory or returning them as values.
In this internal format, the value has been shifted left so that the
fixnum sign-bit coincides with the integer sign bit, and a number of
bits in the least-significant end of the word hold zeros.  The shift
amount is the length of the type-tag field.
The rules
    (ASSIGN (REGISTER (? target)) (OBJECT->FIXNUM (REGISTER (? source))))
    (ASSIGN (REGISTER (? target)) (FIXNUM->OBJECT (REGISTER (? source))))
perform this translation.
The open-coding of fixnum arithmetic assumes that the sources and the
result are in this format.  This format is good for value comparisons,
addition, subtraction, and bitwise logical operations, but must be
transformed for multiplication, division, and shifting operations.
In addition to open-coding fixnum operations within generic
arithmetic, fixnum primitives can be invoked directly, and the code
can be open coded as well.  Under these circumstances, the result will
not be checked for overflow, and the code generated can be quite
different.  The RTL instructions that perform fixnum arithmetic have a
boolean flag that specifies whether overflow conditions should be
generated or not.
The compiler does not generally require fixnum arithmetic to be open
coded.  If the names of all the fixnum primitives are listed in
COMPILER:PRIMITIVES-WITH-NO-OPEN-CODING, all of them will be handled
by issuing code to invoke them out of line.
There is one exception to this, however.  The following rules MUST be
provided:
    (ASSIGN (REGISTER (? target))
	    (FIXNUM-2-ARGS MULTIPLY-FIXNUM
			   (OBJECT->FIXNUM (CONSTANT 4))
			   (OBJECT->FIXNUM (REGISTER (? source)))
			   #F))
    (ASSIGN (REGISTER (? target))
	    (FIXNUM-2-ARGS MULTIPLY-FIXNUM
			   (OBJECT->FIXNUM (REGISTER (? source)))
			   (OBJECT->FIXNUM (CONSTANT 4))
			   #F))
The reason is that VECTOR-REF and VECTOR-SET! translate into a
sequence that uses these patterns when the index is not a compile-time
constant.  Of course, you can include VECTOR-REF and VECTOR-SET! in
compiler:PRIMITIVES-WITH-NO-OPEN-CODING to avoid the problem
altogether, but this is probably not advisable.
	5.3.5. Rules used to invoke the runtime library
Some of the rules issue code that invokes the runtime library.  The
runtime library is invoked through a primary entry point,
SCHEME-TO-INTERFACE, typically directly accessible through a dedicated
processor register.  SCHEME-TO-INTERFACE expects at least one and up
to five arguments.  The first argument is the index of the runtime
library service to invoke, and the rest are the parameters to the
service routine.  These arguments are passed in fixed locations,
typically registers.  Runtime library utilities return their values
(if any) in the compiler's value register.  The following is a typical
example of such an invocation where INVOKE-INTERFACE expects the index
of a utility, and generates the code that writes the index into the
appropriate location and jumps to SCHEME-TO-INTERFACE.
    (define-rule statement
      (INVOCATION:APPLY (? frame-size) (? continuation))
      (LAP ,@(clear-map!)
	   ,@(load-rn frame-size 2)
	   (MOV L (@R+ 14) (R 1))
	   ,@(invoke-interface code:compiler-apply)))
The code names are typically defined in
compiler/machines/port/lapgen.scm.
Many of the utilities expect return addresses as their first argument,
and it is convenient to define a procedure, INVOKE-INTERFACE-JSB
(sometimes called LINK-TO-INTERFACE) which receives an index but
leaves the appropriate return address in the first argument's
location.  INVOKE-INTERFACE-JSB can be written by using
INVOKE-INTERFACE (and SCHEME-TO-INTERFACE), but given the frequency of
this type of call, it is often written in terms of an alternate entry
point to the runtime library (e.g.  SCHEME-TO-INTERFACE-JSB).
An example of a more complicated call to the runtime library is
    (define-rule statement
      (INTERPRETER-CALL:CACHE-ASSIGNMENT (? extension) (? value))
      (QUALIFIER (and (interpreter-call-argument? extension)
		      (interpreter-call-argument? value)))
      (let* ((set-extension
	      (interpreter-call-argument->machine-register! extension r2))
	     (set-value 
	      (interpreter-call-argument->machine-register! value r3))
	     (clear-map (clear-map!)))
	(LAP ,@set-extension
	     ,@set-value
	     ,@clear-map
	     ,@(invoke-interface-jsb code:compiler-assignment-trap))))
where INTERPRETER-CALL-ARGUMENT->MACHINE-REGISTER! invokes
CLEAR-REGISTERS! and NEED-REGISTER! besides performing the assignment.
For very frequent calls, the assembly language part of the runtime
library can provide additional entry points.  The calling convention
for these would be machine-dependent, but frequently they take
arguments in the same way that SCHEME-TO-INTERFACE and
SCHEME-TO-INTERFACE-JSB take them, but avoid passing the utility
index, and may do part or all of the work of the utility in assembly
language instead of invoking the portable C version.  Many of the
ports have out-of-line handlers for generic arithmetic, with the
commond fixnum/flonum cases handled there.
The following is a possible specialized version of apply
where the special entry point expects the procedure argument on the
stack rather than in a fixed register:
    (define-rule statement
      (INVOCATION:APPLY (? frame-size) (? continuation))
      (LAP ,@(clear-map!)
	   ,@(load-rn frame-size 2)
	    (JMP ,entry:compiler-apply)))
The procedure object will have been pushed on the stack by earlier
code.
	5.4. Writing predicate rules.
Predicate rules are used to generate code to discriminate between
alternatives at runtime.  The code generated depends on the
conditional branch facilities of the hardware at hand.  There are two
main ways in which architectures provide conditional branching
facilities:
* condition codes.  Arithmetic instructions compute condition codes
that are stored in hardware registers.  These hardware registers may
be targeted explicitly by the programmer or implicitly by the
hardware.  Conditional branch instructions determine whether to branch
or not depending on the contents of the condition registers at the
time the branch instruction is executed.  These condition registers
may be named explicitly by the instructions, or assumed implicitly.
* compare-and-branch instructions.  The instruction set includes
instructions that compare two values (or a value against 0) and branch
depending on the comparison.  The results of the comparison are not
stored in special or explicit registers, since they are used
immediately, by the instruction itself, to branch to the desired
target.
Liar accommodates both models for branching instructions.
Predicate rules generate code that precede the actual branches, and
then invoke the procedure SET-CURRENT-BRANCHES! informing it of the
code to generate to branch to the target.
Depending on the model, the prefix code may be empty, and all the code
may appear in the arguments to SET-CURRENT-BRANCHES!
SET-CURRENT-BRANCHES! expects two procedures as arguments.  Each of
them receives a label as an argument, and is supposed to generate code
that branches to the label if the predicate condition is true (first
argument) or false (second argument).  Both options are provided
because linearization of the control-flow graph occurs after LAP
generation, and it is therefore not known when the predicate rule is
fired which of the two possible linearizations will be chosen.
Thus on an architecture with condition codes, the rule will return the
code that performs the comparison, targeting the appropriate
condition-code registers (if they are not implicit), and the arguments
to SET-CURRENT-BRANCHES! will just generate the conditional-branch
instructions that use the generated condition codes.
On an architecture with compare-and-branch instructions, the code
returned by the rule body will perform any work needed before the
compare-and-branch instructions, and the arguments to
SET-CURRENT-BRANCHES! will generate the compare-and-branch
instructions.
For example, on the DEC Vax, a machine with implicit condition codes,
where compare (and most) instructions set the hidden condition-code
register, a predicate rule could be as follows:
    (define-rule predicate
      (EQ-TEST (REGISTER (? register-1)) (REGISTER (? register-2)))
      (set-current-branches!
       (lambda (label)
	 (LAP (B EQL (@PCR ,label))))
       (lambda (label)
	 (LAP (B NEQ (@PCR ,label)))))
      (LAP (CMP L ,(any-register-reference register-1)
		,(any-register-reference register-2))))
The prefix code performs the comparison.  The arguments to
SET-CURRENT-BRANCHES! branch depending on the result.
On the HP Precision Architecture (Spectrum), a machine with
compare-and-branch instructions, the same rule would be written as
follows:
    (define-rule predicate
      ;; test for two registers EQ?
      (EQ-TEST (REGISTER (? source1)) (REGISTER (? source2)))
      (let* ((r1 (standard-source! source1))
	     (r2 (standard-source! source2)))
	(set-current-branches!
	 (lambda (label)
	   (LAP (COMB (EQ) ,r1 ,r2 (@PCR ,label))
		(NOP ())))			; handle delay slot
	 (lambda (label)
	   (LAP (COMB (LTGT) ,r1 ,r2 (@PCR ,label))
		(NOP ()))))			; handle delay slot
	(LAP)))
There is no prefix code, and the arguments to SET-CURRENT-BRANCHES!
perform the comparison and branch.
The (OVERFLOW-TEST) predicate condition does not fit this model
neatly.  The current compiler issues overflow tests when open-coding
generic arithmetic.  Fixnum overflow implies that bignums should be
used for the result, and this predicate is used to conditionally
invoke out-of-line utilities.
The problem is that the decomposition of the code assumes that the
result of the overflow test is stored implicitly by the code that
generates the arithmetic instructions, and that this condition can be
later used for branching by the code generated for (OVERFLOW-TEST).
The code for the test will be adjacent to the code for the
corresponding arithmetic operation, but the compiler assumes that the
condition can be passed implicitly between these adjacent
instructions.  This decomposition only matches hardware with condition
codes.
Hardware with compare-and-branch instructions can be accommodated by
explicitly computing conditions into a hardware register reserved for
this purpose, and the code generated by the predicate rule can then
branch according to the contents of this register.  On these machines,
the arithmetic operator will not only generate the desired result, but
will set or clear a fixed register according to whether the
computation overflowed or not.  The predicate code will then branch
when the fixed register contains a non-zero value for the first
linearization choice, or zero for the other possibility.
This problem is particularly acute on MIPS processors.  The MIPS
architecture does not detect overflow conditions, so the overflow
condition must be computed by examining the inputs and outputs of the
arithmetic instructions.  There are conditional branches used just to
store the correct overflow condition in a register, and the code
generated for the overflow test will then branch again depending on
the value stored.  This makes the code generated by the open-coding of
generic arithmetic contain multiple branches and quite large.
The Spectrum port solves this problem a little differently.  On the
Spectrum, arithmetic instructions can conditionally cause the
following instruction to be skipped.  Since the code generated by
(OVERFLOW-TEST) is guaranteed to follow the code generated by the
arithmetic operation, the last instruction generated by the arithmetic
operations conditionally skips if there is no overflow.  The
(OVERFLOW-TEST) code generates an unconditional branch for the first
linearization choice, and an unconditional skip and an unconditional
branch for the alternative linearization.
A more efficient solution, currently employed in the MIPS port
(version 4.87 or later) depends on the fact that the RTL instruction
immediately preceding an RTL OVERFLOW-TEST encodes the arithmetic
operation whose overflow condition is being tested.  Given this
assumption (that the arithmetic operation producing the overflow
conditions and the test of such condition are adjacent), the rule for
OVERFLOW-TEST need not generate any code, and the rule for the
arithmetic operation can generate both the prefix code and invoke
SET-CURRENT-BRANCHES! as appropriate.  This is possible because the
RTL encoding of arithmetic operations includes a boolean flag that
specifies whether the overflow condition is desired or not.
		6. Suggested ordering of tasks.
The task of porting the compiler requires a lot of work.  In the past,
it has taken approximately three full weeks for a single person
knowledgeable with MIT Scheme and the compiler, but without
documentation.  This guide was written after the first three ports.
One unfortunate aspect is that a lot of mechanism must be in place
before most of the compiler can be tried out.  In other words, there
is a lot of code that needs to be written before small pieces can be
tested, and the compiler is not properly organized so that parts of it
can be run independently.  
Note also that cmpint-port.h, machin.scm, rules3.scm, and
cmpaux-port.m4 are very intertwined, and you may often have to iterate
while writing them until you converge on a final design.
Keeping all this in mind, here is a suggested ordering of the tasks:
	6.1. Learn the target instruction set well.  
In particular, pay close attention to the branch and jump instructions
and to the facilities available for controlling the processor caches
(if necessary).  You may need to find out the facilities that the
operating system provides if the instructions to control the cache are
privileged instructions.
	6.2. Write microcode/cmpint-port.h:
cmpint.txt documents most of the definitions that this file must
provide.
 6.2.1. Design the trampoline code format.  Trampolines are used to
invoke C utilities indirectly.  In other words, Scheme code treats
trampolines like compiled Scheme entry points, but they immediately
invoke a utility to accomplish their task.  Since
return-to-interpreter is implemented as a trampoline, you will need to
get this working before you can run any compiled code at all.
 6.2.1. Design the closure format and the execute cache format.  This
is needed to get the Scheme part of the compiler up AND to get the
compiled code interface in the microcode working.  Try to keep the
number of instructions low since closures and execute caches are very
common.
 6.2.2. Design the interrupt check instructions that are executed on
entry to every procedure, continuation, and closure.  Again, try to
keep the number of instructions low, and attempt to make the
non-interrupting case fast at the expense of the case when interrupts
must be processed.  Note that when writing the Scheme code to generate
the interrupt sequences, you can use the ADD-END-OF-BLOCK-CODE!
procedure to make sure that the interrupt sequence does not confuse
your hardware's branch prediction strategy.
 6.2.3. Given all this, write cmpint-port.h.  Be especially careful
with the code used to extract and insert absolute addresses into
closures and execute caches.  A bug in this code would typically
manifest itself much later, after a couple of garbage collections.
During this process you will be making decisions about what registers
will be fixed by the port, namely the stack pointer, the free pointer,
the register block pointer, and at least one register holding the
address of a label used to get back to C, typically
scheme_to_interface.
	6.3. Write machin.scm:
Most of the definitions in this file have direct counterparts or are
direct consequences of the code in and microcode/cmpint-port.h, so it
will be mostly a matter of re-coding the definitions in Scheme rather
than C.
In particular, you will have to decide how registers are going to be
used and split between your C compiler and Liar.  If your architecture
has a large register set, you can let C keep those registers to which
it assigns a fixed meaning (stack pointer, frame pointer, global
pointer), and use the rest for Liar.  If your machine has few
registers or you feel more ambitious, you can give all the registers
to Liar, but the code for transferring control between both languages
in cmpaux-port.m4 will become more complex.  Either way, you will need
to choose appropriate registers for the Liar fixed registers (stack
pointer, free pointer, register block pointer, dynamic link register
and optionally, datum mask, return value register, memtop register,
and scheme_to_interface address pointer).
	6.4. Write the assembler:
You can write the assembler any old way you want, but it is easier to
use the branch tensioner and the rest of the facilities if you use the
same conventions that the existing assemblers use.  In particular,
with any luck, you will be able to copy inerly.scm, insmac.scm, and
parts of assmd.scm verbatim from an existing port, and for most
machines, coerce.scm is straightforward to write.
assmd.scm defines procedures that depend only on the endianness of the
architecture.  You may want to start with the MIPS version since this
version accommodates both endianness possibilities as MIPS processors
can be configured either way.  If your processor has fixed endianness,
you can prune the inappropriate code.  The block-offset definitions
must agree with those in microcode/cmpint-port.h, and the padding
definitions are simple constants.
Assuming that you decide to use the same structure as existing
assemblers, you may need to write parsers for addressing modes if your
machine has them.  You can use the versions in the MC68020 (bobcat),
Vax, and i386 (Intel 386) ports for guidance.  Addressing modes are
described by a set of conditions under which they are valid, and some
output code to issue.  The higher-level code that parses instructions
in insmac.scm must decide where the bits for the addressing modes must
appear.  The MC68020 version divides the code into two parts, the part
that is inserted into the opcode word of the instruction (further
subdivided into two parts), and the part that follows the opcode word
as an extension.  The Vax version produces all the bits at once since
addressing modes are not split on that architecture.  You should write
the addressing mode definitions in port/insutl.scm, plus any
additional transformers that the instruction set may require.
Once you have the code for the necessary addressing modes and
transformers (if any), and the parsing code for their declarations in
port/insmac.scm, writing the instr<n>.scm files should not be hard.
Remember to include pseudo-opcodes for inserting constants in the
assembly language, and for declaring external labels so that the
gc-offset to the beginning of the compiled code block will be inserted
correctly.  See for example, the definition of the EXTERNAL-LABEL
pseudo-opcode in machines/mips/instr1.scm, and its use in
machines/mips/rules3.scm.
	6.5. Write the LAPGEN rules:
You will need to write lapgen.scm, rules1.scm, rules2.scm, rules3.scm,
and parts of rules4.scm.  Most of rules4.scm is not used by the
compiler with the ordinary switch settings and the code may no longer
work in any of the ports, and rulfix.scm and rulflo.scm are only
necessary to open code fixnum and flonum arithmetic.  A good way to
reduce the amount of code needed at first is to turn primitive open
coding off, and ignore rulfix.scm and rulflo.scm.
Lapgen.scm need not include the shared code used to deal with fixnums
and flonums, but will require the rest, especially the code used to
invoke utilities in the compiled code interface.
rules1.scm and rules2.scm are relatively straightforward since the
RTL instructions whose translations are provided there typically map
easily into instructions.
rules4.scm need only have the INTERPRETER-CALL:CACHE-??? rules, and
these are simple invocations of runtime library routines which you can
emulate from exisiting ports.
rules3.scm is an entirely different matter.  It is probably the
hardest file to write when porting the compiler.  The most complicated
parts to understand, and write, are the closure code, the invocation
prefix code, and the block assembly code.
 The block assembly code can be taken from another port.  You will
only have to change how the transmogrifly procedure works to take into
account the size and layout of un-linked execute caches.
 The invocation prefix code is used to adjust the stack pointer, and
move a frame in the stack prior to a call to guarantee proper tail
recursion.  The frame moved is the one pointed at by the stack
pointer, and it may be moved a distance known at compile time
(invocation-prefix:move-frame-up rules) or a distance that cannot be
computed statically (invocation-prefix:dynamic-link rules).  The
move-frame-up rules are simple, but you should remember that the
starting and ending locations for the frame may overlap, so you must
ensure that data is not overwritten prematurely.  The dynamic link
rules are similar to the move-frame-up rules (and typically share the
actual moving code) but must first decide the location where the frame
should be placed.  This is done by comparing two possible values for
the location, and choosing the value closest to the current stack
pointer (i.e. numerically lower since the stack grows towards smaller
addresses).  Again, the source and target locations for the frame may
overlap, so the generated code must be careful to move the data in
such a way that no data will be lost.
 The closure code is the most painful to write.  When writing
cmpint-port.h you decided what the actual code in closure entries
would be, and the code for closure headers is a direct consequence of
this.  The combination of the instructions in a closure object, the
helper instructions in assembly language (if any), and the
instructions in the closure header must ultimately push the closure
object (or its canonical representative) on the stack as if it were
the last argument to the procedure, and pending interrupts (and gc)
must be checked on entry to the closure.  The interrupt back-out code
is different from the ordinary procedure interrupt back-out code
because the procedure object (the closure or its representative) is on
top of the stack.
 The cons-closure rules are used to allocate closure objects from the
runtime heap.  Some of this allocation/initialization may be done out
of line, especially if ``assembling'' the appropriate instructions on
the fly would require a lot of code.  In addition, you may have to
call out-of-line routines to synchronize the processor caches or
block-allocate multiple closure entries.
	6.6. Write stubs for remaining port files: 
rgspcm.scm and dassm1.scm can be copied verbatim from any other port.
lapopt.scm only needs to define an identity procedure.
rulfix.scm, rulflo.scm, and rulrew.scm need not define any rules,
since you can initially turn off open coding of primitive operators.
dassm2.scm and dassm3.scm need not be written at first, but they are
useful to debug the assembler (since disassembling some code should
produce code equivalent to the input to the assembler) and compiler
output when you forgot to make it output the LAP.
	6.7. Write the compiler-building files:
make.scm, and comp.cbf should be minorly modified copies of the
corresponding files in another port.
comp.sf and decls.scm can be essentially copied from another port, but
you will need to change the pathnames to refer to your port directory
instead of the one you copied them from, and in addition, you may have
to add or remove instr<n> and other files as appropriate.
	6.8. Write microcode/cmpaux-port.m4:
cmpaux.txt documents the entry points that this file must provide.
You need not use m4, but it is convenient to conditionalize the code
for debugging and different type code size.  If you decide not to use
it, you should call your file cmpaux-port.s
 6.8.1. Determine your C compiler's calling convention.  Find out what
registers have fixed meanings, which are supposed to be saved by
callees if written, and which are supposed to be saved by callers if
they contain useful data.
 6.8.2. Find out how C code returns scalars and small C structures.
If the documentation for the compiler does not describe this, you can
write a C program consisting of two procedures, one of which returns a
two-word (two int) struct to the other, and you can examine the
assembly language produced by the compiler.
 6.8.3. Design how scheme compiled code will invoke the C utilities.
Decide where the parameters (maximum of four) to the utilities will be
passed (preferably wherever C procedures expect arguments), and where
the utility index will be passed (preferably in a C caller-saves
register).
 6.8.4. Given all this, write a minimalist cmpaux-port.m4.  In other
words, write those entry points that are absolutely required
(C_to_interface, interface_to_C, interface_to_scheme, and
scheme_to_interface).  Be especially careful with the code that
switches between calling conventions and register sets.
C_to_interface and interface_to_scheme must switch between C and Liar
conventions, while scheme_to_interface must switch the other way.
interface_to_C must return from the original call to C_to_interface.
Make sure that C code always sees a valid C register set and that code
compiled by Liar always sees a valid Scheme register set.
	6.9. After the preliminary code works:
Once the compiler is up enough to successfully compile moderately
complex test programs, and the compiled code and the interface have
been tested by running the code, you probably will want to go back and
write the files that were skipped over.  In particular, you definitely
should write rulfix.scm and rulrew.scm, and rulflo.scm and the
disassembler if at all possible.
		7. Building and testing the compiler.
Once the port files have been written, you are ready to build and test
the compiler.  The first step is to build an interpreted compiler and
run simple programs.  Most simple bugs will be caught by this.
	7.1. Re-building scheme.
You need to build a version of the microcode with the compiled code
interface (portable runtime library) in it.  Besides writing
cmpint-port.h and cmpaux-port.m4, you will need to do the following:
- Copy (or link) cmpint-port.h to cmpint2.h.
- Modify m.h to use 6-bit-long type tags (rather than the default 8)
if you did not do this when you installed the microcode.  If you do
this, you will not be able to load .bin files created with 8 bit type
tags.  You can overcome this problem by using the original .psb files
again to regenerate the .bin files.  Alternatively, you can use a
version of Bintopsb compiled with 8-bit tags to generate new .psb
files, and a version of Psbtobin compiled with 6-bit tags to generate
the new .bin files.  Anotheroption is to bring the compiler up using 8
bit tags, but you may run out of address space.  The simplest way to
specify 6-bit type tags is to add a definition of C_SWITCH_MACHINE
that includes -DTYPE_CODE_LENGTH=6 .  Be sure to add any m4 switches
that you may need so that the assembly language will agree on the
number of tag bits if it needs it at all.  If your version of m4 does
not support command-line definitions, you can use the s/ultrix.m4
script to overcome this problem.  Look at the m/vax.h and s/ultrix.h
files for m4-related definitions.
 ==> We should just switch the default to 6 bits and be done with it.
- Modify ymakefile to include the processor dependent section that
lists the cmpint-port.h and cmpaux-port.m4 files.  You can emulate the
version for any other compiler port.  It is especially important that
the microcode sources be compiled with HAS_COMPILER_SUPPORT defined.
- Remove (or save elsewhere) all the .o files, scheme.touch, and
scheme, the linked scheme microcode.
- Do ``make xmakefile;make -f xmakefile scheme'' to generate a new
linked microcode.
Once you have a new linked microcode, you need to regenerate the
runtime system image files even if you have not changed the length of
the type tags.  This is done as follows:
- Re-generate a runtime.com (actually runtime.bin) image file by
invoking scheme with the options ``-large -fasl make.bin'' while
connected to the runtime directory, and then typing
  (disk-save "<lib directory pathname>/runtime.com")
at the Scheme prompt.
- You should probably also generate a runtime+sf.com file by typing
  (begin
    (cd "<sf directory pathname>")
    (load "make")
    (disk-save "<lib directory pathname>/runtime+sf.com"))
at the Scheme prompt.
You also need to have a working version of cref.  This can be done by
invoking scheme with the options ``-band runtime+sf.com'', and then
typing
  (begin
     (cd "<cref directory pathname>")
     (load "cref.sf"))
at the Scheme prompt.
If this errors because of the lack of a ``runtim.glob'' file, try it
again after executing
  (begin
     (cd "<runtime directory pathname>")
     (load "runtim.sf"))
	7.2. Building an interpreted compiler.
Once you have a new microcode, compatible runtime system, and ready
cref, you can pre-process the compiler as follows:
- Copy (or link) comp.pkg, comp.sf, and comp.cbf from the
compiler/machines/port directory to the compiler directory.
- For convenience, make a link from compiler/machines/port to
compiler/port.
- Invoke scheme with the ``-band runtime+sf.com'' option, and then
execute
  (begin
     (cd "<compiler directory pathname>")
     (load "comp.sf"))
This will take quite a while, and pre-process some of the files twice.
At the end of this process, you should have a .bin file for each of
the .scm files, a .ext file for some of them, and a bunch of
additional files in the compiler directory (comp.con, comp.ldr,
comp.bcon, comp.bldr, comp.glob, comp.free, comp.cref).
It is a good idea to look at the comp.cref file.  This is a
cross-reference of the compiler and may lead you to find typos or
other small mistakes.  The first section of the cref file (labeled
``Free References:'') lists all variables that are not defined in the
compiler or the runtime system.  The only variables that should be in
this list are SF, and SF/PATHNAME-DEFAULTING.  The ``Undefined
Bindings:'' section lists those variables defined in the runtime
system and referenced freely by the compiler sources.  The remainder
of the cref file lists the compiler packages and the cross reference
of the procedures defined by it.
- Load up the compiler.  Invoke scheme with the options 
``-compiler -band runtime+sf.com'', and then type
  (begin
     (cd "<compiler directory pathname>")
     (load "port/make")
     (disk-save "<lib directory pathname>/compiler.com"))
You should then be able to invoke the compiler by giving scheme the
``-compiler'' option, and use it by invoking CF.
	7.3. Testing the compiler.
There is no comprehensive test suite for the compiler.  There is,
however, a small test suite that is likely to catch gross errors.  The
files for the test suite are in compiler/etc/tests.  Each file
contains a short description of how it can be used.
Make sure, in particular, that you test the closure code thoroughly,
especially if closure allocation hand-shakes with out-of-line code to
accomodate the CPU's caches.
A good order to try the test suite in is
	three.scm
	expr.scm
	pred.scm
	close.scm
	blast.scm
	reverse.scm
	arith.scm
	bitwse.scm
	fib.scm
	vector.scm
	reptd.scm
	lexpr.scm
	klexpr.scm
	close2.scm
	prim.scm
	free.scm
	uvuuo.scm
	link.scm
	uuo.scm
	unv.scm
	tail.scm
	y.scm
	sort/*.scm (see sort/README for a description)
 The programs in the first list test various aspects of code
generation.
 The programs in the second list test the handling of various dynamic
conditions (e.g. error recovery).
 The programs in the third list are somewhat larger, and register
allocation bugs, etc., are more likely to show up in them.
A good idea at the beginning is to turn COMPILER:GENERATE-RTL-FILES?
and COMPILER:GENERATE-LAP-FILES? on and compare them for plausibility.
If you have ported the disassembler as well, you should try
disassembling some files and comparing them to the input LAP.  They
won't be identical, but they should be similar.  The disassembler can
be invoked as follows:
    (compiler:write-lap-file "<pathname of .com file>") ; writes a .lap file.
    (compiler:disassemble <compiled entry point>) ; writes on the screen.
The .lap filename extension is used by COMPILER:WRITE-LAP-FILE and by
the compiler when COMPILER:GENERATE-LAP-FILES? is true, so you may
want to rename the .lap file generated by the compiler to avoid
overwriting it when using COMPILER:WRITE-LAP-FILE.
Various runtime system files also make good tests.  In particular, you
may want to try list.scm, vector.scm, and arith.scm.  You can try them
by loading them, and invoking procedures defined in them, but you must
execute
    (initialize-microcode-dependencies!)
after loading arith.com and before invoking procedures defined there.
	7.4. Compiling the compiler.
The real test of the compiler comes when it is used to compile itself
and the runtime system.  Re-compiling the system is a slow process,
that can take a few hours even with a compiled compiler on a fast
machine.  Compiling the compiler with an interpreted compiler would
probably take days.
There are two ways to speed up the process:
* Cross-compiling:
If you can access some machines on which the compiler already runs,
you can cross-compile the sources using a compiled compiler.  This
method is somewhat involved because you will need binaries for both
machines, since neither can load or dump the other's .bin files.
Imagine that you have a Vax, and you are porting to a Sparc.  You will
need to pre-process and compile the Sparc's compiler on the Vax to use
it as a cross-compiler.  This can be done by following the same
pattern that you used to generate the interpreted compiler on the
Sparc, but running everything on the Vax, and then compiling the
cross-compiler on the Vax by running scheme with the ``-compiler''
option, and typing
    (begin
	(cd "<sparc compiler directory>")
	(load "comp.cbf")
	(disk-restore "runtime+sf.com"))
    ;; After the disk-restore
    (begin
	(load "make")
	(in-package (->environment '(compiler))
	  (set! compiler:cross-compiling? true))
	(disk-save "sparccom.com"))
to produce a cross-compiler band called "sparccom.com".
Once you have the cross-compiler, you can use CROSS-COMPILE-BIN-FILE
to generate .moc files.  The .moc files can be translated to .psb
files on the Vax.  These .psb files can in turn be translated to .moc
files on the Sparc, and you can generate the final .com files by using
CROSS-COMPILE-BIN-FILE-END defined in compiler/base/crsend.
compiler/base/crsend can be loaded on a plain runtime system (i.e.
without SF or a compiler).  You will probably find the following
idioms useful:
    (for-each cross-compile-bin-file (directory-read "<some dir>/*.bin"))
    (for-each cross-compile-bin-file-end (directory-read "<some dir>/*.moc")).
To translate the original .moc files to .psb files, you should use
microcode/Bintopsb on the Vax as follows:
    Bintopsb ci_processor=?? <foo.moc >foo.psb
where the value of ci_processor should be the value of
COMPILER_PROCESSOR_TYPE defined in microcode/cmpint-port.h.
You can then generate the target .moc files by using
microcode/Psbtobin on the Sparc as follows:
    Psbtobin allow_cc <foo.psb >foo.moc
* Distributing the task over several machines:
You can use more than one machine to compile the sources.  If the
machines do not share a file system, you will have to pre-partition
the job and generate a script for each machine.  If the machines share
a (network) file system, you can try to use compiler/etc/xcbfdir.
This file defines two procedures, COMPILE-DIRECTORY, and
CROSS-COMPILE-DIRECTORY, that use a simple-minded protocol based on
creating .tch files to reserve files to compile, and can therefore be
run on many machines simultaneously without uselessly repeating work
or getting in each other's way.
These two methods are not exclusive.  We typically bring up the
compiler on a new machine by distributing the cross-compilation job.
The compiler and the cross-compiler use a lot of memory while running,
and virtual memory is really no substitute for physical memory.  You
may want to increase your physical memory limit on those systems where
this can be controlled (e.g. under BSD use the ``limit'' command).  If
your machines don't have much physical memory, or it is too painful to
increase your limit, i.e. you have to re-compile or re-link the
kernel, you may want to use microcode/bchscheme instead of
microcode/scheme.  Bchscheme uses a disk file for the spare heap,
rather than a region of memory, putting the available memory to use at
all times.
	7.5. Compiler convergence testing.
Once you have a compiled compiler, you should run the same test suite
that you ran with the interpreted compiler.  Once you have some degree
of confidence that the compiled compiler works, you should make sure
that it can correctly compile itself and the runtime system.  This
re-compilation can manifest second-order compiler bugs, that is, bugs
in the compiler that cause it to compile parts of itself incorrectly
without crashing, so that programs compiled by this
incorrectly-compiled compiler fail even though these programs did not
fail when compiled by the original compiler.
Of course, you can never really tell if the compiler has compiled
itself successfully.  You can only tell that it is not obviously wrong
(i.e. it did not crash).  Furthermore, there could be higher-order bugs
that would take many re-compilations to find.  However, if the binaries
produced by two successive re-compilations are identical, further
re-compilations would keep producing identical binaries and no
additional bugs will be found this way.  Moreover, if the compiler
and system survive a couple of re-compilations, the compiler is likely
to be able to compile correctly most programs.
To run this compiler convergence test, you need to re-compile the
compiler.  In order to do this, you need to move the .com files from
the source directories so that COMPILE-DIRECTORY and
RECOMPILE-DIRECTORY will not skip all the files (they avoid compiling
those already compiled).  The simplest way to move all these files is
to type ``make stage1'' at your shell in the source directories
(runtime, sf, cref, and compiler).  This command will create a STAGE1
subdirectory for each of the source directories, and move all the .com
and .binf files there.  You can then use compiler/comp.cbf, or
compiler/etc/xcbfdir and RECOMPILE-DIRECTORY to regenerate the
compiler.
If you generated the stage1 compiled compiler by running the compiler
interpreted, the new .com files should match the stage1 .com files.
If you generated the stage1 compiler by cross-compilation, they will
not.  The cross-compiler turns COMPILER:COMPILE-BY-PROCEDURES? off,
while the default setting is on.  In the latter case, you want to
generate one more stage to check for convergence, i.e. execute ``make
stage2'' in each source directory, and re-compile once more, at each
stage using the compiler produced by the previous stage.
Once you have two stages that you think should have identical
binaries, you can use COMPARE-COM-FILES, defined in
compiler/etc/comcmp, to compare the binaries.  The simplest way to use
it is to also load compiler/etc/comfiles and then use the CHECK-STAGE
procedure.
    (check-stage "STAGE2" '("runtime" "sf" "compiler/base"))
will compare the corresponding .com files from runtime and
runtime/STAGE2, sf and sf/STAGE2, and compiler/base and
compiler/base/STAGE2.
If nothing is printed, the binaries are identical.  Otherwise some
description of the differences is printed.  COMPARE-COM-FILES does not
check for isomorphism of Scode objects, so any sources that reference
Scode constants (e.g. runtime/advice.scm) will show some differences
that can safely be ignored.  Generally, differences in constants can
be ignored, but length and code differences should be understood.  The
code in question can be disassembled to determine whether the
differences are real or not.
While testing the compiler, in addition to checking for the correct
operation of the compiled code, you should also watch out for crashes
and other forms of unexpected failure.  In particular, hardware traps
(e.g.  segmentation violations, bus errors, illegal instructions)
occurring during the re-compilation process are a good clue that there
is a problem somewhere.
		8. Debugging.
The process of porting a compiler, due to its complexity, is unlikely
to proceed perfectly.  Things are likely to break more than once while
running the compiler and testing the compiled code.
Debugging a compiler is not trivial, because often the failures
(especially after a while) will not manifest themselves until days,
weeks, or months after the compiler was released, at which point the
context of debugging the compiler has been swapped out by the
programmer.  Second-order compiler bugs do not make things any easier.
Liar does not have many facilities to aid in debugging.  This section
mentions some of the few, and some techniques to use with
assembly-language debuggers (gdb, dbx, or adb).
The main assumption in this section is that the front end and other
machine-independent parts of the compiler work correctly.  Of course,
this cannot be guaranteed, but in all likelihood virtually all of the
bugs that you will meet when porting the compiler will be in the new
machine-specific code.
If you need to examine some of the front-end data structures, you may
want to use the utilities in base/debug.scm which is loaded in the
compiler by default.  In particular, you will want to use PO (for
print-object) to examine compiler data structures, and
DEBUG/FIND-PROCEDURE to map procedure names to the data structures
that represent the procedures, or more correctly, the lambda
expressions.
	8.1. Preliminary debugging of the compiled code interface.
The first item of business, after the microcode interface
(cmpaux-port.m4 and cmpint-port.h) has been written, is to guarantee
that properly constructed compiled code addresses do not confuse the
garbage collector.  This can be done before writing any of the
remaining files, but you must have rebuilt the microcode and the
runtime.com band.
A simple test to run is the following:
  (define foo
    ((make-primitive-procedure 'COERCE-TO-COMPILED-PROCEDURE)
     (lambda (x y)
       (+ x y))
     2))
	       
  (gc-flip)
  (gc-flip)
  (gc-flip)
If the system does not crash or complain, in all likelihood the
garbage collector can now properly relocate compiled code objects.
This object can also be used to test parts of the compiled code
interface.  FOO is bound to a trampoline that will immediately revert
back to the interpreter when invoked.
The next test is to determine that FOO works properly.  You can follow
the execution of FOO by using a debugger and placing breakpoints at
  cmpint.c:apply_compiled_procedure,
  cmpaux-port.s:C_to_interface, 
  cmpaux-port.s:scheme_to_interface (or trampoline_to_interface if it
is written),
  cmpint.c:comutil_operator_apply_trap, 
  cmpint.c:comutil_apply, and
  cmpaux-port.s:interface_to_C
and then evaluating (FOO 3 4).
When setting the breakpoints, remember that C_to_interface,
scheme_to_interface, and interface_to_scheme are not proper C
procedures, so you should use the instruction-level breakpoint
instructions or formats, not the C procedure breakpoint instructions
or formats.  If you are using adb, this is moot, since adb is purely
an assembly-language debugger.  If you are using gdb, you should use
``break *&C_to_interface'' instead of ``break C_to_interface''.  If
you are using dbx, you will want to use the ``stopi'' command, instead
of the ``stop'' command to set breakpoints in the assembly language
routines.
Make sure that the arguments to comutil_operator_apply_trap look
plausible and that the registers have the appropriate contents when
going into scheme code and back into C.  In particular, you probably
should examine the contents of the registers right before jumping into
the trampoline code, and single step the trampoline code until you get
back to scheme_to_interface.
In order to parse the Scheme objects, you may want to keep a copy of
microcode/type.names handy.  This file contains the names of all the
scheme type tags and their values as they appear in the most
significant byte of the word when type tags are 8 bits long and 6
bits long.  Remember that you may have to insert segment bits into
addresses in order to examine memory locations.
You should also make sure that an error is signalled when FOO is
invoked with the wrong number of arguments, and that the system
correctly recovers from the error (i.e., it gives a meaningful error
message and an error prompt, and resets itself when you type ^G).
This test exercises most of the required assembly language code.  The
only entry point not exercised is interface_to_scheme.
	8.2. Debugging the assembler.
Assuming that the compiler generates correctly formatted compiled code
objects, fasdump should be able to dump them out without a problem.
If you have problems when dumping the first objects, and assuming that
you ran the tests in section 8.1., then in all likelihood the block
offsets are not computed correctly.  You should probably re-examine
the rule for the EXTERNAL-LABEL pseudo operation, and the block-offset
definitions in machines/port/assmd.scm.
Once you can dump compiled code objects, you should test the
assembler.  A simple, but somewhat inconvenient way of doing this is
to use adb as a disassembler as follows:
Scheme binary (.bin and .com) files have a 50 longword header that
contain relocation information.  The longword that follows immediately
is the root of the dumped object.  
If COMPILER:COMPILE-BY-PROCEDURES? is false, the compiler dumps a
compiled entry point directly, so the format word for the first entry
is at longword location 53 (* 4 = 0xd4), and the instructions follow
immediately.
If COMPILER:COMPILE-BY-PROCEDURES? is true, the compiler dumps an
Scode object that contains the first entry point as the ``comment
expression''.  The format word for the first entry point is then at
longword location 55 (* 4 = 0xdc), and the instructions for the
top-level block follow immediately.
Thus, assuming that there are four bytes per Scheme object (unsigned
long in C), and that foo.com was dumped by the compiler with
COMPILER:COMPILE-BY-PROCEDURES? set to false, the following would
disassemble the first 10 instructions of the generated code.
  adb foo.com
  0xd8?10i
If COMPILER:COMPILE-BY-PROCEDURES? was set to true, the following
would accomplish the same task:
  adb foo.com
  0xe0?10i
You can use adb in this way to compare the input assembly language to
its binary output.  Remember that you can obtain the input assembly
language by using the COMPILER:GENERATE-LAP-FILES? switch.
	8.3. Setting breakpoints in Scheme compiled code.
Compiled code is not likely to work correctly at first even after the
compiler stops signalling errors.  In general, when you find that
compiled code executes incorrectly, you should try to narrow it down
as much as possible by trying the individual procedures, etc., in the
code, but ultimately you may need the ability to set instruction-level
breakpoints and single-step instructions in compiled code.
A problem peculiar to systems in which code is relocated on the fly is
that you cannot, in general, obtain a permanent address for a
procedure or entry point.  The code may move at every garbage
collection, and if you set a machine-level breakpoint with a Unix
debugger, and then the code moves, you will probably get spurious
traps when re-running the code.  Unix debuggers typically replace some
instructions at the breakpoint location with instructions that will
cause a specific trap, and then look up the trapping location in some
table when the debugged process signals the trap.
One way around this problem is to ``purify'' all compiled scheme code
that you will be setting breakpoints in.  If you purify the code, it
will move into ``constant space'' and remain at a constant location
across garbage collections.  The PURIFY procedure expects an object
to purify as the first argument, a boolean flag specifying whether the
object should be moved into constant space (if false) or pure space
(if true) as a second argument, and a boolean flag specifying whether
purification should occur immediately (if false) or be delayed until
the next convenient time (if true) as a third argument.  You should
  (purify <object> false false)
when moving compiled code objects to constant space for debugging
purposes.  Alternatively, you can specify that you want the code to be
purified when you load it by passing appropriate arguments to LOAD.
Since load delays the actual purification, you will need to invoke
GC-FLIP twice to flush the purification queue.
At any rate, setting the actual breakpoints is not completely trivial,
since you must find the virtual address of the instructions, and then
use them with your assembly-language debugger.
The simplest way to do this is to get Scheme to print the datum of
the entry points for you, and then type one of Scheme's interrupt
characters to gain the debugger's attention and set the breakpoint.
Continuing from the debugger will allow you to type further
expressions to Scheme.
Imagine, for example, that we have compiled the runtime file list.scm
and some of the procedures in it, namely MEMBER and MEMQ, do not work
properly.  After purifying the code, you can type ``memq'' at the
read-eval-print loop and it will respond with something like
 ;Value 37: #[compiled-procedure 37 ("list" #x5A) #x10 #x10FE880]
This specifies that MEMQ is bound to an ordinary compiled procedure
(not a closure), that it was originally compiled as part of file
``list'' and it was part of compiled code block number 90 (= #x5a) in
that file.  The current datum of the object is #x10FE880 (this is the
address without the segment bits if any), and the offset to the
beginning of the containing compiled code block is #x10.  Thus you
could then gain the attention of the debugger and set a breakpoint at
address 0x10fe880 (remember to add the segment bits if necessary) and
after continuing back into Scheme, use MEMQ normally to trigger the
breakpoint.
The case with MEMBER is similar.  Typing ``member'' at the
read-eval-print loop will cause something like
 ;Value 36: #[compiled-closure 36 ("list" #x56) #x5C #x10FE484 #x1180DF8]
to be printed.
This specifies that MEMBER is bound to a compiled closure, originally
in compiled code block number 86 (= #x56) in file ``list'', that the
entry point to the closure is at datum #x1180DF8, that the entry point
shared by all closures of the same lambda expression (the ``real''
entry point) is at datum #x10FE484, and that this entry point is at
offset #x5C of its containing compiled code block.
Thus if you want to single step the closure code (a good idea when you
try them at first), you would want to set a breakpoint at address
#x1180DF8 (plus appropriate segment bits), and if you want to single
step or examine the real code, then you should use address #x10FE484.
If you purified the code when you loaded it, the real code would be
pure, but the closure itself would not be, since it was not a part of
the file being loaded (closures are created dynamically).  Thus,
before setting any breakpoints in a closure, you should probably
purify it as specified above, and obtain its address again, since it
would have moved in the meantime.
For example, if you are using adb on an HP-PA (where the top two bits
of a data segment address are always 01, and thus the top nibble of a
Scheme's object address is always 4), assuming that the interpreter
printed the above addresses,
0x41180df8:b	would set a breakpoint in the MEMBER closure,
0x410fe484:b	would set a breakpoint at the start of the code shared
		by MEMBER and all closures of the same lambda expression,
0x410fe880:b	would set a breakpoint at the start of MEMQ.
If you are using gdb on a Motorola MC68020 machine, with no segment bits
for the data segment, the equivalent commands would be
break *0x1180df8	for a breakpoint in the MEMBER closure,
break *0x10fe484	for a breakpoint in MEMBER's shared code
break *0x10fe880	for a breakpoint in MEMQ.
If you are using dbx, you will need to use a command like 
 ``stopi at 0x10fe484'' to achieve the same effect.
	8.4. Examining arguments to Scheme procedures.
Commonly, after setting a breakpoint at some interesting procedure,
you will want to examine the arguments.  Currently, Liar passes all
arguments on the stack.  The Scheme stack always grows towards
decreasing addresses, and arguments are pushed from left to right. 
On entry to a procedure, the stack frame must have been reformatted so
that optional arguments have been defaulted, and tail (lexpr)
arguments have been collected into a list (possibly empty).  Thus on
entry to an ordinary procedure's code the stack pointer points to the
rightmost parameter in the lambda list, and the rest of the parameters
follow at increasing longword addresses.  
This is also the case on entry to a closure object's instructions, but
by the time the shared code starts executing the body of the lambda
expression, the closure object itself (or its canonical
representative) will be on top of the stack with the arguments
following in the standard format.  On entry to a closure's shared
code, the stack will contain the arguments in the standard format, but
the closure object will typically be in the process of being
constructed and pushed, depending on the distribution of the task
between the instructions in the closure object, the optional helper
instructions in assembly language, and the instructions in the closure
header.
If you are using adb, you can use the following commands:
 $r		displays the names and contents of the processor
		registers,
 <r23=X		displays the contents of the r23 register in hex,
 0x4040c/4X	displays four longwords in memory at increasing
		addresses starting with 0x4040c.
When using gdb, you can achieve the same effect by using the following
commands:
 info reg	displays the names and contents of the processor
		registers,
 p/x $gr23	displays the contents of processor register gr23,
 x/4wx 0x4040c	displays four longwords in memory starting at
		address 0x4040c.
If you are using dbx, you can use the following commands:
 print $l4	to display the contents of processor register l4,
 0x4040c/4X	to display four longwords in memory starting at
		address 0x4040c.
	8.5. Tracing the call stack.
Procedures compiled by Liar receive additional implicit arguments used
to communicate the lexical environment and the implicit continuation.
Most procedures receive a return address argument that points to the
procedure that is waiting for the one at hand to finish.  Some
procedures receive a static link argument.  Static links are pointers
into other frames in the stack, used to thread together environments
when the distance between lexically nested frames cannot be statically
computed.  Some procedures receive a dynamic link argument.  Dynamic
links are pointers to return addresses on the stack, and are used when
the compiler cannot determine statically the distance on the stack
between the last argument and the return address for a procedure.
Dynamic links are rarely needed, and static links are needed only
somewhat more frequently.  Only one of the programs in the test suite
uses dynamic and static links, and it was carefully constructed to
make the compiler generate code that uses them.  All externally
callable procedures, including all those ``closed'' at top level,
expect return addresses and no other links.
In general, it is impossible to find out what procedure called another
in Scheme, due to tail recursion.  Procedures whose last action is a
call to another procedure, and whose local frames are not part of the
environment of their callees, will have their frames popped off the
stack before the callee is entered, and there will be no record left
of their execution.  The interpreter uses a cache of previous frames
(called the history) to provide additional debugging information.
On the other hand, most calls are not in tail position, and a return
address will be ``passed'' to the callee indicating who the caller
was.  Occasionally the static and dynamic links will have to be traced
to find the return address, but this is rare.
The following assumes that we are dealing with an ordinary procedure
that expects a return address.
The return address is passed on the stack, immediately below the
leftmost argument.  It is a compiled-code entry object whose datum is
the encoded address of the instruction at which the ``caller'' should
be reentered.  If the procedure was called from the interpreter, the
``caller'' will be a special trampoline (return_to_interpreter) that
will give control back to the interpreter by using the compiled code
interface.  There is a little hanky panky in the interpreter to
guarantee that interpreted code ``tail recursing'' into compiled code
and viceversa will not push spurious return_to_interpreter return
addresses and return_to_compiled_code interpreter return codes that
would break proper tail recursion, but you need not concern yourself
with this.
If your debugger allows you to dynamically call C procedures, and it
is not hopelessly confused by the Scheme register set, you can use the
C procedure ``compiled_entry_filename'' to determine the filename that
a return address (or other compiled entry) belongs to.  Its only
argument should be the compiled entry object whose origin you want to
find.  Unfortunately, debuggers are often confused by the register
assignments within Scheme compiled code, precisely when you need them
most.  You can bypass this problem the following way:
On entry to a procedure whose return address you wish to examine,
write down the return address object, change the compiled code's
version of MemTop so that the comparison with free will fail and the
code will take an interrupt, set a breakpoint in the runtime library
routine ``compiler_interrupt_common'', and continue the code.  When
the new breakpoint is hit, you can use ``compiled_entry_filename'' to
examine the return address you had written down.
Here is how to do this the hard way, which you will have to resort to
often:
Compiled code blocks generated by the compiler always encompass two
special locations.  The last location in the compiled code block
contains the environment where the compiled code block was loaded
(after the code has been evaluated).  The immediately preceding
location contains a debugging information descriptor.  This descriptor
is either a string, namely the name of the file where the compiler
dumped the debugging information for the block, or a pair whose car is
the name of the debugging information filename and whose cdr is the
block number (a fixnum) of the compiled code block in the file, or the
debugging information itself if the compiled code block was generated
in core and not dumped.
Given that the word immediately preceding an external entry point in
the compiler always contains a gc-offset from the entry point to the
first word of the compiled code block (i.e. the vector header), or to
another external entry point if the distance is too large, it is a
simple, but involved, matter to find this debugging information.  Note
that all return addresses, full closures, and top-level procedures are
external entry points.
For example, imagine that the return address for a procedure is
0xa08fe2ee.  Furthermore, assume that we are running on a Motorola
MC68020 with four bytes per longword, no segment bits, and for which
cmpint-mc68k.h defines PC_ZERO_BITS to be 1.
Extracting the word at address 0x8fe2ea (four bytes before the entry
point), will yield the format longword, that consists of the format
field in its high halfword, and the encoded gc offset field in the
lower halfword.
The gdb command ``x/2hx 0x4fc564'' will print these two halfwords,
and let's assume that the output is
 0x8fe2ea <fpa_loc+10445546>:	0x8280	0x003e
The adb (and dbx) command ``0x4fc564/2x'' should yield similar output.
This confirms that the object in question is a return address because
the most significant bit of the format word is a 1, and it would be a
0 for a procedure.  The encoded gc offset is 0x3e.  GC offsets and
format words are described in detail in cmpint.txt.
Since the least significant bit of the GC offset is 0, it points
directly to the vector header of a compiled code block.  The real
offset is
 ((0x3e >> 1) << PC_ZERO_BITS) = 0x3e
Thus the compiled code block starts at location
 0x8fe2ee-0x3e = 0x008fe2b0
Examining the top two words at this address (using the gdb command
 ``x/2wx 0x008fe2b0'' or the adb and dbx command ``0x8fe2b0/2X'') we
 0x8fe2b0 <fpa_loc+10445488>:	0x00000028	0x9c00001d
The first word is an ordinary vector header, and the second a
non-marked vector header used to inform the GC that 0x1d longwords of
binary data, the actual instructions, follow.
The last location in the vector, containing the environment, is at
address
 0x8fe2b0+4*0x28 = 0x8fe350
Examining the preceding adjacent location and this one (using gdb's
 ``x/2wx 0x8fe34c'' or a similar command for a different debugger)
will yield
 0x8fe34c <fpa_loc+10445644>:	0x0498ef9c	0x4898e864
The second object is the loading environment, and the first object is
the debugging information, in this case a pair.  This pair can be
examined (using gdb's ``x/2wx 0x98ef9c'' or an analogous command for a
different debugger) to yield
 0x98ef9c <fpa_loc+11038620>:	0x789ac5ec	0x6800001c
The first object is a string, and the second a fixnum, indicating that
the return address at hand belongs to the compiled code block numbered
0x1c in the file whose name is that string.
Scheme strings have two longwords of header, followed by an ordinary C
string that includes a null terminating character, thus the C string
starts at address 0x9ac5ec+4*2=0x9ac5f4, and the gdb command
 ``x/s 0x9ac5f4'' or the adb and dbx command ``0x9ac5f4/s''
will display something like:
 0x9ac5f4 <fpa_loc+11159028>:	
	(char *) 0x9ac5f4 "/usr/local/lib/mit-scheme/SRC/runtime/parse.binf"
Thus the return address we are examining is at offset 0x3e in compiled
code block number 0x1c of the runtime system file ``parse.com''.
If the disassembler is available, you can then use
 (compiler:write-lap-file "parse")
to find out what this return address is, or if you compiled (or
re-compile) parse.scm generating lap files, you can probably guess
what return address is at offset 0x3e (the input lap files do not
contain computed offsets, since these are computed on final assembly).
This interaction would remain very similar for other machines and
compiled entry points, given the same or similar debuggers.  The
variations would be the following:
- Segment bits might have to be added to the object datum components
to produce addresses.  For example, on the HP-PA with segment bits 01
at the most significant end of a word, the C string for Scheme string
object 0x789ac5ec would start at address 0x409ac5ec+8=0x409ac5f4,
instead of at address 0x9ac5ec+8=0x9ac5f4.
- The gc offset might be computed differently, depending on the value
of PC_ZERO_BITS.  For example, on a Vax, where PC_ZERO_BITS has the
value 0, an encoded offset of value 0x3e would imply a real offset of
value (0x3e >> 1)=0x1f.  On a MIPS R3000, where PC_ZERO_BITS is 2, the
same encoded offset would encode the real offset value 
 ((0x3e >> 1) << 2)=0x7c.  In addition, if the low order bit of the
encoded gc offset field were 1, a new gc offset would have to be
extracted, and the process repeated until the beginning of the block
was reached.
- The constant offsets added to various addresses (e.g. that added to
a string object's address to obtain the C string's address) would vary
if the number of bytes per Scheme object (sizeof (unsigned long)) in C
were not 4.
- Not all compiled entry points have debugging information descriptors
accessible the same way.  Trampolines don't have them at all, and
closures have them in the shared code, not in the closure objects.  To
check whether something is a trampoline, you can check the format
field (most trampolines have 0xfffd) or verify that the instructions
immediately call the compiled code interface.  Closure objects have
type code MANIFEST-COMPILED-CLOSURE instead of MANIFEST-VECTOR in the
length word of the compiled code block.  Once you obtain the real
entry point for a closure, you can use the same method to find out the
information about it.
	8.6. Things to watch out for.
The worst bugs to track are interrupt and garbage-collection related.
They will often make the compiled code crash at seemingly random
points, and are very hard to reproduce.  A common source of this kind
of bug is a problem in the rules for procedure headers.  Make sure
that the rules for the various kinds of procedure headers generate the
desired code, and that the desired code operates correctly.  You can
test this explicitly by using an assembly-language debugger to set
breakpoints at the entry points of various kinds of procedures.  When
the breakpoints are reached, you can bump the Free pointer to a value
larger than MemTop, so that the interrupt branch will be taken.  If
the code continues to execute correctly, you are probably safe.  You
should especially check procedures that expect dynamic links for these
must be saved and restored correctly.  Closures should also be tested
carefully, since they need to be reentered correctly, and the closure
object on the stack may have to be de-canonicalized.  Currently
C_to_interface and interface_to_scheme must copy the interpreter's
value register into the compiler's value register, and must extract
the address of this value and store it in the dynamic link register.
Register allocation bugs also manifest themselves in unexpected ways.
If you forget to use NEED-REGISTER! on a register used by a LAPGEN
rule, or if you allocate registers for the sources and target of a
rule in the wrong order (remember the cardinal rule!), you may not
notice for a long time, but some poor program will.  If this happens,
you will be lucky if you can find and disassemble a relatively small
procedure that does not operate properly, but typically the only
notice you will get is when Scheme crashes in an unrelated place.
Fortunately, this type of bug is reproducible.  In order to find the
incorrectly compiled code, you can use binary search on the sources by
mixing interpreted and compiled binaries.  When loading the compiler,
.bin files will be used for those files for which the corresponding
.com file does not exist.  Thus you can move .com files in and out of
the appropriate directories, reload, and test again.  Once you
determine the procedure in which the bug occurs, re-compiling the
module and examining the resulting RTL and LAP programs should lead to
identification of the bug.
		9. Bibliography
1. "Efficient Stack Allocation for Tail-Recursive Languages" by Chris
Hanson, in Proceedings of the 1990 ACM Conference on Lisp and
Functional Programming.
2. "Free Variables and First-Class Environments" by James S. Miller
and Guillermo J. Rozas, in Lisp and Symbolic Computation, 4, 107-141,
1991, Kluwer Academic Publishers.
3. "MIT Scheme User's Manual for Scheme Release 7.1" by Chris Hanson,
distributed with MIT CScheme version 7.1.
4. "MIT Scheme Reference Manual for Scheme Release 7.1" by Chris
Hanson, distributed with MIT CScheme version 7.1.
5. "Taming the Y Operator" by Guillermo J. Rozas, in Proceedings of
the 1992 ACM Conference on Lisp and Functional Programming.
		A.1. MIT Scheme package system
The MIT Scheme package system is used to divide large programs into
separate name spaces which are then ``wired together''.  A large
program, like the runtime system, Edwin, or the compiler, has many
files and variable names all of which must exist at the same time
without conflict.  The package system is a prototype system to
accomplish this separation, but will probably be replaced once a
better module system is developed.
Currently, each package corresponds, at runtime, to a Scheme
environment.  Environments have their usual, tree-shaped structure,
and packages are also structured in a tree, but the trees need not be
isomorphic, although they often are.
Each package is given a name, e.g.:
  (compiler reference-contexts)
whose corresponding environment can be found using the procedure
->ENVIRONMENT:
  (->environment '(compiler reference-contexts))	;; Call this CR
By convention, this package corresponds to an environment below the
  (->environment '(compiler))				;; Call this C
environment, and therefore CR contains all variables defined in C, as
well as those specifically defined in CR.  The package name ``()''
corresponds to the system global environment.
The package structure for the compiler is defined in the file
  <compiler-directory>/machines/<machine-name>/comp.pkg
In that file, each package has a description of the form:
  (define-package <NAME>
    (files <FILES>)
    (parent <PACKAGE-NAME>)
    (export <PACKAGE-NAME> <VARIABLES>)
    (import <PACKAGE-NAME> <VARIABLES>))
where <FILES> are the names of the files that should be loaded in to
package <NAME>.
  (parent <PACKAGE-NAME>)
declares the package whose name is <PACKAGE-NAME> to be the parent
package of <NAME>.  Lexical scoping will make all variables visible in
<PACKAGE-NAME> also visible in <NAME>.
The EXPORT and IMPORT declarations are used to describe cross-package
links.  A package may export any of its variables to any other
package using EXPORT; these variables will appear in both packages
(environments), and any side effect to one of these variables in
either package will be immediately visible in the other package.
Similarly, a package may import any of another package's variables
using IMPORT.  Any number (including zero) of IMPORT and EXPORT
declarations may appear in any package declaration.
Here is an example package declaration, drawn from the compiler:
  (define-package (compiler top-level)
    (files "base/toplev"
	   "base/crstop")
    (parent (compiler))
    (export ()
	    cf
	    compile-bin-file
	    compile-procedure
	    compile-scode
	    compiler:reset!
	    cross-compile-bin-file
	    cross-compile-bin-file-end)
    (export (compiler fg-generator)
	    compile-recursively)
    (export (compiler rtl-generator)
	    *ic-procedure-headers*
	    *rtl-continuations*
	    *rtl-expression*
	    *rtl-graphs*
	    *rtl-procedures*)
    (export (compiler lap-syntaxer)
	    *block-label*
	    *external-labels*
	    label->object)
    (export (compiler debug)
	    *root-expression*
	    *rtl-procedures*
	    *rtl-graphs*)
    (import (runtime compiler-info)
	    make-dbg-info-vector)
    (import (runtime unparser)
	    *unparse-uninterned-symbols-by-name?*))
The read-eval-print loop of Scheme evaluates all expressions in the
same environment.  It is possible to change this environment using the
procedure GE, e.g.:
  (ge (->environment '(compiler top-level)))
To find the package name of the current read-eval-print loop
environment, if there is one, evaluate:
  (pe)
The package system is currently completely static; it is difficult to
create packages and wire them together on the fly.  If you find that
you need to temporarily wire a variable to two different environments
(as you would do with an IMPORT or EXPORT declaration), use the
procedure ENVIRONMENT-LINK-NAME:
  (environment-link-name <TO-ENVIRONMENT>
			 <FROM-ENVIRONMENT>
			 <VARIABLE-SYMBOL-NAME>)
For example, to make WRITE-RESTARTS, originally defined in the
(runtime debugger) package, also visible in the (edwin debugger)
package, evaluate:
  (environment-link-name (->environment '(edwin debugger))
			 (->environment '(runtime debugger))
			 'write-restarts)