-*- Text -*-

$Id: safety.txt,v 1.4 2000/03/21 04:29:53 cph Exp $

		COMPILER SAFETY INFORMATION 
		Liar versions 4.77 and later

This article describes how to control the compilation process in order
to achieve the desired mix of safety, debuggability, and speed.

The task of the native-code compiler is to translate a source (or
Scode) program into the native machine language in order to make the
program run faster than when interpreted.

Although a straight-forward translation speeds the program
significantly, much of the achievable performance comes from
optimizations that the compiler can perform after statically analyzing
the program text.  There is a limit, however, to the extent of the
information that can be collected statically, and, in order to achieve
higher performance (often desired, occasionally necessary), the
compiler can be directed to assume additional information that is not
apparent after analyzing the program text.

Compilation switches are (global) variables whose value when the
compiler is run determines how the compilation proceeds.  Some of the
switches provide information that cannot be deduced statically and
allow the relaxation of some runtime consistency checking and the
collection of information to be displayed when an error is detected
and signalled.  Relaxing the runtime constraints often makes the
generated code smaller and faster, but may cause problems if the
program being compiled has not been fully debugged, or is invoked with
inappropriate arguments at run time.

Safety (correctness) can primarily be compromised by eliminating
checks that the program should perform at runtime.  These checks
are divided into a few categories:

- Heap availability checks.  Programs need to invoke the storage
manager (garbage collector) when they need more memory than is
available.  Each time that storage is needed, its availability should
be checked.  If this is not done, the system may be damaged.

- Stack availability checks.  Storage is divided into a heap used to
allocate objects with indefinite extent, and a stack used for
procedure call frames with dynamic extent
(call-with-current-continuation copies the stack when invoked).
Availability of storage must be checked in the appropriate region.  A
very deep recursion may cause the stack to overflow, and this
condition must be checked in order to avoid overwriting other regions
of memory.

- Type checks.  Scheme is a strongly (albeit dynamically) typed
language.  Operations are only defined on certain types of objects,
and a program is in error if it attempts to operate on the wrong type
of data.

- Range checks.  The type of some arguments to a procedure may be
correct, but there may be further restrictions on them which may not
be satisfied.  For example, vector and string indices must be
non-negative integers smaller than the length of the vector or string,
filenames represented as strings must denote existing files with the
appropriate protection when the files are going to be opened for
reading, etc.

These checks obviously require some code, when compared to the code
that could be generated assuming that no violations will occur at
runtime.  This code requires space, and time to execute, but
furthermore, may cause other performance degradation with respect to
the version where no violations are guaranteed to occur.  This
additional performance degradation arises because the compiler is
prevented from making better register assignments or reusing the
results of previous computations.

For a translation to be safe, ie. completely correct, all these checks
must be performed at runtime except in those situations when the
compiler can prove that violations cannot occur at runtime.  These
situations are very rare, so for most programs, most checks would be
included in the code generated by the compiler.

The MIT Scheme compiler treats each of these consistency checks as
follows:

- Heap availability checks. Heap availability is currently not checked
on every allocation, but instead is checked when allocating large
blocks of storage, and otherwise checked frequently, typically on
entry to procedures and continuations.  The storage manager reserves a
block of storage past the end of the logical end of storage in order
to allow this scheme to work.  This scheme is, however, unsafe.  It is
possible, but unlikely, to write programs that, after being compiled,
will overflow the heap and cause the system to crash at runtime.  The
current heuristic has not being observed to fail, but future versions
of the compiler will improve matters by allowing more careful code
generation, and/or limiting the amount of allocation between checks to
the size of the storage manager's overflow buffer.

- Stack availability checks: Stack availability is currently not
checked at all by compiled code.  A very deep or infinite recursion
will cause the system to crash.  This WILL be fixed in the near
future.

- Type checks and range checks: A Scheme program can be considered to
be a set of calls to primitive operations and some higher-level glue
that pieces them together.  The higher-level glue does not directly
manipulate objects, but instead passes them around to the various
primitives in a controlled fashion.  Thus type and range checks are
not needed in the higher-level glue, but only in the primitives
themselves.  There are various switches that control how primitives
are treated by the compiler, and they provide the main form of user
control of the safety of compiled code.

	Control of the open coding (in-lining) of primitives

Primitives may be open-coded or called out of line.  The out-of-line
versions are safe, ie. they perform all pertinent consistency checks.
The compilation switches listed below control how the primitives are
open coded.

Some important considerations:

- Under all possible settings of the switches described below, any
generated code corresponding to a primitive call, whether open coded
or not, will operate correctly on correct inputs.

- If the compiler does not know that the operator of a combination is
a primitive procedure, it will not open code it.  In particular, if
the compiler does not know that a variable is bound to a particular
primitive procedure, no combinations with that variable as the
operator will be open coded.  Usually the compiler is informed of such
constant bindings by making use of declarations like
USUAL-INTEGRATIONS.  See the documentation for sf for additional
information on declarations.

- The compiler will not make an unsafe program safe, ie. safe
translation does not compensate for unsafe programs.

This article describes whether and when the translation of the program
into native code will reduce the safety of the program (as compared to
the interpreted version), but there is no realistic way to increase
its safety.  A program may be inherently unsafe if it uses inherenty
unsafe primitives inappropriately.

Some primitives of the MIT Scheme system are inherently unsafe.  They
are used for system maintenance and low-level system operation, but,
like everything else in the system, they are available to users.
Their use should be avoided except in rare occasions.  Using them
arbitrarily may cause the system to crash, or worse, damage it in
subtle ways that will produce spurious wrong results or later crashes.
There is nothing the compiler can effectively do to prevent this,
since any other action might change the meaning of the program on
correct inputs.

- The switches listed below are not orthogonal.  Their meaning
sometimes depends on the settings of the other switches.

The following compilation switches affect the open-coding of
primitives:


	COMPILER:OPEN-CODE-PRIMITIVES?

This N-ary switch can take several values as described below.  Two of
the values (true and false) are booleans, the rest symbols.  

Note that if a primitive call is open coded when a switch setting is
used, it will also be open coded with settings that appear below in
the list.

The possible values for this switch are:

-- false: No primitive calls are open-coded.  All primitives are
called out-of-line and the code is fully safe.

-- CORRECT: Open code only those primitive calls whose corresponding
code is always correct, and therefore safe.

-- INNOCUOUS: Open code primitive calls whose corresponding code is
correct when given appropriate arguments, and will not crash
immediately when given inappropriate arguments.  Primitive calls may
return values when they should have signalled an error, but the values
returned are relatively innocuous: they are guaranteed to be valid
Scheme objects.  The overall program or the system may still fail,
since these incorrect values may cause the program to take the wrong
branches later and end up in unsafe or unexpected code that it would
never have executed had the errors been signalled.  Damage to the
system is unlikely.

-- ALLOW-READS: Open code even if arbitrary memory locations may be
read with inappropriate arguments.  This may cause a memory trap if
the location is read-protected by the Operating System, or the
resulting address is not valid (eg. not aligned properly), and may
cause the garbage collector or other parts of the program and system
to crash if the data stored at the location read is not a valid object
but looks like one.  If the extracted data is only used temporarily
and never stored in long living data structures or environments,
damage to the system is unlikely.

-- ALLOW-WRITES: Open code even if arbitrary memory locations may be
written.  This may cause an immediate failure if the location is not
writable, or other problems if the integrity of some data is destroyed
causing (often obscure) errors or crashes later.

-- true: open code all primitive calls (that the compiler is capable of
open-coding) without regard for safety.

	COMPILER:GENERATE-TYPE-CHECKS?
	COMPILER:GENERATE-RANGE-CHECKS?

These boolean switches control whether type or range checks should be
issued.  The code generated is longer and slower when they are.  Note
that a primitive call that would not fall in the CORRECT setting of
COMPILER:OPEN-CODE-PRIMITIVES? if these checks where not issued, might
very well fall in it when they are.  For most intents and purposes,
turning both of these switches on bumps COMPILER:OPEN-CODE-PRIMITIVES?
to ALLOW-WRITES unless it is false.


	COMPILER:PRIMITIVE-ERRORS-RESTARTABLE?

This boolean switch controls how errors will be signalled if they are
detected at runtime due to incorrect arguments found by checks in the
open coding of primitive calls.  If set to true, the code will be
longer and slower, but will provide the maximum amount of debugging
information, and in addition, the primitive call may be bypassed and
the computation restarted as if it had completed successfully.  If set
to false, the code may be noticeably smaller and faster, but there may
be less debugging information and some restarting ability may be lost.