-*- Text -*- $Id: safety.txt,v 1.4 2000/03/21 04:29:53 cph Exp $ COMPILER SAFETY INFORMATION Liar versions 4.77 and later This article describes how to control the compilation process in order to achieve the desired mix of safety, debuggability, and speed. The task of the native-code compiler is to translate a source (or Scode) program into the native machine language in order to make the program run faster than when interpreted. Although a straight-forward translation speeds the program significantly, much of the achievable performance comes from optimizations that the compiler can perform after statically analyzing the program text. There is a limit, however, to the extent of the information that can be collected statically, and, in order to achieve higher performance (often desired, occasionally necessary), the compiler can be directed to assume additional information that is not apparent after analyzing the program text. Compilation switches are (global) variables whose value when the compiler is run determines how the compilation proceeds. Some of the switches provide information that cannot be deduced statically and allow the relaxation of some runtime consistency checking and the collection of information to be displayed when an error is detected and signalled. Relaxing the runtime constraints often makes the generated code smaller and faster, but may cause problems if the program being compiled has not been fully debugged, or is invoked with inappropriate arguments at run time. Safety (correctness) can primarily be compromised by eliminating checks that the program should perform at runtime. These checks are divided into a few categories: - Heap availability checks. Programs need to invoke the storage manager (garbage collector) when they need more memory than is available. Each time that storage is needed, its availability should be checked. If this is not done, the system may be damaged. - Stack availability checks. Storage is divided into a heap used to allocate objects with indefinite extent, and a stack used for procedure call frames with dynamic extent (call-with-current-continuation copies the stack when invoked). Availability of storage must be checked in the appropriate region. A very deep recursion may cause the stack to overflow, and this condition must be checked in order to avoid overwriting other regions of memory. - Type checks. Scheme is a strongly (albeit dynamically) typed language. Operations are only defined on certain types of objects, and a program is in error if it attempts to operate on the wrong type of data. - Range checks. The type of some arguments to a procedure may be correct, but there may be further restrictions on them which may not be satisfied. For example, vector and string indices must be non-negative integers smaller than the length of the vector or string, filenames represented as strings must denote existing files with the appropriate protection when the files are going to be opened for reading, etc. These checks obviously require some code, when compared to the code that could be generated assuming that no violations will occur at runtime. This code requires space, and time to execute, but furthermore, may cause other performance degradation with respect to the version where no violations are guaranteed to occur. This additional performance degradation arises because the compiler is prevented from making better register assignments or reusing the results of previous computations. For a translation to be safe, ie. completely correct, all these checks must be performed at runtime except in those situations when the compiler can prove that violations cannot occur at runtime. These situations are very rare, so for most programs, most checks would be included in the code generated by the compiler. The MIT Scheme compiler treats each of these consistency checks as follows: - Heap availability checks. Heap availability is currently not checked on every allocation, but instead is checked when allocating large blocks of storage, and otherwise checked frequently, typically on entry to procedures and continuations. The storage manager reserves a block of storage past the end of the logical end of storage in order to allow this scheme to work. This scheme is, however, unsafe. It is possible, but unlikely, to write programs that, after being compiled, will overflow the heap and cause the system to crash at runtime. The current heuristic has not being observed to fail, but future versions of the compiler will improve matters by allowing more careful code generation, and/or limiting the amount of allocation between checks to the size of the storage manager's overflow buffer. - Stack availability checks: Stack availability is currently not checked at all by compiled code. A very deep or infinite recursion will cause the system to crash. This WILL be fixed in the near future. - Type checks and range checks: A Scheme program can be considered to be a set of calls to primitive operations and some higher-level glue that pieces them together. The higher-level glue does not directly manipulate objects, but instead passes them around to the various primitives in a controlled fashion. Thus type and range checks are not needed in the higher-level glue, but only in the primitives themselves. There are various switches that control how primitives are treated by the compiler, and they provide the main form of user control of the safety of compiled code. Control of the open coding (in-lining) of primitives Primitives may be open-coded or called out of line. The out-of-line versions are safe, ie. they perform all pertinent consistency checks. The compilation switches listed below control how the primitives are open coded. Some important considerations: - Under all possible settings of the switches described below, any generated code corresponding to a primitive call, whether open coded or not, will operate correctly on correct inputs. - If the compiler does not know that the operator of a combination is a primitive procedure, it will not open code it. In particular, if the compiler does not know that a variable is bound to a particular primitive procedure, no combinations with that variable as the operator will be open coded. Usually the compiler is informed of such constant bindings by making use of declarations like USUAL-INTEGRATIONS. See the documentation for sf for additional information on declarations. - The compiler will not make an unsafe program safe, ie. safe translation does not compensate for unsafe programs. This article describes whether and when the translation of the program into native code will reduce the safety of the program (as compared to the interpreted version), but there is no realistic way to increase its safety. A program may be inherently unsafe if it uses inherenty unsafe primitives inappropriately. Some primitives of the MIT Scheme system are inherently unsafe. They are used for system maintenance and low-level system operation, but, like everything else in the system, they are available to users. Their use should be avoided except in rare occasions. Using them arbitrarily may cause the system to crash, or worse, damage it in subtle ways that will produce spurious wrong results or later crashes. There is nothing the compiler can effectively do to prevent this, since any other action might change the meaning of the program on correct inputs. - The switches listed below are not orthogonal. Their meaning sometimes depends on the settings of the other switches. The following compilation switches affect the open-coding of primitives: COMPILER:OPEN-CODE-PRIMITIVES? This N-ary switch can take several values as described below. Two of the values (true and false) are booleans, the rest symbols. Note that if a primitive call is open coded when a switch setting is used, it will also be open coded with settings that appear below in the list. The possible values for this switch are: -- false: No primitive calls are open-coded. All primitives are called out-of-line and the code is fully safe. -- CORRECT: Open code only those primitive calls whose corresponding code is always correct, and therefore safe. -- INNOCUOUS: Open code primitive calls whose corresponding code is correct when given appropriate arguments, and will not crash immediately when given inappropriate arguments. Primitive calls may return values when they should have signalled an error, but the values returned are relatively innocuous: they are guaranteed to be valid Scheme objects. The overall program or the system may still fail, since these incorrect values may cause the program to take the wrong branches later and end up in unsafe or unexpected code that it would never have executed had the errors been signalled. Damage to the system is unlikely. -- ALLOW-READS: Open code even if arbitrary memory locations may be read with inappropriate arguments. This may cause a memory trap if the location is read-protected by the Operating System, or the resulting address is not valid (eg. not aligned properly), and may cause the garbage collector or other parts of the program and system to crash if the data stored at the location read is not a valid object but looks like one. If the extracted data is only used temporarily and never stored in long living data structures or environments, damage to the system is unlikely. -- ALLOW-WRITES: Open code even if arbitrary memory locations may be written. This may cause an immediate failure if the location is not writable, or other problems if the integrity of some data is destroyed causing (often obscure) errors or crashes later. -- true: open code all primitive calls (that the compiler is capable of open-coding) without regard for safety. COMPILER:GENERATE-TYPE-CHECKS? COMPILER:GENERATE-RANGE-CHECKS? These boolean switches control whether type or range checks should be issued. The code generated is longer and slower when they are. Note that a primitive call that would not fall in the CORRECT setting of COMPILER:OPEN-CODE-PRIMITIVES? if these checks where not issued, might very well fall in it when they are. For most intents and purposes, turning both of these switches on bumps COMPILER:OPEN-CODE-PRIMITIVES? to ALLOW-WRITES unless it is false. COMPILER:PRIMITIVE-ERRORS-RESTARTABLE? This boolean switch controls how errors will be signalled if they are detected at runtime due to incorrect arguments found by checks in the open coding of primitive calls. If set to true, the code will be longer and slower, but will provide the maximum amount of debugging information, and in addition, the primitive call may be bypassed and the computation restarted as if it had completed successfully. If set to false, the code may be noticeably smaller and faster, but there may be less debugging information and some restarting ability may be lost.