[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Note: the procedures described in this section are only available when
the -compiler
command-line option is specified.
4.1 Compilation Procedures 4.2 Declarations 4.3 Efficiency Tips
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
(cf "foo") |
cf
compiles the file `foo.scm', producing the file
`foo.com' (incidentally it will also produce `foo.bin',
`foo.bci', and possibly `foo.ext'). If you later evaluate
(load "foo") |
`foo.com' will be loaded rather than `foo.scm'.
If destination is given, it says where the output files should go. If this argument is a directory, they go in that directory, e.g.:
(cf "foo" "../bar/") |
will take `foo.scm' and generate the file `../bar/foo.com'. If destination is not a directory, it is the root name of the output:
(cf "foo" "bar") |
takes `foo.scm' and generates `bar.com'.
About the `.bci' files: these files contain the debugging
information that Scheme uses when you call debug
to examine
compiled code. When you load a `.com' file, Scheme remembers where
it was loaded from, and when the debugger (or pp
) looks at the
compiled code from that file, it attempts to find the `.bci' file
in the same directory from which the `.com' file was loaded. Thus
it is a good idea to leave these files together.
`.bci' files are stored in a compressed format. The debugger has to uncompress the files when it looks at them, and on a slow machine this can take a noticeable time. The system takes steps to reduce the impact of this behavior: debugging information is cached in memory, and uncompressed versions of `.bci' files are kept around. The default behavior is that a temporary file is created and the `.bci' file is uncompressed into it. The temporary file is kept around for a while afterwards, and during that time if the uncompressed `.bci' file is needed the temporary file is used. Each such reference updates an `access time' that is associated with the temporary file. The garbage collector checks the access times of all such temporary files, and deletes any that have not been accessed in five minutes or more. All of the temporaries are deleted automatically when the Scheme process is killed.
Two other behaviors are available. One of them uncompresses the `.bci' file each time it is referenced, and the other uncompresses the `.bci' file and writes it back out as a `.bif' file. The `.bif' file remains after Scheme exits. The time interval and the behavior are controlled by the following variables.
#f
automatic
#t
#f
.
sf
is the program that transforms a source-code file into binary
SCode form; it is used on machines that do not support native-code
compilation. It performs numerous optimizations that can make your
programs run considerably faster than unoptimized interpreted code.
Also, the binary files that it generates load very quickly compared to
source-code files.
The simplest way to use sf
is just to say:
(sf filename) |
This will cause your file to be transformed, and the resulting binary
file to be written out with the same name, but with pathname type
"bin"
. If you do not specify a pathname type on the input file,
"scm"
is assumed.
Like load
, the first argument to sf
may be a list of
filenames rather than a single filename.
sf
takes an optional second argument, which is the filename of
the output file. If this argument is a directory, then the output file
has its normal name but is put in that directory instead.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Several declarations can be added to your programs to help cf
and
sf
make them more efficient.
4.2.1 Standard Names 4.2.2 In-line Coding 4.2.3 Operator Replacement 4.2.4 Operator Reduction
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Normally, all files have a line
(declare (usual-integrations)) |
near their beginning, which tells the compiler that free variables whose
names are defined in system-global-environment
will not be
shadowed by other definitions when the program is loaded. If you
redefine some global name in your code, for example car
,
cdr
, and cons
, you should indicate it in the declaration:
(declare (usual-integrations car cdr cons)) |
You can obtain an alphabetically-sorted list of the names that the
usual-integrations
declaration affects by evaluating the
following expression:
(eval '(sort (append usual-integrations/constant-names usual-integrations/expansion-names) (lambda (x y) (string<=? (symbol->string x) (symbol->string y)))) (->environment '(scode-optimizer))) |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Another useful facility is the ability to in-line code procedure definitions. In fact, the compiler will perform full beta conversion, with automatic renaming, if you request it. Here are the relevant declarations:
integrate
declaration, except that it only
substitutes for references that appear in the operator position of a
combination. All other references are ignored.
If filename is a relative filename (the normal case), it is interpreted as being relative to the file in which the declaration appears. Thus if the declaration appears in file `/usr/cph/foo.scm', then the compiler looks for a file called `/usr/cph/filename.ext'.
Note: When the compiler finds top-level integrations, it collects them
and outputs them into an auxiliary file with extension `.ext'.
This `.ext' file is what the integrate-external
declaration
refers to.
Note that the most common use of this facility, in-line coding of
procedure definitions, requires a somewhat complicated use of these
declarations. Because this is so common, there is a special form,
define-integrable
, which is like define
but performs the
appropriate declarations. For example:
(define-integrable (foo-bar foo bar) (vector-ref (vector-ref foo bar) 3)) |
Here is how you do the same thing without this special form: there
should be an integrate-operator
declaration for the procedure's
name, and (internal to the procedure's definition) an integrate
declaration for each of the procedure's parameters, like this:
(declare (integrate-operator foo-bar)) (define foo-bar (lambda (foo bar) (declare (integrate foo bar)) (vector-ref (vector-ref foo bar) 3))) |
The reason for this complication is as follows: the
integrate-operator
declaration finds all the references to
foo-bar
and replaces them with the lambda expression from the
definition. Then, the integrate
declarations take effect because
the combination in which the reference to foo-bar
occurred
supplies code that is substituted throughout the body of the procedure
definition. For example:
(foo-bar (car baz) (cdr baz)) |
First use the integrate-operator
declaration:
((lambda (foo bar) (declare (integrate foo bar)) (vector-ref (vector-ref foo bar) 3)) (car baz) (cdr baz)) |
Next use the internal integrate
declaration:
((lambda (foo bar) (vector-ref (vector-ref (car baz) (cdr baz)) 3)) (car baz) (cdr baz)) |
Next notice that the variables foo
and bar
are not used,
and eliminate them:
((lambda () (vector-ref (vector-ref (car baz) (cdr baz)) 3))) |
Finally, remove the ((lambda () ...))
to produce
(vector-ref (vector-ref (car baz) (cdr baz)) 3) |
(sf "foo.scm") (pp (fasload "foo.bin")) |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The replace-operator
declaration is provided to inform the
compiler that certain operators may be replaced by other operators
depending on the number of arguments.
For example:
Declaration:
(declare (replace-operator (map (2 map-2) (3 map-3)))) |
Replacements:
(map f x y z) ==> (map f x y z) (map f x y) ==> (map-3 f x y) (map f x) ==> (map-2 f x) (map f) ==> (map f) (map) ==> (map) |
Presumably map-2
and map-3
are efficient versions of
map
that are written for exactly two and three arguments
respectively. All the other cases are not expanded but are handled by the
original, general map
procedure, which is less efficient because
it must handle a variable number of arguments.
The syntax of this declaration is
(replace-operator (name (nargs1 value1) (nargs2 value2) ...)) |
where
any
, else
or otherwise
.
'constant
variable
(primitive primitive-name [arity])
(global var)
The meanings of these fields are:
any
, else
or otherwise
, then the operation is
replaced with a call to the corresponding valueN.
Only one of the nargsN may be of this form.
any
, else
or otherwise
, then the operation is not
replaced.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The reduce-operator
declaration is provided to inform the
compiler that certain names are n-ary versions of binary operators.
Here are some examples:
Declaration:
(declare (reduce-operator (cons* cons))) |
Replacements:
(cons* x y z w) ==> (cons x (cons y (cons z w))), (cons* x y) ==> (cons x y) (cons* x) ==> x (cons*) error--> too few arguments |
Declaration:
(declare (reduce-operator (list cons (null-value '() any)))) |
Replacements:
(list x y z w) ==> (cons x (cons y (cons z (cons w '())))) (list x y) ==> (cons x (cons y '())) (list x) ==> (cons x '()) (list) ==> '() |
Declaration:
(declare (reduce-operator (- %- (null-value 0 single) (group left)))) |
Replacements:
(- x y z w) ==> (%- (%- (%- x y) z) w) (- x y) ==> (%- x y) (- x) ==> (%- 0 x) (-) ==> 0 |
Declaration:
(declare (reduce-operator (+ %+ (null-value 0 none) (group right)))) |
Replacements:
(+ x y z w) ==> (%+ x (%+ y (%+ z w))) (+ x y) ==> (%+ x y) (+ x) ==> x (+) ==> 0 |
Note: This declaration does not cause an appropriate definition of
%+
(in the last example) to appear in your code. It merely
informs the compiler that certain optimizations can be performed on
calls to +
by replacing them with calls to %+
. You should
provide a definition of %+
as well, although it is not required.
Declaration:
(declare (reduce-operator (apply (primitive cons) (group right) (wrapper (global apply) 1)))) |
Replacements:
(apply f x y z w) ==> ((access apply ()) f (cons x (cons y (cons z w)))) (apply f x y) ==> ((access apply ()) f (cons x y)) (apply f x) ==> (apply f x) (apply f) ==> (apply f) (apply) ==> (apply) |
(reduce-operator (name binop [(group ordering)] [(null-value value null-option)] [(singleton unop)] [(wrapper wrap [n])] [(maximum m)] )) |
where
'constant
variable
(primitive primitive-name [arity])
(global var)
always
, any
, one
,
single
, none
, or empty
.
left
, right
, or
associative
.
The meaning of these fields is:
group
option specifies whether name associates to the
right or left.
null-value
option specifies a value to use in the following
cases:
none
empty
one
single
any
always
In the above options, when value is supplied to binop, it is supplied on the left if grouping to the left, otherwise it is supplied on the right.
singleton
option specifies a function, unop, to be
invoked on the single argument given. This option supersedes the
null-value
option, which can only take the value none
.
wrapper
option specifies a function, wrap, to be
invoked on the result of the outermost call to binop after the
expansion.
If n is provided it must be a non-negative integer indicating a number
of arguments that are transferred verbatim from the original call to
the wrapper. They are passed to the left of the reduction.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
How you write your programs can have a large impact on how efficiently the compiled program runs. The most important thing to do, after choosing suitable data structures, is to put the following declaration near the beginning of the file.
(declare (usual-integrations)) |
Without this declaration the compiler cannot recognize any of the common operators and compile them efficiently.
The usual-integrations
declaration is usually sufficient to get
good quality compiled code.
If you really need to squeeze more performance out of your code then we hope that you find the following grab-bag of tips, hints and explanations useful.
4.3.1 Coding style 4.3.2 Global variables 4.3.3 Fixnum arithmetic 4.3.4 Flonum arithmetic
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Scheme is a rich language, in which there are usually several ways to say the same thing. A coding style is a set of rules that a programmer uses for choosing an expressive form to use in a given situation. Usually these rules are aesthetic, but sometimes there are efficiency issues involved; this section describes a few choices that have non-obvious efficiency consequences.
Consider the following implementation of map
as might be found in
any introductory book on Scheme:
(define (map f lst) (if (null? lst) '() (cons (f (car lst)) (map f (cdr lst))))) |
The problem with this definition is that at the points where car
and cdr
are called we still do not know that lst is a pair.
The compiler must insert a type check, or if type checks are disabled,
the program might give wrong results. Since one of the fundamental
properties of map
is that it transforms lists, we should make the
relationship between the input pairs and the result pairs more apparent
in the code:
(define (map f lst) (cond ((pair? lst) (cons (f (car lst)) (map f (cdr lst)))) ((null? lst) '()) (else (error "Not a proper list:" lst)))) |
Note also that the pair?
case comes first because we expect that
map
will be called on lists which have, on average, length
greater that one.
Calls to internal procedures are faster than calls to global procedures. There are two things that make internal procedures faster: First, the procedure call is compiled to a direct jump to a known location, which is more efficient that jumping `via' a global binding. Second, there is a knock-on effect: since the compiler can see the internal procedure, the compiler can analyze it and possibly produce better code for other expressions in the body of the loop too:
(define (map f original-lst) (let walk ((lst original-lst)) (cond ((pair? lst) (cons (f (car lst)) (walk (cdr lst)))) ((null? lst) '()) (else (error "Not a proper list:" original-lst))))) |
Internal definitions are a useful tool for structuring larger
procedures. However, certain internal definitions can thwart compiler
optimizations. Consider the following two procedures, where
compute-100
is some unknown procedure that we just know returns
`100'.
(define (f1) (define v 100) (lambda () v)) (define (f2) (define v (compute-100)) (lambda () v)) |
The procedure returned by f1
will always give the same result and
the compiler can prove this. The procedure returned by f2
may
return different results, even if f2
is only called once.
Because of this, the compiler has to allocate a memory cell to v
.
How can the procedure return different results?
The fundamental reason is that the continuation may escape during the
evaluation of (compute-100)
, allowing the rest of the body of
f2
to be executed again:
(define keep) (define (compute-100) (call-with-current-continuation (lambda (k) (set! keep k) 100))) (define p (f2)) (p) => 100 (keep -999) => p re-define v and p (p) => -999 |
To avoid the inefficiency introduced to handle the general case, the compiler must prove that the continuation cannot possibly escape. The compiler knows that lambda expressions and constants do not let their continuations escape, so order the internal definitions so that definitions of the following forms come first:
(define x 'something) (define x (lambda (...) ...)) (define (f u v) ...) |
Note: The IEEE Scheme standard permits only lambda expressions and constants as the value of internal defines. Furthermore, all internal definitions must appear before any other expressions in the body. Following the standard simultaneously assures portability and avoids the implementation inefficiencies described in this section.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Compiled code usually accesses variables in top-level first-class environments via variable caches. Each compiled procedure has a set of variable caches for the global variables that it uses. There are three kinds of variable cache - read caches for getting the value of a variable (referencing the variable), write caches for changing the value, and execute caches for calling the procedure assigned to that variable.
Sometimes the variable caches contain special objects, called reference traps, that indicate that the operation cannot proceed normally and must either be completed by the system (in order to keep the caches coherent) or must signal an error. For example, the assignment
(set! newline my-better-newline) |
will cause the system to go to each compiled procedure that calls
newline
and update its execute cache to call the new procedure.
Obviously you want to avoid updating hundreds of execute caches in a
critical loop. Using fluid-let
to temporarily redefine a
procedure has the same inefficiency (but twice!).
To behave correctly in all situations, each variable reference or assignment must check for the reference traps.
Sometimes you can prove that the variable (a) will always be bound, (b) will never be unassigned, and (c) there will never be any compiled calls to that variable. The compiler can't prove this because it assumes that other independently compiled files might be loaded that invalidate these assumptions. If you know that these conditions hold, the following declarations can speed up and reduce the size of a program that uses global variables.
The variables are specified with expressions from the following set language:
all
is the set of all
variables, none
is the empty set, free
is all of the
variables bound outside the current block, bound
is all of the
variables bound in the current block and assigned
is all of the
variables for which there exists an assignment (i.e. set!
).
For example, to ignore reference traps on all the variables except x, y and any variable that is assigned to
(declare (ignore-reference-traps (difference all (union assigned (set x y))))) |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The usual arithmetic operations like +
and <
are called
generic arithmetic operations because they work for all (appropriate)
kinds of number.
A fixnum is an exact integer that is small enough to fit in a machine word. In MIT Scheme, fixnums are 26 bits on 32-bit machines, and 56 bits on 64-bit machines; it is reasonable to assume that fixnums are at least 24 bits. Fixnums are signed; they are encoded using 2's complement.
All exact integers that are small enough to be encoded as fixnums are
always encoded as fixnums -- in other words, any exact integer that is
not a fixnum is too big to be encoded as such. For this reason, small
constants such as 0
or 1
are guaranteed to be fixnums. In
addition, the lengths of and valid indexes into strings and vectors are
also always fixnums.
If you know that a value is always a small fixnum, you can substitute the equivalent fixnum operation for the generic operation. However, care should be exercised: if used improperly, these operations can return incorrect answers, or even malformed objects that confuse the garbage collector. For a listing of all fixnum operations, see section `Fixnum Operations' in MIT Scheme Reference Manual.
A fruitful area for inserting fixnum operations is in the index operations in tight loops.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Getting efficient flonum arithmetic is much more complicated and harder than getting efficient fixnum arithmetic.
One of the main disadvantages of generic arithmetic is that not all kinds of number fit in a machine register. Flonums have to be boxed because a 64-bit IEEE floating-point number (the representation that MIT Scheme uses) does not fit in a regular machine word. This is true even on 64-bit architectures because some extra bits are needed to distinguish floating-point numbers from other objects like pairs and strings. Values are boxed by storing them in a small record in the heap. Every floating-point value that you see at the REPL is boxed. Floating-point values are unboxed only for short periods of time when they are in the machine's floating-point unit and actual floating-point operations are being performed.
Numerical calculations that happen to be using floating-point numbers cause many temporary floating-point numbers to be allocated. It is not uncommon for numerical programs to spend over half of their time creating and garbage collecting the boxed flonums.
Consider the following procedure for computing the distance of a point (x,y) from the origin.
(define (distance x y) (sqrt (+ (* x x) (* y y)))) |
The call (distance 0.3 0.4)
returns a new, boxed flonum, 0.5.
The calculation also generates three intermediate boxed flonums. This
next version works only for flonum inputs, generates only one boxed
flonum (the result) and runs eight times faster:
(define (flo:distance x y) (flo:sqrt (flo:+ (flo:* x x) (flo:* y y)))) |
Note that flo:
operations are usually effective only within a
single arithmetic expression. If the expression contains conditionals
or calls to procedures then the values tend to get boxed anyway.
Flonum vectors are vectors that contain only floating-point values, in much the same way as a string is a `vector' containing only character values.
Flonum vectors have the advantages of compact storage (about half that of a conventional vector of flonums) and judicious use of flonum vectors can decrease flonum consing.
The disadvantages are that flonum vectors are incompatible with ordinary vectors, and if not used carefully, can increase flonum consing. Flonum vectors are a pain to use because they require you to make a decision about the representation and stick with it, and it might not be easy to ascertain whether the advantages in one part of the program outweigh the disadvantages in another.
The flonum vector operations are:
The following operation causes no flonum consing because the flonum is loaded directly from the flonum vector into a floating-point machine register, added, and stored again. There is no need for a temporary boxed flonum.
(flo:vector-set! v 0 (flo:+ (flo:vector-ref v 0) 1.2)) |
In this next example, every time g
is called, a new boxed flonum
has to be created so that a valid Scheme object can be returned. If
g
is called more often than the elements of v are changed
then an ordinary vector might be more efficient.
(define (g i) (flo:vector-ref v i)) |
Pitfall 1: Make sure that your literals are floating-point constants:
(define (f1 a) (flo:+ a 1)) (define (f2 a) (flo:+ a 1.)) |
f1
will most likely cause a hardware error, and certainly give
the wrong answer. f2
is correct.
Pitfall 2:
It is tempting to insert calls to exact->inexact
to coerce values
into flonums. This does not always work because complex numbers may be
exact or inexact too. Also, the current implementation of
exact->inexact
is slow.
Pitfall 3:
A great deal of care has to be taken with the standard math procedures.
For example, when called with a flonum, both sqrt
and asin
can return a complex number (e.g with argument -1.5
).
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |