Notes on potential compiler improvements


* The analysis which generates `block-stack-link' could be improved.
Currently, it fails in the case where the procedure is always invoked
as a subproblem, but there are multiple possible continuations.  The
reason is subtle: we need to know that the `continuation/offset' of
all of the continuations is the same (as well as the frame-size and
the closing-block).  Unfortunately, the computation of the offset
depends on the subproblem ordering, which depends on the stack-link
(to decide whether or not to use static links).  Catch-22.  Probably
this can be solved.

* Pathological case of "takr.scm" can perhaps be solved by integrating
simapp and outer into one pass.  By handling "passed-in" nodes before
other nodes, and not making links to such nodes, the explosion of
useless dataflow information would be avoided.  However, this affects
the static-link analysis, which looks at BOTH the "passed-in" bit as
well as the set of values.  Think of some way to make this degrade
properly.

* Make the static-link analysis more sophisticated so that it uses
static links whenever the current strategy would require going through
at least two links.  This sometimes happens when the parent must be
located through the closing block of the continuation.  In this case
it is probably better to add a redundant static link for speed in
lookup.

* When tail-recursing into an internal procedure, if the procedure has
no free variables, we can erase the calling frame.  In the simplest
case, this means that such a procedure is actually an external
procedure.  However, we could get more sophisticated and notice that
it was OK to delete some of the ancestor stack frames but not others.

* The code generated by the rewrite rule for disjunctions demonstrates
that the decision about whether or not to use registers for LET
parameters does not depend on the entire body of the LET.  In this
case, the predicate parameter can ALWAYS be register allocated,
independent of the complexity of the alternative, because it is unused
once the decision has been made in favor of the alternative.  This can
be generalized to handle more complex cases.

* Change CFG implementation so that `hook' objects are just partially
connected edges.  I think that noop nodes can then be eliminated in
favor of disconnected edges.  This will also solve a potential problem
where deletion of the noop nodes at the entry points of continuations
leaves random hooks scattered around in various subproblems.

* Many closures are never invoked through their external entry points.
For such closures, the external entry point and associated code need
never be generated.  Also, the closure object need not contain a code
pointer.  This is one step closer to just using the closure frame
pointer in place of the closure.

* Perform dead-code-elimination at the same time as constant folding.
Be thorough, deleting all nodes associated with all code that is
eliminated.  This is tricky but pays off handsomely later on.  Also,
doing it after the dataflow but before the rest of the analysis
greatly reduces the amount of details that have to be kept track of
during deletion.

ALSO: note that removal of code to hack known predicates in "rgretn"
may make something like this necessary for simple cases.

Subsequent note: performing dead code elimination prior to subproblem
ordering has a problem in that there are cfg fragments in the
subproblems with invisible pointers into the node structure.  We can't
delete nodes unless we know about these pointers, so we must do dead
code elimination after subproblem ordering.

* Now that RTL generator does not generate temporaries for quantities
that are immediately pushed, tested, etc., we will need to modify the
CSE to generate temporaries for the cases where those quantities are
found in multiple places.  Hopefully this won't break the world.

* The interning of SCode variable objects (for explicit lookup) is
done on a per-block basis.  It should be changed so that stack blocks
are skipped and the interning is done on the nearest IC block.

* Fixnum operations

** Is signed bit-field extraction faster than current strategy if the
operand is in memory?

** In the case of addition of two boxed fixnums to a boxed result,  no
unboxing is needed on the operands provided the result is boxed in the
usual way.


		    Items that have been processed


* Introduction of inline-coded continuations (i.e. continuations of
type UNIQUE or SIMPLE) has invalidated the current method of
maintaining the frame pointer offset.  The reason is that the body of
such a continuation thinks that the frame pointer knows where its
frame is, while the offset in fact refers to some ancestor of that
frame.  I think that ignoring the frame of such a continuation in
`find-block' will produce the desired effect.

* JOIN type blocks aren't needed for offset, but they ARE needed to
prevent continuations from being classified as UNIFORM when they
aren't.

* To do `block-parent' operation on a "real" block, must skip any
intervening JOIN blocks to find the next "real" block.

* `generator/subproblem' has code to mark frame-size of a join block
if the continuation is closed in one.  That needs to be moved
elsewhere?

* Theory: JOIN blocks are always invisible _except_ when needed to
compute a frame pointer offset.  This means:

** `find-block' and friends in "emodel" need to know about them.  Also
the associated `stack-block-parent-locative' and similar
constructions.

** `procedure-closure-block' now refers to the previous
`block-parent'.  The closing code must refer to `block-%parent' to get
the lower-level closing block.

** `block->join/distance' in "rgretn" needs to learn about them.

* (implemented 8/88 -- cph) The code in "rgretn" should be modified as
follows.  At a return point, if the continuation is known, then we can
just jump to the continuation, as long as we set things up correctly
based on the operator class of the continuation.  This might mean, for
example, that we throw away the return address on the stack because we
know that it has a certain value.  In practice, this can occur when we
supply a continuation to a combination that goes to one of two
procedures.  The procedure in which the return appears is ONLY invoked
with this continuation, while the other procedure is sometimes invoked
with another continuation.  Thus we must push the return address,
because we don't know which procedure we're invoking, but at return
time it isn't needed.

* Some procedures that are being considered closures can easily be
open external.  Each of the free variables must satisfy one of the
following criteria: (1) it has a known value, or (2) it is bound in
the IC block being used for cached references.  This optimization will
make an enormous performance improvement on programs that consist of
many procedures closed in a compiled block, with a few external
closure entry points, because it will allow most of the internal
procedures to be open.  Currently they will all become closures.