Notes on potential compiler improvements * The analysis which generates `block-stack-link' could be improved. Currently, it fails in the case where the procedure is always invoked as a subproblem, but there are multiple possible continuations. The reason is subtle: we need to know that the `continuation/offset' of all of the continuations is the same (as well as the frame-size and the closing-block). Unfortunately, the computation of the offset depends on the subproblem ordering, which depends on the stack-link (to decide whether or not to use static links). Catch-22. Probably this can be solved. * Pathological case of "takr.scm" can perhaps be solved by integrating simapp and outer into one pass. By handling "passed-in" nodes before other nodes, and not making links to such nodes, the explosion of useless dataflow information would be avoided. However, this affects the static-link analysis, which looks at BOTH the "passed-in" bit as well as the set of values. Think of some way to make this degrade properly. * Make the static-link analysis more sophisticated so that it uses static links whenever the current strategy would require going through at least two links. This sometimes happens when the parent must be located through the closing block of the continuation. In this case it is probably better to add a redundant static link for speed in lookup. * When tail-recursing into an internal procedure, if the procedure has no free variables, we can erase the calling frame. In the simplest case, this means that such a procedure is actually an external procedure. However, we could get more sophisticated and notice that it was OK to delete some of the ancestor stack frames but not others. * The code generated by the rewrite rule for disjunctions demonstrates that the decision about whether or not to use registers for LET parameters does not depend on the entire body of the LET. In this case, the predicate parameter can ALWAYS be register allocated, independent of the complexity of the alternative, because it is unused once the decision has been made in favor of the alternative. This can be generalized to handle more complex cases. * Change CFG implementation so that `hook' objects are just partially connected edges. I think that noop nodes can then be eliminated in favor of disconnected edges. This will also solve a potential problem where deletion of the noop nodes at the entry points of continuations leaves random hooks scattered around in various subproblems. * Many closures are never invoked through their external entry points. For such closures, the external entry point and associated code need never be generated. Also, the closure object need not contain a code pointer. This is one step closer to just using the closure frame pointer in place of the closure. * Perform dead-code-elimination at the same time as constant folding. Be thorough, deleting all nodes associated with all code that is eliminated. This is tricky but pays off handsomely later on. Also, doing it after the dataflow but before the rest of the analysis greatly reduces the amount of details that have to be kept track of during deletion. ALSO: note that removal of code to hack known predicates in "rgretn" may make something like this necessary for simple cases. Subsequent note: performing dead code elimination prior to subproblem ordering has a problem in that there are cfg fragments in the subproblems with invisible pointers into the node structure. We can't delete nodes unless we know about these pointers, so we must do dead code elimination after subproblem ordering. * Now that RTL generator does not generate temporaries for quantities that are immediately pushed, tested, etc., we will need to modify the CSE to generate temporaries for the cases where those quantities are found in multiple places. Hopefully this won't break the world. * The interning of SCode variable objects (for explicit lookup) is done on a per-block basis. It should be changed so that stack blocks are skipped and the interning is done on the nearest IC block. * Fixnum operations ** Is signed bit-field extraction faster than current strategy if the operand is in memory? ** In the case of addition of two boxed fixnums to a boxed result, no unboxing is needed on the operands provided the result is boxed in the usual way. Items that have been processed * Introduction of inline-coded continuations (i.e. continuations of type UNIQUE or SIMPLE) has invalidated the current method of maintaining the frame pointer offset. The reason is that the body of such a continuation thinks that the frame pointer knows where its frame is, while the offset in fact refers to some ancestor of that frame. I think that ignoring the frame of such a continuation in `find-block' will produce the desired effect. * JOIN type blocks aren't needed for offset, but they ARE needed to prevent continuations from being classified as UNIFORM when they aren't. * To do `block-parent' operation on a "real" block, must skip any intervening JOIN blocks to find the next "real" block. * `generator/subproblem' has code to mark frame-size of a join block if the continuation is closed in one. That needs to be moved elsewhere? * Theory: JOIN blocks are always invisible _except_ when needed to compute a frame pointer offset. This means: ** `find-block' and friends in "emodel" need to know about them. Also the associated `stack-block-parent-locative' and similar constructions. ** `procedure-closure-block' now refers to the previous `block-parent'. The closing code must refer to `block-%parent' to get the lower-level closing block. ** `block->join/distance' in "rgretn" needs to learn about them. * (implemented 8/88 -- cph) The code in "rgretn" should be modified as follows. At a return point, if the continuation is known, then we can just jump to the continuation, as long as we set things up correctly based on the operator class of the continuation. This might mean, for example, that we throw away the return address on the stack because we know that it has a certain value. In practice, this can occur when we supply a continuation to a combination that goes to one of two procedures. The procedure in which the return appears is ONLY invoked with this continuation, while the other procedure is sometimes invoked with another continuation. Thus we must push the return address, because we don't know which procedure we're invoking, but at return time it isn't needed. * Some procedures that are being considered closures can easily be open external. Each of the free variables must satisfy one of the following criteria: (1) it has a known value, or (2) it is bound in the IC block being used for cached references. This optimization will make an enormous performance improvement on programs that consist of many procedures closed in a compiled block, with a few external closure entry points, because it will allow most of the internal procedures to be open. Currently they will all become closures.