The C Users' Group Library 1994 August

home *** CD-ROM | disk | FTP | other *** search

/ The C Users' Group Library 1994 August / wc-cdrom-cusersgrouplibrary-1994-08.iso / vol_300 / 357_01 / cstar1.exe / DESIGN.DOC < prev next >

Wrap

Text File | 1991-06-20 | 17KB | 426 lines

DESIGN.DOC: Design issues pertaining to the CSTAR translator June 25, 1991 The following notes were written in 1987, I believe. The subject of these notes are various issues concerning the design of the CSTAR translator, and in some cases the CSTAR language. Be warned: these notes are extremely technical in nature and assume a deep level of knowledge of the CSTAR translator. They are likely to be of interest only to *serious* students of CSTAR. Parts of this file have the flavor of cryptic "notes to myself." If you have a question, please don't be intimidated by the obscure nature of these notes. Just give me a call and I'll be happy to tell you what I know. By the way, much of this file was written by a brilliant programmer named Paul Schick, who worked on CSTAR for about a year with me. Parts of this file are in dialog form, with my comments being marked by EKR: Another way of looking at this file is as pages from an engineering notebook. Some of the issues discusses here never really got satisfactorily resolved. The section called goals, principles and problems was written by me, and the rest of the file is really a discussion of the problems in meeting those goals and principles in any kind of clean way. Some terminology: LHS stands for the Left Hand Side of an assignment statement and RHS stands for the Right Hand Side. A "star expression" is an expression containing the * operator. A "loc node" is the part of the parse tree containing all information for a "location," which usually corresponds to some identifier. Understanding how loc nodes are created and used is a key to understanding how the compiler works. GOALS, PRINCIPLES AND PROBLEMS The primary goal is to make available all of the efficiency of assembly language in a reasonable manner. 1. Register allocation must be done in an obvious and efficient way. 2. As much as possible, consistent with other principles, we want to allow the programmer to experiment with register allocation without rewriting the form of expressions. 3. The details of the 68000 architecture should be separate from the actual code. I see no necessity of embedding all the details of which addressing modes are allowed in which instruction all through the compiler code. This suggests avoiding flaky rules on expression composition. 4. The code generator ought to function independently of the the precise details of what kinds of expressions are allowed, but that may be wishful thinking. The code generation process can be broken down in concept into three phases: a) Assign result location to each operator. Once this is done, all other decisions are completely determined. For binary operators, one operand will, in general, be a result location from another operator. b) Determine whether a machine instruction exists to apply the indicated operator to the two operands and produce the required result. Again, the previous phase will have tried to ensure that the result location is the same as one of the arguments. c) A peephole optimizer pass operates to handle flow-of-control problems. At present, I do not see a peephole pass being applied to arithmetic expressions, though I have not really looked into this possibility. Consider the following expression. a = b + (c = d + (e = 2 + f))); The CSTAR translator treats this as exactly equivalent to e = 2 + f; c = d + e; a = b + c; with the restriction that c may not be a and e may not be c. But C compilers are free to rearrange the order of evaluation of operands, so additional restrictions are needed to ensure that the corresponding C expression always is evaluated correctly. In the above example, d should not be e and b should not be c. Isn't the assignment a sequence point? ON PREDICTABILITY We are entitled to predict what we have specified, and no more (or not very much more.) It seems to me, right now, that a statement like: m1 = (m2*m3) + (m4*m5); really is, even in CSTAR a specification which says: 1. put the LHS into the RHS and don't tell me how you did it. 2. don't touch anything except the authorized temporaries. I'm really not entitled to know more, because I didn't say more. Now, in practice, I ought to help myself to a little bit more, and say that as long as every non-assignment operator corresponds to an assignment operator, one for one, e. g.: m1 = (d2 = m2 * m3) + (d3 = m4 + m5); then there really aren't any choices left (except for whatever moves are necessary to reconcile operands and results), so now--and only now--I haven't asked for any surprises, and so I'm entitled to not get them. EKR: doesn't this still leave open the question of whether to use the LHS as a temporary or not? What's new and useful is that I have that choice. I didn't have that choice in full C, because I can only specify these things loosely in full C, and the compiler is entitled to ignore me. I didn't have that choice in macro assembler, but for a different reason: I was obliged to specify everything all the time, whether I needed to or not, which is why programming was so impossibly tedious. So I don't want to give up the right to specify register designators, and to know how function calls are done, and go back to C, because I want the option to handle those things in a specific, assembly-like way. Nor do I to eschew all compiler decisions all the time and go back to macro assembler, because I want relief from the burden of specifying everything all the time, needed or not. In a way, I want in-line assembly code, but in-line assembly code that works and is workable. Normally it is not workable, because the register designators can't be treated as variables, so you have to go through a big song and dance with lots of overhead in order to feed something to the in- line code, and then another big song and dance to get it out again; nobody could ever optimize anything that way. It might be helpful to have the compiler issue a message if a "ternary" assignment, etc., compiles to a surprise. The programmer would be informed about an incorrect assumption (possibly resulting from an overlooked instruction-set quirk), and we'd have to decide what a "surprise" is, or whether there's any such thing. HOW CAN WE EXPERIMENT WITH REGISTER/MEMORY ALLOCATION? We can't, unless the rules for composing expressions are blind to register vs. memory allocations. They need not be blind with respect to pointers vs. nonpointers, since that is a question of type. For assignments, there is always at least one data temporary register and one address temporary register, as specified in the command line or in the function. We will refer to the first pair as as and ds for convenience. The programmer should act as though all of these scratch registers are clobbered by any assignment. Any assumption to the contrary may not be portable, which may be a good reason to allow for specifying a temporary-count individually for each function. (E.g. it might be that it is infeasible to put a result into local memory without loading a component of the address into the address temporary.) On the 68000, and other processors that distinguish address and data registers, the type of a hardware address register shall always be pointer. On the 8086/80186/80286, this is guaranteed to get weird. The type of the result of an operator can be calculated in a machine independent way as the tree for the expression is built bottom up by the reduce() routine. The rules for doing this depend only on the rules stated in K & R. Functions which produce a non-pointer result leave that result in ds. Functions which produce a pointer result leave that result in as. Temporaries will be used in a canonical way in coding any expression which invokes them. The compiler will do no rearrangements, and will evaluate operands by order of association unless there are contradicting parentheses. In order to enhance speed and simplify code generation, a data register will be used in lieu of the LHS to accumulate the result, if the LHS is in memory and the RHS is nontrivial. Complex expressions which are speed- or space-critical should (as a matter of style) either be decomposed, or have intermediate assignments explicitly specified on the RHS. In CSTAR, stack temporaries are not used, so there is no indefinitely extensible set of temporaries available. It is therefore advisable to keep expressions as simple as possible consistent with readability, in order to know that there are enough temporaries. In the simple, ternary assignment statement, the LHS will not be used as an accumulator [if there is any alternative], [or maybe never at all, but what about exclusive-or?] because of the following: m = d1 + d2; option1: treat as: m = d1; m += d2; one move, one add, two addresses, 32 clocks option 2: treat as: m = dt = d1 + d2; dt = d1; dt += d2; m = dt; two moves, one add, one address, 20 clocks Most assignments will have implied moves under the covers since the processors generally require results to be placed in at least one, and possibly in both, operands, while the operator form actually written in source code implies no such thing. If speed or compactness is highly critical, the programmer must use forms that are more specific, and correspond more closely to the hardware. However, such forms tend to be less readable, so there is no requirement to use them in non-critical code. EKR: One solution [?] Have assignment operators (+=, *=, etc.) always produce exactly the indicated code. STAR EXPRESSIONS Star expressions arise out of the P> and [ ] operators, but with luck that should not affect this discussion. Star expressions have 3 parts, 1) an address register, 2) a data register and 3) a constant. At least one of these parts must be present. The parser will probably rearrange the parse tree so that the unary * operator has these 3 subtrees under it. Before the parser rearranges the tree, the operand of a unary * operator is either: 1) another unary * operator or 2) a tree containing a star expression whose operands are joined by + operators (or a P operator if the P operator applies to the constant.) The a0 register is used to do multiple indirections as required. For example, the star expression ***a5 generates a0 = *a5; a0 = *a5; and *a0 is then used as the equivalent operand. Scaling is done in d0. Scaling is done if the data register is present and the scale factor is not 1. For example, the star expression *(a5 + d5) which is equivalent to a5 [d5] is done as d0 = d5; /* assembly language (no scaling) */ d0 *= scale_factor; a0 = a5 + d0 and then *a0 is used as the equivalent operand. Note that scaling by constants between 2 and 10 (?) is done by masking and shifting instead of multiplying. Similarly, the star expression * (d5 + base_constant) which is equivalent to base_constant [d5] is done as d0 = d5 d0 *= scale_factor; a0 = base_constant + d0; and then *a0 is used as the equivalent operand. When scaling is not required at run time, i.e., when the data register is not present or when the scale factor is one, star expressions are equivalent to 68000 addressing modes. For example, char * a1; *(a1 + d1 + 5); the star expression is the same as the equivalent operand and generates the #5(a1,d1) addressing mode. One could, in theory, allow more complex star expressions by substituting an inner expression for the data register. For example: * (a1 + (d1 = *a2) + 5); This would be unwrapped as: d1 = *a2; *(a1 + d1 + 5); At present, I see no particular problem with allowing this generality. Indeed, one could use d0 as an implicit d register and write *(a1 + *a2 + 5); which would be the same as d0 = *a2; *(a1 + d0 + 5); It turns out that finding the three separate parts of a star expression is not difficult. At the top level of a tree for the star expression, look for a subtree with pointer type. That subtree contains the address register. If the subtree contains a binary operator, one subtree of the binary operator will contain the pointer and the other will contain the data register and/or the constant. Again, if that subtree contains a binary operator, one subtree must contain the data register and the other subtree must contain the constant or constant expression. WHEN ARE EXPRESSIONS TOO COMPLEX? When we run out of temporaries given the expression and the current settings. Associated with each assignment is a list of forbidden registers which may not appear in the RHS of the assignment. This list is created by the unwrapping program. Actually, it's probably more complicated than this. There must be some canonical tools. The result of a subtree need not be either the LHS or the result temporary if the subtree consists of a primitive. In that case, that subtree does not limit in any way the form of the complexity of the other subtree. Eh? REWRITING RULES An expression of the form a = arg1 op1 arg2 op2 ... opn argn+1 is, by definition, equivalent to a = arg1 a op1 = arg2 a opn = argn+1 I don't know anything about this from here down. As the result of this equivalence, the LHS becomes a hidden common subexpression. The LHS temporary may be used to store this hidden common subexpression. For example, if the LHS temporary is available, the expression ***a5 = a + b; generates a1 = **a5; *a1 = a; *a1 += b; If the LHS temporary is not used, the following code would have to be generated (In general, *a0 might be the effective operand for a or b). a1 = *a5; a1 = *a5; *a1 = a; a1 = *a5; a1 = *a5; *a1 += b; In actuality, the expression ***a5 = a + b would not be allowed if the LHS temporary is not used. Another way around this problem would be to write the expression this way ***a5 = d0 = a + b; In general, if the effective LHS of an expression is *an, the only non-primitive operators allowed on the RHS are + P and ^. Rewriting rules move a lot of "code generation" into the parser. MANAGEMENT OF LOC_NODES DESIGNATING SIMPLE REGISTERS The management of loc_nodes is not as simple as we once thought it was. At this time, in fact, they are not managed at all, but created at will in great profusion. The essence of the problem is that because of the way the parser makes loc_nodes, it is permissible to alter them at will-- until they get attached to the code tree. Once they get attached to the code tree, they can no longer be altered because to do so would cause retroactive changes earlier in the code. This fact underlaid our bafflement at the initial behavior of the code generator. Two techniques are proposed to minimize proliferation, and are already provided for (and a compile-time switch should switch them out, to help us track down the inevitable bugs!) The simpler one is to mark nodes when they get attached to the code list, so that conditional routines like locn_xdupl need not copy them unless they are marked. In most cases, this means that extra copies won't be made except when inner-level assignments are used as values, as in a = b = c; The more complex one involves the locn_chmod routine, which is meant to manage a set of simple register and register-indirect nodes. These permanent nodes will be marked, so that they never get altered, and routines like locn_chmod will operate by substituting one for another. This will limit most copying of temporaries to cases where nodes are actually combined, rather than just referred to. Provision is also made for locn_xconst to do something similar if it turns out that we are generating lots of internal ones. Note that we may want to improve the folding routines in the parser eventually to eliminate the creation of superfluous zeros, in particular. These nodes do need to carry their types, at least transitorily, but perhaps there is some practical way to work this out. VARIABLES AND VARIABLE NAMES Variables are set up in the parser to be suitable for their intended storage class. This fact implies some coupling of the parser to the code generator; the parser is not totally machine-independent. There is as yet no mechanism for properly determining the range of a symbolic name; that is, whether it can be combined into an address mode or not, although portions are in place. In particular, globals have to be always regarded as out-of-range, since the linker may place them anywhere at its own option.