home *** CD-ROM | disk | FTP | other *** search
- .SH
- The Intermediate Language
- .PP
- .FS
- \(dgUNIX is a Trademark of Bell Laboratories.
- .FE
- Communication between the two phases of the compiler proper
- is carried out by means of a pair of intermediate files.
- These files are treated as having identical structure,
- although the second file contains only the code generated for strings.
- It is convenient to write strings out separately to reduce the
- need for multiple location counters in a later assembly
- phase.
- .PP
- The intermediate language is not machine-independent;
- its structure in a number of ways reflects
- the fact that C was originally a one-pass compiler
- chopped in two to reduce the maximum memory
- requirement.
- In fact, only the latest version
- of the compiler has a complete
- intermediate language at all.
- Until recently, the first phase of the compiler generated
- assembly code for those constructions it could deal with,
- and passed expression parse trees, in absolute binary
- form,
- to the second phase for code generation.
- Now, at least, all inter-phase information
- is passed in a describable form, and there are
- no absolute pointers involved, so the coupling between
- the phases is not so strong.
- .PP
- The areas in which the machine
- (and system) dependencies are most noticeable are
- .IP 1.
- Storage allocation for automatic variables and arguments
- has already been performed,
- and nodes for such variables refer to them by offset
- from a display pointer.
- Type conversion
- (for example, from integer to pointer)
- has already occurred using the assumption of
- byte addressing and 2-byte words.
- .IP 2.
- Data representations suitable to the PDP-11 are assumed;
- in particular, floating point constants are passed as
- four words in the machine representation.
- .PP
- As it happens, each intermediate file is represented as a sequence
- of binary numbers without any explicit demarcations.
- It consists of a sequence of
- conceptual lines, each headed by an operator, and possibly containing
- various operands.
- The operators are small numbers;
- to assist in recognizing failure in synchronization,
- the high-order byte of each operator word is always the
- octal number 376.
- Operands are
- either 16-bit binary numbers or strings of characters representing names.
- Each name is terminated by a null character.
- There is no alignment requirement for numerical
- operands and so there is no padding
- after a name string.
- .PP
- The binary representation was chosen to avoid the necessity
- of converting to and from character form
- and to minimize the size of the files.
- It would be very easy to make
- each operator-operand `line' in the file be
- a genuine, printable line, with the numbers in octal or decimal;
- this in fact was the representation originally used.
- .PP
- The operators fall naturally into two classes:
- those which represent part of an expression, and all others.
- Expressions are transmitted in a reverse-Polish notation;
- as they are being read, a tree is built which is isomorphic
- to the tree constructed in the first phase.
- Expressions are passed as a whole, with no non-expression operators
- intervening.
- The reader maintains a stack; each leaf of the expression tree (name, constant)
- is pushed on the stack;
- each unary operator replaces the top of the stack by a node whose
- operand is the old top-of-stack;
- each binary operator replaces the top pair on the stack with
- a single entry.
- When the expression is complete there is exactly one item on the
- stack.
- Following each expression
- is a special operator which passes the unique previous expression
- to the `optimizer' described below and then to the code
- generator.
- .PP
- Here is the list of operators not themselves part of expressions.
- .LP
- .Op EOF
- marks the end of an input file.
- .Op BDATA "flag data ..."
- specifies a sequence of bytes to be assembled
- as static data.
- It is followed by pairs of words; the first member
- of the pair is non-zero to indicate that the data continue;
- a zero flag is not followed by data and terminates
- the operator.
- The data bytes occupy the low-order part of a word.
- .Op WDATA "flag data ..."
- specifies a sequence of words to be assembled as
- static data; it is identical to the BDATA operator
- except that entire words, not just bytes, are passed.
- .Op PROG
- means that subsequent information is to be compiled as program text.
- .Op DATA
- means that subsequent information is to be compiled as static data.
- .Op BSS
- means that subsequent information is to be compiled as unitialized
- static data.
- .Op SYMDEF name
- means that
- the symbol
- .I
- name
- .R
- is an external name defined in the current program.
- It is produced for each external data or function definition.
- .Op CSPACE "name size"
- indicates that the name refers to a data area whose size is the
- specified number of bytes.
- It is produced for external data definitions without explicit initialization.
- .Op SSPACE size
- indicates that
- .I
- size
- .R
- bytes should be set aside for data storage.
- It is used to pad out short initializations of external data
- and to reserve space for static (internal) data.
- It will be preceded by an appropriate label.
- .Op EVEN
- is produced after each
- external data definition whose size is not
- an integral number of words.
- It is not produced after strings except when they initialize
- a character array.
- .Op NLABEL name
- is produced just before a BDATA or WDATA initializing
- external data, and serves as a label for the data.
- .Op RLABEL name
- is produced just before each function definition,
- and labels its entry point.
- .Op SNAME "name number"
- is produced at the start of each function for each static variable
- or label
- declared therein.
- Subsequent uses of the variable will be in terms of the given number.
- The code generator uses this only to produce a debugging symbol table.
- .Op ANAME "name number"
- Likewise, each automatic variable's name and stack offset
- is specified by this operator.
- Arguments count as automatics.
- .Op RNAME "name number"
- Each register variable is similarly named, with its register number.
- .Op SAVE number
- produces a register-save sequence at the start of each function,
- just after its label (RLABEL).
- .Op SETREG number
- is used to indicate the number of registers used
- for register variables.
- It actually gives the register number of the lowest
- free register; it is redundant because the RNAME operators could be
- counted instead.
- .Op PROFIL
- is produced before the save sequence for functions
- when the profile option is turned on.
- It produces code to count the number
- of times the function is called.
- .Op SWIT "deflab line label value ..."
- is produced for switches.
- When control flows into it,
- the value being switched on is in the register
- forced by RFORCE (below).
- The switch statement occurred on the indicated line
- of the source, and the label number of the default location
- is
- .I
- deflab.
- .R
- Then the operator is followed by a sequence of label-number and value pairs;
- the list is terminated by a 0 label.
- .Op LABEL number
- generates an internal label.
- It is referred to elsewhere using the given number.
- .Op BRANCH number
- indicates an unconditional transfer to the internal label number
- given.
- .Op RETRN
- produces the return sequence for a function.
- It occurs only once, at the end of each function.
- .Op EXPR line
- causes the expression just preceding to be compiled.
- The argument is the line number in the source where the
- expression occurred.
- .Op NAME "class type name"
- .Op NAME "class type number"
- indicates a name occurring in an expression.
- The first form is used when the name is external;
- the second when the name is automatic, static, or a register.
- Then the number indicates the stack offset, the label number,
- or the register number as appropriate.
- Class and type encoding is described elsewhere.
- .Op CON "type value"
- transmits an integer constant.
- This and the next two operators occur as part of expressions.
- .Op FCON "type 4-word-value"
- transmits a floating constant as
- four words in PDP-11 notation.
- .Op SFCON "type value"
- transmits a floating-point constant
- whose value is correctly represented by its high-order word
- in PDP-11 notation.
- .Op NULL
- indicates a null argument list of a function call in an expression;
- call is a binary operator whose second operand is the argument list.
- .Op CBRANCH "label cond"
- produces a conditional branch.
- It is an expression operator, and will be followed
- by an EXPR.
- The branch to the label number takes place if the expression's
- truth value is the same as that of
- .I
- cond.
- .R
- That is, if
- .I
- cond=1
- .R
- and the expression evaluates to true, the branch is taken.
- .Op binary-operator type
- There are binary operators corresponding
- to each such source-language operator;
- the type of the result of each is passed as well.
- Some perhaps-unexpected ones are:
- COMMA, which is a right-associative operator designed
- to simplify right-to-left evaluation
- of function arguments;
- prefix and postfix ++ and \-\-, whose second operand
- is the increment amount, as a CON;
- QUEST and COLON, to express the conditional
- expression as `a?(b:c)';
- and a sequence of special operators for expressing
- relations between pointers, in case pointer comparison
- is different from integer comparison
- (e.g. unsigned).
- .Op unary-operator type
- There are also numerous unary operators.
- These include ITOF, FTOI, FTOL, LTOF, ITOL, LTOI
- which convert among floating, long, and integer;
- JUMP which branches indirectly through a label expression;
- INIT, which compiles the value of a constant expression
- used as an initializer;
- RFORCE, which is used before a return sequence or
- a switch to place a value in an agreed-upon register.
-