home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!olivea!decwrl!decwrl!world!iecc!compilers-sender
- From: tarvydas@tsctrl.guild.org (tarvydas)
- Newsgroups: comp.compilers
- Subject: Re: Intermediate code representations (detailed)
- Keywords: Eiffel, design, bibliography
- Message-ID: <92-07-083@comp.compilers>
- Date: 23 Jul 92 16:51:57 GMT
- Sender: compilers-sender@iecc.cambridge.ma.us
- Reply-To: tarvydas@tsctrl.guild.org (tarvydas)
- Organization: Compilers Central
- Lines: 246
- Approved: compilers@iecc.cambridge.ma.us
-
- I've had great pleasure/success using the ideas of Ric Holt (University of
- Toronto) and Jim Cordy (Queen's University, Kingston, Ontario). I've used
- these same ideas in the past to build most of an experimental Eiffel
- compiler (a relatively pure OOP language).
-
- The nucleus for compiler data type representation can be found in "data
- descriptors". Your semantic analyzer should contain at least 3 "tables"
- (semantic domains) - the symbol table, the type table and the scope table.
- The job of the semantic analyzer is to connect elements of these tables,
- for example a struct:
-
- struct s {
- int a;
- char b;
- };
-
- would be represented by:
-
- symbol scope type
- ====== ===== ====
- s ----------------------------> struct -+
- |
- +------------+++++ <-----------------+
- / +----------+++++
- / /
- a ----------------------------> int
- /
- b ----------------------------> char
-
-
- You can see that, as this method scales up to handle larger examples,
- specific types will be shared by many symbols. Complex structures, eg.
- structs of arrays of structs, just fall into place (eg. imagine that the
- "int" entry above happens to be another struct, with links to another
- scope and its symbols).
-
- Scope entries can be implemented as linked lists, or, as small hash
- tables, or, as blocks of integers/pointers which index/point-to entries in
- the symbol table. The set of scopes visible at any point during the
- compilation (eg. the global scope, the routine scope and, say, another
- nested scope) can be represented by a stack of pointers to little scope
- hash tables, or, by a framed stack of indices/pointers-to symbols (ie.
- information copied directly from the scope table).
-
- As your semantic analysis proceeds, it filters out variables and converts
- them to data descriptors, using the information in the tables. Later
- "passes" tweak the data descriptors, eg. the allocator decides where each
- piece of data should be allocated and modifies the corresponding data
- descriptors to contain this machine specific information. The coder
- "pass" then takes the filtered program and emits code. The coder can do a
- good job, without great complexity, because data descriptors give a
- uniform representation of any compile-time object (ie. variables,
- parameters, constants, registers, etc).
-
- A data descriptor is, roughly:
-
- k
- @ b.d.[x.s]
-
- where:
-
- k
- @ represents the number of levels of indirection (0 for constants and
- registers, 1 for "normal" variables in memory, 2 for variables pointed-to
- by pointer variables, etc).
-
- B represents the base (null for constants and absolute addresses, a
- register, another data descriptor or an abstract lexic base (eg. the
- Global Lexic Base, the Lexic Base for the local variables for a certain
- routine, etc).
-
- D represents the displacement - a numeric displacement from the base,
- plus, a set of labels (for representing labelled data, offsets from the
- ".data" beginning address (_edata), and branch labels within the program
- itself).
-
- [X.S] represents and index register plus a scale (the multiplier).
-
-
- Here are some simple data descriptor examples:
-
- 68k-like operand data descriptor
- ---------------- ---------------
-
- 0
- #5 (constant) @ .null.5 (index is null, so I dropped it)
-
-
- 1
- 1234 (absolute addr) @ .null.1234
-
-
- 0
- a4 @ .a4.0
-
-
- 1
- -8(sp) @ .sp.-8
-
-
- 2
- 0(-8(sp)) @ .sp.-8
-
-
- 1
- 8(a4,d1:L) @ .a4.8.[d1.4]
-
-
- 1
- xyz (pc-rel label) @ .pc.xyz
-
-
- 1
- 5-th local within @ .L92.5
- routine #92
-
-
- The translation of this notation into a C struct is straight-forward. In
- reality, you need to add a size field and it's sometimes convenient to add
- a machine-type field (b/w/l) to help the coder make some of its decisions
- more efficiently.
-
- You can develop a whole data descriptor algebra which then tells you how
- to emit good code. For example, imagine a local array of words (16 bits),
- starting at the -8th byte from the stack pointer. The array starts at
- index 1. We wish to access element 3. The data descriptor algebra shows
- that a manifest indexing operation can be done without emitting any code:
-
- 1
- array base (first byte) = @ .sp.-8
-
-
- 0
- index (constant 3) = @ .null.3
-
-
- indexing operation =
-
- take the address of the array, scale the index to
- origin 0, multiply the index by its scaling factor (2
- in this case), add it to the address and promote the
- result back to being a variable
-
- 1 0 0 0
- = makevar( addrof(@ .sp.-8) + (@ .null.3 - @ .null.1) * @ null.2) )
-
- -1 0
- = makevar( @ * @ .sp.-8 + ...)
-
- 0
- = makevar( @ .sp.-8 + ...)
-
- 0 0
- = makevar( ... + (@ .null.2) * @ null.2)
-
- 0
- = makevar( ... + (@ .null.4))
-
- 0 0
- = makevar( @ .sp.-8 + @ .null.4 )
-
- 0
- = makevar( @ .sp.(-8+4) )
-
- 0
- = makevar( @ .sp.-4 )
-
- 1 0
- = @ * @ .sp.-4
-
- 1
- = @ .sp.-4
-
- = -4(sp).
-
-
- We've just watched an automatable compile-time calculation of an indexing
- operation which didn't emit any code. The resulting operand, -4(sp), goes
- on to be used in further operations and will be emitted as necessary - the
- whole array indexing operation was subsumed, at compile time, into a
- single operand. It should be obvious that variable (non-manifest) indices
- can't be folded exactly this way, but, for example, they could be folded
- into the index field of the data descriptor (the code emitted might be:
- "move.l var,d2" and the
- 1
- resulting data descriptor might be @ .sp.-8.[d2.2], which still emits the
- least amount of code possible for this situation (I'm just trying to make
- a point - if you think that a better code sequence is possible, you're
- welcome to twist the decision trees :-)).
-
- What I've just shown are, literally, some of the steps that the OCG
- technology takes when optimizing code - address mode arithmetic becomes a
- no-brainer and can be relegated to just a few OCG decision trees when data
- descriptors are used.
-
-
- Your intermediate code problem can be solved by looking at the operations
- which the Orthogonal Code Generator can accept. The operations use data
- descriptors as operands - you can interpret the result if you wish, and,
- you can use the OCG to compile the operations into locally good code.
-
- Rosselet's paper on PT Pascal, although old and applied to compiling a
- boring language, is an excellent learning tool in that it has pictures of
- these data structures and has the S/SL source code for the whole compiler
- (you have to get S/SL via anonymous ftp and you have to write the simple
- S/SL mechanisms yourself in C or something). PT's coder predates OCG, but
- still demonstrates the evolutionary path which spawned the OCG.
-
- Paul Tarvydas
- TS Controls
- tarvydas@tsctrl.guild.org
-
-
-
- R. C. Holt
- Data Descriptors: A Compile-Time Model of Data and Addressing
- ACM Transactions on Programming Languages and Systems.
- Volume 9.
- pages 367-389.
- July 1987.
-
-
- Code Generation Using an Orthogonal Model
- Cordy, Holt
- Software Practice and Experience
- Volume 20 (3)
- March 1990
- pages 301-320
-
-
- An Orthogonal Model for Code Generation
- Technical Report CSRI-177.
- Computer Systems Research Institute.
- University of Toronto.
- January 1986.
-
-
- PT: A Pascal Subset
- Alan Rosselet
- Tech Report CSRG-119
- University of Toronto
- September 1980
-
- --
- Send compilers articles to compilers@iecc.cambridge.ma.us or
- {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
-