NetNews Usenet Archive 1992 #16

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #16 / NN_1992_16.iso / spool / comp / compiler / 1262 < prev next >

Wrap

Internet Message Format | 1992-07-25 | 8.5 KB

Path: sparky!uunet!olivea!decwrl!decwrl!world!iecc!compilers-sender From: tarvydas@tsctrl.guild.org (tarvydas) Newsgroups: comp.compilers Subject: Re: Intermediate code representations (detailed) Keywords: Eiffel, design, bibliography Message-ID: <92-07-083@comp.compilers> Date: 23 Jul 92 16:51:57 GMT Sender: compilers-sender@iecc.cambridge.ma.us Reply-To: tarvydas@tsctrl.guild.org (tarvydas) Organization: Compilers Central Lines: 246 Approved: compilers@iecc.cambridge.ma.us I've had great pleasure/success using the ideas of Ric Holt (University of Toronto) and Jim Cordy (Queen's University, Kingston, Ontario). I've used these same ideas in the past to build most of an experimental Eiffel compiler (a relatively pure OOP language). The nucleus for compiler data type representation can be found in "data descriptors". Your semantic analyzer should contain at least 3 "tables" (semantic domains) - the symbol table, the type table and the scope table. The job of the semantic analyzer is to connect elements of these tables, for example a struct: struct s { int a; char b; }; would be represented by: symbol scope type ====== ===== ==== s ----------------------------> struct -+ | +------------+++++ <-----------------+ / +----------+++++ / / a ----------------------------> int / b ----------------------------> char You can see that, as this method scales up to handle larger examples, specific types will be shared by many symbols. Complex structures, eg. structs of arrays of structs, just fall into place (eg. imagine that the "int" entry above happens to be another struct, with links to another scope and its symbols). Scope entries can be implemented as linked lists, or, as small hash tables, or, as blocks of integers/pointers which index/point-to entries in the symbol table. The set of scopes visible at any point during the compilation (eg. the global scope, the routine scope and, say, another nested scope) can be represented by a stack of pointers to little scope hash tables, or, by a framed stack of indices/pointers-to symbols (ie. information copied directly from the scope table). As your semantic analysis proceeds, it filters out variables and converts them to data descriptors, using the information in the tables. Later "passes" tweak the data descriptors, eg. the allocator decides where each piece of data should be allocated and modifies the corresponding data descriptors to contain this machine specific information. The coder "pass" then takes the filtered program and emits code. The coder can do a good job, without great complexity, because data descriptors give a uniform representation of any compile-time object (ie. variables, parameters, constants, registers, etc). A data descriptor is, roughly: k @ b.d.[x.s] where: k @ represents the number of levels of indirection (0 for constants and registers, 1 for "normal" variables in memory, 2 for variables pointed-to by pointer variables, etc). B represents the base (null for constants and absolute addresses, a register, another data descriptor or an abstract lexic base (eg. the Global Lexic Base, the Lexic Base for the local variables for a certain routine, etc). D represents the displacement - a numeric displacement from the base, plus, a set of labels (for representing labelled data, offsets from the ".data" beginning address (_edata), and branch labels within the program itself). [X.S] represents and index register plus a scale (the multiplier). Here are some simple data descriptor examples: 68k-like operand data descriptor ---------------- --------------- 0 #5 (constant) @ .null.5 (index is null, so I dropped it) 1 1234 (absolute addr) @ .null.1234 0 a4 @ .a4.0 1 -8(sp) @ .sp.-8 2 0(-8(sp)) @ .sp.-8 1 8(a4,d1:L) @ .a4.8.[d1.4] 1 xyz (pc-rel label) @ .pc.xyz 1 5-th local within @ .L92.5 routine #92 The translation of this notation into a C struct is straight-forward. In reality, you need to add a size field and it's sometimes convenient to add a machine-type field (b/w/l) to help the coder make some of its decisions more efficiently. You can develop a whole data descriptor algebra which then tells you how to emit good code. For example, imagine a local array of words (16 bits), starting at the -8th byte from the stack pointer. The array starts at index 1. We wish to access element 3. The data descriptor algebra shows that a manifest indexing operation can be done without emitting any code: 1 array base (first byte) = @ .sp.-8 0 index (constant 3) = @ .null.3 indexing operation = take the address of the array, scale the index to origin 0, multiply the index by its scaling factor (2 in this case), add it to the address and promote the result back to being a variable 1 0 0 0 = makevar( addrof(@ .sp.-8) + (@ .null.3 - @ .null.1) * @ null.2) ) -1 0 = makevar( @ * @ .sp.-8 + ...) 0 = makevar( @ .sp.-8 + ...) 0 0 = makevar( ... + (@ .null.2) * @ null.2) 0 = makevar( ... + (@ .null.4)) 0 0 = makevar( @ .sp.-8 + @ .null.4 ) 0 = makevar( @ .sp.(-8+4) ) 0 = makevar( @ .sp.-4 ) 1 0 = @ * @ .sp.-4 1 = @ .sp.-4 = -4(sp). We've just watched an automatable compile-time calculation of an indexing operation which didn't emit any code. The resulting operand, -4(sp), goes on to be used in further operations and will be emitted as necessary - the whole array indexing operation was subsumed, at compile time, into a single operand. It should be obvious that variable (non-manifest) indices can't be folded exactly this way, but, for example, they could be folded into the index field of the data descriptor (the code emitted might be: "move.l var,d2" and the 1 resulting data descriptor might be @ .sp.-8.[d2.2], which still emits the least amount of code possible for this situation (I'm just trying to make a point - if you think that a better code sequence is possible, you're welcome to twist the decision trees :-)). What I've just shown are, literally, some of the steps that the OCG technology takes when optimizing code - address mode arithmetic becomes a no-brainer and can be relegated to just a few OCG decision trees when data descriptors are used. Your intermediate code problem can be solved by looking at the operations which the Orthogonal Code Generator can accept. The operations use data descriptors as operands - you can interpret the result if you wish, and, you can use the OCG to compile the operations into locally good code. Rosselet's paper on PT Pascal, although old and applied to compiling a boring language, is an excellent learning tool in that it has pictures of these data structures and has the S/SL source code for the whole compiler (you have to get S/SL via anonymous ftp and you have to write the simple S/SL mechanisms yourself in C or something). PT's coder predates OCG, but still demonstrates the evolutionary path which spawned the OCG. Paul Tarvydas TS Controls tarvydas@tsctrl.guild.org R. C. Holt Data Descriptors: A Compile-Time Model of Data and Addressing ACM Transactions on Programming Languages and Systems. Volume 9. pages 367-389. July 1987. Code Generation Using an Orthogonal Model Cordy, Holt Software Practice and Experience Volume 20 (3) March 1990 pages 301-320 An Orthogonal Model for Code Generation Technical Report CSRI-177. Computer Systems Research Institute. University of Toronto. January 1986. PT: A Pascal Subset Alan Rosselet Tech Report CSRG-119 University of Toronto September 1980 -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.