home *** CD-ROM | disk | FTP | other *** search
Text File | 1986-06-03 | 97.6 KB | 2,047 lines |
-
-
-
- Pascal in 25 Rules or Less
-
- William A. Barrett
- QCAD Systems, Inc., 1164 Hyde Ave, San Jose, CA 95129
-
- In this article, we're going to design and implement a
- small subset Pascal compiler, using the Turbo Pascal
- compiler and a couple of other software tools that you can
- purchase for a nominal charge. It is capable of translating
- a Pascal subset program into 8086 symbolic assembly
- language, which can then be assembled into a running program
- on an IBM PC.
- When you look at our tiny Pascal, it may strike you as
- being so ridiculously simple that it has no useful
- applications. However, as we'll explain, it provides all
- the important features of a high-level programming language,
- and can be extended indefinitely by writing more support
- functions.
- By reading this article, or better, by ordering the
- support software described at the end of this article, you
- will not only have your own extensible compiler going, but
- will have learned how language translators and compilers
- are organized and written. So... carry on, please!
-
-
- Tiny Pascal Features
-
- A full Pascal language, like Turbo, has about a hundred
- features, and a large support library. We're going to
- choose just a few features, and simplify the language to
- such an extent that it probably isn't fair to call it Pascal
- -- it just resembles Pascal in some ways. In particular, we
- want a description of our language in just twenty-five
- production rules or less, since the Qparser demo pack
- supports just that many. Twenty-six rules, and the demo
- system won't work! Does this sound like a contest you've
- entered?
- Anyway, we've managed to pack all this into those
- twenty-five rules:
-
- * All four arithmetic operations in full expressions
- -- but on 16-bit integers only.
- * Assignment, IF-THEN-ELSE, BEGIN-END blocks, and
- WHILE-DO.
- * Functions with parameters
- * READ, WRITE, WRITELN with limited string support
- * Global variables
- * Integer literals
-
- Although our feature list is very short, it's large enough
- to write some impressive and useful programs. The reason is
- that tiny Pascal supports functions that can return values.
- If you need a capability, then you can always write a
- function to provide that capability.
-
- A Tiny Pascal Program
-
- A complete program written in tiny Pascal follows this
- paragraph. You should be able to follow what it does. When
- you run this through the tiny Pascal compiler, you will get
-
-
- page 1
-
-
- Pascal in 25 Rules or Less
-
-
- an assembler program that is ready to be assembled and
- executed on your PC. That program will also contain all the
- high-level tiny Pascal statements as comments. The fully
- compiled and assembled listing is given near the end of this
- article.
-
- {TURUN -- A sample program written in Tiny Pascal }
- var I, J, K, PROBLEM;
-
- {*********************}
- function ISLESS(N1, N2);
- begin {returns 1 if n1<n2, 0 otherwise}
- if n2-n1 then isless:=1 {truth value test is >0}
- else isless:=0;
- end;
-
- function ADDEMUP(LOWER, UPPER, SUM);
- begin end; {makes it a forward declaration}
-
- {*********************}
- function ISEQUAL(N1, N2);
- begin
- if n2-n1 then isequal:=0 {false}
- else
- if n1-n2 then isequal:=0
- else isequal:=1;
- end;
-
- {***********************}
- function ADDEMUP(LOWER, UPPER, SUM);
- {SUM is a local}
- begin
- sum:=0;
- while isless(lower, upper) do begin
- sum:=sum+lower;
- lower:=lower+1;
- end;
- addemup:=sum+lower; { the last one was left out }
- end;
-
- {*********************}
- function MAIN(SUM, UPPER);
- begin
- i:=1;
- j:=i+5;
- k:=j-16;
- problem:=i+(j*k);
- writeln('I: ', i, ' J: ', j, ' K: ', k, ' Problem: ', problem);
- write('Enter upper ');
- upper:=read;
- sum:=addemup(1, upper); {sum of integers 1..upper}
- if isequal(sum, (upper*(upper+1))/2) then
- writeln('Sum = ', sum)
- else begin
- writeln('BUG: Sum = ', sum, '; should be ',
- (upper*(upper+1))/2);
- end;
- end;
-
-
- page 2
-
-
- Pascal in 25 Rules or Less
-
-
-
-
- More About Tiny Pascal
-
- Our functions pass parameters by VALUE only -- there's no
- VAR parameter capability. However, a function can return an
- integer value, or it can set the value of a global variable.
- That's a bit awkward sometimes, but it's still better than
- assembler.
- There are no Boolean data types as such, but the language
- uses 0 and 1 as a substitute where necessary. In
- particular, the expression in an IF or a WHILE tests an
- integer value for greater-than-zero (true) or
- less-than-or-equal-to-zero (false). That's good enough.
- For example, you would write
-
- if n1-n2 then ...
-
- in tiny Pascal, which is equivalent to
-
- if n1-n2>0 then ... or
- if n1>n2 then ...
-
- in full Pascal. The comparison operators can of course be
- added with more production rules.
- There are also no local variables in tiny Pascal, but it
- turns out they really aren't needed. For example,
-
- function ADDEMUP(LOWER, UPPER: integer): integer;
- var SUM: integer;
- begin ... end;
-
- in full Pascal can be written
-
- function ADDEMUP(LOWER, UPPER, SUM);
- begin ... end;
-
- in tiny Pascal. Note that we don't bother with the type
- designators (i.e. : integer) since only one type is
- supported. Also our compiler is arranged so that if ADDEMUP
- is called with only two parameter values, SUM is initialized
- to 0. For example,
-
- addemup(5, 10)
-
- causes LOWER=5, UPPER=10, and SUM=0 inside ADDEMUP. So you
- see the extra formal parameters have the same effect as the
- local variables in full Pascal. Of course, you can also
- call ADDEMUP with all three parameters, or none -- that's an
- unsafe defect in tiny Pascal which can't be avoided with our
- tight production rule limit. You'll just have to be sure to
- call functions with the correct number of parameters.
- Functions can be called within expressions, or on a line
- by themselves. For example, both of these statements are
- OK:
-
- addemup(5, 10); {returned value is ignored}
- x := y - addemup(5, 10); {returned value is subtracted
-
-
- page 3
-
-
- Pascal in 25 Rules or Less
-
-
- from y}
-
- Literal strings are permitted only within a WRITE or
- WRITELN statement; they're intended to decorate messages to
- the standard output device only. If you want strings to
- appear in expressions and other functions, then you will
- have to introduce declarations for them and type designators
- -- that'll take more productions.
-
- READ and WRITE
-
- Tiny Pascal supports three I/O functions: READ, WRITE and
- WRITELN. READ is called with no parameters, and returns an
- integer value entered at the keyboard.
- WRITE takes any number of integer expressions or strings,
- and writes them to the console screen in left-to-right
- order. No line return is emitted at the end, so you can
- call WRITE several times before sending a line return.
- WRITELN is the same as WRITE, except that it emits a line
- return after writing its parameters. Both these can take no
- parameters if you like -- WRITE does nothing, and WRITELN
- emits a return.
- It happens that our integers will be read or written in
- hexadecimal for simplicity, but you can easily change the
- STDIO.HDR file to support decimal notation instead.
-
- The MAIN Program
-
- There must be one function whose name is MAIN in each tiny
- Pascal program. That'll be the first function called. It
- may have parameters, but we haven't provided any way of
- passing parameters to it from the console -- Turbo
- parameters are strings anyway and would have to be
- translated into integers somehow.
- MAIN can of course call any other procedure, and it's
- through procedure calls and use of the control structures
- that algorithms can be built up.
-
- Recursion and FORWARD Functions
-
- Tiny Pascal supports recursion -- a function can call
- itself any number of times. Each time a function is called,
- its local variables are pushed onto a runtime stack, along
- with its return address. A special 8086 register -- the BP
- register -- is used to mark the position of these variables
- so that they can be accessed from within a function. We'll
- explain just how that works later.
- Sometimes you'll have a procedure A that calls B that
- calls A. The problem is that A needs a declaration for B.
- You can supply that in tiny Pascal by writing a function for
- B with an empty Stmt, like this:
-
- function B(P1, P2); begin end;
-
- You can even omit the begin end; if you want to. The tiny
- Pascal compiler will recognize this as a so-called forward
- declaration. In full Pascal, you'd write it like this:
-
-
-
- page 4
-
-
- Pascal in 25 Rules or Less
-
-
- function B(P1, P2: integer): integer; forward;
-
- The Grammar
-
- Computer languages are described by a context-free
- grammar, which is just a list of production rules. Here's
- the grammar for tiny Pascal, all twenty-five rules of it:
-
- \ TU.GRM -- tiny Pascal grammar held to 25 productions
- Goal -> FDeclList
- FDeclList -> FDeclList FuncDecl ;
- -> FuncDecl ;
- FuncDecl -> FUNCTION <identifier> ( ExprList ) ; Stmt #FDECL
- -> VAR ExprList #VDECL \ Global variables
- \ ExprList must be identifiers only
- Stmt -> <identifier> := Expr #ASSIGN
- -> IF Expr THEN Stmt ELSE Stmt #IFTHEN
- -> WHILE Expr DO Stmt #WHILEDO
- -> BEGIN StmtList END #BLOCK
- -> Expr #SEXPR \ procedure call only!
- StmtList -> StmtList Stmt ; #STLIST2
- -> <empty>
- Expr -> Expr + Term #ADDOPR
- -> Expr - Term #SUBOPR
- -> Term
- Term -> Term * Primary #MPYOPR
- -> Term / Primary #DIVOPR
- -> Primary
- Primary -> ( Expr ) #PAREN
- -> <integer> \ only type INTEGER supported
- -> <string>
- -> <identifier> \ variable or function call w/o parameters
- -> <identifier> ( ExprList ) #FUNCP
- ExprList -> ExprList , Expr #EXPRLIST2
- -> Expr #EXPRLIST1
-
- First, a word about the notation. Each rule takes up one
- line. A backslash starts a comment, which runs to the end
- of the line. The pound sign (#) marks a production flag,
- which we'll use as a kind of handle in the compiler's
- parser.
- A rule with nothing to the left of the symbol -> is
- assumed to have the same symbol as the preceding rule. For
- example, the third rule is written
-
- -> FuncDecl ;
-
- but actually means
-
- FDeclList -> FuncDecl ;
-
- The first rule, with the left member Goal, expands into a
- complete tiny Pascal program. From the second and third
- rules, a program is a sequence of FuncDecl, separated by
- semicolons.
- Each FuncDecl in turn is either a FUNCTION declaration or
- a VAR declaration, according to the fourth and fifth rules.
- In fact, we will insist that a program consist only of some
-
-
- page 5
-
-
- Pascal in 25 Rules or Less
-
-
- VAR declarations (the global variables) followed by the
- FUNCTION declarations.
- Each FUNCTION declaration carries a name (the
- <identifier>), an expression list (with at least one
- expression), and a statement Stmt.
- Note that a Stmt can be a list of statements StmtList
- enclosed in a BEGIN-END pair, so that the body of a function
- can consist of one, two, three or more statements.
- There are four kinds of statement, an assignment, an
- IF-THEN-ELSE, a WHILE-DO, a BEGIN-END block, and an
- expression -- which in fact must be just a function call.
- Note that the ELSE must always be present in an
- IF-THEN-ELSE, unlike full Pascal.
- Also note that Stmt can be <empty>, i.e. absent. Thus
-
- if a-b THEN ELSE BEGIN ... END
-
- is a legal statement.
- For expressions, we support all four arithmetic operations
- on our integers, as well as parenthesizing. The production
- rules guarantee that the usual Pascal precedence rules
- apply, and that things inside parentheses will be evaluated
- first.
- The form <integer> stands for any unsigned integer.
- <string> stands for a Pascal literal string, for example
-
- 'Here is a Pascal string'
-
- The form <identifier> stands for any legal Pascal name.
- Inside an expression, the name could stand for a global or
- local variable, or a function name. As a function name, the
- function will be called with zero or more parameters,
- returning some value.
-
- Compiler Overview
-
- Let's first take a broad look at our compiler, then we'll
- look at some of the details. We need to look at the forest
- before we study the trees. (The details are in fact
- voluminous as you might expect, and are best examined from
- the source files directly using your favorite editor.)
- We'll be using a compiler generator tool for the
- backbreaking part of the job -- our choice is called
- Qparser, and is available from QCAD Systems, Inc.
- Qparser accepts the 25-rule grammar we've written above
- and generates a complete Pascal program from it
- automatically. All you have to do is run two programs
- called LR1 and LR1P with the appropriate file names, and
- you'll get a Turbo Pascal source file. (Specific
- instructions are included with the product.)
- Now Qparser only generates a parser -- that's a program
- that will take a tiny Pascal source file and break it down
- into small steps. Each step will be a production rule
- reduce or apply operation. We need to add some more Pascal
- source code to that `front-end' compiler, to get it to
- generate assembler code.
- In particular, the compiler will:
-
-
-
- page 6
-
-
- Pascal in 25 Rules or Less
-
-
- 1. Construct an abstract syntax tree (AST) for a
- whole function. This will be built from unit records
- allocated from the heap, linked together with pointers
- in various ways. The AST will carry a complete
- description of all the variables, statements,
- expressions, function calls, etc. in the function.
- 2. When a function AST is complete, its local
- variables will be added to a symbol table, along with
- their positions relative to register BP.
- 3. Then the statements and expressions are turned
- into 8086 assembly code through a recursive tree-walking
- procedure called EVAL. EVAL is rather simple to
- describe, and -- as we shall see -- produces rather
- tight 8086 code for our integer operations, assignments,
- function calls, IF-THENs, and WHILE-DOs. It's also easy
- to understand, extend or modify.
-
- The tiny Pascal compiler acts somewhat like a batch
- compiler -- it collects all the lines for a function, then
- emits all the code for the function. We've tried to make
- the resulting assembly code easy to read by copying the
- function source lines as comments, and by writing the names
- of local variables as comments. (Local variable references
- will all appear as offsets from register BP in the assembly
- code, rather than as names.)
-
- Designing the Compiler
-
- Once we've written the grammar, we can pass it through
- Qparser, then try out the automatically-generated parser on
- a sample program -- such as TURUN given earlier. That will
- verify that our grammar describes what we expect it to.
- The next big step is designing compiler data structures to
- support the AST that will hold all the features of a
- complete function. It turns out that's easy to do, too --
- the Qparser files contains a sample Pascal record structure
- designed for that purpose, called a SEMREC. Here's what the
- generic SEMREC looks like as supplied in the Qparser
- software:
-
- type
- SEMTYPE = (OTHER, IDENT, FIXED, STRNG);
- SEMRECP = ^semrec;
- SEMREC = record
- case SEMT: semtype of { semantic stack structure }
- ident: (SYMP: symtabp);
- fixed: (NUMVAL: integer);
- strng: (STX: integer); {index to string table}
- end;
-
- This supports only three kinds of object: an identifier, a
- fixed-point integer and a string. Identifiers are actually
- written into a symbol table -- the symtabp is a pointer to a
- symbol table record. An integer has a value, NUMVAL. A
- string is written to a global packed array of char called
- STRTAB, starting at the index STX, and ending on a chr(0).
- Note that the type SEMRECP is a pointer to a SEMREC.
- We're going to build our AST around a tree whose nodes are
-
-
- page 7
-
-
- Pascal in 25 Rules or Less
-
-
- linked by SEMRECP pointers. Each of the SEMREC structures
- will be allocated from the Pascal heap as needed, and we'll
- dispose of them all after we're through with them, at the
- end of scanning a tiny Pascal function.
-
- Arithmetic Operators
-
- The AST for the four arithmetic operators is the easiest
- to grasp. We extend SEMREC to these operators by adding two
- lines to our SEMREC declaration. The result is:
-
- SEMREC = record
- case SEMT: semtype of { semantic stack
- structure }
- ident: (SYMP: symtabp);
- fixed: (NUMVAL: integer);
- strng: (STX: integer); {index to string
- table}
- addop, subop, mpyop, divop:
- (LEFT, RIGHT: semrecp);
- end;
-
- Of course, we also need to add the enumerated type names
- addop, etc. to the SEMTYPE list:
-
- SEMTYPE = (OTHER, IDENT, FIXED, STRNG,
- ADDOP, SUBOP, MPYOP, DIVOP);
-
- Let's look at an example of an arithmetic expression and
- see how it is expressed as an AST: X * (Y-Z) / 15.
- Figure 1 shows the AST we want. The - operator must be
- performed first, since we need its result before the *
- operator, and it in turn is needed before the division / can
- be performed. This ordering is clearly dictated by an
- inflexible AST rule: the children of a node must be
- evaluated before the node itself. The leaves of our tree
- are identifiers (X, Y, Z) or numbers (15). A number clearly
- stands for itself, while each identifier stands for some
- numeric value at runtime. In fact, an identifier will be
- associated with some memory address whose contents will be
- fetched in an appropriate instruction.
- Let's next see how the AST of figure 1 would be described
- by SEMREC structures linked by pointers. That's shown in
- figure 2, where each SEMREC is a box, and its LEFT and RIGHT
- pointers are connected to other SEMREC boxes. The root is a
- SEMREC whose SEMT=divop. Its LEFT child is another SEMREC
- whose SEMT=mpyop. The leaves are also SEMREC's, since a
- SEMRECP pointer must point to a SEMREC, but the leaf nodes
- are either SEMT=ident or SEMT=fixed. The ident nodes point
- to a symbol table entry which will carry the identifier
- name, its type, its address and possible other information.
-
- How the AST is Built
-
- The AST will be built through parser actions. We don't
- have the space here to go into the theory of the parsing
- machine -- that's described in several different references
- [1, 2]. However, we'll try to explain enough to see how our
-
-
- page 8
-
-
- Pascal in 25 Rules or Less
-
-
- AST gets built.
- Let's look at our arithmetic expression again: X * (Y-Z) /
- 15. When the parser scans this, it will go through the
- following sequence of production steps:
-
- Primary -> <identifier> ; <identifier> = X
- Primary -> <identifier> ; <identifier> = Y
- Primary -> <identifier> ; <identifier> = Z
- Expr -> Expr - Term #SUBOPR ; Y - Z
- Primary -> ( Expr ) #PAREN ; (Y-Z)
- Term -> Term * Primary #MPYOPR ; X * (Y-Z)
- Primary -> <integer> ; 15
- Term -> Term / Primary #DIVOPR ; X * (Y-Z) / 15
- Expr -> Term
-
- We've left out several intermediate steps, but these are the
- important ones. Essentially, a derivation of our expression
- is being reconstructed in reverse order -- we start with the
- expression, match various production right parts to the
- expression, and eventually end up with a single expression
- node -- Expr.
- Notice how the identifiers X, Y and Z are picked up first
- and `reduced' to a Primary. Then the Y and Z primaries are
- combined in the SUBOPR production. That agrees with our
- notion that the subtraction has to come first.
- The parentheses are covered by the PAREN production; we
- can then reduce the multiplication with the MPYOPR
- production.
- The integer 15 gets scanned next; we didn't need it until
- now, but it's involved in the division, and finally the
- division is covered through the DIVOPR production.
- This sequence of events is the same as if we evaluated the
- expression with a reverse-Polish calculator, like those
- manufactured by the Hewlett-Packard Co. We enter X, then
- enter Y, enter Z, subtract, multiply, enter 15 and divide.
- On each `enter', a value is pushed into a stack, which saves
- it for future reference. Each operation is between a number
- on the stack top and a number just below it; the stack is
- popped and the result appears on the stack top.
- Thus when the multiply is performed, it acts on the two
- operands X and (Y-Z). When the divide is performed, it acts
- on the operands X*(Y-Z) and 15.
-
- Using the Semantics Stack
-
- Qparser also provides a stack that works almost like that
- on a reverse-Polish calculator. The difference is that the
- Qparser stack can be very deep, and it is keyed to
- production apply actions. Let us explain what that means --
- it's crucial to understanding how our AST is built.
- Qparser's stack is called SEM. SEM is declared as an
- array of SEMRECP pointers. The stack top index is named
- TOS, which increases as more items are pushed onto the stack
- and decreases as the stack is popped. Thus SEM[TOS]^ points
- to a SEMREC object on the top of the stack; SEM[TOS-1]^
- points to an object just below the stack top, etc. (SEM[n]
- may also be NIL, which means there is no object.)
- Now the Qparser software will perform the following
-
-
- page 9
-
-
- Pascal in 25 Rules or Less
-
-
- services for you in response to scanning its input source:
-
- 1. Each tagged production will cause procedure APPLY to
- be called with an integer parameter that designates the
- production. The APPLY procedure looks like this:
-
- procedure APPLY(PFLAG: integer; var TSEMP: semrecp);
- begin
- case PFLAG of
- ADDOPR: ...
- ASSIGN: ...
- BLOCK: ...
- etc.
- WHILEDO: ...
- VDECL: ...
- end
- end;
-
- Note that PFLAG causes an immediate branch to one of the
- legs of the CASE statement -- a leg that corresponds to one
- particular production.
- 2. When a production such as Term -> Term * Primary is
- invoked through an APPLY call, the top three SEM stack
- elements will be aligned with Term, *, and Primary,
- respectively. SEM[TOS] will point to something associated
- with Primary, SEM[TOS-1] will be NIL (since * needn't carry
- information), and SEM[TOS-2] will point to something
- associated with Term.
- This is an important concept, and is illustrated in figure
- 3. Just before our APPLY action, the stack has three
- elements on its top, which carry pointers to the roots of
- the trees X and (Y-Z), respectively. (See figure 2 for the
- whole tree.)
- 3. In the APPLY procedure, you have an opportunity to do
- something about the SEM information, and to return a pointer
- to a SEMREC through the var parameter TSEMP.
- Consider production Term -> Term * Primary again (figure
- 3). We assume that SEM[TOS-2] (= Term) points to a tree
- node, as does SEM[TOS] (= Primary). We want to construct a
- new SEMREC tree node whose SEMT is MPYOP, whose LEFT will
- point to Term and whose RIGHT will point to Primary. TSEMP
- will be assigned to point to this new node. That's easy --
- here's the Pascal code fragment needed for the job:
-
- MPYOPR: { Term -> Term * Primary }
- begin
- new(tsemp); {allocate a new node}
- with tsemp^ do begin {and fill in its record fields}
- left:=sem[tos-2]; {points to Term}
- right:=sem[tos]; {points to Primary}
- semt:=mpyop;
- end
- end;
-
- 4. When APPLY returns, the var parameter TSEMP is
- returned to the parser, which will (a) strip off the top
- three pointers on the SEM stack then (b) push the TSEMP
- pointer. Notice that the top three pointers have already
-
-
- page 10
-
-
- Pascal in 25 Rules or Less
-
-
- been linked to TSEMP, so losing them is no big deal. By
- pushing TSEMP on the stack, we have effectively associated
- our new tree root with the Term on the left side of the
- production Term -> Term * Primary.
- Refer to figure 4. This is how the stack will look just
- after returning from APPLY. The top three pointers have
- been scraped off, and replaced by the TSEMP pointer. We
- clearly have a larger tree rooted in the stack top -- again
- look at figure 2 for the whole tree, and you'll see that
- we've added one node to it.
-
- That closes the loop on our operations. Eventually our
- new Term will show up again in the APPLY procedure, this
- time as an element in the right part of some production; it
- will carry the root of a tree, and that tree will get built
- into something bigger.
- Certain actions are done for us automatically by the
- parser and lexical analyzer system of Qparser -- the
- <identifier>, <integer> and <string> forms are loaded into
- the SEM stack automatically when and where they are needed.
- For example, consider the production Stmt -> <identifier>
- := Expr, tagged ASSIGN in our production set. When this
- pops up in an APPLY call, there will be a SEMREC pointer in
- SEM[TOS-2] that corresponds to the <identifier>; it will
- carry the tag SEMT=ident, and the SYMP pointer will point to
- a symbol table entry that corresponds to the identifier.
- Single productions needn't be tagged -- the parser will
- preserve whatever is attached to the right member, passing
- it along to the left member. (A single production has
- exactly one token in its right member, for example Primary
- -> <identifier>). Also, TSEMP is by default NIL, so if we
- choose not to set TSEMP, a NIL will be pushed on the stack.
-
- Other Tree Nodes
-
- Well, we've seen how an arithmetic expression is built up
- as an AST. The same principle can be applied to any form in
- our language. Let's look at some others. Consider the
- assignment production Stmt -> <identifier> := Expr, tagged
- ASSIGN. We'll use the tag SEMT=assignop for it, and use our
- friends LEFT and RIGHT as pointers to the <identifier> and
- Expr. The APPLY code that builds this tree node looks
- almost the same as for the multiply case:
-
- ASSIGNOP: { Stmt -> <identifier> := Expr }
- begin
- new(tsemp); {allocate a new node}
- with tsemp^ do begin {and fill in its record fields}
- left:=sem[tos-2];
- right:=sem[tos];
- semt:=assignop;
- end
- end;
-
- In fact, it looks so much alike that we've created a
- procedure that deals with all these binary operator cases,
- called BIN_TREE; this takes one parameter, the SEMT value,
- and expects to find a LEFT node in SEM[TOS-2] and a RIGHT
-
-
- page 11
-
-
- Pascal in 25 Rules or Less
-
-
- node in SEM[TOS].
- Not all the tree nodes have binary operators. Let's look
- at an expression list, which is carried by two productions:
-
- ExprList -> ExprList , Expr #EXPRLIST2
- -> Expr #EXPRLIST1
-
- These don't look quite like our binary operators, but in
- fact can be handled in a very similar way. We'll create yet
- another SEMT enumerated type EXPR_LIST with a LEFT and
- RIGHT. LEFT will point to an expression subtree, but RIGHT
- will be NIL or point to another EXPR_LIST. Figure 5 shows
- the general idea for a list of two subtrees. (This
- structure is borrowed from Lisp -- the LEFT pointer acts
- like CAR and the RIGHT pointer acts like CDR).
- The code for these two productions looks like this:
-
- EXPRLIST1: { ExprList -> Expr }
- if not(is_void(sem[tos])) then begin
- tsemp:=new_sem(expr_list);
- tsemp^.left:=sem[tos];
- end;
- EXPRLIST2: { ExprList -> ExprList , Expr }
- if not(is_void(sem[tos])) then begin
- tsemp:=new_sem(expr_list);
- tsemp^.left:=sem[tos];
- tsemp:=nconc(sem[tos-2], tsemp);
- end
- else tsemp:=sem[tos-2];
-
- Some new functions are shown in this code fragment. NEW_SEM
- allocates a SEMREC from the Pascal heap using NEW, and
- assigns its SEMT to the passed parameter. More importantly,
- it also initializes LEFT and RIGHT (or other pointers) to
- NIL, rather than letting them be garbage. NEW_SEM is always
- used rather than NEW directly in order to achieve this
- useful initialization.
- The IS_VOID function tests its SEMREC parameter for NIL.
- Finally, NCONC takes two parameters, a list L and an element
- E. It attaches the element to the end of the list by
- altering the RIGHT pointer of the last element of the list L
- to point to the element E. (Again, this is inspired by the
- Lisp function of the same name).
- Let's now look at these operations in more detail. The
- EXPRLIST1 operation must either return NIL or an EXPR_LIST
- that points to the Expr. It turns out that Expr will never
- be NIL, but we've written it this way for robustness.
- The EXPRLIST2 operation looks at the Expr; if it is NIL,
- then we can simply return the pointer to the ExprList (it,
- too, may be NIL). If Expr isn't NIL, then we call NCONC to
- stitch it onto the end of the ExprList, as suggested by
- figure 5.
-
- Function Declaration
-
- It's time to consider the function declaration production
-
- FuncDecl -> FUNCTION <identifier> ( ExprList ) ; Stmt
-
-
- page 12
-
-
- Pascal in 25 Rules or Less
-
-
- #FDECL
-
- When this pops up in an APPLY action, we have scanned
- through the function name (the <identifier>), its parameter
- list (ExprList), and its body (the Stmt). Assuming we've
- done all our AST building correctly, the ExprList and Stmt
- are the roots of some trees, possibly very large.
- The scanning and parsing action have worked through the
- function declaration completely, but we haven't (1)
- generated any code, nor (2) done anything about declaring
- the procedure or its parameters.
- In fact, we need to perform these actions in this order:
-
- 1. Add the function identifier to the symbol table at
- the global scope level, if it isn't already there. (It
- could have been previously declared with an empty Stmt
- part, i.e. as a forward.)
- 2. Raise the scope level by one.
- 3. Add the ExprList identifiers to the symbol table,
- checking each one for multiple declarations. These
- should each be a single identifier rather than an
- expression; we can easily test their tree roots for this
- case and complain about an error. We did it this way to
- save on productions. Each of the identifiers will be
- assigned a location in the stack relative to base
- register BP -- we'll explain how that works shortly.
- 4. Evaluate the Stmt tree. A recursive procedure is
- needed for this that will work through the entire
- statement tree, emitting code appropriate to each of the
- forms in our language. We'll see that this is much
- easier than it appears on the surface and is almost
- trivial for some of the language forms. All we need are
- some simple translation forms in 8086 assembler and our
- AST will unfold very neatly into code.
- During the evaluation phase, every identifier will
- have a pointer to a symbol table entry which will carry
- a relative address and a type for the name. That will
- be useful in generating instructions for the identifier.
-
- The Symbol Table
-
- We need to describe our symbol table system. There are
- various ways of organizing symbol tables; we're going to
- focus on the one used in the Qparser system.
- User names are associated with the <identifier> form in
- the grammar. The default <identifier> form is a Pascal name
- -- something that starts with a letter and continues with
- letters, digits, or underscores. The first character that
- isn't in this set stops the name -- it could be a space, a
- left parenthesis, or whatever.
- Names will be upper cased as in Pascal.
- Note that certain keywords look like identifiers, for
- example, IF, WHILE, BEGIN, etc. The lexical scanner scans
- one of these as though it were an identifier, then searches
- the symbol table for it. The keywords have previously been
- entered and tagged as such, therefore a successful `find'
- will immediately indicate that the supposed identifier is in
- fact a keyword. Making this distinction is vital to the
-
-
- page 13
-
-
- Pascal in 25 Rules or Less
-
-
- parser, since the keywords are important clues to
- recognition of the program phrases.
- If the identifier isn't a keyword, it will be entered
- anyway under an innocuous tag (user).
- Now that we've discussed how the symbol table works in
- general, let's look at the symbol table record structure:
-
- type
- SYMBOL = string[maxtoklen];
- SYMTYPE = (RESERVED, SYMERR, USER, VAR_TYPE, FUNC_TYPE);
- SYMTABP = ^symtabtype;
- SYMTABTYPE = record
- { structure for <identifier>s and keywords }
- NEXT: symtabp;
- LEVEL: int;
- SYM: symbol;
- case SYMT: symtype of
- reserved: (TOKVAL: tokrange);
- var_type: (VADDR: integer);
- {relative to BP}
- func_type:
- (FADDR: integer; {code location}
- PBYTES: integer; {bytes of parameters}
- IS_ACTUAL, {TRUE if an actual declaration
- has been seen}
- IS_SYSTEM {system procedure, demands special
- treatment}
- : boolean);
- end;
-
- There are five classes of symbol. RESERVED is for the
- keywords. SYMERR is used during error recovery when the
- parser needs to make up an identifier for insertion
- purposes. USER is a generic name attached to user
- identifiers when they are first seen. VAR_TYPE refers to
- the global and local variable names -- its VADDR field
- provides (in the local case only) an offset from BP.
- FUNC_TYPE refers to a declared function. These are
- somewhat complicated in our tiny Pascal.
- FADDR actually isn't used, but could be to refer to a code
- location. Since we generate symbolic assembly code, we will
- just produce a symbol procedure label and let the assembler
- worry about where the function will be located.
- PBYTES is the number of bytes of formal parameters the
- function carries. We need this in order to generate an
- appropriate EXIT instruction, and also to generate calls.
- IS_ACTUAL will be false until a declaration with a full
- statement body is seen; after that, IS_ACTUAL will be true.
- We use this to catch such errors as undeclared functions and
- two functions with full statement bodies.
- IS_SYSTEM designates the function as a `system' function,
- slated for special handling. Our WRITE, WRITELN, READ and a
- couple of other functions are in this class. These are
- supported by special assembler routines and aren't called
- quite the same way as user-declared functions.
-
- A symbol name is stuffed in SYM, which is just a string.
- NEXT carries a link to another SYMTABTYPE record, and is
-
-
- page 14
-
-
- Pascal in 25 Rules or Less
-
-
- used in a hash table linked-list system for rapid symbol
- searching.
- The LEVEL parameter designates the scoping level of the
- name. Reserved names are carried at level -1. Global names
- are at level 0 and locals at level 1. In full Pascal, the
- LEVEL would be incremented by one for every descent into a
- lexically nested procedure or function. Tiny Pascal
- supports only one global level of procedures, so LEVEL never
- exceeds 1.
-
- There are several major symbol table actions of interest
- in tiny Pascal. Reserved names are pushed into an
- otherwise-empty table as the first action of the compiler,
- at level -1. This action is essentially automatic by virtue
- of the Qparser generator.
- When an <identifier> form is seen, the identifier string
- is picked up by the lexical analyzer and entered in the
- table, if not already there. The analyzer also builds a
- SEMREC structure of type IDENT and pushes it into the SEM
- stack at the appropriate place. APPLY will therefore `see'
- an IDENT structure of type USER upon the first appearance of
- some user identifier.
- Eventually, we choose to add attributes to the identifier.
- This will occur when a production appears that reveals what
- attributes are required. In tiny Pascal, that production is
- the FuncDecl production FDECL. The action required is to
- scan through the ExprList, and change the attributes of the
- names to VAR_TYPE, assigning VADDR values as we go.
- Correction: if the name doesn't carry a USER tag, it may
- have been previously declared as a global variable or a
- function; we need to override the previous use without
- destroying the previous use. Our symbol table scheme
- permits doing that by carrying successive names in a
- first-in-first-out stack organization.
- Once these names have been declared, there will be no
- further declarations of names in the function body; only
- references to names that should previously have been
- declared. These name references will show up as we scan the
- AST in the EVAL function. Among other things, we watch out
- for identifiers carrying a USER tag -- these haven't been
- declared anywhere and deserve an error complaint.
-
- Getting Rid of Unwanted Names
-
- When the function body has been fully evaluated and all
- code generated, we need to get rid of the local variables in
- the symbol table. A function called CLEARSYM does the
- trick. It locates all the names in the symbol table whose
- LEVEL is greater than 0, and removes them from the table.
- That's important, since the scope of a function's local
- variables is the lexical range of that function and no
- further. By doing this, we make it possible to use the same
- name in different procedures without confusing the compiler
- or assembler.
-
-
-
-
-
-
- page 15
-
-
- Pascal in 25 Rules or Less
-
-
- The Run Time Environment
-
- We need to describe how we propose to generate 8086 code
- from our AST. (If you're not familiar with the 8086, we
- recommend [3] as a reference.) There are four kinds of code
- generation problem confronting us:
-
- 1) Function declaration -- what assembler forms are
- needed to open a function and what are needed to close
- it?
- 2) Function call -- how are the parameters passed and
- how is the call generated?
- 3) Expression and assignment evaluation -- how are the
- register and memory instructions used to maximum benefit
- and efficiency?
- 4) Control structures -- how are the conditional and
- unconditional branches to be coded?
-
- We'll address each of these in turn. Unfortunately, items
- (1) and (2) depend on each other. Our procedure call
- strategy must be defined before we can decide on just how a
- call and a declaration are to be carried out. That will
- also determine how our local variable addresses are
- computed. So let's look at function calls first.
-
- Function Call Environment
-
- Our language is intended to support recursion, which means
- that a given function may have been called several times
- before any exit has taken place. Each of these uncompleted
- calls is called an activation, and each activation requires
- an activation record that contains a set of local variable
- values, a return value and a return address. The activation
- records will be carved out of the 8086's stack, where the
- stack top is pointed to by register SP.
- The micro's stack will also be used to hold temporary
- expression values, so SP will be moving up and down in a
- somewhat unpredictable fashion. We need a way of marking
- the position of the activation record that doesn't depend on
- SP. The current activation record will be pointed to by
- register BP.
- Figure 6 shows our activation record design. The stack
- top is at the bottom, perversely. (Does that mean that we
- programmers don't know which way is up?) In the 8086, the
- PUSH operation causes SP to decrease in numeric value -- the
- stack will start at the end of some segment and work toward
- its beginning.
- The figure shows an activation record for a function with
- three parameters, P1, P2, and P3, in left-to-right order, as
- it might appear when we're ready to start executing some
- function body instructions. However, the record got that
- way partly through call operations and partly through code
- at the front end of the function.
- The function call will 1) reserve space for a function
- return value through a PUSH AX operation, 2) PUSH each
- parameter in their order of appearance, and then 3) CALL the
- function. The CALL instruction will push the return address
- on the stack. At this point, SP will point to the stack top
-
-
- page 16
-
-
- Pascal in 25 Rules or Less
-
-
- (of course), and BP will be pointing at some previous
- activation record deeper in the stack.
- Among the first instructions executed within the function
- will be
-
- PUSH BP
- MOV BP,SP
-
- The first of these pushes the previous value of BP in the
- stack, and the second sets BP to the new stack top.
- Notice that parameter P1 is at BP+8 (each stack location
- is a 2-byte word), P2 is at BP+6, P3 is at BP+4, and the
- function return value is at BP+10. These offsets can be
- calculated by the compiler and associated with each of the
- respective names. Furthermore, the 8086 instruction set and
- our assembler permit us to use forms like 8[BP] to refer to
- an address offset from BP by +8 bytes. We can therefore
- easily generate symbolic address references to any of the
- function's variables.
-
- Function EXIT
-
- An EXIT from a function is similarly easy. We need to
- restore the previous BP value to the one carried at BP in
- the stack, then return, popping the right number of
- parameters from the stack. For the configuration of figure
- 6, we therefore need
-
- POP BP
- RET 8
-
- We overlooked one thing -- since we are dealing with a
- function, it's important to send the return value back to
- whatever expression the function was called from. We're
- going to adopt the convention that every expression will
- yield its value in the 16-bit AX register. However, our
- function's return value is in the stack. We therefore need
- the instruction
-
- MOV AX,10[BP]
-
- before we POP BP. Then we can remove the parameters and
- return value from the stack upon a RET instruction. Note
- that the `10' in this instruction is equal to 4 plus twice
- the number of parameters of the function. Also the `8' in
- the RET instruction is 2 plus twice the number of
- parameters.
-
- Function CALL
-
- Before we discuss function calling, we need to assert
- something about the way expressions will be evaluated. We
- will write a general-purpose procedure called EVAL that
- takes an expression root (a SEMREC pointer), then generates
- code such that the arithmetic result of the tree will be
- left in the AX register. By asserting this first, we can
- design EVAL to make sure that that is in fact what will
- happen in every case. We can also use EVAL itself to deal
-
-
- page 17
-
-
- Pascal in 25 Rules or Less
-
-
- with subtrees, with assurance that AX will receive the
- result.
- Of course, it's important that EVAL never change the value
- of BP or set SP capriciously. It may push some stuff on the
- stack, but we'll also make sure that whatever is pushed will
- eventually be popped. Certain other 8086 registers may also
- be used in very local situations, for example in connection
- with a multiply or divide instruction.
- Given that EVAL will take an AST tree and produce code for
- AX, calling a function will be very easy. In fact, the
- function call will show up in EVAL itself, whenever we see
- the SEMT=funcall!
- Here's what we need to do (refer to figure 6):
-
- PUSH AX ; this reserves a return value word
- eval(P1) ; parameter P1 is evaluated, result to AX
- PUSH AX ; push result on the stack
- eval(P2)
- PUSH AX
- etc.
- CALL function
-
- We can of course just use the function name in the CALL
- instruction, since the assembler can figure out where it is.
- Note that this code sequence will work nicely within an
- expression, since the result of the CALL is to pop down the
- stack to exactly the place just before the first PUSH AX,
- and to yield the function's result in AX.
- We can also use this sequence for a procedure call, where
- the return value is ignored. AX will just get lost, and
- there's no change in the stack.
-
- ADD and SUB
-
- Let's now turn our attention to evaluating other
- expressions. We need to take a close look at the 8086
- instruction set and decide just how to handle arithmetic.
- We find that integer addition and subtraction are handled
- alike, but are coded somewhat differently than
- multiplication and division.
- Our ADD or SUB will be between AX and some operand, which
- may be another register value, an immediate value, a memory
- value or an indirection through a register. Our AST will
- look like figure 7. We imagine that the LEFT subtree will
- somehow yield AX -- that's easy if we just call EVAL on the
- LEFT subtree. We would then like to emit an ADD or SUB
- instruction whose operand is the right subtree.
- Unfortunately, that's not possible if the right subtree is
- anything other than 1) a literal value, i.e. SEMT=fixed, or
- 2) a global variable, or 3) a local variable. A global
- variable can be accessed by its name alone, while a local
- will be accessed as an offset through BP, e.g. 6[BP].
- So before we just go ahead and evaluate the LEFT subtree,
- we need to test the RIGHT subtree -- is it `simple', i.e.,
- will it fit into an ADD or SUB instruction operand field?
- If it is, then we can just EVAL(LEFT), then emit the
- appropriate ADD or SUB instruction.
- The more difficult case in fact requires that we save the
-
-
- page 18
-
-
- Pascal in 25 Rules or Less
-
-
- result of evaluating one of the subtrees before evaluating
- the other one. Since we want the left one in AX in order to
- emit an instruction, we must evaluate the right one first.
- That will yield the result in AX, but we musn't just leave
- it there while evaluating the left subtree -- AX will very
- likely get changed as a result. Our solution is to PUSH AX
- before evaluating the LEFT subtree. After that evaluation,
- AX will hold the left subtree. We choose to POP DX, then
- emit an ADD or SUB instruction between AX and DX.
- We therefore arrive at the following compiler coding for
- the ADD and SUB cases of EVAL:
-
- addop, subop:
- if is_simple(right) then begin
- eval(left); {goes to AX}
- code3(opcode(semt), ' AX,', nameof(right));
- end
- else begin
- eval(right);
- code('PUSH AX'); {put in stack temporarily}
- eval(left); {left side to AX}
- code('POP DX'); {get the right value back from stack}
- code2(opcode(semt), ' AX,DX');
- end;
-
- Some unfamiliar functions are used in this code fragment.
- OPCODE(SEMT) returns a string that represents the 8086
- instruction for the SEMT. NAMEOF(root) returns a string
- that represents a valid operand for the subtree whose root
- is root -- it may be a literal constant, a named variable,
- or an offset-BP form.
- CODE takes one string and formats it as a symbolic
- assembler line with no label. A space in the string is
- interpreted as a tab stop, to produce a pretty formatted
- effect in the assembly code.
- CODE2 takes two strings, concatenates them and calls CODE
- with the result. CODE3 and CODE4 are similar.
- Note that although we pushed a word in the stack in the
- non-simple case, it gets popped out two lines later. We
- therefore can assert that EVAL will keep the stack orderly,
- provided it does so everywhere else.
-
- MPY and DIV
-
- These operations are somewhat more messy to encode in the
- 8086. The IMUL instruction requires one operand in the AX
- register, the other in another register (we will use CX).
- It returns a result in the pair DX-AX, with AX carrying the
- least significant word. Without attempting any
- optimizations, we therefore evaluate the right subtree, PUSH
- it, evaluate the left, then POP CX and emit the opcode.
- Division requires one additional step -- DX should be a
- sign extension of AX before dividing by CX. Fortunately,
- the instruction CWD performs this for us. Here's the
- resulting EVAL code fragment:
-
- mpyop, divop: begin
- eval(right); {divisor to AX}
-
-
- page 19
-
-
- Pascal in 25 Rules or Less
-
-
- code('PUSH AX');
- eval(left); {dividend to AX}
- if semt=divop then code('CWD'); {sign extend into DX}
- code('POP CX');
- code2(opcode(semt), ' CX');
- end;
-
- Assignment
-
- The assignment operation calls for a certain adjustment,
- depending on whether the right subtree is a literal
- fixed-point number or not. Here's the EVAL fragment:
-
- assignop: begin
- if right^.semt=fixed then {an immediate on the right is OK}
- code4('MOVW ', nameof(left), ',', nameof(right))
- else begin
- eval(right); {goes to AX}
- code3('MOV ', nameof(left), ',AX');
- end
- end;
-
- The use of MOVW is required since the left operand will be a
- name or an indirect BP reference (we can only assign to a
- variable or a function return value), while the right
- operand could be a byte or a word. We therefore need a word
- MOV in this case.
- In the other case, the right subtree is EVALed as usual,
- and the right MOV operand will be AX, stamping this as a
- word operation. Thus we must nurse the assembler along
- which otherwise wouldn't have the foggiest idea of which
- operation we intend.
-
- WHILE DO
-
- The WHILE-DO form is supported by a LEFT pointer and a
- RIGHT pointer. LEFT points to an expression subtree
- carrying the boolean expression (it will be tested for >0 =
- TRUE, and <=0 = FALSE). We have to encode this as 8086
- branch instructions, and have no idea how far the branches
- must reach. Thus we use long branch instructions, hoping
- that an intelligent assembler can substitute short branches
- if possible.
- We need to manufacture a couple of temporary labels, and
- assign that job to a function NEW_LABEL -- this increments a
- global label counter, then appends its string value to the
- string 'XXX'. Thus we create new labels such as XXX1, XXX2,
- etc. as needed.
- The function LCODE takes two strings; the first one will
- become an assembler label, and the second the opcode-operand
- field. Here's the EVAL code fragment for a WHILE-DO:
-
- while_do: begin
- label1:=new_label;
- lcode(label1, 'EQU $');
- eval(left); {boolean condition}
- code('CMP AX,0');
- label2:=new_label;
-
-
- page 20
-
-
- Pascal in 25 Rules or Less
-
-
- code2('JLE ', label2);
- eval(right); {statement or statement list}
- code2('JMP ', label1);
- lcode(label2, 'EQU $');
- end;
-
- Note that we attach a label before evaluating the boolean
- condition; this will later appear in a JMP around the whole
- WHILE-DO form back to the boolean test. Following the
- boolean test is a comparison of AX to 0 (recall that EVAL
- yields a result in AX). We need the CMP since we aren't
- sure how AX got loaded, hence whether the sign flag of the
- 8086 was set correctly. The comparison is followed by a
- conditional jump to the end of the WHILE-DO form -- note how
- label2 reappears on an EQU $ as the last coded line.
- The beauty of EVAL appears between the conditional and the
- unconditional jump. It's quite capable of dealing with any
- number of statements, function and procedure calls, etc., so
- that a quite large and sophisticated sequence of code will
- appear in that innocent eval(right) call. Yet we drop in a
- JMP and the labelled EQU $ at just the right place. And --
- the correctness of this evaluation speaks for itself.
-
- Statement List
-
- We've seen how a sequence of expressions is turned into a
- linked list of SEMREC nodes. We do the same thing with a
- sequence of statements. An EVAL fragment can be written for
- a STMTLIST as follows:
-
- stmtlist:
- while root<>nil do begin
- eval(root^.left);
- root:=root^.right;
- end;
-
- The ROOT is of course a value parameter passed to EVAL; its
- LEFT points to some statement and its RIGHT to the rest of
- the statement list. It's OK in Turbo Pascal to change the
- value of a passed value parameter, so we just do it. Once a
- ROOT is evaluated, we don't need it anymore, so we might as
- well use it to point to the next one.
- This could also have been coded using a recursive call on
- EVAL, but sequences of statements might get very lengthy,
- and could use up lots of stack space before they unwind.
-
- IF-THEN-ELSE
-
- Since an IF-THEN-ELSE has three parts, we need a special
- SEMREC clause to deal with it:
-
- if_then_else: {if B1 then S1 else S2}
- (B1, S1, S2: semrecp);
-
- This node of an AST is then easily coded as follows. We've
- discussed all the functions that appear in this EVAL code
- fragment:
-
-
-
- page 21
-
-
- Pascal in 25 Rules or Less
-
-
- if_then_else: begin
- label1:=new_label;
- eval(b1); {boolean condition}
- code('CMP AX,0');
- code2('JLE ', label1);
- eval(s1); {THEN statement}
- label2:=new_label;
- code2('JMP ', label2);
- lcode(label1, 'EQU $');
- eval(s2); {ELSE statement}
- lcode(label2, 'EQU $');
- end;
-
-
- Summary
-
- The following listing was generated by the Chasm
- assembler. It is the compiled version of program TURUN
- given near the beginning of this paper.
- This is a complete program. It fits into one segment,
- starting at address 100H. The stack pointer is set to
- location STACKORG, which is at the end of a 2000 byte block
- near the end of the program. (This is actually unnecessary
- since DOS will by default set SP to the end of a segment.)
- The program then begins with a call to MAIN. When MAIN
- returns, an INT 020H is executed, which returns control
- cleanly to DOS.
- You can step through this program with the DOS DEBUG
- system if you want to see what's happening in detail. Refer
- to the DOS manual for details.
- A few points about this assembled listing: A bug in our
- version of the Chasm assembler (which I've been assured has
- been repaired) made it impossible to code the RET
- instruction with a number; instead we've written the opcode
- as a DB followed by the number as a DW. You'll see this in
- line 124 for example.
- Chasm also noticed four JMP instructions that could be
- encoded as short jumps. As we've explained, our simple
- compiler doesn't keep track of the span between a JMP and
- its target, and that span could be any number of bytes.
- Thus we've taken the safe approach in our compiler and coded
- long jumps. An optimizing assembler could turn these into
- short jumps.
- Also, our compiler has copied the contents of a support
- file called STDIO.HDR (not to be confused with the C library
- of the same name) into the assembly code. STDIO supports
- the WRITE and READ functions. You can see where this file
- begins and ends from the comments.
- Are you impressed? We hope so. If you've followed our
- development and reasoning this far, you will have noticed
- that the code generation algorithms for our tiny Pascal
- statements seem very clear and transparent. Yet when you
- look at the generated assembly code, it isn't at all clear
- what's going on. Some of the PUSH AX instructions came from
- one part of the compiler algorithm and some from another
- part. But that's the way it is.
- Many software writers prefer a high-level programming
- language because a few very clear source statements can turn
-
-
- page 22
-
-
- Pascal in 25 Rules or Less
-
-
- into a rats nest of assembly code -- but if the compiler
- generates that code correctly, we don't care how convoluted
- and hard-to-understand the resulting code is.
-
- TURUN.ASM 2/19/86
- Page: 1 12:56:04
-
-
- LOC OBJ LINE SOURCE CHASM version 4.00
- 0100 1 ; Tiny Pascal assembler code
- 0100 BC290B 2 MOV SP,OFFSET(STACKORG)
- 0103 8BEC 3 MOV BP,SP
- 0105 E81901 4 CALL MAIN
- 0108 CD20 5 INT 020H
- 010A 6 ; <STDIO.HDR> included
- 010A 7 ; STDIO.HDR
- 010A 8 ;
- 010A 9 ; READ and WRITE routines needed for Tiny Pascal
- 010A 10 ;
- 010A 11 SYS_RCHAR PROC NEAR ; Read single character from
- 010A B401 12 MOV AH,1
- 010C CD21 13 INT 021H
- 010E C3 14 RET ; value comes back in AL
- 010F 15 ENDP
- 010F 16
- 010F 17 SYS_WRCHAR PROC NEAR ; Write a single character (
- 010F B402 18 MOV AH,2
- 0111 CD21 19 INT 021H
- 0113 C3 20 RET
- 0114 21 ENDP
- 0114 22
- 0114 23 SYS_WHEX PROC NEAR ; Write a single HEX number
- 0114 80FA0A 24 CMP DL,10
- 0117 7C07 25 JL SYS_01
- 0119 80C237 26 ADD DL,55 ; 'A' - 10
- 011C E8F0FF 27 CALL SYS_WRCHAR
- 011F C3 28 RET
- 0120 80C230 29 SYS_01 ADD DL,'0'
- 0123 E8E9FF 30 CALL SYS_WRCHAR
- 0126 C3 31 RET
- 0127 32 ENDP
- 0127 33
- 0127 34 SYS_IWRT PROC NEAR ; Write an integer to stdout
- 0127 B604 35 MOV DH,4 ; used as a counter
- 0129 D1C0 36 SYS_11 ROL AX
- 012B D1C0 37 ROL AX
- 012D D1C0 38 ROL AX
- 012F D1C0 39 ROL AX
- 0131 8AD0 40 MOV DL,AL
- 0133 80E20F 41 AND DL,0FH
- 0136 50 42 PUSH AX
- 0137 E8DAFF 43 CALL SYS_WHEX
- 013A 58 44 POP AX
- 013B FECE 45 DEC DH
- 013D 75EA 46 JNZ SYS_11
- 013F C3 47 RET
- 0140 48 ENDP
- 0140 49
-
-
- page 23
-
-
- Pascal in 25 Rules or Less
-
-
- 0140 50 SYS_SWRT PROC NEAR ; Write a string terminated
- 0140 8A5700 51 SYS_21 MOV DL,0[BX]
- 0143 80FA00 52 CMP DL,0
- 0146 7501 53 JNZ SYS_22 ; zero terminator?
- 0148 C3 54 RET
- 0149 E8C3FF 55 SYS_22 CALL SYS_WRCHAR
- 014C 43 56 INC BX
- 014D EBF1 57 JMPS SYS_21
- 014F 58 ENDP
- 014F 59
- 014F 60 SYS_WRTLN PROC NEAR ; write carriage return/lin
- 014F B20D 61 MOV DL,0DH
- 0151 E8BBFF 62 CALL SYS_WRCHAR
- 0154 B20A 63 MOV DL,0AH
- 0156 E8B6FF 64 CALL SYS_WRCHAR
- 0159 C3 65 RET
- 015A 66 ENDP
- 015A 67
- 015A 68 READ PROC NEAR ; read a HEX number fr
- 015A BA0000 69 MOV DX,0 ; clear DX
- 015D E8AAFF 70 READ_01 CALL SYS_RCHAR ; get one character in
- 0160 71 ; won't affect DX
- 0160 3C0D 72 CMP AL,0DH
- 0162 7506 73 JNZ READ_02
- 0164 52 74 PUSH DX ; save the thing we've
- 0165 E8E7FF 75 CALL SYS_WRTLN ; send a carriage retu
- 0168 58 76 POP AX ; was an ENTER
- 0169 C3 77 RET
- 016A 3C20 78 READ_02 CMP AL,' '
- 016C 74EF 79 JZ READ_01 ; ignore spaces
- 016E 2C30 80 SUB AL,'0' ; start conversion to
- 0170 3C09 81 CMP AL,9
- 0172 7E02 82 JLE READ_03
- 0174 2C07 83 SUB AL,7 ; turn 'A' into 0AH
- 0176 3C0F 84 READ_03 CMP AL,0FH
- 0178 7E02 85 JLE READ_04
- 017A 2C20 86 SUB AL,32 ; turn 'a' into 0AH
- 017C 240F 87 READ_04 AND AL,0FH ; clip for good measur
- 017E D1E2 88 SHL DX ; prepare DX for hex v
- 0180 D1E2 89 SHL DX
- 0182 D1E2 90 SHL DX
- 0184 D1E2 91 SHL DX
- 0186 08C2 92 OR DL,AL
- 0188 EBD3 93 JMPS READ_01 ; go do some more
- 018A 94 ENDP
- 018A 95
- 018A 96 READLN PROC NEAR
- 018A EBCE 97 JMPS READ ; does the same thing
- 018C 98 ENDP
- 018C 99
- 018C 100 ; ... end of include STDIO.HDR
- 018C 101 ; {TURUN -- A sample program written in Tiny Pascal
- 018C 102 ; var I, J, K, PROBLEM;
- 018C 103 ;
- 018C 104 ; {*********************}
- 018C 105 ; function ISLESS(N1, N2);
- 018C 106 ; begin {returns 1 if n1<n2, 0 otherwise}
- 018C 107 ; if n2-n1 then isless:=1 {truth value test is
-
-
- page 24
-
-
- Pascal in 25 Rules or Less
-
-
- 018C 108 ; else isless:=0;
- 018C 109 ; end;
- 018C 110 ISLESS PROC NEAR
- 018C 55 111 PUSH BP
- 018D 8BEC 112 MOV BP,SP
- 018F 8B4604 113 MOV AX,4[BP] ; N2
- 0192 2B4606 114 SUB AX,6[BP] ; N1
- 0195 3D0000 115 CMP AX,0
- 0198 7E08 116 JLE XXX0
- 019A C746080100 117 MOVW 8[BP],1 ; ISLESS
- **** Diagnostic: Could use JMPS
- 019F E90500 118 JMP XXX1
- 01A2 119 XXX0 EQU $
- 01A2 C746080000 120 MOVW 8[BP],0 ; ISLESS
- 01A7 121 XXX1 EQU $
- 01A7 8B4608 122 MOV AX,8[BP] ; ISLESS
- 01AA 5D 123 POP BP
- 01AB C2 124 DB 0C2H
- 01AC 0600 125 DW 6
- 01AE 126 ENDP
- 01AE 127 ; SYMBOL TABLE
- 01AE 128 ; ISLESS 8[BP]
- 01AE 129 ; N1 6[BP]
- 01AE 130 ; N2 4[BP]
- 01AE 131
- 01AE 132 ;
- 01AE 133 ; function ADDEMUP(LOWER, UPPER, SUM);
- 01AE 134 ; begin end; {makes it a forward declaration}
- 01AE 135 ;
- 01AE 136 ; {*********************}
- 01AE 137 ; function ISEQUAL(N1, N2);
- 01AE 138 ; begin
- 01AE 139 ; if n2-n1 then isequal:=0 {false}
- 01AE 140 ; else
- 01AE 141 ; if n1-n2 then isequal:=0
- 01AE 142 ; else isequal:=1;
- 01AE 143 ; end;
- 01AE 144 ISEQUAL PROC NEAR
- 01AE 55 145 PUSH BP
- 01AF 8BEC 146 MOV BP,SP
- 01B1 8B4604 147 MOV AX,4[BP] ; N2
- 01B4 2B4606 148 SUB AX,6[BP] ; N1
- 01B7 3D0000 149 CMP AX,0
- 01BA 7E08 150 JLE XXX2
- 01BC C746080000 151 MOVW 8[BP],0 ; ISEQUAL
- **** Diagnostic: Could use JMPS
- 01C1 E91800 152 JMP XXX3
- 01C4 153 XXX2 EQU $
- 01C4 8B4606 154 MOV AX,6[BP] ; N1
- 01C7 2B4604 155 SUB AX,4[BP] ; N2
- 01CA 3D0000 156 CMP AX,0
- 01CD 7E08 157 JLE XXX4
- 01CF C746080000 158 MOVW 8[BP],0 ; ISEQUAL
- **** Diagnostic: Could use JMPS
- 01D4 E90500 159 JMP XXX5
- 01D7 160 XXX4 EQU $
- 01D7 C746080100 161 MOVW 8[BP],1 ; ISEQUAL
- 01DC 162 XXX5 EQU $
-
-
- page 25
-
-
- Pascal in 25 Rules or Less
-
-
- 01DC 163 XXX3 EQU $
- 01DC 8B4608 164 MOV AX,8[BP] ; ISEQUAL
- 01DF 5D 165 POP BP
- 01E0 C2 166 DB 0C2H
- 01E1 0600 167 DW 6
- 01E3 168 ENDP
- 01E3 169 ; SYMBOL TABLE
- 01E3 170 ; ISEQUAL 8[BP]
- 01E3 171 ; N1 6[BP]
- 01E3 172 ; N2 4[BP]
- 01E3 173
- 01E3 174 ;
- 01E3 175 ; {***********************}
- 01E3 176 ; function ADDEMUP(LOWER, UPPER, SUM);
- 01E3 177 ; {SUM is a local}
- 01E3 178 ; begin
- 01E3 179 ; sum:=0;
- 01E3 180 ; while isless(lower, upper) do begin
- 01E3 181 ; sum:=sum+lower;
- 01E3 182 ; lower:=lower+1;
- 01E3 183 ; end;
- 01E3 184 ; addemup:=sum+lower; { the last one was left ou
- 01E3 185 ; end;
- 01E3 186 ADDEMUP PROC NEAR
- 01E3 55 187 PUSH BP
- 01E4 8BEC 188 MOV BP,SP
- 01E6 C746040000 189 MOVW 4[BP],0 ; SUM
- 01EB 190 XXX6 EQU $
- 01EB 50 191 PUSH AX
- 01EC 8B4608 192 MOV AX,8[BP] ; LOWER
- 01EF 50 193 PUSH AX
- 01F0 8B4606 194 MOV AX,6[BP] ; UPPER
- 01F3 50 195 PUSH AX
- 01F4 E895FF 196 CALL ISLESS
- 01F7 3D0000 197 CMP AX,0
- 01FA 7E15 198 JLE XXX7
- 01FC 8B4604 199 MOV AX,4[BP] ; SUM
- 01FF 034608 200 ADD AX,8[BP] ; LOWER
- 0202 894604 201 MOV 4[BP],AX ; SUM
- 0205 8B4608 202 MOV AX,8[BP] ; LOWER
- 0208 050100 203 ADD AX,1
- 020B 894608 204 MOV 8[BP],AX ; LOWER
- **** Diagnostic: Could use JMPS
- 020E E9DAFF 205 JMP XXX6
- 0211 206 XXX7 EQU $
- 0211 8B4604 207 MOV AX,4[BP] ; SUM
- 0214 034608 208 ADD AX,8[BP] ; LOWER
- 0217 89460A 209 MOV 10[BP],AX ; ADDEMUP
- 021A 8B460A 210 MOV AX,10[BP] ; ADDEMUP
- 021D 5D 211 POP BP
- 021E C2 212 DB 0C2H
- 021F 0800 213 DW 8
- 0221 214 ENDP
- 0221 215 ; SYMBOL TABLE
- 0221 216 ; ADDEMUP 10[BP]
- 0221 217 ; LOWER 8[BP]
- 0221 218 ; UPPER 6[BP]
- 0221 219 ; SUM 4[BP]
-
-
- page 26
-
-
- Pascal in 25 Rules or Less
-
-
- 0221 220
- 0221 221 ;
- 0221 222 ; {*********************}
- 0221 223 ; function MAIN(SUM, UPPER);
- 0221 224 ; begin
- 0221 225 ; i:=1;
- 0221 226 ; j:=i+5;
- 0221 227 ; k:=j-16;
- 0221 228 ; problem:=i+(j*k);
- 0221 229 ; writeln('I: ', i, ' J: ', j, ' K: ', k, ' Probl
- 0221 230 ; write('Enter upper ');
- 0221 231 ; upper:=read;
- 0221 232 ; sum:=addemup(1, upper); {sum of integers 1..up
- 0221 233 ; if isequal(sum, (upper*(upper+1))/2) then
- 0221 234 ; writeln('Sum = ', sum)
- 0221 235 ; else begin
- 0221 236 ; writeln('BUG: Sum = ', sum, '; should be ',
- 0221 237 ; (upper*(upper+1))/2);
- 0221 238 ; end;
- 0221 239 ; end;
- 0221 240 MAIN PROC NEAR
- 0221 55 241 PUSH BP
- 0222 8BEC 242 MOV BP,SP
- 0224 C70653030100 243 MOVW I,1 ; I
- 022A A15303 244 MOV AX,I ; I
- 022D 050500 245 ADD AX,5
- 0230 A35503 246 MOV J,AX ; J
- 0233 A15503 247 MOV AX,J ; J
- 0236 2D1000 248 SUB AX,16
- 0239 A35703 249 MOV K,AX ; K
- 023C A15703 250 MOV AX,K ; K
- 023F 50 251 PUSH AX
- 0240 A15503 252 MOV AX,J ; J
- 0243 59 253 POP CX
- 0244 F7E9 254 IMULW CX
- 0246 50 255 PUSH AX
- 0247 A15303 256 MOV AX,I ; I
- 024A 5A 257 POP DX
- 024B 01D0 258 ADD AX,DX
- 024D A35103 259 MOV PROBLEM,AX ; PROBLEM
- 0250 BB4D03 260 MOV BX,OFFSET(SS0)
- 0253 E8EAFE 261 CALL SYS_SWRT
- 0256 A15303 262 MOV AX,I ; I
- 0259 E8CBFE 263 CALL SYS_IWRT
- 025C BB4803 264 MOV BX,OFFSET(SS1)
- 025F E8DEFE 265 CALL SYS_SWRT
- 0262 A15503 266 MOV AX,J ; J
- 0265 E8BFFE 267 CALL SYS_IWRT
- 0268 BB4303 268 MOV BX,OFFSET(SS2)
- 026B E8D2FE 269 CALL SYS_SWRT
- 026E A15703 270 MOV AX,K ; K
- 0271 E8B3FE 271 CALL SYS_IWRT
- 0274 BB3803 272 MOV BX,OFFSET(SS3)
- 0277 E8C6FE 273 CALL SYS_SWRT
- 027A A15103 274 MOV AX,PROBLEM ; PROBLEM
- 027D E8A7FE 275 CALL SYS_IWRT
- 0280 E8CCFE 276 CALL SYS_WRTLN
- 0283 BB2B03 277 MOV BX,OFFSET(SS4)
-
-
- page 27
-
-
- Pascal in 25 Rules or Less
-
-
- 0286 E8B7FE 278 CALL SYS_SWRT
- 0289 E8CEFE 279 CALL READ
- 028C 894604 280 MOV 4[BP],AX ; UPPER
- 028F 50 281 PUSH AX
- 0290 B80100 282 MOV AX,1
- 0293 50 283 PUSH AX
- 0294 8B4604 284 MOV AX,4[BP] ; UPPER
- 0297 50 285 PUSH AX
- 0298 B80000 286 MOV AX,0
- 029B 50 287 PUSH AX
- 029C E844FF 288 CALL ADDEMUP
- 029F 894606 289 MOV 6[BP],AX ; SUM
- 02A2 50 290 PUSH AX
- 02A3 8B4606 291 MOV AX,6[BP] ; SUM
- 02A6 50 292 PUSH AX
- 02A7 B80200 293 MOV AX,2
- 02AA 50 294 PUSH AX
- 02AB 8B4604 295 MOV AX,4[BP] ; UPPER
- 02AE 050100 296 ADD AX,1
- 02B1 50 297 PUSH AX
- 02B2 8B4604 298 MOV AX,4[BP] ; UPPER
- 02B5 59 299 POP CX
- 02B6 F7E9 300 IMULW CX
- 02B8 99 301 CWD
- 02B9 59 302 POP CX
- 02BA F7F9 303 IDIVW CX
- 02BC 50 304 PUSH AX
- 02BD E8EEFE 305 CALL ISEQUAL
- 02C0 3D0000 306 CMP AX,0
- 02C3 7E12 307 JLE XXX8
- 02C5 BB2403 308 MOV BX,OFFSET(SS5)
- 02C8 E875FE 309 CALL SYS_SWRT
- 02CB 8B4606 310 MOV AX,6[BP] ; SUM
- 02CE E856FE 311 CALL SYS_IWRT
- 02D1 E87BFE 312 CALL SYS_WRTLN
- **** Diagnostic: Could use JMPS
- 02D4 E92D00 313 JMP XXX9
- 02D7 314 XXX8 EQU $
- 02D7 BB1803 315 MOV BX,OFFSET(SS6)
- 02DA E863FE 316 CALL SYS_SWRT
- 02DD 8B4606 317 MOV AX,6[BP] ; SUM
- 02E0 E844FE 318 CALL SYS_IWRT
- 02E3 BB0B03 319 MOV BX,OFFSET(SS7)
- 02E6 E857FE 320 CALL SYS_SWRT
- 02E9 B80200 321 MOV AX,2
- 02EC 50 322 PUSH AX
- 02ED 8B4604 323 MOV AX,4[BP] ; UPPER
- 02F0 050100 324 ADD AX,1
- 02F3 50 325 PUSH AX
- 02F4 8B4604 326 MOV AX,4[BP] ; UPPER
- 02F7 59 327 POP CX
- 02F8 F7E9 328 IMULW CX
- 02FA 99 329 CWD
- 02FB 59 330 POP CX
- 02FC F7F9 331 IDIVW CX
- 02FE E826FE 332 CALL SYS_IWRT
- 0301 E84BFE 333 CALL SYS_WRTLN
- 0304 334 XXX9 EQU $
-
-
- page 28
-
-
- Pascal in 25 Rules or Less
-
-
- 0304 8B4608 335 MOV AX,8[BP] ; MAIN
- 0307 5D 336 POP BP
- 0308 C2 337 DB 0C2H
- 0309 0600 338 DW 6
- 030B 3B2073686F75 339 SS7 DB '; should be ',0
- 6C6420626520
- 00
- 0318 4255473A2053 340 SS6 DB 'BUG: Sum = ',0
- 756D203D2000
- 0324 53756D203D20 341 SS5 DB 'Sum = ',0
- 00
- 032B 456E74657220 342 SS4 DB 'Enter upper ',0
- 757070657220
- 00
- 0338 2050726F626C 343 SS3 DB ' Problem: ',0
- 656D3A2000
- 0343 204B3A2000 344 SS2 DB ' K: ',0
- 0348 204A3A2000 345 SS1 DB ' J: ',0
- 034D 493A2000 346 SS0 DB 'I: ',0
- 0351 347 ENDP
- 0351 348 ; SYMBOL TABLE
- 0351 349 ; MAIN 8[BP]
- 0351 350 ; SUM 6[BP]
- 0351 351 ; UPPER 4[BP]
- 0351 352
- 0351 353 ;
- 0351 354 ; GLOBAL VARIABLES
- 0351 0000 355 PROBLEM DW 0
- 0353 0000 356 I DW 0
- 0355 0000 357 J DW 0
- 0357 0000 358 K DW 0
- 0359 359 ; RUNTIME STACK
- 0359 360 DS 2000
- 0B29 0000 361 STACKORG DW 0
- 0B2B 362 ; MAIN stack space
- 0B2B 0000 363 DW 0
- 0B2D 0000 364 DW 0
- 0B2F 0000 365 DW 0
- 0B31 366 ; NO errors
-
-
- 0 Error(s) detected
- 5 Diagnostic(s) offered
-
-
- 817 (331H) Bytes of object code generated
-
- Symbol Table Dump:
-
- ADDEMUP.........P01E3 I...............M0353 ISEQUAL.........P01AE
- ISLESS..........P018C J...............M0355 K...............M0357
- MAIN............P0221 PROBLEM.........M0351 READ............P015A
- READLN..........P018A READ_01.........P015D READ_02.........P016A
- READ_03.........P0176 READ_04.........P017C SS0.............M034D
- SS1.............M0348 SS2.............M0343 SS3.............M0338
- SS4.............M032B SS5.............M0324 SS6.............M0318
- SS7.............M030B STACKORG........M0B29 SYS_01..........P0120
- SYS_11..........P0129 SYS_21..........P0140 SYS_22..........P0149
-
-
- page 29
-
-
- Pascal in 25 Rules or Less
-
-
- SYS_IWRT........P0127 SYS_RCHAR.......P010A SYS_SWRT........P0140
- SYS_WHEX........P0114 SYS_WRCHAR......P010F SYS_WRTLN.......P014F
- XXX0............P01A2 XXX1............P01A7 XXX2............P01C4
- XXX3............P01DC XXX4............P01D7 XXX5............P01DC
- XXX6............P01EB XXX7............P0211 XXX8............P02D7
- XXX9............P0304
-
- Where to Obtain Software
-
- We invite you to try writing your own compiler, or at
- least to try the tools and source code we used in writing
- this one. Here's what you need -- you'll find the cost very
- reasonable:
-
- The Qparser demo system. Order from QCAD Systems, 1164
- Hyde Ave, San Jose, CA 95129, phone 800/538-9787
- (408/727-6671 in CA). Price is $10 plus $2 shipping and
- handling. CA residents add sales tax. Check, money order,
- or bank card accepted. This comes with a booklet that
- carries you through a simple pocket calculator example, but
- contains the all-important parser generator needed for our
- tiny Pascal compiler. If you want to go beyond 25
- productions, the unlimited Qparser system will cost you
- $400.
-
- Source for the tiny Pascal compiler. Order from QCAD
- Systems, same address and phones as above. Ask for the
- "tiny Pascal disc": $10 plus $2 shipping and handling, plus
- sales tax, etc.
-
- The Chasm assembler. You can purchase an abridged version
- of this assembler (but powerful enough to support tiny
- Pascal) from PC Software Interest Group, 1030 E. Duane,
- suite D, Sunnyvale, CA 94086. Call 408/730-9291 for
- membership and ordering information. Ask for disk number
- 10. A membership in PC-SIG is $20 for one year, and
- entitles you to any number of software discs at $6 each from
- their catalog of over 540 discs.
- You can also order a full Chasm assembler for $40 directly
- from the author, David Whitman, 136 Wellington Terrace,
- Lansdale, PA, 19446, phones 215/641-7114 (day), 215/362-8526
- (evenings).
- Tiny Pascal was written to be compatible with Chasm, but
- it can easily be adapted to other assembler conventions.
- The Qparser demo system disc can also be ordered from
- PC-SIG for $6 -- ask for disk number 419. (This version has
- the booklet information on the disc as a README file.)
-
- References
-
- [1] Compiler Construction--Theory and Practice, Barrett,
- Bates, Gustafson and Couch, Science Research Associates,
- Chicago, second edition. This is a companion textbook to
- the Qparser system. This and reference [2] develop the
- theory of grammars, parsers, translation, symbol table
- management and much more.
-
- [2] Principles of Compiler Design, Aho and Ullman,
-
-
- page 30
-
-
- Pascal in 25 Rules or Less
-
-
- Addison-Wesley.
-
- [3] 8086 Family User's Manual, Intel Corp, 9800722-03.
- Order from the Literature Dept, Intel Corp, 3065 Bowers Ave,
- Santa Clara, CA, 95051. A definitive volume on the 8086
- architecture, development systems, software and related
- semiconductor products.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- page 31