home *** CD-ROM | disk | FTP | other *** search
Text File | 1985-09-30 | 51.5 KB | 1,454 lines |
-
-
-
- QPARSER Demonstration System
-
- Version 2.3, September 3, 1985
-
- Copyright (C) 1985, QCAD Systems, Inc.
-
- Introduction
-
- QPARSER provides a strategy and a set of software tools for compiler and
- translator development that we and other professionals have found to be
- highly productive. The strategy is one of writing semantic actions that
- respond to production rule ``triggers''. The tool set includes an LR parser
- generator and several fully worked-out example programs.
-
- QPARSER provides the grammar-directed ``front-end'', as well as several
- software tools useful in developing the semantics ``back-end'' of a
- compiler. A comprehensive user manual is part of the full system.
-
- The front-end is automatically generated, solving several difficult software
- problems. This is the half that scans and analyzes the input language forms,
- and is capable of extracting meaning from them by relating them back to the
- rules of the defining grammar.
-
- The back-end, which must be specially written for each application, is the
- operational side of the translator. It performs actions in response to the
- extracted meaning of input forms. Since translator actions are very general
- in nature, it's necessary to write these as Pascal or C program fragments
- for your application.
-
- QPARSER will generate a parser automatically, but not a complete translator.
- The parser will contain the necessary source file access, a lexical scanner,
- symbol table functions, debugging tools and semantics tools.
-
- QPARSER Features
-
- The principal features of QPARSER are:
-
- * Correctness. The language accepted by the translator will be exactly
- that specified by the grammar.
-
- * Detection of Ambiguity. If your grammar is ambiguous, or if the
- parser is unable to assure a correct translation, a diagnostic message
- will be printed.
-
- * Automatic generation of two key components. The lexical scanner and
- the LR parser are automatically generated.
-
- * Efficient table-driven parser operations. For a large grammar, most
- compiler experts agree that a table-driven LR1 approach is superior in
- both performance and space requirements to any other. Several
- optimizations are made to minimize table size.
-
- * Modularity of semantic operations. A translator programming task
- will become neatly partitioned into a number of relatively water-tight
- compartments.
-
-
-
- page 1
-
-
-
-
- QPARSER Demonstration System
-
-
- * Efficient lexical scanner. The lexical scanner operates by a CASE
- statement transfer on the next character to be scanned from the
- source. No backtracking, table searching or waste motion is involved.
-
- * Efficient and powerful symbol table mechanism. Symbol searching is
- built into the lexical scanner, which is automatically generated.
-
- * Error recovery. A reasonable error recovery system is built into
- each translator. No extra parser table space is required.
-
- * Ease of extending or changing the language. Once a translator is
- written following the guidelines in the full user manual, it'll be
- easy to extend or modify.
-
- * Ease of extending or changing the lexical scanner. The scanner is
- provided in well-documented source form. Its underlying assumptions
- are described in the user manual.
-
- What's on the Demo Disk
-
- The demo system diskette contains the following files:
-
- ShowQP.COM -- a quick demo program
- QPDEMO.TXT -- needed by ShowQP
- LR1.COM -- the parser generator program
- LR1P.COM -- skeleton expander
- PMACS.TXT -- Pascal form macro file
- LR1SKEL.PAS -- a parser ``skeleton'' file
- CALCDBUG.PAS, SKELRTBL.PAS, SKELNUM.PAS,
- SKELSYMS.PAS, SKELDTBL.PAS -- skeleton source files
- CALC.GRM -- grammar definition for calculator
- CALCSKEL.PAS -- skeleton file customized for calculator
- CALCSEM.PAS, CALCUTIL.PAS -- calculator-specific semantics
- CALC.COM -- executable calculator file
- READ.ME -- this text file
-
- The Quick Demo
-
- Select the floppy drive containing the demo diskette as the default drive.
- For example, if your floppy drive is A, then enter A: at the MS-DOS command
- level. Then run ShowQP by typing its name. Youll be able to step through a
- simple demonstration of some of Qparsers features. (If you see some peculiar
- characters, you probably need to adjust your CONFIG.SYS file according to
- the directions at the beginning of the demo.)
-
- Configuration
-
- We're going to assume that your microcomputer system is an IBM PC/XT with
- the PC-DOS operating system, though you can also run the demos on an
- IBM-compatible machine. You'll also be using some of the standard DOS
- functions in the demonstrations. A Pascal compiler and screen-oriented
- editor would make this demo system more meaningful, but aren't necessary.
-
- The full QPARSER system requires a Pascal or C compiler and a text editor.
- We recommend a system with at least 256 Kbytes of memory and a hard disk.
- Turbo Pascal (which includes a screen-oriented text editor) is provided with
-
-
- page 2
-
-
-
-
- QPARSER Demonstration System
-
-
- the full system, but not with the demo system. Order a copy of Turbo Pascal
- from QCAD Systems, and use it for additional experiments with the demo
- system.
-
- Installation
-
- If your computer has a hard disk: Copy the files from the QPARSER demo
- diskette to a new directory using the following commands, which assume that
- the hard disk is the default drive:
-
- mkdir \qparser
- copy a:*.* \qparser
- cd \qparser
-
- You will now be able to run the calculator examples from the \qparser
- directory.
-
- If your computer does NOT have a hard disk: The QPARSER demo diskette is
- double-sided, which may not be readable from certain older PC versions. Call
- us if this is a problem -- we can supply the demo system on two single-sided
- diskettes. In any case, we recommend copying some of the demo diskettes
- files -- for example, its .TXT and .PAS files -- to a blank diskette. The
- following formats a new diskette, then copies all .PAS files from disk drive
- a to drive b:
-
- format b:
- copy a:*.pas b:
-
- Getting Started
-
- QPARSER is easy to try out. The following operations will analyze the
- ``pocket calculator'' grammar and prepare a source Pascal file for a working
- interactive calculator.
-
- Start by entering the two commands
-
- LR1 CALC
- LR1P CALC -s CALCSKEL
-
- CALC.GRM contains the calculator's grammar and is read by the first command
- (the default extension .GRM is assumed). LR1 produces a grammar table
- CALC.TBL. The second command reads CALC.TBL and CALCSKEL.PAS and produces a
- full Pascal program file CALC.PAS. A short report file will appear on the
- console.
-
- Ordinarily, the CALC.PAS file would then need to be compiled, but we've
- supplied a compiled version of this file (CALC.COM) so that you can execute
- the calculator example and try out the parsing diagnostic tools without a
- compiler.
-
- If you have a Pascal compiler, you might try compiling CALC.PAS instead. It
- follows Turbo Pascal conventions, and should compile without any complaints
- with that compiler. If you are using another Pascal compiler, you may have
- to edit CALC.PAS or CALCSKEL.PAS in order to accommodate compiler
- differences.
-
-
-
- page 3
-
-
-
-
- QPARSER Demonstration System
-
-
- Executing CALC.COM
-
- When you execute CALC.COM, you should see a prompt (>). You'll then be able
- to type arithmetic expressions, whose values will be reported on the
- console. For example,
-
- 5 + 6.5*(9 - 3)
-
- yields the result 44. An assignment statement, e.g.,
-
- x := 5 + 15
-
- causes variable x to be declared and receive the value 20. Then an
- assignment such as
-
- y := x + 5
-
- can be evaluated, associating value 25 with variable y.
-
- The point of this exercise is not to run a calculator, but to demonstrate
- that a simple grammar and a short semantics file is sufficient to produce an
- interpreter for a simple language. We're next going to look at the grammar
- file (CALC.GRM) and the semantics files (CALCSEM.PAS and CALCUTIL.PAS) to
- see what QPARSER does and what is required of you as the programmer.
-
- The Calculator Grammar
-
- Any of the source files can be examined by using a text editor of your
- choice, or the EDLIN or MORE command -- see the DOS reference manuals for
- instructions.
-
- The calculator grammar file CALC.GRM is very short, so we'll reproduce it
- below:
-
- \ CALC -- A simple four-function calculator grammar
- \ which allows variable assignments.
- Goal -> Stmts QUIT #quit
- Stmts -> Stmts Stmt
- -> Stmt
- Stmt -> Expr <eol> #prtval
- -> <identifier> := Expr <eol> #assign
- -> <eol> \ allow an empty line
- Expr -> Expr + Term #plus
- -> Expr - Term #minus
- -> Term
- Term -> Term * Fact #mpy
- -> Term / Fact #divide
- -> Fact
- Fact -> Primary
- -> - Primary #uminus
- Primary -> ( Expr ) #parens
- -> <identifier> #variable
- -> <real> #realval
- -> <integer> #intval
-
- What does all this mean? First, anything starting with a backslash (\) is
-
-
- page 4
-
-
-
-
- QPARSER Demonstration System
-
-
- just a comment. Secondly, a symbol preceded by a sharp sign (#) is a
- production rule tag, whose purpose will be explained later. What remains is
- a list of production rules, whose general form is like this:
-
- <left-member> -> <right-member>
-
- Whenever a left-member is missing, it is assumed to be the same as for the
- preceding rule. Thus the third rule, written
-
- -> Stmt
-
- above, is actually
-
- Stmts -> Stmt
-
- The primary purpose of a grammar is to define a language -- in this case,
- the algebraic formulas and assignment statements that can be legally entered
- when the calculator is executed. A grammar is a set of production rules, or
- just rules for short. Each rule has exactly one left member element and one
- or more right member elements, each of which we shall call a token. The
- tokens are separated by one or more spaces.
-
- For example,
-
- Term -> Term * Fact
-
- is a rule in this grammar. Its left member token is Term, and the three
- right member tokens are Term, *, and Fact. It says, in essence, that any
- input string form that is to be considered a Term must consist of a Term,
- followed by an asterisk (*), followed by a Fact.
-
- A Term and a Fact can each be expanded into any of a large number of token
- sequences, while the asterisk (*) stands for itself. Tokens of the former
- kind are called nonterminals, while tokens of the latter kind are called
- terminals.
-
- Another rule is
-
- Term -> Term / Fact
-
- The left member token Term has been omitted in the grammar file, and is
- therefore inherited from the previous rule in the grammars rule list. This
- rule says that a Term consists of a Term followed by a slash (/) followed by
- a Fact.
-
- The first rule in the list
-
- Goal -> Stmts QUIT
-
- will expand into the entire language. The left member token Goal of this
- rule is not permitted anywhere else in the grammar. This rule says that any
- legal language form must consist of a member of Stmts followed by the token
- QUIT.
-
-
-
-
-
- page 5
-
-
-
-
- QPARSER Demonstration System
-
-
- How a GOAL Symbol Expands into Forms
-
- Lets start with the first rule:
-
- Goal -> Stmts QUIT
-
- We see that Stmts appears on the left of two rules:
-
- Stmts -> Stmts Stmt
- Stmts -> Stmt
-
- This means that we can choose either one of these two rules, and then
- substitute the right members for one appearance of the left member. Let's
- choose the first rule, and make the substitution:
-
- Goal -> Stmts QUIT -> Stmts Stmt QUIT
-
- Well, we don't seem to be getting anywhere, because the token Stmts appears
- in our new, expanded form. However, we can close this cycle by choosing the
- second Stmts rule for a substitution:
-
- Goal -> Stmts QUIT -> Stmts Stmt QUIT -> Stmt Stmt QUIT
-
- It should be clear that we can get a sequence of Stmt forms by choosing the
- first rule several times. Each substitution step is called a derivation
- step; we are deriving a sentence in the language from the Goal token.
-
- Now let's look at the Stmt rules. There are three of them, as follows:
-
- Stmt -> Expr <eol>
- Stmt -> <identifier> := Expr <eol>
- Stmt -> <eol>
-
- The special token <eol> is a predefined terminal token that stands for an
- enter. The special token <identifier> stands for any Pascal-like user name;
- for example, L1, SAM_SMITH, MARY_JONES are all legal <identifier>s. The Expr
- is associated with some more rules, permitting more substitutions.
-
- Let's choose the first of these Stmt rules for the first substitution. We
- then have, starting from the Goal again:
-
- Goal -> Stmts QUIT -> Stmt QUIT -> Expr <eol> QUIT
-
- More substitutions are now possible. There are three Expr rules; two of
- these include an Expr and a Term. There are also three Term rules, etc.
- Notice that the algebraic tokens +, -, *, / appear in these.
-
- We'll continue the expansion by making various choices along the way. We
- suggest trying one yourself on paper, to see just how many different ways a
- Goal can be expanded into sequences of algebraic formulas.
-
-
-
-
-
-
-
-
- page 6
-
-
-
-
- QPARSER Demonstration System
-
-
- Goal -> Stmts QUIT
- -> Stmt QUIT
- -> Expr <eol> QUIT
- -> Expr + Term <eol> QUIT
- -> Term + Term <eol> QUIT
- -> Fact + Term <eol> QUIT
- -> Primary + Term <eol> QUIT
- -> <integer> + Term <eol> QUIT
- -> <integer> + Fact <eol> QUIT
- -> <integer> + Primary <eol> QUIT
- -> <integer> + ( Expr ) <eol> QUIT
- -> <integer> + ( Term - Term ) <eol> QUIT
- -> <integer> + ( Term * Fact - Term ) <eol> QUIT
- -> <integer> + ( Fact * Fact - Term ) <eol> QUIT
- -> <integer> + ( Primary * Fact - Term ) <eol> QUIT
- -> <integer> + ( <real> * Fact - Term ) <eol> QUIT
- -> <integer> + ( <real> * Primary - Term ) <eol> QUIT
- -> <integer> + ( <real> * <identifier> - Term ) <eol> QUIT
- -> <integer> + ( <real> * <identifier> - Fact ) <eol> QUIT
- -> <integer> + ( <real> * <identifier> - Primary ) <eol> QUIT
- -> <integer> + ( <real> * <identifier> - <integer> ) <eol> QUIT
-
- Whew! Fortunately, that's as far as we can go, because all the tokens are
- terminal. Now, <integer> stands for any decimal integer, and <real> for a
- floating-point number. So our derived sentence might look like this:
-
- 15 + (17.65 * SAM - 45) <eol> QUIT <eol>
-
- The <eol> will be interpreted as an `enter' when the parser is executed. The
- last enter will cause an exit from the calculator program.
-
- Parsing
-
- We've seen how a grammar can be written that defines some language we're
- interested in. At this point, QPARSER can take over and produce a program
- that can invert these derivation steps. When a derivation step is inverted,
- we say that a reduce step or a reduction has occurred. Such a program is
- called a bottom-up parser.
-
- The program LR1 performs an analysis of the grammar and produces a set of
- parser tables. Then program LR1P produces a parser program in Pascal source,
- using a Pascal-based macro file PMACS.
-
- The actions performed by LR1 and LR1P are roughly as follows:
-
- 1. The nonterminal and terminal tokens are identified. The terminal
- tokens are used to construct a lexical scanner, which when executed
- reads source strings and breaks them into the terminal token pieces.
-
- 2. The grammar rules are then used to construct a so-called LR state
- machine, which is an abstract description of a parser.
-
- 3. The abstract state machine description is turned into a specific
- Pascal or C program. The state machine is carried in the program in
- tabular form, and the parser is a table interpreter. The lexical
- scanner is expressed directly as Pascal or C statements.
-
-
- page 7
-
-
-
-
- QPARSER Demonstration System
-
-
-
- Following the Parser's Steps
-
- You can follow the parsing steps of the CALC.GRM grammar by running CALC.COM
- again. This time, type the character ! before entering an expression, for
- example,
-
- ! 15 + (17.65 * 22 - 45)
-
- The ! character will bring up a debugging prompt before any of the rest of
- the line is scanned. The debug prompt looks like this:
-
- I(dentifier, D(ebug level, A(ll symbols, C(ontinue?
-
- Type D, then 1 (enter), to choose debug level 1. You'll get the debug prompt
- again, so type C to continue. You should see a production printed:
-
- Primary -> <integer>
-
- This indicates that the 15 in the input string has been scanned (the
- <integer>) and has been reduced to a Primary. The debug prompt will again
- appear, so type C. The next production appears:
-
- Expr -> Term
-
- This indicates that Primary has been reduced to Term and now Term is about
- to be reduced to Expr. The production rule Term -> Primary will always be
- skipped over during parsing due to an optimization in the parsing tables.
-
- Typing the next C will bring up
-
- Primary -> <real>
-
- This says that the 17.65 in the input string has been scanned and reduced to
- a Primary. Keep typing C, and you'll see the rest of the productions. Here
- they are, as they will appear during the parsing process; weve added
- comments to show what's happening:
-
- Primary -> <integer> \ 15
- Term -> Term * Fact \ 17.65 * 22 = 388.3
- Expr -> Term
- Primary -> <integer> \ 45
- Expr -> Expr - Term \ 388.3 - 45 = 343.3
- Primary -> ( Expr )
- Expr -> Expr + Term \ 15 + 343.3 = 358.3
- Stmt -> Expr <eol> \ the whole line
-
- At this point, C will print a result and produce another prompt:
-
- = 358.3
- >
-
- Lets type in QUIT. We'll then see one more production:
-
-
-
-
-
- page 8
-
-
-
-
- QPARSER Demonstration System
-
-
- > QUIT
- Goal -> Stmts QUIT
-
- One more C and the CALC program is done.
-
- Looking at the Semantics Stack
-
- In the above example, you've surely noticed that the first number (15) was
- scanned, then set aside temporarily until the parenthesized expression
- (17.65*22-45) could be scanned. Where was the value 15 kept while this extra
- work was going on?
-
- The answer is that it was held on a semantics stack. The semantics stack is
- a fundamental data structure of QPARSER. Every time a token is read from the
- input stream, state information is pushed on the stack associated with the
- token. A token like <integer> causes the integer value to be saved on the
- stack. The token <real> causes a floating-point number to be saved, and the
- token <identifier> causes a name to be saved.
-
- When a production rule is applied, the top of the semantics stack will
- always carry information that corresponds to the right members of the
- production. Let's do another trace and see how that information is carried.
-
- Execute CALC.COM again with the same expression:
-
- > ! 15 + (17.65*22-45)
-
- This time, choose D(ebug level 2, then C(ontinue. Youll see the following
- display:
-
- <integer> FIXED: 15
- Primary -> <integer>
-
- This says that the semantics stack contains just one item, the first integer
- in the input stream. FIXED refers to its type, and the 15 is of course its
- value as held in the stack.
-
- The next C(ontinue produces the following:
-
- Primary FLOAT: 1.500E+01
- Expr -> Term
-
- This announces that the fixed point number 15 has been converted into a
- floating point number. That action was caused by a semantics action in
- response to the production Primary -> <integer>. We'll see where that
- happened in the source program later.
-
- The next C(ontinue produces the following:
-
- Expr FLOAT: 1.500E+01
- + OTHER:
- ( OTHER:
- <real> FLOAT: 1.765E+01 \ TOS
- Primary -> <real>
-
- Here we've scanned the +, the ( and the real number 17.65. The stack has
-
-
- page 9
-
-
-
-
- QPARSER Demonstration System
-
-
- four items -- its top (noted as TOS -- ``top-of-stack'' -- above) is shown
- at the end of the stack report. The production Primary -> <real> is about to
- be applied, and its right member refers to the <real> on the top of the
- stack.
-
- Now we see where the 15 went while the remainder of the expression is being
- worked on -- it's held in the stack beneath the remaining material. It will
- reappear later as the other material is reduced and popped from the stack.
-
- Push C twice -- well skip one step involving the production Primary ->
- <integer>. You should see the following stack:
-
- Expr FLOAT: 1.500E+01
- + OTHER:
- ( OTHER:
- Primary FLOAT: 1.765E+01 \ TOS-2
- * OTHER: \ TOS-1
- Primary FLOAT: 2.200E+01 \ TOS
- Term -> Term * Fact
-
- The productions right member (Term * Fact) seems not to correspond with the
- three top stack members, but it does. The parser generator has optimized
- parsing by eliminating some unnecessary reduction steps. The Term
- corresponds to the Primary at TOS-2, and the Fact corresponds to the Primary
- at TOS.
-
- This production indicates that a multiplication is supposed to take place
- between the numbers in TOS and TOS-2. The product is 388.3 and will appear
- after you hit C again:
-
- Expr FLOAT: 1.500E+01
- + OTHER:
- ( OTHER:
- Primary FLOAT: 3.883E+02 \ TOS
- Expr -> Term
-
- Notice that the top three stack elements have been removed and replaced by a
- single element -- the floating point number 388.3. Hit C twice. You should
- get the following display:
-
- Expr FLOAT: 1.500E+01
- + OTHER:
- ( OTHER:
- Expr FLOAT: 3.883E+02 \ TOS-2
- - OTHER:
- Primary FLOAT: 4.500E+01 \ TOS
- Expr -> Expr - Term
-
- Here, the last number (45) has been scanned. The stack is set up for a
- subtraction. Since the Expr in the production corresponds to TOS-2, and the
- Term to TOS, the calculator program is expected to subtract 45 from 388.3,
- yielding 343.3. That's shown in the next display, after hitting C:
-
-
-
-
-
-
- page 10
-
-
-
-
- QPARSER Demonstration System
-
-
- Expr FLOAT: 1.500E+01
- + OTHER:
- ( OTHER:
- Expr FLOAT: 3.433E+02
- ) OTHER:
- Primary -> ( Expr )
-
- This production won't do any arithmetic, but must make sure that the Expr
- value 343.3 will appear on the stack top, associated with the Primary. The
- reduce action causes the parentheses to disappear, yielding:
-
- Expr FLOAT: 1.500E+01
- + OTHER:
- Primary FLOAT: 3.433E+02
- Expr -> Expr + Term
-
- At last, the 15 reappears at the right place to perform an addition. That
- operation yields 358.3, and the smaller stack:
-
- Expr FLOAT: 3.583E+02
- <eol> OTHER:
- Stmt -> Expr <eol>
-
- This production prints the resulting Expr value on the console, so after
- hitting C, you should see:
-
- = 358.3
- >
-
- Type in ``QUIT'', then another C to end the program.
-
- Some Observations
-
- We've run through a complete trace for the pocket calculator example. What
- should be apparent is how the form of the grammar rules determine when
- operations are needed, and also how the semantics stack mechanism ensures
- that information is available when it is needed.
-
- The parser generator system has solved several algorithmic tasks for you:
-
- * The derivation of an input string has been uncovered and
- reproduced in backward order, as a parser.
- * Information is carried on the semantics stack until needed later.
- * Names and numbers are scanned and put into the stack at the right
- place.
- * When each production rule pops up, the top of the semantics stack
- carries just what is needed for the operation it implies.
-
- You can do any number of different things when a production rule pops up.
- We'll look at how that works next. We'll also take a look at some of the
- type declarations and procedures that implement the parser.
-
-
-
-
-
-
-
- page 11
-
-
-
-
- QPARSER Demonstration System
-
-
- Semantics Coding
-
- Most of what you've seen in the semantics trace comes through the parser
- generator tools and the skeleton files. However, one important part isn't
- automatic -- you must write a semantics procedure called APPLY.
-
- Procedure APPLY is the interface between the automatically generated parser
- and the semantics operations that you must design and write. Please look at
- APPLY, founnd in file CALCSEM.PAS, while you read on.
-
- APPLY is essentially a large CASE statement, whose tag names come from the
- tags attached to the production rules in the grammar. We've deferred
- discussion of these tags until now. Recall that a production rule tag is the
- name following a sharp sign in a production. For example, in CALC.GRM, the
- production
-
- Expr -> Expr + Term #plus
-
- carries the tag plus, which you will find in the CALCSEM.PAS source. The
- APPLY action for this production rule is essentially to collapse the
- information in the top three elements of SEMSTACK (corresponding to the
- right member Expr + Term) into a single element (corresponding to the left
- member Expr), as shown in figure 1.
-
- -----------------------------------------------------
- TOS
- +---+---+---+---+---+
- | | | E | + | T | semstack: BEFORE apply action
- +---+---+---+---+---+ production E -> E + T
- ^ ^ ^
- | | +---- stackx
- | +-------- stackx-1
- +------------ stackx-2
- . . . . . . . . . . . . . . . . . . . . . . . . . . .
-
- TOS
- +---+---+---+
- | | | E | semstack: AFTER apply action
- +---+---+---+
- ^
- +------------ stackx
-
- Figure 1. Before and after APPLY operations on production
- E -> E + T
- ----------------------------------------------------------
-
- Here's what procedure APPLY looks like, with some of the material removed
- for the sake of discussion:
-
-
-
-
-
-
-
-
-
-
- page 12
-
-
-
-
- QPARSER Demonstration System
-
-
- procedure APPLY(PFLAG, PRODLEN: int; TSEMP: semrecp);
- { Apply customized for the calculator. All operations are kept
- track of on the semantic stack. }
- begin
- case pflag of
- { omitted material }
- DIVIDE, {Term -> Term / Fact}
- MINUS, {Expr -> Expr - Term}
- MPY, {Term -> Term * Fact}
- PLUS: {Expr -> Expr + Term}
- eval_binop(pflag, semstack[stackx-2],
- semstack[stackx], tsemp);
- { omitted material }
- end { apply case }
- end;
-
- Notice the PLUS tag. When the production Expr -> Expr + Term is ready to be
- applied, the parser system will call APPLY with a value of PFLAG that will
- direct control to the PLUS statement. This statement happens to be shared
- with DIVIDE, MINUS, and MPY, but it doesn't have to be. You can -- and in
- fact you must -- distinguish production rules with distinct tag names. You
- can also omit the tag on any production that requires no special operation.
-
- Notice that the function EVAL_BINOP passes the flag parameter PFLAG, and
- three more parameters. The second parameter, SEMSTACK[STACKX-2], refers to
- the TOS-2 element of the semantics stack. The third parameter,
- SEMSTACK[STACKX], refers to the TOS element. As we've seen in the trace,
- these carry information associated with the token.
-
- The result of a stack operation is passed back through the variable TSEMP.
-
- Procedure EVAL_BINOP
-
- This procedure is in file CALCUTIL.PAS. Note that the two operands are
- passed as OPND1 and OPND2, respectively, and these are of type SEMRECP,
- which is the type of SEMSTACK[xxx].
-
- EVAL_BINOP just sorts out which operation it is to perform, checking for the
- possibility that the entry might be OTHER. (OTHER will creep in through a
- syntax error recovery patch.) The RESULT value will always be a real number,
- and SEM_TYPE will therefore always be FLOAT. EVAL_BINOP also attempts to
- produce a reasonable result, even though either of the operands might be the
- default type OTHER.
-
- The procedure WRITE_VALUE, also in CALCUTIL.PAS, essentially decides how to
- print a number. All numbers are stored in floating-point form, including
- integers, but a real number might be sufficiently close to an integer to
- justify printing it that way. That's most of WRITE_VALUE. There are clearly
- other strategies that can be adopted.
-
- The last two procedures, INIT_SEM and END_SEM, are called before any parsing
- action is taken, and after all actions are complete, respectively. You can
- use these for any initializations required at these times. Usually none is
- required.
-
-
-
-
- page 13
-
-
-
-
- QPARSER Demonstration System
-
-
- The Types of SEMSTACK and TSEMP
-
- You may wish to look at the Pascal type of the variables SEMSTACK and TSEMP.
- They are part of the global declarations in file CALCSKEL.PAS. SEMSTACK is
- an array of pointers to the record type SEMREC. Its top of stack will be the
- integer index STACKX.
-
- Record SEMREC carries different cases for the different kinds of things to
- be carried in the semantic stack -- IDENT for identifiers, FIXED for
- integers, FLOAT for real numbers, etc. The record variable SEMT indicates
- which kind of object is in place, making it possible for a general function
- like EVAL_BINOP to decide how to carry out an operation.
-
- The default SEMT type OTHER is used for semantic objects that carry no
- special information, such as the operation tokens +, *, etc.
-
- TSEMP is a pointer to a SEMREC record. In passing information back through
- an APPLY call, the record field pointed to by TSEMP should be filled with
- something that will appear on the stack. You can choose to do nothing, in
- which case TSEMP will point to the default case OTHER. There's also a
- special case that's handy to know about: any single production with no tag
- causes its right member to be copied into its left member. A single
- production is a production rule with exactly one right-member token. For
- example, the rule
-
- Fact -> Primary
-
- qualifies as a single production. When its tag is absent, the Primary
- information is just carried along on the stack without change.
-
- Symbols and the Symbol Table
-
- QPARSER provides a complete set of symbol table access functions. You may
- wish to look at their source, in file SKELSYMS.PAS. The global declarations
- associated with these functions are in CALCSKEL.PAS -- look at the record
- SYMTABTYPE, and the other types and variables associated with it.
-
- A symbol is picked up by the lexical scanner whenever the grammar token
- <identifier> is scanned. The lexical scanner in QPARSER assumes that names
- and other tokens follow Pascal rules -- but you can modify or rewrite the
- scanner to suit other language disciplines.
-
- When an <identifier> is scanned, a default symbol table entry is built for
- it, and a pointer to that entry appears in the semantics stack. Symbol table
- entries have the form of the SYMTABTYPE record, and are allocated from the
- Pascal heap. An efficient name search algorithm is used, based on a hash
- table. These search algorithms are expressed in the functions in file
- SKELSYMS.PAS.
-
- The skeleton file symbol table functions are suitable for flat or
- block-scoped names. The user manual and the mini-Pascal compiler (included
- in the full QPARSER package) describe how they can be used in considerable
- detail to cover a variety of special situations. Names can be unlinked from
- the table at the end of a block, and can also be deallocated from the heap.
-
-
-
-
- page 14
-
-
-
-
- QPARSER Demonstration System
-
-
- Tracing Symbols
-
- You can see how the symbol tracing system works by running CALC.COM again.
- This time, keep the debug level at 0, and enter each of the following
- formulas:
-
- X := 15
- Y := X - 40
- Z := X + Y
-
- The calculator should report that X = 15, Y = -25, and Z = -10. Now enter a
- ! character:
-
- !
-
- You will get the usual debug menu:
-
- I(dentifier, D(ebug level, A(ll symbols, C(ontinue?
-
- When you type A, you will be shown all the names known in the symbol table:
-
- Z Y X
-
- (The order of these names is an accident of the hash table.) When you type
- I, you'll be prompted for a name, and then its attributes as found in the
- symbol table will be reported:
-
- I(dentifier, D(ebug level, A(ll symbols, C(ontinue? I
- What symbol? y
- Y REAL_VAR 0: -2.5000E+01
-
- Some of these attributes will always be provided. Others will require
- writing a short program segment to produce. In the above line, Y, REAL_VAR
- and the 0 (scope level) are essentially automatic. Printing the numeric
- value (-2.5000E+01) requires adding a write to the standard debugging file.
- To see what's needed, look at CALCDBUG.PAS, which contains the trace and
- interactive dump procedures. Most of this is generic to any grammar or
- language. The line needed to support the calculator may be found in
- procedure DUMP_SYM, as follows:
-
- {---------- following line is specific to CALCDBUG -----------}
- real_variable: write(rfile, rval);
-
- Error Recovery
-
- The calculator program illustrates the generic error recovery system
- provided with every generated translator. Error recovery operates from the
- parsing tables used by the parser. It attempts to find a local patch in the
- vicinity of the error. The patch must be compatible with the parsing
- operations and is searched for in a special procedure that does not
- interfere with the mainstream of parsing operations. When a patch is found,
- the machine state and stack are returned to the main parser such that
- parsing can continue as though there were no error.
-
- The error recovery system does not depend on line endings or special
-
-
- page 15
-
-
-
-
- QPARSER Demonstration System
-
-
- delimiter tokens. It will operate as effectively for a block-structured
- language as for a line-oriented language. It will usually find a reasonable
- patch in one step. It does not require scanning ahead through more tokens in
- order to find a patch, although that would help make a more effective
- decision.
-
- Try running CALC.COM with some invalid formulas. The following illustrates
- two cases and the recovery report. Note that the syntax error report points
- to the position of the error in the input line:
-
- > x y + 15
- ^
- ERROR: syntax error
- Type any character to continue:
- = 5.0000000001
-
- Here the patch yielded some new expression; the program nevertheless
- produced a partial result.
-
- > x := y ++ 15
- ^
- ERROR: syntax error
- Type any character to continue:
- Undefined variable ERR#AA
- X := -10.000000001
-
- Here, the patch amounted to creating and inserting a new identifier between
- the two + signs, ERR#AA. This will be tagged as an ERRSYM in the symbol
- table, but our semantics routine didn't pay attention to that and complained
- about the undefined variable. It's possible to continue semantics checking,
- working around the possible error insertions, but it would be easier to just
- suppress all semantics operations upon finding the first syntax error.
-
- SUMMARY: What's Automatic and What Isn't
-
- We can now review what QPARSER provides automatically and whats required of
- you as a programmer in order to generate a complete translator or compiler.
-
- In general, you must design and implement the grammar and the semantic
- functions. You will need to expand the number of categories of the SEMREC
- and SYMTABTYPE record structures, since these contain only a few simple
- types.
-
- When these are expanded, some additions will be needed (e.g. to
- SKELDBUG.PAS) to support symbolic tracing and debugging, as weve seen.
-
- Designing and writing the semantic functions start with the APPLY procedure.
- When you are starting from scratch with a grammar, the parser generator will
- produce a dummy APPLY procedure for you automatically. It will look like
- this, using the CALC.GRM grammar as an example:
-
-
-
-
-
-
-
-
- page 16
-
-
-
-
- QPARSER Demonstration System
-
-
- procedure APPLY(PFLAG, PRODLEN: int; TSEMP: semrecp);
- begin
- case pflag of
- ASSIGN:
- begin { Stmt -> <identifier> := Expr <eol> }
- writeln(rfile, ASSIGN applied.);
- end;
- DIVIDE:
- begin { Term -> Term / Fact }
- writeln(rfile, DIVIDE applied.);
- end;
- { ... material omitted ... }
- UMINUS:
- begin { Fact -> - Primary }
- writeln(rfile, UMINUS applied.);
- end;
- VARIABLE:
- begin { Primary -> <identifier> }
- writeln(rfile, VARIABLE applied.);
- end;
- end
- end;
-
- The generator has provided a separate CASE item for each tagged production,
- in alphabetic order, and has included the production for reference purposes
- as a comment. It has also provided a writeln to the report file rfile.
-
- You can compile and execute a generated program based on nothing more than a
- grammar. Tagged productions will announce themselves during its execution as
- indicated above. You can then start to fill in the separate slots in the
- APPLY case table with ``real semantics operations, testing the result as you
- add various modules.
-
- The generic tracing and debugging tools will also operate with no special
- additions, although they cannot report your SEMREC or SYMTABTYPE extensions
- without some help.
-
- The Full QPARSER System
-
- This demo guide is essentially the introductory chapter of the full QPARSER
- user manual. The full system contains, in addition to an unrestricted parser
- generator and the calculator example discussed above, all of the following:
-
- 1. A comprehensive user manual containing a language tutorial, a
- detailed discussion of the lexical scanner, symbol table, semantics
- stack, use of the heap, the example assembler, simulator and
- mini-Pascal compiler, and a technical summary. The user manual also
- contains listings of all supplied source programs.
-
- 2. A C skeleton and a companion macro file CMACS.TXT that will
- produce a complete, ready-to-compile parser in Kernighan and Ritchie
- C.
-
- 3. A grammar checking and analysis program. This can produce a
- cross-reference table of the grammar and report all terminal and
- nonterminal tokens. It looks for useless nonterminals and other
-
-
- page 17
-
-
-
-
- QPARSER Demonstration System
-
-
- grammar problems, such as a single production cycle. It can report
- FIRST, FOLLOW and PREDECESSOR sets, which are valuable in
- understanding LR conflicts and in designing semantics structures.
-
- 4. A simulator and assembler for a simple Pcode-style stack machine
- is provided, in Pascal source form. The assembler makes use of the LR
- translator system -- its grammar is also provided. If you want to
- write an assembler for any purpose or machine, this provides a good
- starting point.
-
- 5. A Pascal subset compiler is provided in Pascal source form, along
- with its grammar and a detailed discussion of its internals. This
- illustrates the design of a user type declaration scheme, full
- expression evaluation, constant folding, several control structures
- with some optimizations, compound name access evaluation with
- optimizations, and a procedure call/exit mechanism. If you want to
- write a compiler or interpreter for a language using any or all of
- these Pascal features, this grammar and source material is a good
- starting point.
-
- 6. A copy of Turbo Pascal is provided. All of the Pascal source
- materials are compatible with this compiler. It contains a full screen
- editor as well. You can order this product separately from us if you
- wish to explore the demo system more fully. Turbo Pascal has certain
- limitations, as summarized below. For a large project, you will likely
- prefer a more powerful Pascal or C compiler.
-
- Educational Applications
-
- Are you an instructor of a course in compiler construction or language
- theory? If so, you will find QPARSER a valuable laboratory addition to your
- course. The system has been adopted by several educational institutions
- (write or call for more information). The second edition of the textbook
- written by Dr. William Barrett supplements QPARSER, and is scheduled for
- release by SRA (Chicago) by December, 1985.
-
- In addition to the obvious grammar and semantics experiments that can be
- performed, the parser generator can produce detailed item sets that
- eventually form the LR state machine. The effects of various optimizations
- can also be followed, and the item-sets traced through to the final state
- machine.
-
- You can observe the development of the item-sets by selecting a fairly high
- report level when executing LR1 -- these range from 0 to 6 and provide an
- increasingly detailed set of reports. For example, here's a command line
- that will provide such a report file:
-
- LR1 CALC -l3 CALCRPT.TXT
-
- There are no hidden object files or undocumented features in QPARSER. LR1
- produces a set of parser tables that are clearly defined by the source
- access procedures and the parser itself. Many of the experiments can be
- performed without writing any Pascal programs.
-
-
-
-
-
- page 18
-
-
-
-
- QPARSER Demonstration System
-
-
- Copy Protection
-
- Only one program is copy protected in the full system -- thats the parser
- generator LR1. Its provided in object form only. The IBM versions use
- SoftGuard. After installing LR1 to a hard disk or floppy diskette, you can
- then freely execute LR1 without having the distribution diskette physically
- present. You can also install a second copy for backup purposes.
-
- The university editions enable multiple copies to be made from a single
- distribution diskette. The instructor must submit a signed qualification
- form to us.
-
- LR1 Specifications
-
- The parser generator LR1 constructs an item-set state machine with an LR(0)
- construction. Inadequate states are identified and converted into lookahead
- states. The lookahead token sets for reduce actions are computed by the
- DeRemer-Penello LALR(1) method, as reported in ACM Transactions on Program
- Languages & Systems, Vol 4, No. 4, October 1982.
-
- A failure to resolve a state by LALR results in a resolution report and a
- default resolution. Special tags may be added to production rules to guide
- the resolution process.
-
- Resolution may result in one or more useless lookahead states; these are
- removed along with any unreachable states found. If the halt state cannot be
- reached from the goal state, an error report is generated.
-
- Untagged single productions are bypassed if the LR state machine so permits.
- The resulting state tables are folded by identifying common subsequences.
- These tables are of a sparse-matrix type, resulting in great economy of
- representation of the final machine, without sacrificing performance.
-
- Several tables are produced to facilitate symbolic debugging -- these
- include production and token tables. These may easily be suppressed along
- with the debugging procedures for a production version of the translator.
-
- The automatically-generated lexical scanner assumes that Pascal-style
- lexical analysis is appropriate. These assumptions are explained in the user
- manual and are also manifest in the skeleton source files. A choice of
- Pascal or C is provided for automatic generation. For other host languages,
- you will have to write your own skeletons, perhaps by editing a Pascal or C
- skeleton.
-
- All the generated material and all the skeleton files are provided in
- machine-readable source form, permitting their adaptation to other host
- languages.
-
- Limitations of LR1
-
- The demo version of LR1 is limited to 25 production rules.
-
- The full system is limited to 255 terminal tokens and 255 nonterminal
- tokens. There are no other hard-bound limits other than maximum available
- heap memory size. LR1s data structures are allocated from memory in just
- sufficient quantities as needed. Space no longer needed is disposed or
-
-
- page 19
-
-
-
-
- QPARSER Demonstration System
-
-
- released. The program requires about 65 Kbytes, and the stack another 65
- Kbytes maximum. MS-DOS uses another 30 Kbytes. The rest of the memory is
- available for heap.
-
- Weve found that the mini-Pascal grammar, with 140 productions, requires
- about 200 Kbytes of heap space. LR1 completes in less than two minutes on an
- IBM XT.
-
- More About Turbo Pascal
-
- All of the examples supplied with QPARSER will compile and execute under
- Turbo Pascal. Turbo has a number of features that make it attractive for
- software development. Its screen-oriented editor is nicely integrated with
- the compiler. Syntax errors are reported through the editor by the screen
- cursor. Run-time errors can be traced to a single line in the program in the
- editor.
-
- Turbo may be ordered separated at the regular retail price from QCAD
- Systems, Inc. One copy is included with the full QPARSER system.
-
- However, you will find that Turbo has a number of size limitations that may
- interfere with the development of a large compiler system, as follows:
-
- * The compiled Turbo program space is limited to 64 Kilobytes. This limit
- can be exceeded somewhat by using overlays. The limit does not depend on the
- total memory of your machine.
-
- * The Turbo compiler does not support independently compiled modules. Thus
- the entire Pascal program, with all include files, must fit within the
- compilers space limitations.
-
- * The Turbo editor is limited to 63 Kilobytes of source data. It is
- possible that the LR1 generator will produce a file too large to be accepted
- by this editor. Note that EDLIN (on the IBM) can accept an arbitrarily large
- source file.
-
- * There are certain other limitations explained in the Turbo manual that
- may affect your development.
-
- The LR1 parser generator will make full use of the maximum memory of your
- system, however, and can generate an arbitrarily large program file.
-
- If you are impacted by the Turbo memory limitations, we recommend the use of
- a different editor and/or compiler for the compilation of the generated
- translators.
-
- Other Systems and Services
-
- QPARSER can be provided on the VAX/VMS, MacIntosh 512K, and the
- Hewlett-Packard series 200. Quantity and dealer discounts available. Site
- and source licenses available. Consultation with our expert staff on
- specific language or translator issues can be arranged. Call for details and
- prices.
-
-
-
-
-
- page 20
-
-
-
-
- QPARSER Demonstration System
-
-
- For Further Study
-
- We hope weve given you some notion of whats required to make effective use
- QPARSER user manual is itself a condensed course in language
- theory, and is available separately. The following is a short bibliography
- of books on compiler theory.
-
- Compiler Construction: Theory and Practice, Barrett and Couch, Science
- Research Associates, Inc, second edition, 1985.
-
- Principles of Compiler Design, Aho and Ullman, Addison-Wesley, 1979.
-
- The Theory of Parsing, Translation, and Compiling, two volumes, Aho and
- Ullman, Prentice-Hall, 1973.
-
-
-
- Turbo Pascal is a registered trademark of Borland International.
- QPARSER is a trademark of QCAD Systems, Inc.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- page 21
-
-
-
-
- QPARSER Demonstration System
-
-
- PRODUCT ORDER FORM
-
- QCAD Systems, Inc.
- 1164 Hyde Avenue, San Jose, CA 95129 (408) 727-6671
- TOLL-FREE: (800) 538-9787 (outside California)
-
- Quantity Item Unit Price Total
- -----------------------------------------------------------------------------
- Qparser Industrial system $400.00
-
- Qparser University system* 400.00
-
- Qparser Demo Package 10.00
-
- Qparser User Manual 25.00
-
- Turbo Pascal 3.0 69.95
-
- Qtools -- Unix-like tool kit 49.50
-
- Qroff -- Laserjet typesetting system 79.95
-
- SUB-TOTAL
- TAX (CA residents)
- SHIPPING (overseas)
- GRAND TOTAL
-
- * Please enclose educational agreement
-
- Name __________________________________________________________________
-
- Company _______________________________________________________________
-
- Address _______________________________________________________________
-
- City _________________________________ State ____________ ZIP _________
-
- Computer System _______________________________
-
- Method of payment:
-
- ___ check enclosed
- ___ charge card #: __________________________ Exp: ____________
-
- Name on card: _____________________________________________
- Visa ___ Mastercard ___
-
- ___ purchase order #: _______________________
-
- (prices subject to change without notice)
-
- OVERSEAS SHIPPING: $5.00 each for Qroff, Qtools, Qparser demo;
- $20.00 for full Qparser system; $10.00 for Qparser manual or Turbo.
-
-
-
-
-
- page 22
-
-
-
-