The Education Master 1994 (4th Edition)

home *** CD-ROM | disk | FTP | other *** search

/ The Education Master 1994 (4th Edition) / EDUCATIONS_MASTER_4TH_EDITION.bin / files / progmisc / qparser2 / qparser.doc

Wrap

Text File | 1993-11-07 | 112.2 KB | 2,564 lines

QPARSER+ User Manual page 1 QPARSER+ SHAREWARE User Manual QCAD Systems 1164 Hyde Avenue, San Jose, CA 95129 Table of Contents Chapter 1. Introduction page 2 Chapter 1. QPARSER+ in Brief page 3 Chapter 2. Parser Generator Overview page 7 Chapter 3. Grammars page 11 Chapter 4. Building a Parser page 15 Chapter 5. The Semantics Stack page 17 Chapter 6. Accessing Semantic Elements page 23 Chapter 7. More about the Lexical Analyzer page 27 Chapter 8. Embedded Semantics page 29 Chapter 9. Using the DEBUG Option page 33 Chapter 10. Syntax Trees page 34 Chapter 11. Miscellaneous Topics page 44 Chapter 12. Debugging a Translator page 46 Chapter 13. Registration and Support page 49 Chapter 14. For Additional Reading page 50 QPARSER+ User Manual page 2 Chapter 1. Introduction Almost every computer program depends upon some type of textual input. The language associated with that input may be as complex as a programming language like Pascal and C or as simple as a series of yes and no responses. QPARSER+ provides the utilities for the development of a translator capable of understanding an input form and then converting that form into an alternate output form. This definition of computer translation may sound quite foreign to you, but if you work with computers it is quite possible that you use a `translator' every day since one of the most common translator applications is the language compiler. A product like Turbo Pascal, for example, is a translator used to convert Pascal statements into machine code. QPARSER+ may be used to define new programming languages, but that is by no means the only use for the QPARSER+ system. For example, many of our customers have used the product to write user-friendly interfaces for their programs since QPARSER+ facilitates the development of interfaces capable of understanding complex user-input forms. The applications for QPARSER+ are almost limitless since almost every computer program uses language in some form. With a little imagination, QPARSER+ can be used to develop products ranging from Artificial Intelligence systems to word processor translators capable of converting documents from one word processor `language' into another. Qparser+ is a professional, world-class product with an installed base of over 400 users. It has been preferred to Lex and Yacc by many of its users for its ease of use, its advanced syntax tree features, its generation of full source code, and the ease with which the source code can be modified for the sake of performance or unusual conditions that often arise in language implementation. It was designed and written by Dr. William Barrett, an author of a college text on compiler construction, and a languages/compiler specialist and consultant for more than 15 years. QPARSER+ User Manual page 3 Chapter 1. QPARSER+ in Brief The QPARSER+ user must supply the structural rules of the input language as well as the actions to be taken when language structures are recognized. QPARSER+ will then take these two inputs and produce a complete translator capable of interpreting languages as complex as Pascal or as simple as the set of control codes in a word processor `language'. The QPARSER+ translator system is designed to operate in two phases -- a syntactic phase and a semantics phase. We will sometimes refer to these phases as the front end and back end of a translator. The syntactic phase is almost completely automatic, given a suitable grammar for the language. In this phase, QPARSER+ will take the user-defined grammar and automatically generate a parser capable of understanding any input language form. No programming is required in this phase, although Qparser requires a compiler to generate executable code from the generated source programs. The semantics phase requires some programming skills. In this phase, guided by the grammar's language rules, actions are added to the translator through additions to the C or Pascal source code. Qparser provides a number of useful functions to manage the symbol table, and access the parsing stack, but the central task of providing the actions (semantics) is up to you. This is the back-end, or operational side, of the translator and it must be specially written for each application. There are a number of worked-out examples and tutorials to help you through this task. QPARSER+ provides the tools to develop a complete translator using either C or Pascal as the host language. Features Many of the QPARSER+ features listed here can only be appreciated by those who have had some experience with translator writing, but we feel that everyone will come to appreciate them after having used the product for a while. The principal features of QPARSER+ are: * Correctness. The language accepted by the translator will be exactly that specified by the grammar. * Detection of Ambiguity. If your grammar is ambiguous, or if the parser is unable to assure a correct translation, a diagnostic message will be provided. * Automatic generation of two key components. The lexical scanner and the LR parser are automatically generated. * Automatic generation of Syntax Trees. The QTREES package included in QPARSER+ will automatically build the syntax trees QPARSER+ User Manual page 4 which must be painfully constructed branch by branch with a system like YACC. The use of a syntax tree, generated automatically by Qparser functions, makes code generation a snap. * Flexibility. A translator can be generated in the source language of your choice. The files which define the source language syntax are provided only for Pascal and C, but they can be used as models for other languages. A program like YACC can only produce translators in C. * Efficient table-driven parser operations. For a large grammar, most compiler experts agree that a table-driven LR(1) approach is superior in both performance and space requirements to any other. Several optimizations are made to minimize table size. * Modularity of semantic operations. A translator programming task will become neatly partitioned into a number of relatively water-tight compartments. * Efficient lexical scanner. The lexical scanner operates by a CASE or SWITCH statement transfer on the next character to be scanned from the source. No backtracking, table searching or waste motion is involved. * Efficient and powerful symbol table mechanism. Symbol searching is built into the lexical scanner, which is automatically generated. * Error recovery. A reasonable error recovery system is built into each translator. No extra parser table space is required. * Ease of extending or changing the language. Once a translator is written following the guidelines in this manual, it'll be easy to extend or modify. QPARSER+ was written to be very flexible and you'll find that once you've mastered the basics, you can do translations of any type. * Ease of extending or changing the lexical scanner. The scanner is provided in well-documented source form. Its underlying assumptions are described in the user manual. Required Hardware and Software QPARSER+ runs on a PC AT or compatible under version 5.0 (or later) of MSDOS. We recommend 640 Kbytes RAM for a large grammar, but you can manage with 250 Kbytes. A floating-point coprocessor is not required. There are no special screen or console requirements -- your PC will be used in a 'dumb terminal' mode for all of the Qparser utilities, debugging, etc. So you can run under DOS or Windows -- this guide will refer to a DOS operation. Qparser does not support extended memory, but that shouldn't be a handicap unless you have an exceptionally large grammar. It's optimized to make efficient use of your RAM. A memory crunch -- if any -- will appear in LR1, which requires an unpredictable amount of RAM, depending on the structure of your grammar. QPARSER+ User Manual page 5 A hard disk with at least 2 Mbytes of available capacity is recommended. You can try out the programs on a floppy-based PC, but you'll quickly tire of juggling program and source files among your floppy drives. You'll need an editor. It must be a technical editor, not a word processing editor, unless your word processor can export technical text files. The required file must consist of printable ASCII characters (0x20 - 0x7E) separated by CR-LF pairs. The DOS EDIT utility is acceptable, as is the builtin Turbo C editor. We recommend the Microsoft C or Turbo C compiler. Almost any C compiler will do, but we've tested the product most intensively on Turbo C and Microsoft C, and "make" files are provided for these compilers. DOS Installation You should have been able to install the product if you can read this. It's simple -- 1) create a directory, like C:\QPARSER, 2) choose an executables directory, say C:\BIN, 3) copy all the *.EXE files to C:\BIN, 4) "xcopy" all the remaining files and their subdirectories to C:\QPARSER, and 5) modify the C:\AUTOEXEC.BAT file as shown below. When the installation is complete, you should find the following directories and files in place. We've assumed that the executables are installed in C:\BIN, and the Qparser work files in C:\QPARSER. C:\BIN LR1.EXE C:\BIN OPT.EXE C:\BIN LR1P.EXE C:\BIN GRAMCHK.EXE C:\QPARSER README.DOC C:\QPARSER ORDRFORM.TXT C:\QPARSER QPARSER.DOC C:\QPARSER QPHELP.HLP C:\QPARSER\CSKELS MAIN.C C:\QPARSER\CSKELS CALC.GRM C:\QPARSER\CSKELS CALCC.GRM C:\QPARSER\CSKELS CALCT.GRM C:\QPARSER\CSKELS M C:\QPARSER\CSKELS MAKEFILE.MAK C:\QPARSER\CSKELS SKELEVAL.C C:\QPARSER\CSKELS SKELTAB.C C:\QPARSER\CSKELS SKELDECL.H C:\QPARSER\CSKELS SKELLEX.C C:\QPARSER\CSKELS LEX.C C:\QPARSER\CSKELS DECL.H C:\QPARSER\CSKELS EVAL.C C:\QPARSER\CSKELS TABLE.C C:\QPARSER\CSKELS SYMS.C C:\QPARSER\CSKELS PARS.C QPARSER+ User Manual page 6 C:\QPARSER\CSKELS DEBUG.C C:\QPARSER\LIB M C:\QPARSER\LIB MAKEFILE.MAK C:\QPARSER\LIB SETS.H C:\QPARSER\LIB ALLOC.H C:\QPARSER\LIB HELP.H C:\QPARSER\LIB PROTOS.H C:\QPARSER\LIB RESPF.H C:\QPARSER\LIB HELP.C C:\QPARSER\LIB RESPF.C C:\QPARSER\LIB ALLOC.C C:\QPARSER\LIB SETS.C Your AUTOEXEC.BAT file should contain the following entries: PATH=.... C:\BIN; ... SET QPHELP=C:\QPARSER\QPHELP.HLP You can install the executables under some other directory, if you adjust the PATH variable in the AUTOEXEC.BAT file to suit. Regarding Pascal If you're a Pascal fan, you'll want to register with QCAD to obtain the full Qparser+ system. It contains skeletons and several worked-out examples compatable with Turbo Pascal. (This version contains only C versions.) It probably won't be compatible with the Pascal provided in other computers and operating systems, but you can modify the source files provided to make it compatible with your favorite Pascal. QPARSER+ User Manual page 7 Chapter 2. Parser Generator Overview In order to generate a parser, you need a grammar file (XXX.GRM) and one or more skeleton files (SKELzzzz.C). The grammar defines the rules for parsing your language, and the fragments of C code that are executed when a rule is satisfied. It has to have a particular form that will described in the next chapter. Program LR1.EXE is designed to operate on a grammar file and produce several table files that are used by the remaining utilities. An important table file is XXX.TBL. This contains all the parsing tables that LR1 has worked out from the grammar. LR1 and the other utilities OPT, LR1P and GRAMCHK, will accept a grammar file name with or without the suffix .GRM. LR1 will create several table files based on the grammer prefix XXX. Program OPT.EXE will compress the tables in an XXX.TBL file. The tables will usually take less than half the space when the parser is finished after OPT has done its job. We recommend running it, although it strictly isn't necessary. A skeleton file looks like a C file, and mostly contains C source code. However, there are sections that cause C code to be generated, based on table file information. Program LR1P.EXE is designed to work on a skeleton file, given a set of grammar files. It generates a valid C file, one that now is specific to the grammar's rules. LR1P will usually be used several times, once for each skeleton file. For example, LR1P will be applied to SKELLEX.C to yield LEX.C, in order to specialize the code for the particular token forms found in the grammar. You can write your own skeleton file, or revise the ones provided, in order to set up a new parsing regime. By trying out LR1P on various skeleton segments with a known grammar, you can figure out how it transforms a skeleton file into a compilable C file. (The registered version includes a manual that explains the system in detail). Once you've run LR1P on the appropriate files, they are ready to be compiled and linked. The result should be an operational parser, given only a grammar with no code fragments. If you've attached C code fragments to the rules, the resulting EXE file will not only parse input statement forms, but will also execute the code fragments at the right places during the parse. Operations Summary Here's a summary of the Qparser operations: Grammar File --> LR1 --> Grammar tables --> OPT --> Grammar tables (Skeleton File + Grammar tables) --> LR1P --> Compilable C file QPARSER+ User Manual page 8 Compilable C files --> C COMPILER --> LINKER --> EXECUTABLES MAKE Files In order to make these operations easy, we've included two working examples (CALC.GRM, CALCC.GRM) and make files that carry through all the operations. Look in directory CSKELS. File "m" is a make file designed for the Microsoft C compiler. It runs under the Microsoft "make" utility. Please examine it -- it assumes that LR1, OPT, LR1P and QPHELP.HLP are in the C:\QPARSER directory, and the Microsoft compiler executables are in the C:\MCC directory. is named in the PATH= statement in your autoexec.bat file. Also the Microsoft compiler is assumed to be called by "cl". You call the Microsoft make utility like this: make m File "makefile.mak" is a make file designed for Turbo C; it runs under the Turbo "make" utility. This one contains a specific path directive for the Turbo C compiler directory -- "C:\TC". You will have to change this if it isn't in that directory. It also contains a specific reference to the library files which you may have to change. You can call the Turbo make utility like this: make -fmakefile.mak The default grammar in both of these is CALC.GRM. You can change that with the option GRM=YourGrammar in the make call, e.g. make GRM=YourGrammar m ... using Microsoft make make GRM=YourGrammar -fmakefile.mak ... using Turbo make What's happening is that any appearance of $(GRM) in the makefile is replaced by "YourGrammar". Also, any definition of GRM in the make file is overridden by the command-line definition. Look at the makefile and you'll see where GRM is defined by default, and how $(GRM) is used. +***********************+ NOTE: in older versions of Turbo's make, the -D option doesn't work. You will have to change the GRM=CALC line in makefile.mak +***********************+ The makefiles will yield the executable file MAIN.EXE. We'll assume that you won't change the name in what follows. Memory Model We recommend memory model "large" for both compilers. Check the QPARSER+ User Manual page 9 size of the executable after linking -- if it exceeds 64K you can't use the "compact" or "small" model. Turbo C will compile and link the executable without complaint for a compact model, but you'll run into mysterious execution problems. Microsoft C seems to work with their compact model, although the executable file may exceed 64K. When you change the model, be sure to recompile everything in the LIB and CSKELS directory, and relink everything. Having a mixture of compact and large model object files is a recipe for disaster. Qparser Executable Options You can read about the options for LR1, OPT, LR1P and GRAMCHK by calling them with no parameters. For example, LR1 reports this: QPARSER vs. 4.8.3 Copyright (c) QCAD Systems, Inc. 1990. All rights reserved. Usage: lr1 [options] GrammarFile [RptFile] options: -t dump tokens -p dump productions -n dump nullable nonterminals -i dump itemsets -mL language mode -- L==C, P, N -MN dump state machine N= 1, 2: dump level -r dump reads/includes/lookbacks -T tablefile table file name -s suppress single-production bypassing -? help: more description of the options -C generate table file compatible with 3.0 (default tablefile name is grammar-filename.tbl) Most of these give additional reports of one kind or another without affecting the resulting grammar table files. (-MN, -s and -C will affect the table file form). You can ignore most of them. If you want to know more about any one option, run LR1 with the option -?. (The help file QPHELP.HLP must be installed properly, or you'll get a complaint about it). You can also read QPHELP.HLP in your text editor; it's just an ordinary text file. You should use the option -mC with LR1. This states that you intend to use C as your host language -- this is the compiler you will use to compile your translator, not necessarily the language you are creating with your grammar. Executable OPT requires no options, just the grammar file name, without a suffix: OPT gram QPARSER+ User Manual page 10 You should use the option -mC with LR1P. LR1P is called with two or three files, like this: lr1p -mC gram skeleton [ target ] We recommend leaving the suffix OFF the grammar file, i.e. use CALC rather than CALC.GRM. The skeleton file name should be complete, as well as the target file name. Look at the make file for more details. If you don't specify a target file name, lr1p will send the expanded source file to stdout; you can redirect this in DOS with the ">" operator, i.e. lr1p -mC gram skeleton > target Executable GRAMCHK is used for grammar reporting and checking; you don't need to call it to form your compilable target files. It has many parameters, which are explained through the QPHELP.HLP file. Again, use option -mC to inform it that the target language is C. QPARSER+ User Manual page 11 Chapter 3. Grammars A grammar is a set of production rules. The rules can be in any order, except that the so-called "goal production" must be the first in the set. Here's what a rule looks like: Stmt -> Expr <eol> #PRTVAL This says that there is some class of input strings, or "sentences" that we'll call "Stmt", that consists of the class "Expr" followed by an end-of-line token, "<eol>". The symbol "#PRTVAL" is a "tag" that will used as a handle, or reference, to this particular rule. Symbol PRTVAL will appear later as a C define in the generated code, and will stand for some small index number. The symbols "Stmt" and "Expr" are called "nonterminals". The symbol "<eol>" is called a "terminal". A terminal stands for itself, while a nonterminal can stand for any one of a set of strings. The arrow "->" is just a place-holder in the production rule. You can have several production rules with "Stmt" on the left side. For example, in the grammar CALC.GRM, there are several like that: Stmt -> Expr <eol> #PRTVAL Stmt -> <identifier> := Expr <eol> #ASSIGN1 Stmt -> <eol> \ allow an empty line Whenever several such rules exist, it means that "Stmt" can expand into any one of the given right parts. You have a choice between expanding Stmt into "Expr <eol>", or "<identifier> := Expr <eol>", or just "<eol>". The second rule represents an assignment statement as it might appear in Pascal. The intention is to replace the value associated with the user name "<identifier>" with the current value calculated from "Expr". We'll see that "Expr" can expand into a large number of algebraic expressions, so there's lots of possibilities in that second rule. The symbol ":=" is an example of a "token". A "token" is a short string that appears in the input language. It stands for itself, and is recognized as a whole by a special front-end character scanner called the "lexical analyzer". An <identifier> is a builtin nonterminal that stands for any string starting with a letter and continuing with letters, digits and underscores. (You can change this convention to accept other kinds of identifiers). It's used in programming languages to represent types, constants, variables, functions, etc. You can introduce a name like "X" or "Sensor2", and the parser will scan it and associate with the <identifier> nonterminal. Each such <identifier> is also a "token". QPARSER+ User Manual page 12 Every <identifier> is also plugged into a symbol table, so that when the same name appears more than once, you'll automatically have a handle on the same thing through the symbol table. The third rule, Stmt -> <eol> \ allow an empty line makes use of Qparser's comment convention. The backslash causes it and the rest of the line to be ignored. A production rule has to be written on a single text line. If you insert a line ending in the middle, LR1 will probably complain about a syntax error. Use a backslash as the LAST character in a line in order to cause a line continuation. Or, use long lines -- LR1 will accept a line as long as 255 characters. Recursive Rules An important principle to use in writing a grammar is to use a "recursive rule" when you need to express the idea of repeating something any number of times. Look at the CALC.GRM file again. You'll find a pair of rules like this: Stmts -> Stmts Stmt -> <empty> There are two more ideas here -- the "<empty>" means literally "empty", or nothing, not even a single character. That's OK -- it means that the Stmts nonterminal can expand into nothing at all. You may think that a rule like would be impossible to recognize during parsing. Well, it may cause some problems, but it can be recognized on the basis of what characters are ahead of and behind the "empty". The second idea is that that when there's no left member, just "->", Qparser assumes the same left member is used as in the previous rule. So we really have the rules Stmts -> Stmts Stmt Stmts -> <empty> Now recursion -- the first rule says that a Stmts is another Stmts followed by a "Stmt". It would appear that nothing's gained. What can this possibly mean? Well, assume that we understand what a Stmt stands for -- we've already looked its rule, in fact. Then a Stmts can be "nothing", thanks to rule. Ah ha! ...then the first rule says that a Stmts can be "nothing" followed by a Stmt, or just Stmt. Given that, it's clear that a Stmts can also be a Stmt followed by a Stmt. (The two forms of Stmt don't have to stand for the same thing, of course). And so forth, ad infinitum. QPARSER+ User Manual page 13 This is an example of a "left recursive rule". You'll see this repeated over and over in the grammars. The pattern is this: Something -> Something X where X can be anything at all, except <empty>. There also has to be a non-recursive rule for Something, such as Something -> Y Then it's easy to see that SomeThing will stand for any of these sentences: Y Y X Y X X Y X X X (etc.) Recursion is how we get "any number" of something -- in this case X -- from a small number of rules. GOAL Production The first production rule in a Qparser grammar is assumed to be the "goal production". This is where you start an expansion process. There must also be only ONE such rule with its left member. You are allowed to use the special token "<eof>", which stands for the end-of-file in this rule, as the last token in the rule. You can't use <eof> anywhere else, however. Grammar CALC.GRM has this goal production: Goal -> Stmts QUIT <eol> #QUIT It clearly says that the expanded forms all have to be a "Stmts" followed by the keyword QUIT followed by an end-of-line. An end-of-file is implied at the end. Since "Stmts" expands into nothing or more "Stmt" forms, the CALC grammar is a list of "Stmt" forms followed by QUIT as the last one. Since QUIT doesn't appear anywhere else, it's only allowed at the end of an input file. Grammar Syntax Qparser grammar rules must be expressed through the following rules: * Use a backslash to start a comment, which extends to the end of the current line. * Use one or more spaces to separate the rule tokens. * A rule must occupy a single line. However, a backslash at the QPARSER+ User Manual page 14 end of a line, followed only by a line return character, is interpreted as a continuation to the next line. * Most terminal tokens (like QUIT or :=) don't have to be quoted. They are distinguished from nonterminals by the fact that they don't appear as a left member of any rule. Names are assumed to be case-insensitive in the default lexical analyzer. You need to express them in upper case in the grammar. However, you can override case-insensitivity with suitable changes to file SKELLEX.C. If your language is case-sensitive, make sure the tokens are expressed correctly in the grammar, and remove the "to_upper" function found in function get_symbol, file SKELLEX.C. * If you need a token that starts with a quote mark, a backslash, or a hash mark (#), use a quote (') to open and close the token, for example, '#' However, be careful about quote marks and spaces as tokens -- their use will probably confuse the lexical analyzer. If a quote mark is part of a token, you need to double it, i.e. 'one''token'. * A rule doesn't have to be tagged, i.e. (#GOAL), but we recommend tagging every rule except single productions. A single production has a single nonterminal as its right member. Tag everything else. The tags should be unique, and will reappear as C define names, so they need to look like C identifiers. You can use the same tag on different rules, but you'll be warned, and you need to make sure that having two rules with the same target won't cause problems. Expressions CALC describes a complete set of algebraic expressions with the simple operations + - * / and parenthesizing. You can explore these and their properties by running the CALC grammar on various input forms if you want to. The registered version's manual discusses these and other commonly-found expressions, statements, declarations, etc. It also has a C grammar and an Ada grammar. You can also find grammars in various reference books -- see the reference list at the end of this document. QPARSER+ User Manual page 15 Chapter 4. Building a Parser A grammar describes the structure of a language. You usually aren't interested in using a grammar to generate language strings, as we've described above. You usually have some string, in the form of a file or user input, and you want to break it down into the structure defined by the grammar. The program that does that operation is called a "parser". Qparser will build a parser for you from your grammar. Let's do that by walking you through a parser construction for the CALC.GRM grammar. Start by building the library files found in the QPARSER\LIB directory. These are used over and over and will almost never change, so they've been pulled out. The make file there -- when run -- should create a library file called QP.LIB, containing these oft-used functions. They'll be used in linking your parser from here on. Now build the parser by running the make file found in the CSKELS directory. Make should run LR1 on the grammar file CALC.GRM, then OPT, then LR1P on each of the skeleton files listed in the make file. Finally, it will compile each of the files in CSKELS, including some that LR1P didn't touch. These, along with the QPARSER\LIB library, should link together to form an executable file. It'll be called MAIN.EXE -- the make file renames it. Try executing MAIN as follows: main -d1 This invokes some Qparser debugging tools, so that you can follow what the parser is doing. You'll get a prompt, like this: > Enter a statement, like this, followed by an Enter: X + 16 * (5-6) You can enter spaces freely, or leave them out, between the tokens. The response should be this: Primary -> <identifier> #VARIABLE [idebug] ?(help, N(ame, A(ll, D(bug level, P(rodTrap, T(race, L(ex, S(tack, ENTER? What you've done is cause the rule Stmt -> <empty> to be applied, then go on. The next rule to be applied is Primary-> <identifier>. (The <identifier> stands for the X you entered as the first variable name). QPARSER+ User Manual page 16 Keep typing the Enter key, and you'll get a complete sequence of production rules that corresponds to the sentence you've entered. This sequence comprises the "parsing" of your sentence. It won't end until you enter "quit" followed by an end-of-line, then an end-of-file mark. Under MSDOS, this pair is entered by typing "control-Z Enter" with input from the console. Even after that, you may get one or two more rules before the thing decides everything is done. The last rule applied will be the goal rule, Goal -> Stmts <eol> Bottom-Up Parsing This is a "bottom-up" parser. It collects all the pieces of a rule's right members, like the "Stmts" and the "<eol>" in the Goal rule, then collapses them into the Goal. However, the input sentence is scanned in a left-to-right fashion. QPARSER+ User Manual page 17 Chapter 5. The Semantics Stack The semantics stack is a first-in-last-out (LIFO) stack that carries information about the individual tokens used in the parse. It grows and shrinks as the input sentence is scanned and the rules are applied. Growth occurs when a token is accepted from the input sentence -- each token is pushed into the stack as it appears in a left-to-right fashion in the input list. The stack then shrinks when a production rule is applied. The growth action is called a "read" or "shift" move, from the fact that an input token is shifted into the stack. Each rule application is called an "apply" or "reduce" action. An application causes one or more things on the top of the stack to be peeled off and replaced by a single state representing a nonterminal symbol. The apply action is the one that you'll use to obtain a translation -- the others are just behind-the-scenes window dressing that you can essentially ignore. Qparser is structured to ignore them for you, in fact. A third action, called a "lookahead" is sometimes seen. This action results in a state change, but nothing happens to the stack and no token is shifted from the input list. It's a kind of gear-change required by the parser. Let's watch all this happen next. Run MAIN with the option '-d3' instead of '-d1'. That will cause the semantics stack to appear, and also follow each shift action. When we ran with option '-d1', the shift and lookahead actions were skipped over. Option '-d2' shows the stack with each reduce action, and option '-d3' shows the stack on every action. Option '-d4' will show error recovery action on a syntax error -- that one's really verbose and boring unless you are interested in how Qparser handles syntax errors. So type this expression at the prompt '>': X + 16 * (5 - 6) You'll see this on the screen after entering that expression and touching the ENTER key several times. (We've omitted the two prompt lines for the sake of clarity, and the states are separated by a blank line.) Reduce, state 1 Stmts -> Read, state 18 on id X 1: 1 Stmts NULL Look, state 34, on token + 1: 1 Stmts NULL 2: 18 <identifier> IDENT: X Reduce, state 3 1: 1 Stmts NULL QPARSER+ User Manual page 18 2: 18 <identifier> IDENT: X Primary -> <identifier> #VARIABLE VARIABLE seen Look, state 35, on token + 1: 1 Stmts NULL 2: 18 Primary NULL What does all this mean? Well, the parser has worked through four states, counting the very first reduce (state 1) on Stmts -> <empty>. The second state (18) is a read, which shifts the first token X into the stack. The report tells us that X is an identifier. The third state (34) is a lookahead. Token + is used in this state transition, but it stays in the input list, as we'll see. The token X has been shifted into the stack and occupies the stack top. That leads to the fourth state (3), a reduce. This is the interesting one. Note that the previous lookahead didn't change the stack -- the identifier X is still on top. Under it is a Stmts, which it obtained from the first reduce action. The stack 'top' will always be listed on the bottom. This reduce state (3) is associated with a production about to be applied -- Primary -> <identifier> #VARIABLE You will notice that the production's right member is an identifier, and it happens that an identifier is on the stack top. Coincidence? Not at all. It will always be that way -- ***************************************** The top elements of the stack will always match the applied rule right member. ***************************************** For example, if the grammar rule at the moment of a reduce action is Expr -> Expr + Term #PLUS then the semantics stack will look like this: TOS-3: (stuff below the stack top) TOS-2: Expr TOS-1: + TOS: Term This is an important idea, and we'll keep reminding you of it. Note that we'll show the top of the stack (TOS) as the last thing printed, so the stack appears to be upside down. This is because we are usually only interested in a few elements near the top, not QPARSER+ User Manual page 19 the stuff underneath, which may scroll off the screen. Each stack element is associated with a terminal or nonterminal token. We'll see that nonterminal stack elements can be used to carry a data structure, which can be as simple as an int, or a complex as a pointer to a large data structure. Terminal elements (other than <identifier>, <string>, <real>, and <integer>) won't carry information as a rule, although there's no reason they couldn't. When you type ENTER, you'll get a brief message that announces that this production -- tagged VARIABLE -- has been applied: VARIABLE seen You'll get a similar message every time a tagged production is applied. Untagged productions aren't so announced. This is built in as the default action on every tagged production. It's to remind you to fill in an "apply" action clause as explained later. Getting the Whole Parse You can view the whole parse trail without the debugging prompts by using option -D3 (note the capital D rather than lowercase d), for example: main -s -D3 input > output The "-s" option causes input lines to be printed on the screen along with the debugging information. "input" is a file containing the sentences to parse. It should contain: X + 16*(5 - 6) quit "output" is a file that receives the textual output from the running MAIN program. The -D option causes stack debug information to be sent to the output file, but with no prompts requiring attention. You'll get a whole trail of stack dumps which you can examine with your editor. If you try this without an "input" file, MAIN will expect input from the console, but you won't get a prompt. You need to type in the recommended statements, followed by control-Z ENTER to simulate the file input. The trail (in file "output") is rather long. We'll just look at a few sections of it to bring out a few more points. Find this piece of the trail -- it's been preceded by a couple of dozen states: QPARSER+ User Manual page 20 Reduce, state 14 1: 1 Stmts NULL 2: 18 Expr NULL 3: 23 + NULL 4: 27 Term NULL 5: 32 * NULL 6: 29 ( NULL 7: 19 Expr NULL 8: 25 - NULL 9: 28 Term NULL Expr -> Expr - Term #MINUS MINUS seen This is the first example of a reduce state in which the production's right member has more than one token in it. Notice how the top three states (7, 8, 9) match the production's right part. The very next state is this one: Read, state 25, on token ) 1: 1 Stmts NULL 2: 18 Expr NULL 3: 23 + NULL 4: 27 Term NULL 5: 32 * NULL 6: 29 ( NULL 7: 19 Expr NULL If you compare this stack with the previous one, you'll see that elements 1 through 6 are the same, while elements 7 through 9 have been replaced by a single element corresponding to an Expr. You can think of a reduce action as ********************************************************* replacing the stack top elements matching the right member of its production with a single element equivalent to the production's left member. ********************************************************* We said earlier that the stack grows and shrinks. We've now seen what makes it grow (tokens shifted into the stack through read actions) and what makes it shrink (apply actions). Stack Represents Past History What can be said about the rest of the stack, the portion that is underneath the tokens that match a production right member? Well, the whole stack corresponds to a kind of past history of the parse. It carries condensed representations of what's been scanned in the input string. Look at the stack again for state 14. The deepest member, Stmts, came from an empty reduce action. The next-deepest is an Expr -- this was the result of scanning the identifier X. Then there's a + sign followed by Term -- the Term QPARSER+ User Manual page 21 clearly corresponds to the number 16. Here's a copy of the stack, and this time, we've added the parts of the original expression X + 16*(5 - 6) that go with it: Reduce, state 14 1: 1 Stmts NULL <empty> 2: 18 Expr NULL X 3: 23 + NULL + 4: 27 Term NULL 16 5: 32 * NULL * 6: 29 ( NULL ( 7: 19 Expr NULL 5 8: 25 - NULL - 9: 28 Term NULL 6 Expr -> Expr - Term #MINUS MINUS seen There's a right parenthesis following the 6, but the parser hasn't scanned that yet. If you continue the parse, it'll show up. In fact, here's the next state, and -- sure enough -- the right parenthesis has been shifted into the stack: Reduce, state 12 1: 1 Stmts NULL <empty> 2: 18 Expr NULL X 3: 23 + NULL + 4: 27 Term NULL 16 5: 32 * NULL * 6: 29 ( NULL ( 7: 19 Expr NULL 5 - 6 8: 25 ) NULL ) Primary -> ( Expr ) #PARENS PARENS seen More reduce actions follow in rapid succession. The next one covers the multiplication: Reduce, state 15 1: 1 Stmts NULL <empty> 2: 18 Expr NULL X 3: 23 + NULL + 4: 27 Term NULL 16 5: 32 * NULL * 6: 29 Primary NULL ( 5 - 6 ) Term -> Term * Fact #MPY MPY seen After a lookahead, the addition is reduced: Reduce, state 13 QPARSER+ User Manual page 22 1: 1 Stmts NULL <empty> 2: 18 Expr NULL X 3: 23 + NULL + 4: 27 Term NULL 16 * ( 5 - 6 ) Expr -> Expr + Term #PLUS PLUS seen A few more reduce actions, and the input sentences are completely swallowed up. The parsing action ends on reducing the Goal production, as follows: Reduce, state 10 1: 1 Stmts NULL 2: 18 QUIT NULL 3: 22 <eol> NULL Goal -> Stmts QUIT <eol> #QUIT QUIT seen This reduction obviously leaves just one element on the stack, associated with the Goal token. Two things have to happen in order to terminate a parse with no errors -- the stack should contain just the Goal token, and no more tokens appear in the input sentence. If only one of these is true, then the input sentence is considered to have a syntax error. QPARSER+ User Manual page 23 Chapter 6. Accessing Semantic Elements We now discuss how to do something when a production rule is applied. In particular, there'll be stuff associated with each of the nonterminals on the stack, including "<identifier>", which stands for various names. The rules for handling semantics are fairly simple, as follows: 1. The array "semstack" is an array of pointers to type "semrectype" (found in QPARSER\CSKELS\DECL.H) is created in each stack position. Each element will in general be NULL or a pointer. You should assume that a non-NULL element has been allocated from the heap, and must be released when no longer required. The struct "semrectype" carries all the information associated with the stack element. There are two type indicators, and a union type, corresponding to various kinds of thing that can be on the stack. You will probably want to expand this with your own special types. 2. Variable "stackx" is the index of the top-of-stack element in "semstack". You can access the semrectype struct located at index N below the top of the stack by the expression semstack[stackx-N]->(whatever) N==0 refers to the top of the stack element, N==1 to the one just under it, etc. In relation to a production rule like this: P -> X4 X3 X2 X1 X0 the N==0 member is the RIGHT-MOST element, N==1 is next to it, etc. ALL the right members need to be counted. For example, a <real> in position X2 is accessed as semstack[stackx-2]->usem.rval Note that the "usem" is how the union type is referenced. 3. As a shorthand, the define TOS(N) is equivalent to semstack[stackx-N] Thus the <real> given above can be referenced as TOS(2)->usem.rval 4. Function "apply" is called on each apply action for which a production rule tag has been assigned. (It isn't called otherwise). This function is in file QPARSER\CSKELS\EVAL.C, which has been expanded from the skeleton file SKELEVAL.C. You'll find that this is essentially a switch statement on the production rule tags. When a particular production rule apply action is about to occur, the apply is called, whence the switch will separate out the rule that's being applied. The stack is guaranteed to carry QPARSER+ User Manual page 24 the right member semrectype invocations. It's up to you to use these struct values and construct a semrectype to replace them. 5. Qparser will allocate a "semrectype" for each of the special nonterminal types <identifier>, <integer>, <real>, <string>, and <character>. All other tokens shifted into the stack are associated with NULL. All other nonterminals will have NULL in the stack, unless you have allocated a semrectype for certain of them, as explained later. You need to free these before returning from apply. The safest course of action is to call the function "disp_sem" on each of the right member nodes that might have something in it. It will deal with NULLs and the known semrectype types properly. You need to expand this function if you have other kinds of nodes. 6. For most parsers, you can just use the information given in the stack and generate some code. If the production has something to do with a declaration, it'll have an <identifier> in it -- you'll want to do something with the symbol table attribute associated with the identifier. If that's all you are doing, then make sure that the apply function returns NULL. When "apply" returns, the parsing system will strip the stack back by the number of elements in the rule's right part, then push the return pointer from "apply"; this will be NULL in this case. 7. If you want to save something on the stack from this production rule -- and that gets us into some advanced parsing ideas -- you should allocate a semrectype struct from the heap, and return its pointer from "apply". That pointer will get pushed on the stack. Note that it is effectively associated with some rule left member token, for example "Expr". This pointer will show up later in the parse. You can identify it by the appearance of "Expr" in the right member of a production. Use "new_sem" to allocate a "semrectype". It also initializes various slots in the struct. Say you've done this. You now must make sure that: 7a. You've done the same thing for ALL the rules with "Expr" as their left member. It's OK to push NULL for some of these, provided you test for NULL when you find "Expr" appearing later. But you need to do something sensible for all the productions, since the reappearance of "Expr" may have come from any of them. 7b. Since the "Expr" reappearance may refer to a struct allocated from the heap, you should free that struct before returning from apply. Otherwise, you'll get a memory leak, since the parser doesn't pay attention to what's on the stack when the top part is scraped off. Obviously, it doesn't need to be freed if it's NULL. The <identifier> and the Symbol Table Every appearance of <identifier> in your grammar causes the scanner to expect a user name object, for example, QPARSER+ User Manual page 25 credit_slip itemized_bill sender001 or something like that. Now there will also be tokens in your grammar that look like a user name but are actually "reserved words" or "key words". In C, for example, some of these are while for switch case and so on. A reserved word is distinguished from an <identifier> by the fact that it will explicitly appear in the grammar. For example, in C, there might be the rule Stmt -> while ( Expr ) Stmt in which the reserved word "while" has explicitly appeared. The problem for the scanner is that both appear superficially like a user identifier. There are different ways of distinguishing these, but in Qparser, the solution chosen is this: 1. The lexical analyzer picks up and scans everything that starts with a letter and continues with a letter, digit or underbar. The code that does this specifically is in function "get_symbol" found in skellex.c. You can change this strategy if you want to, in order to accept different kinds of things as user names, but be careful -- the system assumes that keywords look like user names. 2. Each user name is looked up in a symbol table. If it's in the table, it might be there as the same user name seen previously, or it might be marked with an attribute that says "this is a keyword". The keyword attribute is RESERVED. If the name has the RESERVED attribute, the symbol table also carries its token code, and the token scanner returns that, not the <identifier> token code. 3. The keywords are put in the symbol table as part of the parser initialization. This is also in SKELLEX.C -- see function "inittables", which opens the symbol table system for operation and stuffs it with the known keywords. You'll also find a bunch of other things in the symbol table as keywords that don't look like user names -- ":=" for example, in CALC. These are used by a different section of the lexical analyzer to figure out compound tokens. The symbol table functions are defined in the file CSKELS\SYMS.C, and the structs and prototypes in CSKELS\DECL.H. The symbol table consists of an array of pointers to a linked list of "symtabtype" structs. Each symtabtype carries a pointer to the next symtabtype in the list, if any, and NULL if none. Each QPARSER+ User Manual page 26 symtabtype also carries the symbol string as a heap-allocated string "sym". A scope level "level" is used to distinguish reserved tokens, globals, locals, etc. An attribute tag "symt" has a few predefined values, and will usually be expanded by you in order to support a variety of symbol classes, for example, labels, typedefs, constants, etc. A symbol's attribute may also include other values. Several symbol table functions are provided through the SYMS.C module. For example, "findsym" accepts a character string "fsym", and returns a pointer to the symtabtype in the symbol table with that symbol string, or NULL if it can't be found. "makesym" does the same thing, except that it will always insert the symbol at the head of the list if it can't find one by that name already. By using a combination of these functions, you can shape the symbol operations of your language in a variety of ways. QPARSER+ User Manual page 27 Chapter 7. More about the Lexical Analyzer The lexical analyzer has the task of scanning the input string, which may be a user-entered string or a file. It pays attention to line endings, surplus spaces, comments, etc. and attempts to recognize and extract a sequence of tokens that the parser can operate on. The Qparser lexical analyzer (which we'll call QLA) is partially constructed from information obtained from the grammar. When LR1 scans the grammar, it also produces tables of information needed to build the analyzer, for example, what reserved words are found, and what special tokens are found. How all this works is best explained in the registered version's manual. You can of course study the code in SKELLEX.C and a generated file such as LEX.C. In a nutshell, here is a guide to Qparser's lexical system: * QLA operates in a "skipblanks" mode or in a "recognize" mode. It starts in the skipblanks mode, in which it scans over -- and ignores -- spaces, tabs and comments. It goes into the recognize mode when something shows up that is neither of these. See function "get_token" in LEX.C which makes the mode decision and later branches to one of the token functions. * All the QLA functions ultimately appear as explicit C code, not as cryptic tables. You can adjust them for particular circumstances as required. * Identifiers and reserved words are considered to start with a letter, and continue with letters, digits and certain other special characters. Each identifier found in the source is entered in the symbol table, but only if it's not already there. Reserved words are separated from user names by their appearance in the symbol table as such. Reserved words are marked by special initialization code found in function inittables. See function "get_symbol" in LEX.C. * Numbers (integers and floats) are considered to start with a digit, and continue until a C-like floating point number is recognized. Note that a leading sign (+, -) is NOT recognized -- you need to place these explicitly in the grammar. The default CSKELS number scanner accepts simple integers, simple decimal numbers, or decimal numbers followed by an exponent part. The exponent part must be "E" optionally followed by "+" or "-", and followed by an integer. No spaces or line returns may appear within the number. Floating point forms are converted to a double internally. Integers are converted to a long int. See function "get_number" in LEX.C. A more complete C number recognizer is provided in the registered version. * Strings are considered to start with a string quote ("), and continue through the closing quote. A line return may not divide QPARSER+ User Manual page 28 a string. The default string recognizer also accepts \" as an embedded quote, and \\ as an embedded backslash, per the usual C string conventions. No other backslash conventions are provided. See function "get_string" in LEX.C. A more complete C string recognizer is provided in the registered version. * All other tokens, such as "+", ":=", ">>", that appear in the grammar, are recognized through a "greedy" algorithm. Qparser provides a unique way of scanning these that is explained more fully in the registered version. See function "get_special" in LEX.C. This function is modified by LR1P according to information derived from the grammar. Your Very Own Lexical Analyzer You can also replace all of this with your own lexical analyzer. It only needs to do the following: * Scan whatever source you have * Recognize the terminal tokens that appear in your grammar somehow. * Assign a numeric code to each token so recognized that corresponds to the numbers assigned by LR1. You can get a list of the tokens and their code numbers through option -t when you run LR1. Return this int code from "get_token". * If a token represents a class of strings rather than a single string, you need to have your lexical analyzer set up a semrectype for it. This is assigned to the global variable "lsemp", which in turn will appear on the parser stack at the appropriate place. lsemp is allocated from the heap. Follow the examples for <identifier>, <real> and <string> found in LEX.C. You can also keep the lexical analyzer extremely simple and get the grammar parser and your semantics codes to do all the work. However, note that a grammar is not a very good way to describe the idea of a comment that can appear anywhere, so you will need to build this into your lexical analyzer. You'll find similar problems with quoted strings. QPARSER+ User Manual page 29 Chapter 8. Embedded Semantics Qparser supports "embedded semantics", which means that you can write the semantics for a rule directly in the grammar. We recommend doing this, rather than modify file EVAL.C after Qparser has produced it by expanding SKELEVAL.C. An example of this is given in the file QPARSER\CSKELS\CALCC.GRM. Please refer to it in the discussion that follows. An embedded semantic rule looks like this: Stmt -> Expr <eol> #PRTVAL { printf("%lg\n", STACK(1)->usem.rval); FREE(1); } The production rule is of course the first line. The tag #PRTVAL must appear on the same line. You can now write several lines of embedded C code in the following lines, provided that you follow these rules: 1. The left and right braces {...} must be balanced and matched. If they aren't, LR1 will get confused and complain about a syntax error. LR1 understands the C string and comment syntax, so you can have unbalanced braces in those. 2. If you're fussy about indenting rules, the indentation of the left brace in the first line will be fit to the indentation specified in the target file, and the remaining lines will be indented relative to it. Try some examples through LR1P and you'll get the idea. 3. The embedded semantics code fragment will appear in function "apply" found in EVAL.C, after expansion of SKELEVAL.C by LR1P. It will become the particular case branch code associated with the specified tag. The "break" usually used at the end of a case code fragment is provided by the expansion code. CALCC Semantics We're now in a position to review the CALCC semantics. CALCC has the same grammar as CALC, except that it has been embellished with embedded semantic rules. Please refer to the QPARSER\CSKELS\CALCC.GRM file for details. The idea here is that each of the nonterminals Expr, Term, and Primary will be associated with a semrectype struct, which in turn will carry a floating point number in its "rval" slot. (Look at the struct in file DECLS.H and you'll find this). The PRTVAL rule is a simple one to start with: Stmt -> Expr <eol> #PRTVAL QPARSER+ User Manual page 30 { printf("%lg\n", STACK(1)->usem.rval); FREE(1); } Here, we just print the real number in the semrectype struct associated with "Expr". Note that the <eol> is in position 0 on the stack, and "Expr" is in position 1, so STACK(1) expands into semstack[stackx-1]. (The #PRTVAL tag doesn't count in this indexing scheme). The "Expr" element has been allocated from the heap, so it should be freed before returning from the apply action. That's done by the FREE(1) -- the "1" refers to index position 1, or the "Expr" position. Getting a REAL on the Stack Here's how a real semrectype struct gets on the stack: Primary -> <real> #REALVAL { tsemp= TOS(0); } The only place that "<real>" appears in a right member in the grammar is in this production rule. If you follow the rule chain backward, you'll see that a Term can generate a Primary, and an Expr can generate a Term, so it's possible for the <real> element to pop up as an Expr. The action here is simple -- just return the top-of-stack element, TOS(0). What's going on? Well, Qparser will allocate a semrectype when it scans a real number, and will place a pointer to it on the stack. So all we have to do is return it through the apply. The returned pointer will be pushed onto the stack and be associated with the Primary. That in turn will later get associated with Term and ultimately with Expr. Single Productions There's a bit more magic that we haven't mentioned -- the apply function returns the current top-of-stack pointer TOS(0) when the production has no flag. That happens through this code fragment found in "apply": case 0: /* no flag */ if (popno[cstate]==1) tsemp= TOS(0); /* pass this through */ break; This works fine for each untagged single production, for example, A -> B where both A and B are nonterminals. Whatever was associated with QPARSER+ User Manual page 31 B becomes associated with A. However, a tagged single production will have a case branch, and you need to deal with the return value, which will be NULL by default. You also need to make sure that a tag appears on every empty production and every non-single production, otherwise you'll get interesting things on the stack by default. That's really how a Primary turns into a Term, etc. -- through the automatic semantics associated with single productions. Binary Operations Now look at a typical binary operation production: Expr -> Expr + Term #PLUS { CCBINOP(+); } This just calls CCBINOP. This is a macro, as follows: #define CCBINOP(op) { NEW(FLOAT,REALVAR); \ tsemp->usem.rval= TOS(2)->usem.rval op TOS(0)->usem.rval;\ FREE(0); FREE(2); } (This is in the grammar file CALCC.GRM, too, but isolated from the production rules) What CCBINOP does is allocate a new semrectype struct from the heap, with the types FLOAT and REALVAR. This will be assigned to the local pointer variable tsemp. In the second line, the operation "op" expands into "+", and therefore carries out an addition followed by an assignment. The addition is of the two right production member values, "2" and "0", corresponding to "Expr" and "Term", respectively. The values are assigned to the "rval" slot of tsemp. Finally, members 0 and 2 are freed. We could have kept one of them for the result value, but that's dangerous in general. Other Grammar Semantics The CCBINOP define, along with some other useful ones, are part of a section of the grammar that's kept apart from the production rules. The section opens with "#begin" in the leftmost column of a new line. In CALCC.GRM, the line is #begin apply_locals The section continues with whatever C code you want to put in, and ends with #end apply_locals QPARSER+ User Manual page 32 The names "apply_locals" must be matched. You can have any number of such blocks of code in your grammar. They reappear in a skeleton file, such as SKELEVAL.C, wherever you see the phrase {## begin write(apply_locals); end; ##} You'll find this in the skeleval.c file within the function body of "apply". Compare with the eval.c file and you'll see that the code block has been properly inserted from the grammar. In fact, the "write" phrase has been conditioned with if (defined(apply_locals)) then ... ; just in case a block called "apply_locals" doesn't exist. You'll discover some other blocks in skeleval.c that won't be expanded by CALCC.GRM, since they aren't defined. Other code blocks are provided in SKELEVAL.C, to support global functions (eval_functions), a call just before the first production is called (init_sem), and a call just after the last production is called (end_sem). You can add more to any of the skeleton files by following the pattern shown there. In this way, ANY language-dependent feature can be embedded in your grammar, so that you never need to modify the standard Qparser skeleton files again. Running CALCC It's best to delete *.TBL. This will force a rebuild and recompile based on a new grammar. You can compile and link CALCC by changing the makefile to accept CALCC instead of CALC as the grammar, e.g. run make GRM=CALCC (etc.) It should regenerate the grammar files and source files, and link to produce a working calculator as file MAIN.EXE. (NOTE: Turbo's make utility seems to require changing the GRM=CALC line in the makefile instead). Try out the compiled/linked result (without the -d1 option) with some expressions, and you'll see how it evaluates and prints results. QPARSER+ User Manual page 33 Chapter 9. Using the DEBUG Option Qparser provides several tools for checking your grammar and semantics. These are invoked by building your files with the make option -DDEBUG=1, i.e. the DEBUG variable must be defined. You'll find instructions on this in the make file. (Again, Turbo's make utility doesn't accept this; you need to modify the make file directly). You can invoke the stack debugging when executing the parser by entering the special character "!" with the input. When the parser encounters this, it will invoke the debugger on the current stack. You can then set the debugging level to 1 or higher -- try 2 to see the whole stack -- and each Enter will show you the situation just before a particular apply action. The options -dN and -DN cause the stack debugging to start before anything is entered, which is often useful. The lower-case 'd' sets the debug level (N= 0, 1, 2, ...) and pauses on each apply action for your response. The upper-case 'D' sets the debug level, but does not wait for a pause -- you can get a detailed trace of the actions with it. When you execute the CALCC program under debug level 2 or higher, you'll see the values entered appearing on the stack associated with various nonterminal tokens. You should be able to see how the system works using this. The character "!" is caught in the lexical analyzer, file SKELLEX.C, in function get_token. You can remove this or change the character to something else by adjusting the source code and recompiling. All the debug tools are in file DEBUG.C. You'll want to expand certain of its functions if you extend the symtabtype or semrectype structs with your own variant types, so that they will be reported correctly. QPARSER+ User Manual page 34 Chapter 10. Syntax Trees Qparser can automatically build an "abstract syntax tree" that can hold all the information from a parser in a form especially convenient for generating code or handling a declaration. An abstract syntax tree, or AST for short, is rooted in some nonterminal token of the grammar. Each internal nodes also corresponds to a nonterminal token, and the leaf nodes will generally be an <identifier>, <number>, or <string>. (Sometimes a leaf node will be empty). Each node is represented by a "semrectype" struct, which always carries a node designator "semt", a node type "symt", some attributes, and slots for child node pointers. An AST will represent a clause in the input language stream that has been scanned and covered by the root nonterminal token. Let's take an example. Suppose the input clause is x - 15 * z This can be derived from an Expr, using our CALC grammar. Here are some of the key derivations from Expr that show how this clause expands out of Expr: Expr -> Expr - Term -> Expr - Term * Fact -> Expr - Term * <identifier> -> Expr - <integer> * <identifier> -> <identifier> - <integer> * <identifier> -> x - 15 * z We've left out several single-production derivation steps, for example, Fact -> Primary -> <identifier> in order to give you the general idea. In fact, the parser hides steps like these, too. What we'd like to do is describe this as a tree in which the nodes will be production tags. Note that a production tag corresponds to a production rule, which can also be loosely associated with a nonterminal. The tree we want will look like this: QPARSER+ User Manual page 35 MINUS (-) | +-----------------+-----------------+ | | <identifier> (x) MPY (*) | +-----------+-----------+ | | <integer> (15) <identifier> (z) Notice how the "MPY: 15 * z" part comes out of the last few derivation steps in a natural way. It will come out of our bottom-up system first, in fact, and later get attached to the MINUS part. If you trace the productions and stack development by running CALC, you'll see the order in which these fall out. AST Represents Evaluation Order Why is this form useful? Because it concisely represents the order in which the operations need to be carried out. For example, the subtraction can't be done until the subtrahend (x) and the subtractor (MPY) are both known, so that requires that the MPY be done first. This agrees with the precedence rules of algebra, which are in fact built into the structure of the grammar. (You can set up the precedence rules in any manner you like by suitably organizing the production rules.) How Qparser Builds an AST Qparser generates a pair of tables that help it construct an AST like this. The function "make_tree", found in eval.c, is called as the first operation in function apply. Note that it returns a "tsemp" pointer. Each call of make_tree does the following, in general: 1. Allocates a semrectype node from the heap, by calling "new_sem". Let's call the node N. 2. Attaches certain of the stack semrectype nodes as N's children. The child nodes are also semrectype structs, and are linked to N through the array "position0". The size of "position0" is set by Qparser to hold just enough pointers for the longest possible production rule. 3. Returns the "tsemp" pointer which carries the current rule tag, and carries (as its children) the appropriate stuff from the semantics stack that corresponds to the current production rule right members. The "tsemp" pointer is returned by apply in general. If you do nothing else after "make_tree" is called, the apply calls will simply construct a big AST from your input stream. Eventually the goal production is invoked and the tree will represent everything in your input stream. Embellishments QPARSER+ User Manual page 36 Function "make_tree" has several important embellishments, which you need to pay attention to: 1. It doesn't consider any terminal token as a child. For example, stuff like "+", ":=" are ignored, and don't get a placeholder in the tree. 2. A single production with no tag is just skipped over; that is, any tree rooted in B will simply be attached A, when a production like A->B is applied. If the single production IS tagged, a new node with a single child is created. 3. A production with no tag will be assigned the general-purpose tag GENL_KIND. You can sometimes figure out what the node is without knowing the tag. In general, we recommend that you assign a unique tag to EACH production, except for those single productions that can be bypassed in the tree. 4. An empty production, i.e. one like A -> <empty> will have a NULL child, i.e. position0[0] will be NULL for this. 5. The position0 indices go from left to right through the right members of a production as tree children, rather than from right to left as stack elements. Remember that terminals are skipped in the count. So the right-members in the production rule X -> A + B will be accessed like this in the AST: A: root->position0[0] B; root->position0[1] assuming that "root" is the pointer corresponding to X. In order to make it easier to refer to the children, you can use these macros, found in SKELDECL.H: #define POSITION(N) usem.position0[(N)-1] #define CHILD usem.position0[0] #define LEFT usem.position0[0] #define RIGHT usem.position0[1] #define ONE usem.position0[0] #define TWO usem.position0[1] #define THREE usem.position0[2] #define FOUR usem.position0[3] #define FIVE usem.position0[4] Tree Walking We said that an AST makes it easier to generate code than by constructing special semantics code for each production rule. The general idea is to write a tree-walking function that accepts a semrectype pointer as its argument. It then branches through a switch statement on the possible tags that can be associated with the root. You then decide what to do on each node. The biggest advantage of this approach is that the code generation actions can occur through your tree-walk in a different order than QPARSER+ User Manual page 37 that in which they happen to pop up in the bottom-up parse. You can decide whether to inspect the right member ahead of the left member, for example, even though the left member gets built first in the parser. For example, if code generation requires a register assignment, the assignment algorithm can't be expected work in the same order as the production rule parsing. Another advantage is that you can walk over a tree several times if you need to in order to get certain actions to take place. This amounts to using multiple passes, except that you can control the passes to take place in only a limited section of the grammar. A third advantage is that this approach eliminates most of the worry about freeing invalid pointers or having a memory leak -- virtually all the allocation and free operations are done in a safe fashion. Recursive Tree-Walking Functions Tree-walking is done with a recursive function that accepts a tree node (a "semrectype" pointer) as one of its arguments. It may or may not return a value. The art of designing such functions comes with practice. The general idea is that you need to decide just what the function is supposed to do in an abstract way, then make sure it does just that in each case. A common case is for it to just call itself on one or more children. You'll probably want to do this (but not always) on the left member of a recursive rule. For example, if the rule is Expr -> Expr + Term #PLUS then when the node is PLUS, you will probably start with a recursive call on the LEFT child, followed a call on the RIGHT child, followed by whatever stuff you want to do with the PLUS. It's possible for the left and right children to return some mixed types, e.g. double for the left and integer for the right. That's a signal that the Term value requires a cast to double in order to finish the addition. In Pascal, the "+" operation is overloaded as a string operation, so if one of the two things is a string, the other one must be, too, and a string concatenation is called for. CALC by Syntax Tree Evaluation This all sounds very complicated, but in fact, it's easier to do than fussing with specific production rule semantics. Look at the grammar file CALCT.GRM, which is a fully worked-out calculator that behaves just like CALCC.GRM, except that it is implemented through a tree-builder and tree-walker. The tree-walker is function "eval", found in CALCT.GRM. This accepts a tree node as a semrectype pointer, and returns a double QPARSER+ User Manual page 38 value. Since we are implementing a calculator, a double will always be the "value" of any tree node. If we were implementing something else, such as a compiler, the return value might be the type of the tree as inferred from its members. Function "eval" will be called from within itself, and also from "apply" on completion of one the "Stmt" production rules. You can find the "eval" call associated with "Stmt" later in the grammar file. Function eval immediately switches on the "semt" member of the semrectype struct. "semt" will be one of the production rule tags, or one of the predefined tags IDENT, FIXED, FLOAT, STRNG, etc. that are reserved for an <identifier>, <number>, <string>, etc. See decl.h for a complete list. The tag GENL_KIND may show up here if you haven't tagged all your non-single production rules. Finding the Switch Tags to Consider The way to decide what tags need to be among your walker's switch list is to look at the derivations from the "apply" calls. In our case, the apply eval calls are from the nonterminal "Expr". So you need to see what "Expr" can derive -- it can derive a PLUS or a MINUS directly. Through "Term", it can derive MPY and DIVIDE. Then through "Fact" it can derive UMINUS, PARENS, VARIABLE, REALVAL or INTVAL. So all these tags need to appear within the switch statement. Note that this chain is the result of having a series of untagged single productions in the rule set, i.e. Expr -> Term Term -> Primary You can break the chain by tagging one of these, and catching the tag in your tree-walker. Then direct those calls somewhere else. CALC Eval Switch Cases Now let's look at the "eval" switch cases. Case PLUS is easy. We call eval on the left node (root->LEFT), then on the right node (root->RIGHT). These are supposed to each return a double value, and we want the double result of the "+" operation, so we return the sum of these function calls. Case MINUS, MPY and DIVIDE work the same way, except that we've put in a trap to catch a division by zero for the DIVIDE case. (It would be best to also trap on overflow and underflow, which could happen in any of the operations, rather than on just this obvious one). Case UMINUS is also easy. The single child is referred to by root->CHILD. Case PARENS, INTVAL, REALVAL and VARIABLE are also easy -- just return the value of root->CHILD as evaluated by eval. Case IDENT requires that we extract the identifier's value from QPARSER+ User Manual page 39 its symbol table. It happens that we can depend on the identifier having been previously assigned-to through code in the VARIABLE production rule, as we'll see later. Case FIXED picks up the long value in root->usem.numval and casts it to a double. Case FLOAT simply returns the root->usem.rval, which is already a double. Case QUIT is thrown in for good measure. Note that it falls into the "default" case, which should always be included so that you catch any omissions in the tree-walker cases. This refers to the "flags" array, which is guaranteed to deliver a string representation of any of the legal root->semt values. In CALCT, there's no way that QUIT can be passed to eval, nor will the "default" case ever be called, so these could be "assert" calls or just left out completely. The APPLY Actions Now let's look at the apply function actions, which of course are associated with production rules. There isn't much. Note that most of the rules are tagged, but have no semantics. This means that the rule will get built into a syntax tree with nothing further required from you. However, we need to do something on the PRTVAL and ASSIGN1 rules, and in particular to stop the tree building on these rules. In the PRTVAL case, the required action is to evaluate the Expr tree, through the call evalue= eval(tsemp->CHILD); Assuming that goes well (why shouldn't it?), the result is printed. The tree is then disposed by a call to "disp_sem", which incidentally returns NULL so that tsemp returns something sensible from the apply call. In the ASSIGN1 production rule, we again evaluate the Expr tree. The value will then be associated with the <identifier> by the next two statements: tsemp->LEFT->usem.symp->usym.value= evalue; /* the assignment */ tsemp->LEFT->usem.symp->symt= REALVAR; /* defined */ What are these doing? The <identifier> is always associated with a symbol table entry, accessed through the pointer tsemp->LEFT->usem.symp where LEFT refers to the <identifier> position in the rule. The value associated with this identifier is then set to the recently evaluated tree value, evalue. The type of the value "symt" in the QPARSER+ User Manual page 40 symbol table is set to REALVAR -- this states that the identifier has appeared on the left side of an assignment. We'll come back to this later. The remaining two lines are the same as for the PRTVAL case, since we want our calculator to print the result of the assigned-to variable. Note that the AST is unaffected by the two assignments. We have merely changed two entries in the symbol table attribute for the <identifier>. The tree can be disposed as usual, having served its purpose. The VARIABLE production rule has a trap for an unassigned-to identifier. The symptom of such an identifier is that it does not have the REALVAR attribute set in its symbol table entry. Note that this attribute is set when the identifier is assigned-to in the ASSIGN1 production semantics. So we complain about the problem, then set up the identifier with some reasonable values in case someone else notices the problem somewhere else in the code. Running CALCT You can try out the tree-builder calc as follows: 1. Delete the *.TBL, *.OBJ and *.EXE files in the CSKELS subdirectory. This will force "make" to rebuild them. 2. Run make with the option GRM=CALCT. (With the Turbo make file, you need to set GRM=CALCT in the makefile, since it won't take this as an external option.) 3. The result is MAIN.EXE. It should execute and perform in a manner similar to CALCC. You won't see any difference, unless you use the debugging tools to explore the stack and the trees rooted in the stack. Debugging Tree Programs We recommend using your compiler debugger if you have to debug a QTREES program. Set a breakpoint on one of the recursive evaluators, or on one of the switch case statements to see what's happening. If your debugger will allow you to call a function, call "print_tree" with a pointer to some tree root. It'll display what's in the tree. You can also get the "print_tree" display from the Qparser debugging line. You need to trap on a production rule apply operation with a debugging level or 2 or greater. When the trap occurs, select the "S(tack" option, then move up or down in the stack with the + or - key. You'll see an asterisk (*) next to the selected stack item. Then press T(ree to display the tree rooted in that position in the stack. You can move down into the tree by selecting one of the numbers shown at the end of the prompt -- "1" means choose the left-most nonterminal, "2", the one next to it, etc. ("1" corresponds to "position0[0]"). Move back up the tree by pressing "U". With enough "U" presses, you'll be back at the QPARSER+ User Manual page 41 stack level. For more information, consult the help facility, which you can get at any time by pressing "?". Tree Walking Hints The decisions you need to make when designing a tree-walking evaluator are these: 1. On which production should the tree-building be interrupted so that you can apply a tree-walker? We recommend an interruption on each declaration -- which may result in declaring several identifiers. If you care about global optimizations within a function, you may want to build a tree for an entire function, so that you can do an intelligent register assignment, look for common subexpressions, etc. That also makes it easy to expand control statements like "while...do" or "for (){...}" by calling a statement tree-walker. In any case, we recommend building a tree for each assignment statement or expression. Usually (in a compiler) you need to walk through such a tree in order to establish the types, eliminate constant expressions, etc., before generating code segments. 2. What should the tree-walkers do? Again, much depends on your imagination. We've demonstrated a simple numeric evaluator. The extension of that to a code generator is fairly easy -- and in fact is one of the worked-out examples in the registered version of Qparser+. You will probably want a walker that evaluates constant subexpressions. Another one might assign types to all the nodes, based on casts and the implicit type rules of the language. Another one might reorganize parts of the tree around a set of rules that facilitates optimizations, for example, moving all the constants to a canonical location for commutative operations. Another one might perform register assignments. Yet another one will generate code for the control statements -- it will no doubt call an expression evaluator when it runs into expressions. 3. What cases must be tree-walker deal with? We've explained above that you need to trace a derivation through from one or more nonterminals. Start with the nonterminals that appear in the "apply" or other root calls to your evaluator. You need all the tags that can appear in a derivation chain stemming from those. If you choose to stop calling your tree-walker on some tag, that's OK -- it just won't penetrate that tree. If you do, you need to include the tags that come from that production rule's left member. Hints on Identifier Attributes We've pointed out that Qparser assigns every identifier to a symbol table slot upon its first appearance. The attribute tag is the "symt" field of the symtabtype struct, and is USER by default. QPARSER+ User Manual page 42 The identifier assignment is performed with the "make_sym" function -- you can see where it happens by looking at function get_symbol in skellex.c. "make_sym" will accept an identifier already in the symbol table, even though it may be at a deeper scope than you wish. The theory is that the reference for the sake of consistency. may be to the higher-scope identifier, for example, a reference within a function to a global variable. If you discover through your tree-walker that the identifier is being declared, you need to pay attention to this: 1. If the identifier's attribute is USER, it means it has been seen for the first time in this context. You can safely set its attributes to something else appropriate to your declaration, for example, change its symt to REALVAR or something. 2. Or, if the identifier has some other attribute, its scope level will probably also be smaller than the current scope level "clevel". This means it was previously declared in a covering scope. If this is a new declaration, you MUST call "forcesym" with the symbol in order to push a second instance of the identifier into the symbol table, then set up its attributes appropriately. 3. At the end of a scope, you need to decide what has to be saved and what can be discarded. If the names are associated with function parameters, you usually need to save something in order to check the parameter types against subsequent function calls. If the names are truly local in scope, you can safely get rid of all of it. 4. Function "clearsym" will unlink all the symbol table entries at and below a given scope level for you. It will free the symtabtype struct and the "sym" string as well. However, it will NOT free any other space associated with the attribute, unless you explicitly require this. See file "syms.c", and "free_symtab_item" in particular. So you need to embellish this if you've attached other information to the symtabtype struct, and you want to avoid memory leaks. Releasing an AST Releasing the heap space occupied by an AST is easy -- just call "disp_sem" on the root node. Usually this is done within an "apply" case, but you can do this within a tree-walker as well if you like. Of course, you need to make sure that any pointers to the tree node, or any node within it, are neutralized by setting them to NULL. In particular, you should NOT set up some auxiliary data structures with pointers into a tree somewhere, then release the tree space. Learn to live with the AST and stack structures instead. Note that if you embellish semrectype with additional structures, you also need to extend "new_sem", "disp_sem", and "dump_sem". "dump_sem" requires some thought -- you probably won't want to display everything in a complicated node when displaying the stack. It's best to decide where to cut off the display, or how QPARSER+ User Manual page 43 to condense it. QPARSER+ User Manual page 44 Chapter 11. Miscellaneous Topics Skeleton File Semantics The skeleton files are divided into sections of code that are just copied to their target files, and other sections that get expanded by LR1P, according to information derived from the grammar. The sections to be expanded are covered by the special bracket codes {## ... ##} which are recognized by LR1P. Everything outside those brackets are simply copied to the target file by LR1P. Everything inside is interpreted as a Pascal-like source code generator language, with a precisely defined syntax. The registered version of Qparser has a complete explanation of the code generation language syntax, so that you can construct your own generator forms. It's based on a special variant of Pascal, and is quite powerful. You can figure out what most of the forms do by comparing skeleval.c with eval.c, and trying out some cases on your own. The more difficult forms are associated with the parser tables. They will be hard to decipher without some help. For most purposes, you won't need to touch these generator forms. Help Functions We pointed out that when your parser or translator is finally linked and ready for execution, you can invoke the debug functions by either using the option "-d1" or using character "!" in your input. When the debug line appears: [idebug] ?(help, N(ame, A(ll, D(bug level, P(rodTrap, T(race, L(ex, S(tack, ENTER? typing a question mark "?" should produce several screensful of help information about your environment. In this case, the help will be about this particular set of options. The first help report will give you some topics that you can enter by typing ? followed by the option name. Gramchk Program GRAMCHK.EXE hasn't been described yet. Like LR1, it accepts a grammar file name. Unlike LR1, it's just designed to give you lots of information about the grammar that would be hard to figure out otherwise. An accurate description of GRAMCHK's services can be obtained by running GRAMCHK with no parameters. You'll get a list of the options that it understands. To get more information on any one of the options, run it with the option tagged with a question QPARSER+ User Manual page 45 mark, for example, gramchk -t? will generate a screen of information about the -t option. QPARSER+ User Manual page 46 Chapter 12. Debugging a Translator Qparser should always produce a valid parser if your grammar can get through gramchk and lr1 without complaints, and if you have not changed any of the base CSKELS files. However, a parser alone isn't usually what you want. When you start adding semantics operations, particularly operations that call for leaving material in the stack and later using it, you may introduce an error that could cripple the system. Here are some tips on developing reliable Qparser programs: * You should compile with the DEBUG option defined. This will help you trace the stack and parsing operations while setting breakpoints and viewing program variables. * Keep DEBUG.C up to date so that the symtabtype and semrectype are always reported according to your latest revision. Similarly, check new_sem, new_sym, disp_sem and disp_sym for the correct disposition of the structs. * Check the terminal and nonterminal list by using option "-t" in LR1 or GRAMCHK. A common error is omitting the definition of a nonterminal as a production rule, causing it to appear to Qparser as a terminal. GRAMCHK will also warn you of production rules that seem to go nowhere or are not connected to the chain of rules in any way -- this is also a symptom of some grammar error. * Pay attention to the reports from LR1 and GRAMCHK. LR1 may complain about syntax errors -- which you should repair -- or state conflicts. A state conflict can be difficult to repair, and is beyond the scope of this little document to explain and help you to correct. LR1 will nevertheless generate a parser, but it may not behave as you expect. If GRAMCHK complains about a variable name that has been overloaded or some other problem, you should look into it, not just assume that everything will work anyway -- it won't. * If your compiler supports prototypes, and checks parameter types, you should enable that option and change the skeleton and other files so that they make use of this important type-checking feature. Then make sure your C files are type-clean. * If you've changed the internals of the lexical analyzer -- file SKELLEX.C or LEX.C -- you should first test the lexical analyzer separately. That file can be compiled separately by compiling it with the macro TEST defined. It will become a program in its own right, designed to accept a file of characters, then report the tokens that it sees. Only after you've made sure that the lexical analyzer is working correctly should you attempt to move on to generating and testing a parser. * Develop your translator in these stages, making sure that each stage is correct before moving on to the next stage: a) lexical QPARSER+ User Manual page 47 analyzer, if you've changed SKELLEX.C in some way, b) grammar -- it should be complete, free of LR1 and GRAMCHK complaints and correctly represent your intentions, c) symbol attributes and correct scope rules, d) symbol declarations, e) control statements, f) function calls, g) executable statements. * Qparser should correctly call the apply function with the appropriate state number and that in turn will drop into the appropriate switch case number. If you aren't convinced of this, follow the stack traces with some simple input streams and make sure that that mechanism is performing correctly before adding any semantics code to "apply". It's all driven by tables in the code segment, which can still get corrupted by bugs in most systems. * Use the compiler debugger to plant a breakpoint on a particular case in "apply" to trap on one or another apply action. If your debugger can execute a function call from a breakpoint, execute "idebug", which will allow you to interactively examine the stack, symbol table variables, etc. It draws on the global stack information, and requires no parameters. * Make your program report what it's doing with extra printf statements covered by some test option, i.e. TEST1, TEST2, etc. Prepare some input files with situations that you are testing that will set up the symbol table and other environment so that you can track down the problem you are having. Insert "idebug()" calls freely where you suspect trouble; they will help you track them down. * Pay special attention to the use of the heap. In particular, please remember that Qparser allocates each of the following, and it's up to you to dispose of them at the appropriate time: a. One symtabtype type on the appearance of a fresh identifier. An identifier appearing more than once is associated with the same symtabtype. Call "free_symtab_item" to free a symtabtype. Call "clearsym" to remove a set of identifiers from the symbol table when falling out of some inner scope. b. One "char* sym" type within each symtabtype. This is freed with the symtabtype struct by free_symtab_type. c. One semrectype in the semantics stack "semstack" for each appearance of <identifier>, <real>, <integer>, <character> or <string> in the input stream. There is NO automatic disposal of these -- you must dispose of them at the appropriate time, usually after you have made use of their information in an "apply" call, just before its return. Call "disp_sem" to free any semrectype struct. It is also designed to free an entire tree rooted in a semrectype. d. One "char* strx" type on each appearance of a literal string in the input stream. This is associated with a "semrectype" and is normally freed through the "disp_sem" call. * Use functions "new_sem", "new_sym", "disp_sem" and "disp_sym" QPARSER+ User Manual page 48 to allocate and free a semrectype or symtabtype. Keep them up to date with changes you make in those structs. This is much safer than "malloc" and "free", since new_sem and new_sym initialize critical struct members, and the "disp" functions handle NULL and embedded allocated material safely. If your compiler has a library function for checking the heap status, we urge that you enable it and pay attention to its complaints. The nature of this parsing system is such that you will be allocating heap space in one section of the program, but freeing it somewhere else, such that it won't be obvious that every allocated object is freed exactly once. We also recommend that you be fastidious about setting pointers to NULL before allocating space for them, and after freeing them, if they live on thereafter. You can then test a pointer for NULL before freeing it. QPARSER+ User Manual page 49 Chapter 13. Registration and Support We urge that you register your Qparser+ software with QCAD Systems by filling out the order blank file form (file ORDRFORM.TXT) and returning it with the requested payment. Your registration will entitle you to the full version, which contains more worked-out examples and grammars, and skeletons in Pascal as well as C. It comes with a typeset 170-page user manual. You'll be entitled to limited free phone support to answer any questions you have or to help you track down problems with your translator. You'll also learn more about using abstract syntax trees, which are the recommended way to develop a translator product of any reasonable complexity. The Qparser binaries (LR1.EXE, OPT.EXE, LR1P.EXE, GRAMCHK.EXE) can be ported to any computer system with a K&R or ANSI C compiler by purchasing a license for the source code. A make file is provided for Unix system V, and the product has been tested on a wide variety of Unix platforms, as well as VMS and IBM platforms. For more details, please contact us below. If you need support, please write or FAX us at the following: QCAD Systems 1164 Hyde Avenue San Jose, CA 95129 FAX/voice message: 408 727 6884 Even if you don't want to register, let us know with a postcard or a one page FAX that you've acquired this shareware edition, and what you think of it. We'll be working on improvements over time, and are always interested in customer response to the product. Version Number By running one of the executables with no parameters, the version number is reported. Please give us the version number if you want to report a problem or a suggestion -- it's 4.8.3 as of this writing. QPARSER+ User Manual page 50 Chapter 14. For Additional Reading The registered version includes a 170-page printed manual that describes all the Qparser internals, and an introduction to the underlying theory. There are more worked-out examples, and tips on compiler development. Dr. Barrett's compiler construction book "Compiler Construction -- Theory and Practice" was out of print at the time of writing this. A new edition should be available from MacMillan publishers in early 1994. Allen Holub, "Compiler Design in C", Prentice-Hall. We highly recommend this book for the professional language designer. Virtually all the theoretical material is provided in the form of tested C programs, which can be purchased separately in diskette form. It's well written and full of examples and example programs. Aho and Ullman, "The Theory of Parsing, Translation and Compiling", two volumes, Prentice-Hall. This is a classic reference volume on the theory. Those unfamiliar with the structure of proofs and abstract language notation will find it difficult to read. Kernighan and Ritchie, "The C Programming Language", second edition, Prentice-Hall. Contains a C grammar that is LR(1). Bjarne Stroustrup, "The C++ Programming Manual", second edition, Addison-Wesley. Contains a C++ grammar that is LR(1).