home *** CD-ROM | disk | FTP | other *** search
- .SH
- Section 7. Hints for Preparing Specifications
- .PP
- This section contains miscellaneous hints on preparing efficient, easy to change,
- and clear specifications.
- The individual subsections are, more or less,
- independent; the reader seeing Yacc for the first
- time may well find that
- this entire section could be omitted.
- .SH
- Input Style
- .PP
- It is difficult to
- input rules with substantial actions
- and still have a readable specification file.
- The following style hints owe much to Brian Kernighan,
- and are officially endorsed by the author.
- .IP a.
- Use all capital letters for token names, all lower case letters for
- nonterminal names.
- This rule comes under the heading of ``knowing who to blame when
- things go wrong.``
- .IP b.
- Put grammar rules and actions on separate lines.
- This allows either to be changed without
- an automatic need to change the other.
- .IP c.
- Put all rules with the same left hand side together.
- Put the left hand side in only once, and let all
- following rules begin with a vertical bar.
- .IP d.
- Indent rule bodies by one tab stop, and action bodies by two tab stops.
- .PP
- The example in Appendix A is written following this style, as are
- the examples in the text of this paper (where space permits).
- The user must make up his own mind about these stylistic questions;
- the central problem, however, is to make the rules visible through
- the morass of action code.
- .SH
- Common Actions
- .PP
- When several grammar rules have the same action, the user might well wish to
- provide only one code sequence.
- A simple, general mechanism is, of course, to use subroutine calls.
- It is also possible to put a label on the first statement of an action,
- and let other actions be simply a goto to this
- label.
- Thus, if the user had a routine which built trees,
- he might wish to have only one call to it, as follows:
- .DS
- expr :
- expr \'+\' expr =
- { binary:
- $$ = btree( $1, $2, $3 );
- } |
- expr \'\-\' expr =
- {
- goto binary;
- } |
- expr \'*\' expr =
- {
- goto binary;
- } ;
- .DE
- .SH
- Left Recursion
- .PP
- The algorithm used by the Yacc parser encourages so called ``left recursive''
- grammar rules: rules of the form
- .DS
- name : name rest\_of\_rule ;
- .DE
- These rules frequently arise when
- writing specifications of sequences and lists:
- .DS
- list :
- item |
- list \',\' item ;
- .DE
- and
- .DS
- sequence :
- item |
- sequence item ;
- .DE
- Notice that, in each of these cases, the first rule
- will be reduced for the first item only, and the second rule
- will be reduced for the second and all succeeding items.
- .PP
- If the user were to write these rules right recursively, such as
- .DS
- sequence :
- item |
- item sequence ;
- .DE
- the parser would be a bit bigger, and the items would be seen, and reduced,
- from right to left.
- More seriously, an internal stack in the parser
- would be in danger of overflowing if a very long sequence were read.
- Thus, the user should use left recursion wherever reasonable.
- .PP
- The user should also consider whether a sequence with zero
- elements has any meaning, and if so, consider writing
- the sequence specification with an empty rule:
- .DS
- sequence :
- | /* empty */
- sequence item ;
- .DE
- Once again, the first rule would always be reduced exactly once, before the
- first item was read,
- and then the second rule would be reduced once for each item read.
- Experience suggests that permitting empty sequences
- leads to increased generality, which frequently is not evident at the
- time the rule is first written.
- There are cases, however, when the Yacc algorithm can fail when
- such a change is made.
- In effect, conflicts might arise when Yacc is asked to decide
- which empty sequence it has seen, when it hasn\'t seen enough to
- know!
- Nevertheless,
- this principle is still worth following wherever possible.
- .SH
- Lexical Tie-ins
- .PP
- Frequently, there are lexical decisions which depend on the
- presence of various constructions in the specification.
- For example, the lexical analyzer might want to
- delete blanks normally, but not within quoted strings.
- Or names might be entered into a symbol table in declarations,
- but not in expressions.
- .PP
- One way of handling these situations is
- to create a global flag which is
- examined by the lexical analyzer, and set by actions.
- For example, consider a situation where we have a program which
- consists of 0 or more declarations, followed by 0 or more statements.
- We declare a flag called ``dflag'', which is 1 during declarations, and 0 during
- statements.
- We may do this as follows:
- .DS
- %{
- int dflag ;
- %}
- %%
- program :
- decls stats ;
-
- decls :
- = /* empty */
- {
- dflag = 1;
- } |
- decls declaration ;
-
- stats :
- = /* empty */
- {
- dflag = 0;
- } |
- stats statement ;
-
- . . . other rules . . .
- .DE
- The flag dflag is now set to zero when reading statements, and 1 when reading declarations,
- .ul
- except for the first token in the first statement.
- This token must be seen by the parser before it can tell that
- the declaration section has ended and the statements have
- begun.
- Frequently, however, this single token exception does not
- affect the lexical scan required.
- .PP
- Clearly, this kind of ``backdoor'' approach can be elaborated on
- to a noxious degree.
- Nevertheless, it represents a way of doing some things
- that are difficult, if not impossible, to
- do otherwise.
- .SH
- Bundling
- .PP
- Bundling is a technique for collecting together various character strings
- so that they can be output at some later time.
- It is derived from a feature of the same name in the compiler/compiler TMG [6].
- .PP
- Bundling has two components \- a nice user interface,
- and a clever implementation trick.
- They will be discussed in that order.
- .PP
- The user interface consists of two routines, ``bundle'' and ``bprint''.
- .DS
- bundle( a1, a2, . . ., an )
- .DE
- accepts a variable number of arguments which are either character strings or bundles, and
- returns a bundle,
- whose value will be the concatenation of the values of a1, . . ., an.
- .DS
- bprint( b )
- .DE
- accepts a bundle as argument and outputs its value.
- .PP
- For example, suppose that we wish to read arithmetic expressions, and output
- function calls to routines called ``add'', ``sub'',
- ``mul'', ``div'', and ``assign''.
- Thus, we wish to translate
- .DS
- a = b \- c*d
- .DE
- into
- .DS
- assign(a,sub(b,mul(c,d)))
- .DE
- .PP
- A Yacc specification file which does this is given in Appendix D; this includes
- an implementation of the bundle and bprint
- routines.
- A rule and action of the form
- .DS
- expr:
- expr \'+\' expr =
- {
- $$ = bundle( "add(", $1, ",", $3, ")" );
- }
- .DE
- causes the returned value of
- expr to be come a bundle, whose value is the
- character string containing the desired function call.
- Each NAME token has a value which is a pointer to the
- actual name which has been read.
- Finally, when the entire input line has been read
- and the value has been bundled,
- the value is written out
- and the bundles and names
- are cleared, in preparation for the next input line.
- .PP
- Bundles are implemented as arrays of pointers, terminated by a zero pointer.
- Each pointer either points to a bundle or to a character string.
- There is an array, called
- .ul
- bundle space,
- which contains all the bundles.
- .PP
- The implementation trick is to check the values of the pointers in bundles \-
- if the pointer points into bundle space, it is assumed to point to a bundle;
- otherwise it is assumed to point to a character string.
- .PP
- The treatment of functions with a variable number of arguments, like bundle,
- is likely to differ from one implementation of C to another.
- .PP
- In general, one may wish to have a simple storage allocator which
- allocates and frees bundles,
- in order to handle situations where it is not appropriate to completely
- clear all of bundle space at one time.
- .SH
- Reserved Words
- .PP
- Some programming languages
- permit the user to
- use words like ``if'', which are normally reserved,
- as label or variable names, provided that such use does not
- conflict with the legal use of these names in the programming language.
- This is extremely hard to do in the framework of Yacc,
- since it is difficult to pass the required information to the lexical analyzer
- which tells it ``this instance of if is a keyword, and that instance is a variable''.
- The user can make a stab at it, using the
- mechanism described in the last subsection,
- but it is difficult.
- .PP
- A number of ways of making this easier are under advisement, and one
- will probably be supported eventually.
- Until this day comes, I suggest that the keywords be
- .ul
- reserved;
- that is, be forbidden for use as variable names.
- There are powerful stylistic reasons for preferring this, anyway
- (he said weakly . . . ).
- .SH
- Non-integer Values
- .PP
- Frequently, the user wishes to have values which are
- bigger than integers;
- again, this is an area where Yacc does not make the job as easy as it might,
- and some additional support is likely.
- Nevertheless, at the cost of writing a storage manager,
- the user can return pointers or indices to blocks of storage
- big enough to contain the full values desired.
- .SH
- Previous Work
- .PP
- There have been many previous applications of Yacc.
- The user who is contemplating a big application might well
- find that others have developed relevant techniques,
- or even portions of grammars.
- Yacc specifications appear to be easier to change than
- the equivalent computer programs, so that the ``prior art'' is more
- relevant here as well.
-