home *** CD-ROM | disk | FTP | other *** search
- .SH
- Section 1: Basic Specifications
- .PP
- As we noted above, names refer to either tokens or nonterminal symbols.
- Yacc requires those names which will be
- used as token names to be declared as such.
- In addition, for reasons which will be discussed in Section 3, it is usually desirable
- to include the lexical analyzer as part of the specification file;
- it may be useful to include other programs as well.
- Thus, every specification file consists of three sections:
- the
- .ul
- declarations,
- .ul
- (grammar) rules,
- and
- .ul
- programs.
- The sections are separated by double percent ``%%'' marks.
- (The per-cent ``%'' is generally used in Yacc specifications as an escape character.)
- .PP
- In other words, a full specification file looks like
- .DS
- declarations
- %%
- rules
- %%
- programs
- .DE
- .PP
- The declaration section may be empty.
- Moreover, if the programs section is omitted, the second %% mark may be omitted also;
- thus, the smallest legal Yacc specification is
- .DS
- %%
- rules
- .DE
- .PP
- Blanks, tabs, and newlines are ignored except
- that they may not appear in names or multi-character reserved symbols.
- Comments may appear wherever a name or operator is legal; they are enclosed
- in /* . . . */, as in C and PL/I.
- .PP
- The rules section is made up of one or more grammar rules.
- A grammar rule has the form:
- .DS
- A : BODY ;
- .DE
- A represents a nonterminal name, and BODY represents a sequence of zero or more names and literals.
- Notice that the colon and the semicolon are Yacc punctuation.
- .PP
- Names may be of arbitrary length, and may be made up of letters, dot ``.'', underscore ``\_'', and
- non-initial digits.
- Notice that Yacc considers that upper and lower case letters are distinct.
- The names used in the body of a grammar rule may represent tokens or nonterminal symbols.
- .PP
- A literal consists of a character enclosed in single quotes ``\'''.
- As in C, the backslash ``\e'' is an escape character within literals, and all the C escapes
- are recognized.
- Thus
- .DS
- \'\en\' represents newline
- \'\er\' represents return
- \'\e\'\' represents single quote ``\'''
- \'\e\e\' represents backslash ``\e''
- \'\et\' represents tab
- \'\eb\' represents backspace
- \'\exxx\' represents ``xxx'' in octal
- .DE
- For a number of technical reasons, the nul character (\'\e0\' or 000) should never
- be used in grammar rules.
- .PP
- If there are several grammar rules with the same left hand side, the vertical bar ``|''
- can be used to avoid rewriting the left hand side.
- In addition,
- the semicolon at the end of a rule can be dropped before a vertical bar.
- Thus the grammar rules
- .DS
- A : B C D ;
- A : E F ;
- A : G ;
- .DE
- can be given to Yacc as
- .DS
- A : B C D |
- E F |
- G ;
- .DE
- It is not necessary that all grammar rules with the same left side appear together in the grammar rules section,
- although it makes the input much more readable, and easy to change.
- .PP
- If a nonterminal symbol matches the empty string, this can be indicated in the obvious way:
- .DS
- empty : ;
- .DE
- .PP
- As we mentioned above, names which represent
- tokens must be declared as such.
- The simplest way of doing this is to write
- .DS
- %token name1 name2 . . .
- .DE
- in the declarations section.
- (See Sections 3 and 4 for much more discussion).
- Every name not defined in the declarations section is assumed to represent a nonterminal symbol.
- If, by the end of the rules section, some nonterminal symbol has not appeared on the left
- of any rule, then an error message is produced and Yacc halts.
- .PP
- The left hand side of the
- .I
- first
- .R
- grammar rule in the grammar rules section has special importance; it is taken to be the
- controlling nonterminal symbol for the entire input process;
- in technical language it is called the
- .I
- start symbol.
- .R
- In effect, the parser is designed to recognize the start symbol; thus,
- this symbol generally represents the largest,
- most general structure described by the grammar rules.
- .PP
- The end of the input is signaled by a special token, called the
- .ul
- endmarker.
- If the tokens up to, but not including, the endmarker form a structure
- which matches the start symbol, the parser subroutine returns to its caller
- when the endmarker is seen; we say that it
- .ul
- accepts
- the input.
- If the endmarker is seen in any other context, it is an error.
- .PP
- It is the job of the user supplied lexical analyzer
- to return the endmarker when appropriate; see section 3, below.
- Frequently, the endmarker token represents some reasonably obvious
- I/O status, such as ``end-of-file'' or ``end-of-record''.
-