home *** CD-ROM | disk | FTP | other *** search
- .SH
- Section 4: Ambiguity, Conflicts, and Precedence
- .PP
- A set of grammar rules is
- .ul
- ambiguous
- if there is some input string which can be structured in two or more different ways.
- For example, the grammar rule
- .DS
- expr : expr \'\-\' expr ;
- .DE
- is a natural way of expressing the fact that one way of forming an arithmetic expression
- is to put two other expressions together with a minus sign between them.
- Unfortunately, this grammar rule does not
- completely specify the way that all complex inputs
- should be structured.
- For example, if we have input of the form
- .DS
- expr \- expr \- expr
- .DE
- the rule would permit us to treat this input either as
- .DS
- ( expr \- expr ) \- expr
- .DE
- or as
- .DS
- expr \- ( expr \- expr )
- .DE
- (We speak of the first as
- .ul
- left association
- of operators, and the second as
- .ul
- right association).
- .PP
- Yacc detects such ambiguities when it is attempting to build the parser.
- It is instructive to consider the problem that confronts the parser when it is
- given an input such as
- .DS
- expr \- expr \- expr
- .DE
- When the parser has read the second expr, the input which it has seen:
- .DS
- expr \- expr
- .DE
- matches the right side of the grammar rule above.
- One valid thing for the parser to do is to
- .ul
- reduce
- the input it has seen by applying this rule;
- after applying the rule, it would have reduced the input it had already seen to expr (the left side of the rule).
- It could then read the final part of the input:
- .DS
- \- expr
- .DE
- and again reduce by the rule.
- We see that the effect of this is to take the left associative interpretation.
- .PP
- Alternatively, when the parser has seen
- .DS
- expr \- expr
- .DE
- it could defer the immediate application of the rule, and continue reading
- (the technical term is
- .ul
- shifting)
- the input until it had seen
- .DS
- expr \- expr \- expr
- .DE
- It could then apply the grammar rule to the rightmost three symbols, reducing them to expr and leaving
- .DS
- expr \- expr
- .DE
- Now it can reduce by the rule again; the effect is to
- take the right associative interpretation.
- Thus, having read
- .DS
- expr \- expr
- .DE
- the parser can do two legal things, a shift or a reduction, and has no way of
- deciding between them.
- We refer to this as a
- .ul
- shift/reduce conflict.
- It may also happen that the parser has a choice of two legal reductions;
- this is called a
- .ul
- reduce/reduce conflict.
- .PP
- When there are shift/reduce or reduce/reduce conflicts, Yacc still produces a parser.
- It does this by selecting one of the valid steps wherever it has a choice.
- A rule which describes which choice to make in a given situation is called
- a
- .ul
- disambiguating rule.
- .PP
- Yacc has two disambiguating rules which are invoked by default,
- in the absence of any user directives to the contrary:
- .IP 1.
- In a shift/reduce conflict, the default is to do the shift.
- .IP 2.
- In a reduce/reduce conflict, the default is to reduce by the
- .ul
- earlier
- grammar rule (in the input sequence).
- .PP
- Rule 1 implies that reductions are deferred whenever there is a choice,
- in favor of shifts.
- Rule 2 gives the user rather crude control over the behavior of the parser
- in this situation, but the proper use of reduce/reduce conflicts is still a black art, and is
- properly considered an advanced topic.
- .PP
- Conflicts may arise because of mistakes in input or logic, or because the grammar rules, while consistent,
- require a more complex parser than Yacc can construct.
- In these cases, the application of disambiguating rules is inappropriate,
- and leads to a parser which is in error.
- For this reason, Yacc
- always reports the number of shift/reduce and reduce/reduce conflicts which were resolved by Rule 1 and Rule 2.
- .PP
- In general, whenever it is possible to apply disambiguating rules to produce a correct parser, it is also
- possible to rewrite the grammar rules so that the same inputs are read, but there are no
- conflicts.
- For this reason, most previous systems like Yacc
- have considered conflicts to be fatal errors.
- Our experience has suggested that this rewriting is somewhat unnatural to do,
- and produces slower parsers; thus, Yacc will produce parsers even in the presence of conflicts.
- .PP
- As an example of the power of disambiguating rules, consider a fragment from a programming
- language involving an
- ``if-then-else'' construction:
- .DS
- stat : IF \'(\' cond \')\' stat |
- IF \'(\' cond \')\' stat ELSE stat ;
- .DE
- Here, we consider IF and ELSE to be tokens, cond to be a nonterminal symbol describing
- conditional (logical) expressions, and
- stat to be a nonterminal symbol describing statements.
- In the following, we shall refer to these two rules as the
- .ul
- simple-if
- rule and the
- .ul
- if-else
- rule, respectively.
- .PP
- These two rules form an ambiguous construction, since input of the form
- .DS
- IF ( C1 ) IF ( C2 ) S1 ELSE S2
- .DE
- can be structured according to these rules in two ways:
- .DS
- IF ( C1 ) {
- IF ( C2 ) S1
- }
- ELSE S2
- .DE
- or
- .DS
- IF ( C1 ) {
- IF ( C2 ) S1
- ELSE S2
- }
- .DE
- The second interpretation is the one given in most programming languages
- which have this construct.
- Each ELSE is associated with the last preceding ``un-ELSE'd'' IF.
- In this example, consider the situation where the parser has seen
- .DS
- IF ( C1 ) IF ( C2 ) S1
- .DE
- and is looking at the ELSE.
- It can immediately
- .ul
- reduce
- by the simple-if rule to get
- .DS
- IF ( C1 ) stat
- .DE
- and then read the remaining input,
- .DS
- ELSE S2
- .DE
- and reduce
- .DS
- IF ( C1 ) stat ELSE S2
- .DE
- by the if-else rule.
- This leads to the first of the above groupings of the input.
- .PP
- On the other hand, we may
- .ul
- shift
- the ELSE and read S2, and then reduce the right hand portion of
- .a
- .DS
- IF ( C1 ) IF ( C2 ) S1 ELSE S2
- .DE
- by the if-else rule to get
- .DS
- IF ( C1 ) stat
- .DE
- which can be reduced by the simple-if rule.
- This leads to the second of the above groupings of the input, which
- is usually desired.
- .PP
- Once again the parser can do two valid things \- we have a shift/reduce conflict.
- The application of disambiguating rule 1 tells the parser to shift in this case,
- which leads to the desired grouping.
- .PP
- Notice that this shift/reduce conflict arises only when there is a particular current input symbol,
- ELSE, and particular inputs already seen, such as
- .DS
- IF ( C1 ) IF ( C2 ) S1
- .DE
- In general, there may be many conflicts, and each one
- will be associated with an input symbol and
- a set of previously read inputs.
- The previously read inputs are characterized by the
- .ul
- state
- of the parser, which is assigned a nonnegative integer.
- The number of states in the parser is typically two to five times the number of grammar
- rules.
- .PP
- When Yacc is invoked with the verbose
- (\-v) option (see Appendix B), it produces a file of user output
- which includes a description of the states in the parser.
- For example, the output corresponding to the above
- example might be:
- .DS L
- 23: shift/reduce Conflict (Shift 45, Reduce 18) on ELSE
-
- State 23
-
- stat : IF ( cond ) stat\_
- stat : IF ( cond ) stat\_ELSE stat
-
- ELSE shift 45
- . reduce 18
-
- .DE
- The first line describes the conflict, giving the state and the input symbol.
- The state title follows, and a brief description of the grammar
- rules which are active in this state.
- The underline ``\_'' describes the
- portions of the grammar rules which have been seen.
- Thus in the example, in state 23 we have seen input which corresponds
- to
- .DS
- IF ( cond ) stat
- .DE
- and the two grammar rules shown are active at this time.
- The actions possible are, if the input symbol is ELSE, we may shift into state
- 45.
- In this state, we should find as part of the description a line of the form
- .DS
- stat : IF ( cond ) stat ELSE\_stat
- .DE
- because in this state we will have read and shifted the ELSE.
- Back in state 23, the alternative action, described by ``.'',
- is to be done if the input symbol is not mentioned explicitly in the above actions; thus,
- in this case, if the input symbol is not ELSE, we should reduce by grammar rule 18,
- which is presumably
- .DS
- stat : IF \'(\' cond \')\' stat
- .DE
- Notice that the numbers following ``shift'' commands refer to other states,
- while the numbers following ``reduce'' commands refer to grammar
- rule numbers.
- In most states, there will be only one reduce action possible in the
- state, and this will always be the default command.
- The user who encounters unexpected shift/reduce conflicts will probably want to
- look at the verbose output to decide whether the default actions are appropriate.
- In really tough cases, the user might need to know more about
- the behavior and construction of the parser than can be covered here;
- in this case, a reference such as [1]
- might be consulted; the services of a local guru might also be appropriate.
- .PP
- There is one common situation
- where the rules given above for resolving conflicts are not sufficient;
- this is in the area of arithmetic expressions.
- Most of the commonly used constructions for arithmetic expressions can be naturally
- described by the notion of
- .ul
- precedence
- levels for operators, together with information about left
- or right associativity.
- It turns out that ambiguous grammars with appropriate disambiguating rules
- can be used to create parsers which are faster and easier to
- write than parsers constructed from unambiguous grammars.
- The basic notion is to write grammar rules
- of the form
- .DS
- expr : expr OP expr
- .DE
- and
- .DS
- expr : UNARY expr
- .DE
- for all binary and unary operators desired.
- This creates a very ambiguous grammar, with many parsing conflicts.
- As disambiguating rules, the user specifies the precedence, or binding
- strength, of all the operators, and the associativity
- of the binary operators.
- This information is sufficient to allow Yacc to resolve the parsing conflicts
- in accordance with these rules, and construct a parser which realizes the desired
- precedences and associativities.
- .PP
- The precedences and associativities are attached to tokens in the declarations section.
- This is done by a series of lines beginning with a Yacc keyword: %left, %right,
- or %nonassoc, followed by a list of tokens.
- All of the tokens on the same line are assumed to have the same precedence level
- and associativity; the lines are listed in
- order of increasing precedence or binding strength.
- Thus,
- .DS
- %left \'+\' \'\-\'
- %left \'*\' \'/\'
- .DE
- describes the precedence and associativity of the four arithmetic operators.
- Plus and minus are left associative, and have lower precedence than
- star and slash, which are also left associative.
- The keyword %right is used to describe right associative operators,
- and the keyword %nonassoc is used to describe operators, like
- the operator .LT. in Fortran, which may not associate with themselves; thus,
- .DS
- A .LT. B .LT. C
- .DE
- is illegal in Fortran, and such an operator would be described with the keyword
- %nonassoc in Yacc.
- As an example of the behavior of these declarations, the description
- .DS
- %right \'=\'
- %left \'+\' \'\-\'
- %left \'*\' \'/\'
-
- %%
-
- expr :
- expr \'=\' expr |
- expr \'+\' expr |
- expr \'\-\' expr |
- expr \'*\' expr |
- expr \'/\' expr |
- NAME ;
- .DE
- might be used to structure the input
- .DS
- a = b = c*d \- e \- f*g
- .DE
- as follows:
- .DS
- a = ( b = ( ((c*d)\-e) \- (f*g) ) )
- .DE
- When this mechanism is used,
- unary operators must, in general, be given a precedence.
- An interesting situation arises when we have a unary operator and a binary operator
- which have the same symbolic representation, but different precedences.
- An example is unary and binary \'\-\'; frequently, unary minus is given the same
- strength as multiplication, or even higher, while binary minus has a lower strength than
- multiplication.
- We can indicate this situation by use of
- another keyword, %prec, to change the precedence level associated with a particular grammar rule.
- %prec appears immediately after the body of the grammar rule, before the action or closing semicolon,
- and is followed by a token name or literal; it
- causes the precedence of the grammar rule to become that of the token name or literal.
- Thus, to make unary minus have the same precedence as multiplication, we might write:
- .DS
- %left \'+\' \'\-\'
- %left \'*\' \'/\'
-
- %%
-
- expr :
- expr \'+\' expr |
- expr \'\-\' expr |
- expr \'*\' expr |
- expr \'/\' expr |
- \'\-\' expr %prec \'*\' |
- NAME ;
- .DE
- .PP
- Notice that the precedences which are described
- by %left, %right, and %nonassoc are independent of the declarations of token names
- by %token.
- A symbol can be declared by %token, and, later in the declarations section,
- be given a precedence and associativity by one of the above methods.
- It is true, however, that names which are given a precedence or associativity are
- also declared to be token names, and so in general do not need to be
- declared by %token, although it does not hurt to do so.
- .PP
- As we mentioned above, the precedences and associativities are used by Yacc to
- resolve parsing conflicts; they give rise to disambiguating rules.
- Formally, the rules work as follows:
- .IP 1.
- The precedences and associativities are recorded for those tokens and literals
- which have them.
- .IP 2.
- A precedence and associativity is associated with each grammar rule; it is the precedence
- and associativity of the last token or literal in the body of the rule.
- If the %prec construction is used, it overrides this default.
- Notice that some grammar rules may have no precedence and associativity associated with them.
- .IP 3.
- When there is a reduce/reduce conflict, or there is a shift/reduce conflict
- and either the input symbol or the grammar rule, or both, has no precedence and associativity
- associated with it, then the two disambiguating rules given at the beginning of the section are used,
- and the conflicts are reported.
- .IP 4.
- If there is a shift/reduce conflict, and both the grammar rule and the input character
- have precedence and associativity associated with them, then the conflict is resolved
- in favor of the action (shift or reduce) associated with the higher precedence.
- If the precedences are the same, then the associativity is used; left
- associative implies reduce, right associative implies shift, and nonassociating
- implies error.
- .PP
- There are a number of points worth making
- about this use of disambiguation.
- There is no reporting of conflicts which are resolved by this mechanism,
- and these conflicts are not counted in the number of shift/reduce and reduce/reduce
- conflicts found in the grammar.
- This means that occasionally mistakes in the specification of precedences
- disguise errors in the input grammar; it is a good idea to be sparing
- with precedences, and use them in an essentially ``cookbook'' fashion,
- until some experience has been gained.
- Frequently, not enough operators or precedences have been specified;
- this leads to a number of messages about shift/reduce or reduce/reduce conflicts.
- The cure is usually to specify more precedences, or use the %prec mechanism, or both.
- It is generally good to examine the verbose output file to ensure that
- the conflicts which are being reported can be validly resolved by precedence.
-