minnie.tuhs.org

home *** CD-ROM | disk | FTP | other *** search

/ minnie.tuhs.org / unixen.tar / unixen / PDP-11 / Trees / V6 / usr / doc / yacc / ss4 < prev next >

Wrap

Text File | 1975-06-26 | 14.5 KB | 431 lines

.SH Section 4: Ambiguity, Conflicts, and Precedence .PP A set of grammar rules is .ul ambiguous if there is some input string which can be structured in two or more different ways. For example, the grammar rule .DS expr : expr \'\-\' expr ; .DE is a natural way of expressing the fact that one way of forming an arithmetic expression is to put two other expressions together with a minus sign between them. Unfortunately, this grammar rule does not completely specify the way that all complex inputs should be structured. For example, if we have input of the form .DS expr \- expr \- expr .DE the rule would permit us to treat this input either as .DS ( expr \- expr ) \- expr .DE or as .DS expr \- ( expr \- expr ) .DE (We speak of the first as .ul left association of operators, and the second as .ul right association). .PP Yacc detects such ambiguities when it is attempting to build the parser. It is instructive to consider the problem that confronts the parser when it is given an input such as .DS expr \- expr \- expr .DE When the parser has read the second expr, the input which it has seen: .DS expr \- expr .DE matches the right side of the grammar rule above. One valid thing for the parser to do is to .ul reduce the input it has seen by applying this rule; after applying the rule, it would have reduced the input it had already seen to expr (the left side of the rule). It could then read the final part of the input: .DS \- expr .DE and again reduce by the rule. We see that the effect of this is to take the left associative interpretation. .PP Alternatively, when the parser has seen .DS expr \- expr .DE it could defer the immediate application of the rule, and continue reading (the technical term is .ul shifting) the input until it had seen .DS expr \- expr \- expr .DE It could then apply the grammar rule to the rightmost three symbols, reducing them to expr and leaving .DS expr \- expr .DE Now it can reduce by the rule again; the effect is to take the right associative interpretation. Thus, having read .DS expr \- expr .DE the parser can do two legal things, a shift or a reduction, and has no way of deciding between them. We refer to this as a .ul shift/reduce conflict. It may also happen that the parser has a choice of two legal reductions; this is called a .ul reduce/reduce conflict. .PP When there are shift/reduce or reduce/reduce conflicts, Yacc still produces a parser. It does this by selecting one of the valid steps wherever it has a choice. A rule which describes which choice to make in a given situation is called a .ul disambiguating rule. .PP Yacc has two disambiguating rules which are invoked by default, in the absence of any user directives to the contrary: .IP 1. In a shift/reduce conflict, the default is to do the shift. .IP 2. In a reduce/reduce conflict, the default is to reduce by the .ul earlier grammar rule (in the input sequence). .PP Rule 1 implies that reductions are deferred whenever there is a choice, in favor of shifts. Rule 2 gives the user rather crude control over the behavior of the parser in this situation, but the proper use of reduce/reduce conflicts is still a black art, and is properly considered an advanced topic. .PP Conflicts may arise because of mistakes in input or logic, or because the grammar rules, while consistent, require a more complex parser than Yacc can construct. In these cases, the application of disambiguating rules is inappropriate, and leads to a parser which is in error. For this reason, Yacc always reports the number of shift/reduce and reduce/reduce conflicts which were resolved by Rule 1 and Rule 2. .PP In general, whenever it is possible to apply disambiguating rules to produce a correct parser, it is also possible to rewrite the grammar rules so that the same inputs are read, but there are no conflicts. For this reason, most previous systems like Yacc have considered conflicts to be fatal errors. Our experience has suggested that this rewriting is somewhat unnatural to do, and produces slower parsers; thus, Yacc will produce parsers even in the presence of conflicts. .PP As an example of the power of disambiguating rules, consider a fragment from a programming language involving an ``if-then-else'' construction: .DS stat : IF \'(\' cond \')\' stat | IF \'(\' cond \')\' stat ELSE stat ; .DE Here, we consider IF and ELSE to be tokens, cond to be a nonterminal symbol describing conditional (logical) expressions, and stat to be a nonterminal symbol describing statements. In the following, we shall refer to these two rules as the .ul simple-if rule and the .ul if-else rule, respectively. .PP These two rules form an ambiguous construction, since input of the form .DS IF ( C1 ) IF ( C2 ) S1 ELSE S2 .DE can be structured according to these rules in two ways: .DS IF ( C1 ) { IF ( C2 ) S1 } ELSE S2 .DE or .DS IF ( C1 ) { IF ( C2 ) S1 ELSE S2 } .DE The second interpretation is the one given in most programming languages which have this construct. Each ELSE is associated with the last preceding ``un-ELSE'd'' IF. In this example, consider the situation where the parser has seen .DS IF ( C1 ) IF ( C2 ) S1 .DE and is looking at the ELSE. It can immediately .ul reduce by the simple-if rule to get .DS IF ( C1 ) stat .DE and then read the remaining input, .DS ELSE S2 .DE and reduce .DS IF ( C1 ) stat ELSE S2 .DE by the if-else rule. This leads to the first of the above groupings of the input. .PP On the other hand, we may .ul shift the ELSE and read S2, and then reduce the right hand portion of .a .DS IF ( C1 ) IF ( C2 ) S1 ELSE S2 .DE by the if-else rule to get .DS IF ( C1 ) stat .DE which can be reduced by the simple-if rule. This leads to the second of the above groupings of the input, which is usually desired. .PP Once again the parser can do two valid things \- we have a shift/reduce conflict. The application of disambiguating rule 1 tells the parser to shift in this case, which leads to the desired grouping. .PP Notice that this shift/reduce conflict arises only when there is a particular current input symbol, ELSE, and particular inputs already seen, such as .DS IF ( C1 ) IF ( C2 ) S1 .DE In general, there may be many conflicts, and each one will be associated with an input symbol and a set of previously read inputs. The previously read inputs are characterized by the .ul state of the parser, which is assigned a nonnegative integer. The number of states in the parser is typically two to five times the number of grammar rules. .PP When Yacc is invoked with the verbose (\-v) option (see Appendix B), it produces a file of user output which includes a description of the states in the parser. For example, the output corresponding to the above example might be: .DS L 23: shift/reduce Conflict (Shift 45, Reduce 18) on ELSE State 23 stat : IF ( cond ) stat\_ stat : IF ( cond ) stat\_ELSE stat ELSE shift 45 . reduce 18 .DE The first line describes the conflict, giving the state and the input symbol. The state title follows, and a brief description of the grammar rules which are active in this state. The underline ``\_'' describes the portions of the grammar rules which have been seen. Thus in the example, in state 23 we have seen input which corresponds to .DS IF ( cond ) stat .DE and the two grammar rules shown are active at this time. The actions possible are, if the input symbol is ELSE, we may shift into state 45. In this state, we should find as part of the description a line of the form .DS stat : IF ( cond ) stat ELSE\_stat .DE because in this state we will have read and shifted the ELSE. Back in state 23, the alternative action, described by ``.'', is to be done if the input symbol is not mentioned explicitly in the above actions; thus, in this case, if the input symbol is not ELSE, we should reduce by grammar rule 18, which is presumably .DS stat : IF \'(\' cond \')\' stat .DE Notice that the numbers following ``shift'' commands refer to other states, while the numbers following ``reduce'' commands refer to grammar rule numbers. In most states, there will be only one reduce action possible in the state, and this will always be the default command. The user who encounters unexpected shift/reduce conflicts will probably want to look at the verbose output to decide whether the default actions are appropriate. In really tough cases, the user might need to know more about the behavior and construction of the parser than can be covered here; in this case, a reference such as [1] might be consulted; the services of a local guru might also be appropriate. .PP There is one common situation where the rules given above for resolving conflicts are not sufficient; this is in the area of arithmetic expressions. Most of the commonly used constructions for arithmetic expressions can be naturally described by the notion of .ul precedence levels for operators, together with information about left or right associativity. It turns out that ambiguous grammars with appropriate disambiguating rules can be used to create parsers which are faster and easier to write than parsers constructed from unambiguous grammars. The basic notion is to write grammar rules of the form .DS expr : expr OP expr .DE and .DS expr : UNARY expr .DE for all binary and unary operators desired. This creates a very ambiguous grammar, with many parsing conflicts. As disambiguating rules, the user specifies the precedence, or binding strength, of all the operators, and the associativity of the binary operators. This information is sufficient to allow Yacc to resolve the parsing conflicts in accordance with these rules, and construct a parser which realizes the desired precedences and associativities. .PP The precedences and associativities are attached to tokens in the declarations section. This is done by a series of lines beginning with a Yacc keyword: %left, %right, or %nonassoc, followed by a list of tokens. All of the tokens on the same line are assumed to have the same precedence level and associativity; the lines are listed in order of increasing precedence or binding strength. Thus, .DS %left \'+\' \'\-\' %left \'*\' \'/\' .DE describes the precedence and associativity of the four arithmetic operators. Plus and minus are left associative, and have lower precedence than star and slash, which are also left associative. The keyword %right is used to describe right associative operators, and the keyword %nonassoc is used to describe operators, like the operator .LT. in Fortran, which may not associate with themselves; thus, .DS A .LT. B .LT. C .DE is illegal in Fortran, and such an operator would be described with the keyword %nonassoc in Yacc. As an example of the behavior of these declarations, the description .DS %right \'=\' %left \'+\' \'\-\' %left \'*\' \'/\' %% expr : expr \'=\' expr | expr \'+\' expr | expr \'\-\' expr | expr \'*\' expr | expr \'/\' expr | NAME ; .DE might be used to structure the input .DS a = b = c*d \- e \- f*g .DE as follows: .DS a = ( b = ( ((c*d)\-e) \- (f*g) ) ) .DE When this mechanism is used, unary operators must, in general, be given a precedence. An interesting situation arises when we have a unary operator and a binary operator which have the same symbolic representation, but different precedences. An example is unary and binary \'\-\'; frequently, unary minus is given the same strength as multiplication, or even higher, while binary minus has a lower strength than multiplication. We can indicate this situation by use of another keyword, %prec, to change the precedence level associated with a particular grammar rule. %prec appears immediately after the body of the grammar rule, before the action or closing semicolon, and is followed by a token name or literal; it causes the precedence of the grammar rule to become that of the token name or literal. Thus, to make unary minus have the same precedence as multiplication, we might write: .DS %left \'+\' \'\-\' %left \'*\' \'/\' %% expr : expr \'+\' expr | expr \'\-\' expr | expr \'*\' expr | expr \'/\' expr | \'\-\' expr %prec \'*\' | NAME ; .DE .PP Notice that the precedences which are described by %left, %right, and %nonassoc are independent of the declarations of token names by %token. A symbol can be declared by %token, and, later in the declarations section, be given a precedence and associativity by one of the above methods. It is true, however, that names which are given a precedence or associativity are also declared to be token names, and so in general do not need to be declared by %token, although it does not hurt to do so. .PP As we mentioned above, the precedences and associativities are used by Yacc to resolve parsing conflicts; they give rise to disambiguating rules. Formally, the rules work as follows: .IP 1. The precedences and associativities are recorded for those tokens and literals which have them. .IP 2. A precedence and associativity is associated with each grammar rule; it is the precedence and associativity of the last token or literal in the body of the rule. If the %prec construction is used, it overrides this default. Notice that some grammar rules may have no precedence and associativity associated with them. .IP 3. When there is a reduce/reduce conflict, or there is a shift/reduce conflict and either the input symbol or the grammar rule, or both, has no precedence and associativity associated with it, then the two disambiguating rules given at the beginning of the section are used, and the conflicts are reported. .IP 4. If there is a shift/reduce conflict, and both the grammar rule and the input character have precedence and associativity associated with them, then the conflict is resolved in favor of the action (shift or reduce) associated with the higher precedence. If the precedences are the same, then the associativity is used; left associative implies reduce, right associative implies shift, and nonassociating implies error. .PP There are a number of points worth making about this use of disambiguation. There is no reporting of conflicts which are resolved by this mechanism, and these conflicts are not counted in the number of shift/reduce and reduce/reduce conflicts found in the grammar. This means that occasionally mistakes in the specification of precedences disguise errors in the input grammar; it is a good idea to be sparing with precedences, and use them in an essentially ``cookbook'' fashion, until some experience has been gained. Frequently, not enough operators or precedences have been specified; this leads to a number of messages about shift/reduce or reduce/reduce conflicts. The cure is usually to specify more precedences, or use the %prec mechanism, or both. It is generally good to examine the verbose output file to ensure that the conflicts which are being reported can be validly resolved by precedence.