Fresh Fish 5

home *** CD-ROM | disk | FTP | other *** search

/ Fresh Fish 5 / FreshFish_July-August1994.bin / bbs / gnu / bison-1.22-bin.lha / info / bison.info-4 (.txt) < prev

Wrap

GNU Info File | 1994-07-11 | 51KB | 1,007 lines

This is Info file bison.info, produced by Makeinfo-1.54 from the input file /home/gd2/gnu/bison/bison.texinfo. This file documents the Bison parser generator. Copyright (C) 1988, 1989, 1990, 1991, 1992 Free Software Foundation, Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided also that the sections entitled "GNU General Public License" and "Conditions for Using Bison" are included exactly as in the original, and provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that the sections entitled "GNU General Public License", "Conditions for Using Bison" and this permission notice may be included in translations approved by the Free Software Foundation instead of in the original English. File: bison.info, Node: Mystery Conflicts, Next: Stack Overflow, Prev: Reduce/Reduce, Up: Algorithm Mysterious Reduce/Reduce Conflicts ================================== Sometimes reduce/reduce conflicts can occur that don't look warranted. Here is an example: %token ID %% def: param_spec return_spec ',' ; param_spec: type | name_list ':' type ; return_spec: type | name ':' type ; type: ID ; name: ID ; name_list: name | name ',' name_list ; It would seem that this grammar can be parsed with only a single token of look-ahead: when a `param_spec' is being read, an `ID' is a `name' if a comma or colon follows, or a `type' if another `ID' follows. In other words, this grammar is LR(1). However, Bison, like most parser generators, cannot actually handle all LR(1) grammars. In this grammar, two contexts, that after an `ID' at the beginning of a `param_spec' and likewise at the beginning of a `return_spec', are similar enough that Bison assumes they are the same. They appear similar because the same set of rules would be active--the rule for reducing to a `name' and that for reducing to a `type'. Bison is unable to determine at that stage of processing that the rules would require different look-ahead tokens in the two contexts, so it makes a single parser state for them both. Combining the two contexts causes a conflict later. In parser terminology, this occurrence means that the grammar is not LALR(1). In general, it is better to fix deficiencies than to document them. But this particular deficiency is intrinsically hard to fix; parser generators that can handle LR(1) grammars are hard to write and tend to produce parsers that are very large. In practice, Bison is more useful as it is now. When the problem arises, you can often fix it by identifying the two parser states that are being confused, and adding something to make them look distinct. In the above example, adding one rule to `return_spec' as follows makes the problem go away: %token BOGUS ... %% ... return_spec: type | name ':' type /* This rule is never used. */ | ID BOGUS ; This corrects the problem because it introduces the possibility of an additional active rule in the context after the `ID' at the beginning of `return_spec'. This rule is not active in the corresponding context in a `param_spec', so the two contexts receive distinct parser states. As long as the token `BOGUS' is never generated by `yylex', the added rule cannot alter the way actual input is parsed. In this particular example, there is another way to solve the problem: rewrite the rule for `return_spec' to use `ID' directly instead of via `name'. This also causes the two confusing contexts to have different sets of active rules, because the one for `return_spec' activates the altered rule for `return_spec' rather than the one for `name'. param_spec: type | name_list ':' type ; return_spec: type | ID ':' type ; File: bison.info, Node: Stack Overflow, Prev: Mystery Conflicts, Up: Algorithm Stack Overflow, and How to Avoid It =================================== The Bison parser stack can overflow if too many tokens are shifted and not reduced. When this happens, the parser function `yyparse' returns a nonzero value, pausing only to call `yyerror' to report the overflow. By defining the macro `YYMAXDEPTH', you can control how deep the parser stack can become before a stack overflow occurs. Define the macro with a value that is an integer. This value is the maximum number of tokens that can be shifted (and not reduced) before overflow. It must be a constant expression whose value is known at compile time. The stack space allowed is not necessarily allocated. If you specify a large value for `YYMAXDEPTH', the parser actually allocates a small stack at first, and then makes it bigger by stages as needed. This increasing allocation happens automatically and silently. Therefore, you do not need to make `YYMAXDEPTH' painfully small merely to save space for ordinary inputs that do not need much stack. The default value of `YYMAXDEPTH', if you do not define it, is 10000. You can control how much stack is allocated initially by defining the macro `YYINITDEPTH'. This value too must be a compile-time constant integer. The default is 200. File: bison.info, Node: Error Recovery, Next: Context Dependency, Prev: Algorithm, Up: Top Error Recovery ************** It is not usually acceptable to have a program terminate on a parse error. For example, a compiler should recover sufficiently to parse the rest of the input file and check it for errors; a calculator should accept another expression. In a simple interactive command parser where each input is one line, it may be sufficient to allow `yyparse' to return 1 on error and have the caller ignore the rest of the input line when that happens (and then call `yyparse' again). But this is inadequate for a compiler, because it forgets all the syntactic context leading up to the error. A syntax error deep within a function in the compiler input should not cause the compiler to treat the following line like the beginning of a source file. You can define how to recover from a syntax error by writing rules to recognize the special token `error'. This is a terminal symbol that is always defined (you need not declare it) and reserved for error handling. The Bison parser generates an `error' token whenever a syntax error happens; if you have provided a rule to recognize this token in the current context, the parse can continue. For example: stmnts: /* empty string */ | stmnts '\n' | stmnts exp '\n' | stmnts error '\n' The fourth rule in this example says that an error followed by a newline makes a valid addition to any `stmnts'. What happens if a syntax error occurs in the middle of an `exp'? The error recovery rule, interpreted strictly, applies to the precise sequence of a `stmnts', an `error' and a newline. If an error occurs in the middle of an `exp', there will probably be some additional tokens and subexpressions on the stack after the last `stmnts', and there will be tokens to read before the next newline. So the rule is not applicable in the ordinary way. But Bison can force the situation to fit the rule, by discarding part of the semantic context and part of the input. First it discards states and objects from the stack until it gets back to a state in which the `error' token is acceptable. (This means that the subexpressions already parsed are discarded, b