home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!zaphod.mps.ohio-state.edu!cis.ohio-state.edu!rutgers!faatcrl!iecc!compilers-sender
- From: sasghm@unx.sas.com (Gary Merrill)
- Newsgroups: comp.compilers
- Subject: Re: Backtracking yacc
- Keywords: yacc, parse, LL(1)
- Message-ID: <92-09-074@comp.compilers>
- Date: 14 Sep 92 13:02:00 GMT
- References: <92-09-062@comp.compilers> <92-09-059@comp.compilers>
- Sender: compilers-sender@iecc.cambridge.ma.us
- Reply-To: Gary Merrill <sasghm@unx.sas.com>
- Organization: Compilers Central
- Lines: 79
- Approved: compilers@iecc.cambridge.ma.us
-
-
- > Such theorems are rarely of practical value; in practice you will find
- > that rewriting an LR(k) grammar to be LR(1) (or whatever) will involve
- > contorting the original syntax in ways that are perverse. While such a
- > contortion might be acceptable for parsing alone, attempting to add simple
- > semantic actions or, worse, trying to maintain the grammar for an evolving
- > language can lead to premature greying. The Roskind C++ grammar is a
- > notorious example.
-
- I can second this with some enthusiasm. I in fact have written an LL(1)
- parser generator that I transformed into a "grammar munger", and
- ultimately into a tool that does the following:
-
- 1. Sucks in a grammar (in more or less yacc-like format).
- 2. Applies a load of transformations in an attempt to get the
- grammar as close to LL(1) as possible.
- 3. Generates an (LL(1)-like) parse table that contains conflicts.
- 4. Emits C code for a recursive descent parser using that table,
- where this code contains built-in backtracking code.
-
- (Don't get the idea that this is a well-cooked product. I haven't tested
- it that much, but it does seem to work. And at least it produces a table
- and parser when run on our yacc grammar for C++ -- about 750 productions.)
- I had hopes that the resulting RD parser would be maintainable in the
- absence of the generating tool, but I suppose I should have known better.
- The resulting grammar was quite counter-intuitive in having odd
- "artificial" semantic categories. I expect things might be a bit better
- with an LR(1)-ish grammar as the result, but not enough so it would make
- much difference.
-
- In cases such as these I think the primary theorem is something like, "You
- can have a grammar that is pretty natural, or you can have a grammar that
- really works: take your pick."
-
- > As for backtracking, there are additional problems: again, what about
- > semantic processing? If you are merely parsing or constructing a parse
- > tree, you can get away with throwing away an incorrect guess. But as soon
- > as you introduce side effects of even the simplest nature, you are left
- > with the task of figuring out how to undo actions associated with the
- > wrong guess. Even for a language as simle as Ansi C, you must perform
- > certain semantic actions just to parse and C++ is even worse.
-
- Yes, exactly. This is why my changes to Berkeley yacc include treating
- the "usual" ( { ... } ) actions as *conditional* and implementing
- *unconditional* actions ( [ ... ] ) that will be executed even during
- trial parsing.
-
- > A far better approach is to have available a set of tools which allow the
- > disambiguation at the point where it is first encountered. LADE, for
- > example, allows context-sensitive tests to be performed. These can be used
- > to either a) test arbitrary semantic information to decide on a path to
- > take, or b) parse ahead in parse-only mode to find some disambiguating
- > symbol, then return and resume normal, action-associated, parsing. While b
- > approaches a back-tracking capability, it is not, in fact, backtracking
- > since it returns to the originating point regardless of what symbols were
- > found in the scan- ahead.
-
- Such tests can, of course, be performed in the lexical analyzer and you
- can use the results of these tests to drive the invocations of your
- parser. However, it would be *much* nicer to have a parser tool that
- would take care of all this. I have not looked at LADE, but I have
- literature pertaining to it on my desk. I think that if we decide to
- replace our parser at some time in the future, however, we likely will use
- RD.
-
- Note that there are *still* some constructs in some languages (even in
- C++) that require *some* semantic processing even during trial parsing in
- order to parse correctly. (Think of defining a type in an argument
- declaration list and then using that type in a later declaration in the
- list -- this is an error in C++, but in order to perform a truly correct
- parse you need to record the type definition so you can diagnose it and
- not issue a spurious diagnostic on its subsequent use.)
- ---
- Gary H. Merrill [Principal Systems Developer, C Compiler Development]
- SAS Institute Inc. / SAS Campus Dr. / Cary, NC 27513 / (919) 677-8000
- sasghm@theseus.unx.sas.com ... !mcnc!sas!sasghm
- --
- Send compilers articles to compilers@iecc.cambridge.ma.us or
- {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
-