NetNews Usenet Archive 1992 #20

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #20 / NN_1992_20.iso / spool / comp / compiler / 1556 < prev next >

Wrap

Internet Message Format | 1992-09-15 | 4.8 KB

Path: sparky!uunet!zaphod.mps.ohio-state.edu!cis.ohio-state.edu!rutgers!faatcrl!iecc!compilers-sender From: sasghm@unx.sas.com (Gary Merrill) Newsgroups: comp.compilers Subject: Re: Backtracking yacc Keywords: yacc, parse, LL(1) Message-ID: <92-09-074@comp.compilers> Date: 14 Sep 92 13:02:00 GMT References: <92-09-062@comp.compilers> <92-09-059@comp.compilers> Sender: compilers-sender@iecc.cambridge.ma.us Reply-To: Gary Merrill <sasghm@unx.sas.com> Organization: Compilers Central Lines: 79 Approved: compilers@iecc.cambridge.ma.us > Such theorems are rarely of practical value; in practice you will find > that rewriting an LR(k) grammar to be LR(1) (or whatever) will involve > contorting the original syntax in ways that are perverse. While such a > contortion might be acceptable for parsing alone, attempting to add simple > semantic actions or, worse, trying to maintain the grammar for an evolving > language can lead to premature greying. The Roskind C++ grammar is a > notorious example. I can second this with some enthusiasm. I in fact have written an LL(1) parser generator that I transformed into a "grammar munger", and ultimately into a tool that does the following: 1. Sucks in a grammar (in more or less yacc-like format). 2. Applies a load of transformations in an attempt to get the grammar as close to LL(1) as possible. 3. Generates an (LL(1)-like) parse table that contains conflicts. 4. Emits C code for a recursive descent parser using that table, where this code contains built-in backtracking code. (Don't get the idea that this is a well-cooked product. I haven't tested it that much, but it does seem to work. And at least it produces a table and parser when run on our yacc grammar for C++ -- about 750 productions.) I had hopes that the resulting RD parser would be maintainable in the absence of the generating tool, but I suppose I should have known better. The resulting grammar was quite counter-intuitive in having odd "artificial" semantic categories. I expect things might be a bit better with an LR(1)-ish grammar as the result, but not enough so it would make much difference. In cases such as these I think the primary theorem is something like, "You can have a grammar that is pretty natural, or you can have a grammar that really works: take your pick." > As for backtracking, there are additional problems: again, what about > semantic processing? If you are merely parsing or constructing a parse > tree, you can get away with throwing away an incorrect guess. But as soon > as you introduce side effects of even the simplest nature, you are left > with the task of figuring out how to undo actions associated with the > wrong guess. Even for a language as simle as Ansi C, you must perform > certain semantic actions just to parse and C++ is even worse. Yes, exactly. This is why my changes to Berkeley yacc include treating the "usual" ( { ... } ) actions as *conditional* and implementing *unconditional* actions ( [ ... ] ) that will be executed even during trial parsing. > A far better approach is to have available a set of tools which allow the > disambiguation at the point where it is first encountered. LADE, for > example, allows context-sensitive tests to be performed. These can be used > to either a) test arbitrary semantic information to decide on a path to > take, or b) parse ahead in parse-only mode to find some disambiguating > symbol, then return and resume normal, action-associated, parsing. While b > approaches a back-tracking capability, it is not, in fact, backtracking > since it returns to the originating point regardless of what symbols were > found in the scan- ahead. Such tests can, of course, be performed in the lexical analyzer and you can use the results of these tests to drive the invocations of your parser. However, it would be *much* nicer to have a parser tool that would take care of all this. I have not looked at LADE, but I have literature pertaining to it on my desk. I think that if we decide to replace our parser at some time in the future, however, we likely will use RD. Note that there are *still* some constructs in some languages (even in C++) that require *some* semantic processing even during trial parsing in order to parse correctly. (Think of defining a type in an argument declaration list and then using that type in a later declaration in the list -- this is an error in C++, but in order to perform a truly correct parse you need to record the type definition so you can diagnose it and not issue a spurious diagnostic on its subsequent use.) --- Gary H. Merrill [Principal Systems Developer, C Compiler Development] SAS Institute Inc. / SAS Campus Dr. / Cary, NC 27513 / (919) 677-8000 sasghm@theseus.unx.sas.com ... !mcnc!sas!sasghm -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.