home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.compilers
- Path: sparky!uunet!cs.utexas.edu!qt.cs.utexas.edu!yale.edu!yale!mintaka.lcs.mit.edu!spdcc!iecc!compilers-sender
- From: jan@klikspaan.si.hhs.nl
- Subject: Recursive descent parsing is better than LL(1)
- Reply-To: jan@klikspaan.si.hhs.nl
- Organization: Compilers Central
- Date: Fri, 22 Jan 1993 16:41:50 GMT
- Approved: compilers@iecc.cambridge.ma.us
- Message-ID: <93-01-160@comp.compilers>
- Keywords: parse, LL(1), theory
- References: <93-01-122@comp.compilers> <93-01-123@comp.compilers>
- Sender: compilers-sender@iecc.cambridge.ma.us
- Lines: 211
-
- Barton Christopher Massey < bart@cs.uoregon.edu > writes:
-
- > I will personally choose to use YACC or something similar for parsing until
- > I can implement an LL(1) parser generator which will take a reasonably
- > ad-hoc grammar and either transform it to an LL(1) grammar or report
- > failure to do so, "most of the time".
-
- > In order to construct such a tool, I would like to prove or disprove the
- > following conjectures:
-
- > LL(1) Transformability: There exists an algorithm
- > which, given any grammar G which is unambiguous and
- > describes an LL(1) language L(G), produces an LL(1)
- > grammar G' for the language L(G).
-
- > LL(1) Decidability: There exists an algorithm which,
- > given any grammar G which is unambiguous and describes
- > a language L(G), decides whether L(G) is an LL(1)
- > language.
-
- > After some thought, it still seems most likely to me that LL1T is true,
- > but I'm skeptical about LL1D . I posted these conjectures to
- > comp.compilers previously, but didn't get too far with them then. In a
- > way, I'm surprised if such apparently important questions haven't been
- > carefully studied, but I haven't found any very useful references yet.
-
- Bill Purvis <W.Purvis@daresbury.ac.uk> writes:
-
- > In Computer Journal, Vol 11, number 1, May 1968, J. Foster describes SID,
- > a syntax improving program which accepts a fairly arbitrary BNF
- > description and applies the neccesary transformations to produce an LL(1)
- > grammar (assuming that the language is LL(1). I think that that fully
- > covers the first conjecture.
-
- > The program is pretty comprehensive and while it might not cover all
- > cases, I suspect that in practical terms it probably covers the second
- > conjecture.
-
- This discussion is a bit blurred by the use of the clauses `given any
- grammar G' and `reasonably ad-hoc grammar'. I agree with Bill Purvis that
- SID does a good job on some fuzzy set of `reasonably good ad-hoc
- grammars'.
-
- The general case `given any unambigous grammar G' cannot be solved by SID
- if it only uses the transformations substitution, elimination of left
- recursion and factorization.
-
- I present a simple contra-example:
-
- Consider the language defined by the next CFG:
-
- S -> A
- S -> B
- A -> `x' A `y'
- A -> `a'
- B -> `x' B `z'
- B -> `b'
-
- SID uses elimination of left recursion, factorisation and substitution to
- search for a better grammar.
-
- The problem here is that A and B have `x' in common in their FIRST-sets.
- So let's substitute A and B in the S-rules giving:
-
- S -> `x' A `y'
- S -> `a'
- S -> `x' B `z'
- S -> `b'
- A -> `x' A `y'
- A -> `a'
- B -> `x' B `z'
- B -> `b'
-
- Two S-rules now start both with `x' so we have to factorize them giving:
-
- S -> `x' S1
- S -> `a'
- S -> `b'
- A -> `x' A `y'
- A -> `a'
- B -> `x' B `z'
- B -> `b'
- S1 -> A `y'
- S1 -> B `z'
-
- Now after solving the problem with the S-rules, we get the same problem
- with the S1-rules.
-
- We can repeat both substitution and factiorisation on S1 giving:
-
- S -> `x' S1
- S -> `a'
- S -> `b'
- A -> `x' A `y'
- A -> `a'
- B -> `x' B `z'
- B -> `b'
- S1 -> `x' S2
- S1 -> `a'
- S1 -> `b'
- S2 -> A `y' `y'
- S2 -> B `z' `z'
-
- Giving rise to the same kind of conflict, but now on S2-rules. Obviously
- the proces of substitution and factorization can be repeated
- indefinitedly.
-
- The loop suggests the conjecture of decidability doesn't hold.
-
- Foster expresses this problem with `SID may also report failure because
- its attempt to transform the grammar causes it to loop.'
-
- But let's return to the initial discussion: recursive descent versus YACC.
-
- Maybe we underestimate the power of recursive descent technique if only we
- could get rid of the idea each nonterminal should be mapped to one
- function recognizing the right sides of its productions.
-
- Let's try a different approach: Each function recognizes right sides of
- one or more nonterminals, returning as a result the kind(s) of
- nonterminals recognized.
-
- Let's return to the example grammar above:
-
- S -> A
- S -> B
- A -> `x' A `y'
- A -> `a'
- B -> `x' B `z'
- B -> `b'
-
- As we cannot discern A from B let's integrate them into one function AorB.
- This should answer the question `What did you see' with : neither (call
- this an error), A, B, or Both.
-
- AorB might be modelled with a statediagram like:
-
- +-----------+
- +------A-->-+ `y' +---y---> result is A
- | +-----------+
- +--------+ +---+--+ =-----------+
- >----+ `x' +---x-->-+ AorB +---B----+ `z' +---z---> result is B
- +--------+ +-+-+--+ +-----------+
- | | +-----------+---x---> result is A
- V +----Both---+ `y' or z' +
- neither +-----------+---y---> result is B
- |
- conflicting inputs ---+---------------------------------> result is neither
-
- As AorB has no Both exit, this branch could be pruned from the diagram.
-
- In C this would look like:
-
- #define neither 0
- #define Both 1
- #define A 2
- #define B 3
-
- char lookahead;
-
- int match(char c)
- {
- if c == lookahead)
- {
- lookahead = getchar();
- return 1;
- }
- else
- return 0;
- }
-
- int AorB()
- {
- int r;
-
- if (match('x'))
- {
- switch (AorB())
- {
- case neither:
- return neither;
- case A:
- if (match('y'))
- return A;
- else
- return neither;
- case B:
- if (match('z'))
- return B;
- else
- return neither;
- case Both: /* omitted */
- }
- }
- else
- return neither;
- }
-
- Note: This approach is top-down and recursive descent, but not predictive
- parsing! I suppose others have figured this out before. I have no label
- to stick upon it. Maybe someone else on the net.
-
- Regards,
-
- Jan Schramp
- --
- Jan Schramp, Haagse Hogeschool - Sector Informatica. Louis Couperusplein 19,
- 2514 HP Den Haag, Netherlands. E-mail: jan@si.hhs.nl
- --
- Send compilers articles to compilers@iecc.cambridge.ma.us or
- {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
-