home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!dtix!darwin.sura.net!mips!sdd.hp.com!uakari.primate.wisc.edu!usenet.coe.montana.edu!ogicse!das-news.harvard.edu!spdcc!iecc!compilers-sender
- From: jrickard@eoe.co.uk (John Rickard)
- Newsgroups: comp.compilers
- Subject: Re: Generating LALR(1) Grammar from an arbitrary CFG.
- Keywords: LALR, parse, theory
- Message-ID: <92-08-144@comp.compilers>
- Date: 23 Aug 92 16:24:47 GMT
- References: <92-08-118@comp.compilers>
- Sender: compilers-sender@iecc.cambridge.ma.us
- Reply-To: jrickard@eoe.co.uk (John Rickard)
- Organization: Compilers Central
- Lines: 58
- Approved: compilers@iecc.cambridge.ma.us
-
- kumar@ra.csc.ti.com (Sundeep Kumar) writes:
- : Is there a utility that accepts an abitrary CFG and either gives an
- : equivalent LALR(1) grammar or decides that the CFG has no equivalent
- : LALR(1) grammar ? Is this problem in general, solvable ?
-
- No, it's not soluble. (Which doesn't exclude the possibility of such a
- utility that works in the cases one's interested in.)
-
- Sketch proof:
-
- I assume the lemma: There is no algorithm to decide whether a CFG
- generates the language consisting of every word of its alphabet. (This
- result can be found in books on languages and automata, and I give a very
- sketchy proof below.)
-
- Given any language L on alphabet A with a CFG, take a context-free
- language M on a disjoint alphabet B such that M has no LALR(1) grammar.
- Consider the language N on the alphabet A U B consisting of all words S
- such that either discarding all letters in B from S yields a word in L or
- discarding all letters in A from S yields a word in M. If L contains all
- words on A, then N contains all words on A U B and so N has a LALR(1)
- grammar; if L does not contain all words on A, then it is not hard to see
- that N does not have a LALR(1) grammar (since if it did then so would M).
- So if it were decidable whether N had a LALR(1) grammar, then it would be
- decidable whether L contained all words on A, which contradicts the lemma.
-
- Very sketchy proof of lemma:
-
- This uses the well-known fact that it is not decidable whether an
- arbitrary Post correspondence problem is soluble. A Post correspondence
- problem is defined by a finite set of ordered pairs of words over a given
- alphabet; a solution is a finite non-empty sequence of pairs of words,
- each from the given set, such that the concatenation of the first word
- from each pair equals the concatenation of the second word from each pair.
- (For example, is there an English "sentence" with a word-for-word French
- translation that consists of the same letters in the same order, with only
- the positioning of the spaces different?)
-
- Now given a Post correspondence problem, we add two characters "," and ":"
- to its alphabet and consider the language consisting of all words on the
- extended alphabet except those that are generated from a solution of the
- Post correspondence problem by concatenating (to use the terms of the
- example above): the English words, separated by commas; a colon; the
- reverse of the string formed by the French words, separated by commas.
- This language has a CFG, and it contains all words on its alphabet if and
- only if the Post correspondence problem is not soluble; this proves the
- lemma. (Why does the language have a CFG? Well, it's the union of these
- three languages, and it's not hard to find a CFG for each of them: (a)
- words not containing exactly one colon; (b) words containing exactly one
- colon that do not form palindromes if the commas are omitted; (c) words
- containing exactly one colon where the substrings between commas do not
- match up around the colon as English words and reversals of French
- translations.)
-
- -- John Rickard
- --
- Send compilers articles to compilers@iecc.cambridge.ma.us or
- {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
-