home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.lang.c++
- Path: sparky!uunet!brunix!brunix!ted
- From: ted@cs.brown.edu (Tony Davis)
- Subject: Re: C++ Grammar
- Message-ID: <1992Sep12.021515.23609@cs.brown.edu>
- Sender: news@cs.brown.edu
- Organization: Brown University Department of Computer Science
- References: <Bu07v4.F66@csfb1.fir.fbc.com> <1992Sep6.232016.6573@cssc-syd.tansu.com.au> <BuD9Lx.JoD@unx.sas.com>
- Date: Sat, 12 Sep 1992 02:15:15 GMT
- Lines: 66
-
- In article <BuD9Lx.JoD@unx.sas.com> sasghm@theseus.unx.sas.com (Gary Merrill) writes:
- >
- > |> kboyle@csfb1.fir.fbc.com (Kevin T. Boyle) writes:
- > |> > I am interested in writing a syntax parser for C++ and
- > |> > I wonder if anyone has a YACC compatible grammar that they
- > |> > could supply me. Your help would be very appreciated.
-
- > The point is that the grammars you will find that *look* like
- > yacc grammars for C++ are not LR(k) for any k. You can write
- > a yaccable grammar that *looks* like a grammar for C++, but yacc
- > won't (can't) generate a correct C++ parser from it. An LALR(1)
- > parser generator will not create for you a parser capable of
- > parsing a grammar that is not LR(k).
-
- This is accurate but somewhat misleading. An LALR(1) parser
- *USED IN ISOLATION* is not capable of parsing a language that
- is not LR(k). However, yacc grammars are rarely used in
- isolation. Even to parse C, the parsers require a symbol table
- because C is not context-free.
-
- > Roskind has apparently constructed a truly yaccable C++ grammar.
- > I believe I know how this was done and considered doing it myself,
- > but rejected the approach for several reasons. If I am correct,
- > you would discover that the grammar differs in some significant
- > respects from what you are thinking of as a C++ grammar. There
- > was a posting a while ago on comp.compilers that confirms this
- > hypothesis (from someone who has seen and used the grammar). I
- > suggest you look at that posting.
-
- I'm not familiar with the posting referred to, but the biggest
- problem I've found with the Roskind grammar is it's complexity.
- There is a good deal of circumlocution in the grammar to get
- yacc to resolve ambiguities properly. Once that has been done
- it is hard to associate the grammar rules with the semantic
- constructs one expects to deal with in a C++ parser. Roskind
- mentions one ambiguity that is not resolved entirely accurately,
- but I believe he is right in concluding that (1) the difference
- between the parser and 'true C++' is insignificant, and (2) his
- grammar probably resolves that ambiguity more accurately than
- most or all production parsers anyway. If one were producing
- a production parser they would have to evaluate this ambiguity
- carefully. However, for other tools it can be ignored because
- code that exercises the ambiguity is likely never to appear in
- practice.
-
- Another common approach to parsing C++ with yacc is to make
- the lexicl analyzer more powerful. With this approach yacc
- can indeed be used to parse non-LR(k) languages. Granted,
- this might be called cheating since the lexical analyzer is
- doing enough parsing so that the language that the yacc portion
- of the system must parse IS now LR(k). Strictly speaking
- yacc does not parse a non-LR(k) language, but the parser as
- a whole does.
-
- A C++ parser that one of my advisors has recently produced takes
- this approach. The lexer does look ahead to produce artificial
- tokens which yacc uses as ques to yield a correct parse (modulo
- any bugs that remain in the system). Soon I will be making that
- parser available for ftp. When it becomes available I will
- announce it here.
-
-
-
- Tony Davis
- ted@cs.brown.edu
-
-