home *** CD-ROM | disk | FTP | other *** search
- ** RDP release version 1.00 flyer. Adrian Johnstone 16 February 1994 **
-
- RDP - a recursive descent parser generator
-
- RDP compiles attributed LL(1) grammars decorated with C-language semantic
- actions into recursive descent compilers. RDP is written in strict ANSI C
- and produces strict ANSI C. RDP was developed using Borland C 3.1 on a PC
- and has also been built with no problems on Alpha/OSF-1, DECstation/Ultrix
- and Sun 4/SunOS hosts. The definition of strict ANSI here means compiling
- with gcc -Wall -pedantic -ANSI and getting no warning messages.
-
- RDP itself and the language processors it generates use standard symbol,
- set and scanner modules. The RDP scanner is programmed by loading tokens
- into the symbol table at the start of each run.
-
- RDP produces complete runnable programs with built-in help information and
- command line switches that are specified as part of the EBNF file. In this
- sense RDP output is far more shrink-wrapped than the usual parser generators
- which helps beginning students. RDP can generate itself which is a nice
- demonstration of the bootstrapping technique used for porting compilers to
- new architectures.
-
- I wrote RDP because in October I start giving a course on compiler design.
- I don't think the world needs another course on parsing techniques and am
- really interested in code generation for exotic processors, so I tried to
- produce a compact parser generator that would enable undergraduates to rip
- through the syntax and standard code generation parts of the syllabus in a
- few weeks, thus leaving me time to get into the black arts of code
- scheduling and optimisation.
-
- What you get.
-
- - An implementation of Pascal style set-handling in C.
-
- - A hash-coded symbol table with support for scoping and arbitrary user data
-
- - A programmable high-speed scanner with integrated error handling and
- hooks for macros (RDP is being used to generate assemblers for two novel
- microprocessors in addition to the usual applications of LL(1) parsers).
-
- - The source to a hand-coded version of the RDP translator. RDP checks that
- the source grammar is LL(1) and explains exactly why a non-LL(1) grammar
- is unacceptable. RDP does not attempt to rework a grammar by itself.
-
- - A decorated EBNF file describing RDP that may be processed by the
- hand-coded RDP translator to produce a machine generated version of RDP.
- This is good for boggling undergraduate's minds with.
-
- - A decorated EBNF file describing an interpreter for a language called TOY.
-
- - An EBNF file describing Pascal which can be used to generate a parser for
- the Pascal language. This description needs some work: it is overstrict
- with respect to semicolons and does not presently handle quites within
- strings.
-
- - A pre-built RDP executable for MS-DOS.
-
- - Sources, makefiles and Borland 3.1 project files for everything which
- you may use freely on condition that you send copies of any modifications,
- enhancements and ill-conceived changes you might make back to me so that
- I can improve RDP.
-
- What you don't get.
-
- - Decent manuals (coming Real Soon Now). When the manuals arrive they will
- be called rdp.ps which is a user manual for RDP proper, and rdp_supp.ps
- which is a user manual for the set, symbol and scanner modules. If you
- get a late version of this distribution you might be lucky and find them
- in the kit.
-
- - Tutorial information on parsing and compiling (try Wirth's Algorithms +
- Data Structures = Programs, Chapter 5 or your favourite compiler book).
-
- - Warranties. (This code has only just escaped from my personal toolkit.
- I've put a lot of effort into making it fit for others to use, but in
- the very nature of compiler compilers it is hard to test all the angles,
- and the Garbage In - Garbage Out principle holds to the highest degree:
- if you write ill-formed semantic actions you won't find out until you try
- and compile the parser that RDP wrote for you.)
-
- What I'd like.
-
- - Guinea pigs: I'm sure there are some bad surprises waiting for me, and I'd
- like to find them before October so that my course doesn't get torpedoed.
-
- - A C grammar. The supplied Pascal grammar was produced by retrieving the
- yacc Pascal description floating around on the net and hacking it into RDP
- source form (which shortens it a great deal BTW). An obvious first
- project is to do the same for the C grammar and maybe produce a
- pretty-printer based on it.
-
- RDP source language.
-
- - The RDP source language is a very conventional EBNF supplemented with:
-
- a few pre-defined tokens such as ID (an alphanumeric identifier),
- NEW_ID (an alphanumeric identifier that has not yet been added to
- the symbol table), EOF and EOLN,
-
- some programmable tokens such as STRING and STRING_ESC which define
- Pascal and C style strings respectively,
-
- a set of directives which are mainly used to programme the scanner
- and the help subsystems
-
- a weird construct < sequence > 'token' which is shorthand for
- sequence {'token' sequence}
-
- attributes of the form :identifier which may be attached to production
- names or tokens where they define the type of variable returned on the
- LHS and the name of the receiving variable on the RHS of an ::=
-
- arbitrary C code semantic actions embedded in double quotes. These are
- simply copied into the generated parser.
-
- - Here's the RDP grammar written in terms of itself. Tokens are enclosed
- in single quotes, directives and built-in tokens are written in upper case
- and comments are denoted by (* ... *). RDP is case sensitive.
-
- (* RDP source grammar *)
- unit ::= { prod | dir} EOF. (* Zero or more productions or directives *)
-
- prod ::= (ID|NEW_ID) [':'(ID|NEW_ID) {'*'} ] '::=' alt '.' .
-
- alt ::= < seq > '|' . (* Alternates are separated by | *)
-
- seq ::= { (item_ret [':' (ID | NEW_ID) ] | item_inl) }.
-
- item_ret ::= ID | NEW_ID | (* Production *)
- token | (* Token *)
- (* Following four are parameterisble tokens for handling strings and comments *)
- 'STRING' '(' token ')' | (* Pascal style string *)
- 'STRING_ESC' '(' token token')' | (* C-style string *)
- 'COMMENT' '(' token token ')' | (* Non-nested comment *)
- 'COMMENT_NEST' '(' token token ')'. (* Nestable comment *)
-
- item_inl ::= code | (* semantic action *)
- '(' alt ')' | (* do first *)
- '{' alt '}' | (* zero or more *)
- '[' alt ']' | (* zero or one *)
- '<' alt '>' token . (* list: <X>'t' expands to X {'t' X} *)
-
- code ::= STRING_ESC('"' '\\'). (* Inline code is delimited by ".." *)
- token ::= STRING_ESC('\'' '\\'). (* Tokens are delimited by '..' *)
- comment ::= COMMENT_NEST('(*' '*)'). (* Nestable comments are delimited by (*..*) *)
-
- dir ::= 'INCLUDE' '(' code ')' | (* Does for RDP what #include does for C *)
- 'HELP' '(' code code ')' | (* Add help line and command line switch *)
- 'USES' '(' code ')' | (* Adds a #include to the generated parser *)
- 'TITLE' '(' code ')' | (* Descriptive programme title for help *)
- 'SUFFIX' '(' code ')' | (* Defalt file suffix for scanner input *)
- 'PRE_PROCESS' '(' code ')' | (* Function to call before activating parser *)
- 'POST_PROCESS' '(' code ')' | (* Function to call after activating parser *)
- 'OUTPUT_FILE' '(' code ')' | (* Default output file name *)
- 'SET_SIZE' '(' NUMBER ')' | (* Set maximum set size (must be >= # of tokens *)
- 'HASH_SIZE' '(' NUMBER ')' | (* Number of hash buckets in hash table *)
- 'HASH_PRIME' '(' NUMBER ')' | (* Hash key: see dragon book on hash functions *)
- 'TAB_WIDTH' '(' NUMBER ')' | (* Tab expansion width in listings *)
- 'TEXT_SIZE' '(' NUMBER ')' | (* Default size of text buffer *)
- 'MAX_ERRORS' '(' NUMBER ')' | (* Stop after this many errors *)
- 'MAX_WARNINGS' '(' NUMBER ')' | (* Stop after this many warnings *)
- 'CASE_INSENSITIVE' | (* Make scanner case insensitive (cf Pascal) *)
- 'SHOW_SKIPS' | (* Issue 'Skipping...' informational messges *)
- 'NEWLINE_VISIBLE' | (* Make newlines visisbles (cf Ada comments, assemblers) *)
- 'COMMENTS_VISIBLE'. (* Return comments to parser (cf assemblers) *)
-
- (* End of grammar *)
-
- How to get RDP.
-
- - Use anonymous ftp to cscx.dcs.rhbnc.ac.uk (134.219.200.45) and download
- pub/rdp/rdp.tar.Z if you are a Unix user or pub/rdp/rdp.zip if you are an
- MS-DOS user. Actually both files contain exactly the same distribution kit,
- so you will find an MS-DOS executable in the tar file.
-
- - If you only have email access to the Internet, try one of the email based
- ftp servers such as @decwrl.
-
- - If you can't manage ftp then I am prepared to send out 3.5in 1.4M disks
- for MS-DOS. If you're really desperate I can supply just any other medium
- that PC's, Amigas, Atari-ST's, Mac's or DEC computers have ever used, but
- I'm very short of time so strange requests may take a while to service.
-
- Other machines.
-
- - An ex-student of mine is going to try and build RDP for an Amiga. IMHO
- anybody with an ANSI C compiler should have almost no problems building
- from the standard kit. On the other hand K&R compiler users may suffer.
-
- Installation.
-
- 1 Unpack the distribution kit into a single directory. You should have the
- following files:
-
- buildlog.dos A log of Borland make and Borland C 3.1 building for DOS
- buildlog.unx A log of OSF/1 make and GNU CC for Alpha building for Unix
- crash.c Fatal error handling and memory allocation
- crash.h
- makefile Customisable make file for Unix and MS-DOS Borland C 3.1
- pascal.bnf Pascal grammar for building a Pascal parser
- rdp.bnf Decorated RDP grammar for building machine generated RDP
- rdp.c Hand written RDP front end
- rdp.h
- rdp.doc This file
- rdp_aux.c Auxilliary file containing RDP semantic actions
- rdp_aux.h
- scan.c A programmable scanner
- scan.h
- set.c An implmentation of Pascal style sets
- set.h
- symbol.c A hash code symbol table
- symbol.h
- test.pas A piece of Pascal for testing the Pascal parser
- test.toy A piece of Toy for testing the Toy parser
- toy.bnf Decorated Toy grammar for building Toy interpreter
-
- 2 The makefile can be used on Unix or on a PC, but you should edit it to
- make sure that it is configured for your operating system. Select one of
- the two blocks of macro definitons at the top of the file.
-
- 3 To build everything, just type 'make'. The default target in the makefile
- builds rdp, the Toy interpreter and the Pascal parser. The Toy interpreter
- is run on test.toy, and the Pascal interpreter is run on test.pas. Neither
- should generate any errors. On successful completion, make uses rdp to
- build rdp_gen, a machine generated version of rdp and then uses rdp_gen to
- reproduce itself. The two machine generated versions are compared with each
- other to make sure that the bootstrap has been successful. They should
- differ only in the program names.
-
- The following messages appearing during the build are normal:
-
- During compilation of toy.c
- Warning toy.c 732: 'base' is assigned a value that is never used in function main
- Warning toy.c 732: 'force' is assigned a value that is never used in function main
-
- During the run of toy on test.toy
- Toy V1.00 Feb 18 1994 20:07:04
- test.toy: toy interpreter OK
- Text mode: 0 errors, 0 warnings
-
- During generation of pascal.c the dangling else causes
- Error: ** Production 'statement__9' contains null but first /\ follow is not empty
-
- During compilation of pascal.c
- Warning pascal.c 1820: 'base' is assigned a value that is never used in function main
- Warning pascal.c 1820: 'force' is assigned a value that is never used in function main
-
- During the run of pascal on test.pas
- Pascal parser V1.00 (generated) Feb 18 1994 20:07:10
- test.pas: 0 errors, 0 warnings
-
-
- During the comparison of the machine generated rdp versions
-
- #include "rdp_genA.h"
- #include "rdp_genB.h"
-
- "Usage: rdp_genA [options] sourcefile \n\n"
- "Usage: rdp_genB [options] sourcefile \n\n"
-
- The buildlog.* files show a complete log for Unix and MS-DOS: if your
- results are different then please let me know.
-
- 4 If you type 'make clean' all the object files and the machine generated
- rdp versions will be deleted, leaving the distribution files plus the new
- executables.
-
- 5 If you type 'make veryclean' then the directory is cleaned and the
- executables are also deleted.
-
- Missing bits to be fixed in version 1.1
-
- There are a couple of things that I managed to break whilst preparing this
- version. I've run out of time to work on RDP so I'll just note them here and
- put out a maintenance release soon, maybe at the end of February 1994.
-
- - Presently the RDP text mode scanner is line oriented, so I have removed the
- interpretation of while loops from Toy because they break if the while
- statement is the first thing on the line. This is top priority to be fixed:
- look out for version 1.1
-
- - Late on, I introduced a stupid bug into the handling of comment and string
- delimiters so thatthey must be two characters long to work. Hence /* .. */
- is fine but { } is not. Hence the Pascal parser does not parse comments. This
- will also be fixed in version 1.1.
-
- - Due to a subtlety of the order in which sequences are evaluated, rdp_gen
- and its first child differ, so I need to generate rdp_gen's child and
- grandchild to do the convergence check in the makefile.
-
- Enhancements for version 2
-
- - RDP parsers have large numbers of constant first and follow sets embedded
- in them. For some applications, especially assemblers, many of these sets
- are identical. In version 2 identical sets will be collapsed. This is only
- likely to save a few kilobytes in even the most pathological parser, but it
- offends my sense of neatness.
-
- - Macro handling code will be incorporated directly into the scanner. At the
- moment I do macros within my assemblers (which I'm not releasing for
- public consumption just yet).
-
- Comments, queries and requests for code to
-
- Dr Adrian Johnstone, CS Dept, Royal Holloway, University of London
- Email: adrian@dcs.rhbnc.ac.uk
-
-
-
-