home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
High Voltage Shareware
/
high1.zip
/
high1
/
DIR9
/
SYACC131.ZIP
/
SYACC.DOC
< prev
next >
Wrap
Text File
|
1993-11-11
|
13KB
|
245 lines
╔═════════════════════════════════════════════════════════════════════════════╗
║ ║
║ SYACC ║
║ Version 1.3.1 ║
║ ║
║ by David H. Chen ║
║ November 1993 ║
║ ║
╚═════════════════════════════════════════════════════════════════════════════╝
I. Introduction
This program is a LALR(1) parser generator.
It generates a look-ahead LR(1) parser from an argumented grammar.
The generated parser is a left-to-right scanning of the input (L) and
constructing from the rightmost derivation first (R). It look-ahead one
input symbol (not a character symbol). It is a "C" source code that will
recognize a language that described by the production rules (grammar).
A grammar that can be translated into a LR driver program with parsing
table is called a LR grammar. A LR grammar is a context-free grammar.
For most of the porgramming languages, they can be described by
a LR grammar.
A skim of the process in the SYACC:
First, the production rules are parsed into a list. For each symbol, the
"FIRST" and "FOLLOW" are constructed. They are also included in a list.
Meanwhile, the propagated "A -> ε" issue is taken cared. The last
step is to construct the LALR parsing table.
II. User's Guide
1. Usage
After dos prompt, type:
syacc <infile> [<outfile>] [<switchs>]
Where: [] means optional.
<infile> is the input file name.
<outfile> is the output file name. It is optional.
The default file name is "lrparser.c".
<switchs> has following switchs:
-n No Main function -
The default is to create a file that has "main()".
When this switch is present, no "main()" is generated.
The user is preparing this file to link with other program.
-m Print processing message on screen -
The default is no message sends from SYACC to screen.
Using the switch to print processing message on screen from
the SYACC. It reports processing status.
-e Exit program on error -
The default is to continue the program execution even an
error found. However, this will set program into an
unexpected environment. When this switch is present, the
program exit if an error is found (in the input file).
-d Create file 'syacc.out' -
This file has symbols, production rules, items and messages
information. This file is for use's reference to debug
the grammar. It also shows what the SYACC based to create
the parsing table.
-sN Create N version number of parser -
User can specify the parser's version number so that
more than two parsers can be linked into a program.
This makes a program can have multiple parsers that
generated by the SYACC. The default of N is 0.
-h Generated the parsing table file -
When the parsing table is too large to initialize in
source code, this is an alternative to generate the
parsing table file which will be read in at the
initialization. The file's name is 'lalr?.tbl'. The '?'
is the version number - default is 0. If this switch
is presented, the generated parser is required to compile
in 'huge' memory model.
-b Create the virtual memory file 'syacc.vm' -
The user can provide a file path by using "-bfile_path" to
replace the file 'syacc.vm'. This file can be used when
the message "Out of core" showed by the SYACC. When the
RAM is short, the SYACC can use hard disk space for
swapping out data to make room for execution. If this
switch is not presented, the SYACC will use RAM only.
2. Input File
The file that is accepted by the SYACC can be divided into four areas
by the line begin with "%%".
A. The first area is the DECLARATIONS area. The SYACC will copy all
lines in this area into the begining of the output file.
The macro definitions: 'PA_STACK_TYPE', 'TOKEN_VALUE', 'LEXER', and
'LRPARSER' define the value type of the 'TOKEN_VALUE', the name of
the 'TOKEN_VALUE', the name of the LEXER and name of the LR parser.
The default of these macros are 'char *', 'token_value', 'lexer?',
and 'lrparser?' accordingly. This allows user to re-define these
names for multiple parsers program. The '?' is the version number
- default is 0.
B. The second area is the RESOLUTION area. The SYACC will consult
this area's definition when a shift/reduce or a reduce/reduce
conflict. It uses '%left' to denote left associative and '%right'
right associative. The precedence is decided by the order of the
appearence in this area. Hence, the later the symbols appears,
the higher the precedence. The SYACC will use defaults if the
definitions in this area can not solve the conflict.
The default for shift/reduce conflict is in favor of shift and for
reduce/reduce conflict is for the production rule that appears first.
C. The third area is the TRANSLATION RULES area. It contains the LR
grammar's production rules and their associated semantic actions.
Each rule should have the following format:
<left-hand-side> : <alt 1> { <semantic action 1> }
| <alt 2> { <semantic action 2> }
.....
| <alt n> { <semantic action n> }
| /* this denotes the ε */
;
where,
<left-hand-side> is always a non-terminal symbol.
<alt n> is a sequence of non-terminals and terminals.
<semantic action n> is a section of code will be
executed when this production is recognized.
All tokens other than non-terminal symbol is treated as terminals.
The token must start with a alphabet. Rest of the letter in the
token can be any charater. Character symbols has two "'" quoted
to be a terminal symbol. For example, '->' is a terminal;
but, -> is an error. Any number of character symbols can be in
these two "'" quotes.
In this area, it is case sensitive. Therefore, an "If" is
different from an "if". When a lexical analyzer returns token,
it must match the upper/lower case to the symbol in the rules.
In the semantic action, it can has a section of the 'C' code.
However, $$ and $n are reserved. The $$ means the left-hand-side's
value. Those $n (n is a positive number) are the symbol's value
corresponding to the ordinal sequence of the symbols at the
right hand side. (in the <alt n>) They are pointers of the value
type that is defined by the macro 'PA_STACK_TYPE' - the same as
the variable 'token_value' is declared.
A '//' signifies the start of a comment string.
Any letter follows it will be ignored by the SYACC.
The presented sequence of the production rules is important in
parsing table construction. When a reduce/reduce conflict found,
the SYACC will use the one that appear first. Also, the left-hand
side of the first production rule must be the "start symbol".
For error recovery, a production rule A -> error ... can serve
the purpose. The error is a reserve terminal symbol for the SYACC.
The ... must be a sequence of terminals. Also, the error must be
at the first right-hand-side. The parser that is generated by the
SYACC will trigger the error recovery routine as soon as an error
is encountered. When an error occurred, the parser will popup
the states (and symbols) in the stack until a state that has the
error production. Then, it will discard any input symbol that does
not match with the following terminals of the error symbol in this
production's right-hand-side. After the matching is complete, it
will reduce this production rule and trigger the semantic action.
Afterward, the mormal parsing is resumed.
D. The fourth area is the AUXILIARY FUNCTIONS area. The SYACC will
copy every line in this area to the end of the output file.
This area is used by user to provide supporting functions and main
functions (if any) to complete the parser.
3. Output File
The output file is the LALR(1) parser's 'C' source code.
There are two possible output files:
One is the file contains the "main()" function. The user can produce
an ".exe" file directly from this file.
Another is a file without the "main()" function. The user can link
this file with his program later.
In either case, the function "int lrparser?()" is generated in the
output file. It returns 0 if successfully parsed; Otherwise, errors.
The '?' is the version number - default is 0.
The generated parser expects to get a token each time it calls the
function "char* LEXER()". This function will return 0 if error
occurred and a "" string if no more token is available. It is the
only input from this parser.
It also expects 'TOKEN_VALUE' variable contains the returning
token's value in which the token returns by the 'LEXER()' function.
This variable must be a pointer and its data type is defined by the
'PA_STACK_TYPE' macro. The default is 'char*'. User can change it by
using '#define PA_STACK_TYPE ...' in the declaration area. Also, this
token_value is associated with $n that defined in the semantics rules.
4. Examples
A typical development procedure of a lrparser can be described
as following:
INPUT OUTPUT
========= =================
yacc.259 -> SYACC -> p259.c
p259.c -> c-compiler -> p259.exe
INPUT -> p259.exe -> results (actions and values)
where,
yacc.259 is the file which has the SYACC's input format
p259.c is the lrparser with parsing table in "C".
p259.exe is the LR parser.
Other two examples are included - yacc.262 and yacc.266.
To see how these files work, use 259.bat, 262.bat and 266.bat files.
These batch files use the example and build '.exe' files. They assumes
you have the Microsoft C compiler environment preset. If using other
compiler, you have to change the line "cl ..." in the batch files to
your compiler's command line features.
III. HISTORY
- Version 1.3.1 November 1993,
Modify for the generated parser using memory effectively.
- Version 1.3 September 1993,
Add generating parsing table file to cope with large scale
array initialization issue in the parser.
- Version 1.2 September 1993,
Add linking mutil-generated-parsers in one program.
- Version 1.1 August 1993,
Add error recovery function.
Using more effective algorithm.
- Version 1.0 June 1993,
Bugs fix and using 16 and 32 bits program.
- Version 0.9 May 1990,
Initial work completed in C++.