home *** CD-ROM | disk | FTP | other *** search
Text File | 1992-11-07 | 29.8 KB | 1,084 lines |
- Newsgroups: comp.lang.perl
- Path: sparky!uunet!decwrl!netcomsv!netcom.com!jeffrey
- From: jeffrey@netcom.com (Jeffrey Kegler)
- Subject: Marpa, prototype of a Hacker's Parser for Perl
- Message-ID: <1992Nov6.120224.12846@netcom.com>
- Sender: jeffrey@algor2.algorists.com
- Organization: Algorists, Inc.
- Date: Fri, 6 Nov 1992 12:02:24 GMT
- Lines: 1073
-
- Following my signature is a shar archive of the working prototype of
- the Marpa parser. Marpa is the name of the parse engine itself --
- it's a library of perl routines. Milarepa is a program which uses a
- subset of Marpa's capabilities to take a file of BNF and associated
- actions (in Perl!), and parse them into a Perl program which parses
- its input standard, performing the actions. Both milarepa.pl and the
- Perl program it writes on its standard output use Marpa. milarepa.pl
- both produces examples of Marpa usage, and is itself a moderately
- sophisticated one.
-
- For example, test2.mr describes a calculator in the Milarepa language,
- a straightforward combination of BNF and Perl. Running the command
- "perl milarepa.pl < test2.mr > testmr2.pl" creates the calculator, and
- "perl testmr2.pl" runs it. The example just adds and multiplies but a
- glance at test2.mr should show how easily it could be extended into a
- very powerful calculator.
-
- The files in the package are
-
- marpa.pl -- the heart of the Marpa prototype, its parsing routines
- milarepa.pl -- creates simple Marpa compilers or interpreters
- from BNF and Perl
- test.mr -- the Milarepa code for an arithmetic expression compiler
- mrtest.pl -- The above compiler compiled into a Perl script which
- prints the parse for simple arithmetic expressions
- test2.mr -- the Milarepa code for an arithmetic expression interpreter
- mrtest2.pl -- The above interpreter compiled into a perl script which
- evaluates simple arithmetic expressions
- tilopa.pl -- A simple lexer used by Milarepa
- naropa.pl -- Other simple routines used by Milarepa
- test.pl -- A simple program using Marpa
- test2.pl -- Another simple program using Marpa
-
- This is intended to be a real hacker's parser. It is not restricted
- to LR(k), and the parse logic follows directly from the BNF. It
- handles ambiguous grammars, ambiguous tokens (tokens which were not
- positively identified by the lexer) and allows the programmer to
- change the start symbol. There is no fixed distinction between
- terminals and non-terminals, that is, a symbol can both match the
- input AND be on the left hand side of a production. Multiple Marpa
- grammars are allowed in a single perl program. The grammar is
- extensible. The BNF may have productions added (or, as an extension,
- deleted) after parsing has begun.
-
- Since, unlike LR parsers, Marpa's logic follows directly from the BNF,
- hackers should be able to invent tricks. For example, the order in
- which productions are tested is controlled by the programmer. He can
- perform error handling by inserting special error productions into the
- grammar, which detect those cases he wishes to report ("Missing comma
- in list", etc.). These productions can come after the others, so that
- no input ever fails to parse in the strict sense, some merely return
- special "error parses" which indicate the problem. It does not bother
- Marpa if the new productions introduce ambiguities into the grammar.
-
- Of course, I happily offer Marpa under the same terms of free
- redistributability that I was offered perl.
-
- There are two restrictions on the grammar, neither of which I believe
- will prevent Marpa from handling any grammar of practical use. First,
- the grammar may not be left recursive. Left recursion makes Marpa
- recurse infinitely. A later version will detect left recursion and
- stop the parse with an error.
-
- Second, the input must be divided into sentences of a finite maximum
- length. This restriction is unusual in parsing theory, but is easy to
- apply to any grammar of practical interest. A language where the
- parseable entities ran many pages would not be readable or writable by
- humans, and even the most obscure computer languages must be divided
- into pieces by the human being reading them, otherwise they would not
- be comprehensible at all.
-
- With this restriction, Marpa runs very fast. I have done a C language
- prototype of Marpa, and it chomps down large, highly ambiguous
- sentences of an English subset rapidly. The theoretical speed, with
- division into sentences of maximum fixed length, is linear, or O(n).
- The C code works by pushing pointers onto and off of stacks, and runs
- very fast.
-
- This implementation is not fast, since it prototypes Marpa in Perl. I
- am now seeking help in converting Marpa into part of Perl.
-
- Any parser needs help in lexing, and in evaluating the semantics of
- the results. Perl's power in these areas makes it an ideal place to
- embed Marpa.
-
- The current documentation stinks. It forms the rest of this message.
-
- This interface didn't come out very "perl-ish", and that may be due to
- my habits of thought. I would like an interface that seemed more
- "perl-ish".
-
- &createGrammar() -- returns a grammar ID for a new grammar, to be
- built with registerAlternate() and registerSymbol() calls. The
- grammarID scalar is actually just an integer. The new grammar becomes
- the current grammar.
-
- &setGrammar($gid) -- sets the current grammar to gid. All Marpa
- routines affect only the current grammar.
-
- ®isterSymbol($name, $pattern) -- a string giving the name and
- another the search pattern. The name is to be used by another set of
- routines (not described) which will use names instead of symbol IDs.
- The pattern is only used by the default lexer supplied with Marpa.
- Returns the symbol ID of a newly created symbol in the current
- grammar.
-
- ®isterAlternate($value, $lhs, @rhs) -- returns the production ID of
- a new production in the current grammar. The $lhs is the symbol ID of
- its left hand side, and @rhs is a list (possibly empty) of symbol IDs
- for the right hand side. Value is the string to be evaluated when
- translating the parse tree for this grammar.
-
- ®isterToken($value, @sidList) -- used to build the sentence to be
- parsed. $value is the string that will be passed up (unevaluated) to
- the upper levels of the parse tree. The @sidList is the list of
- symbol IDs which are possible choices for this token.
-
- &parse($sid) -- the main routine. Returns the result of evaluating
- the parse tree, or undefined if there were no more parses. Repeated
- calls return evaluations of alternative parse trees, if the grammar
- parses the sentence ambiguously. $sid is the start symbol to use.
- The code in this routine is the heart of the parser, and is very hard
- to figure out. The algorithm comes from the two volume _The Theory of
- Parsing, Translation and Compiling_ by Aho & Ullman, and I don't
- recommend you bother trying to figure out what's going on here unless
- you read that section of the book (Vol. I, pp. 289-297). The code in
- the other routines should be accessible to the determined Perl hacker.
-
- &clearParse() -- for parsing ambiguous grammars, &parse save its
- intermediate results. This routine clears them, and deletes all
- tokens in the current sentence.
-
- Marpa, Milarepa, Tilopa and Naropa are the names of Tibetan saints.
- Marpa the Translator was instrumental in bringing Indian Buddhist
- texts to Tibet, which task involved three dangerous journeys across
- the Himalayas, extensive fund raising, great scholarship, linguistic
- ability and deep spiritual development.
-
- Cheers!
-
- Jeffrey Kegler, Independent UNIX Consultant, Algorists, Inc.
- jeffrey@algor2.ALGORISTS.COM or uunet!algor2!jeffrey
- 137 E Fremont AVE #122, Sunnyvale CA 94087
- "No wonder the gods smile so seldom -- we so often fail to notice."
- From _Stardance_ by Spider and Jeanne Robinson
-
- #!/bin/sh
- # This is a shell archive (produced by shar 3.49)
- # To extract the files from this archive, save it to a file, remove
- # everything above the "!/bin/sh" line above, and type "sh file_name".
- #
- # made 11/06/1992 11:54 UTC by jeffrey@netcom
- # Source directory /u25/jeffrey/marpa
- #
- # existing files will NOT be overwritten unless -c is specified
- #
- # This shar contains:
- # length mode name
- # ------ ---------- ------------------------------------------
- # 7038 -rw-r--r-- marpa.pl
- # 4221 -rw-r--r-- milarepa.pl
- # 888 -rw-r--r-- mrtest.pl
- # 1101 -rw-r--r-- mrtest2.pl
- # 966 -rw-r--r-- naropa.pl
- # 865 -rw-r--r-- test.pl
- # 700 -rw-r--r-- test2.pl
- # 813 -rw-r--r-- tilopa.pl
- # 160 -rw-r--r-- test.mr
- # 244 -rw-r--r-- test2.mr
- #
- # ============= marpa.pl ==============
- if test -f 'marpa.pl' -a X"$1" != X"-c"; then
- echo 'x - skipping marpa.pl (File already exists)'
- else
- echo 'x - extracting marpa.pl (Text)'
- sed 's/^X//' << 'SHAR_EOF' > 'marpa.pl' &&
- package Marpa;
- X
- $debug = 0;
- X
- $currentGrammar = 1;
- $nextGrammar = 2;
- X
- $nextAlternate = 1;
- $nextSymbol = 1;
- X
- $listTemplate = "l*";
- X
- sub main'createGrammar
- {
- X local($ret) = $nextGrammar++;
- X return $ret;
- }
- X
- sub main'setGrammar
- {
- X local($gid) = @_;
- X $currentGrammar = $gid;
- }
- X
- sub main'clearParse
- {
- X @TokenValue = ();
- X @Token = ();
- X @L1Position = ();
- X @L1 = ();
- X @L2 = ();
- X $inProgress = 0;
- }
- X
- sub main'registerSymbol
- {
- X local($name, $pattern) = @_;
- X local($ret) = $nextSymbol++;
- X $symbolName{$currentGrammar, $ret} = $name;
- X $symbolID{$currentGrammar, $name} = $ret;
- X $symbolPattern{$currentGrammar, $ret} = $pattern if defined $pattern;
- X return $ret;
- }
- X
- sub main'requireSymbol
- {
- X local($name) = @_;
- X local($ret) = $symbolID{$currentGrammar, $name};
- X $ret = &main'registerSymbol($name) unless defined $ret;
- X return $ret;
- }
- X
- sub main'requirePattern
- {
- X local($name, $pattern) = @_;
- X local($sid) = &main'requireSymbol($name);
- X $symbolPattern{$currentGrammar, $sid} = $pattern;
- X return $sid;
- }
- X
- sub main'symbolName
- {
- X local($sid) = @_;
- X return $symbolName{$currentGrammar, $sid};
- }
- X
- sub main'terminatorSymbol
- {
- X local($sid) = @_;
- X $terminator{$currentGrammar, $sid} = 1;
- }
- X
- sub main'discardSymbol
- {
- X local($sid) = @_;
- X $discard{$currentGrammar, $sid} = 1;
- }
- X
- sub main'registerAlternate
- {
- X local($value, $lhs, @rhs) = @_;
- X local($alt) = $nextAlternate++;
- X print "Registering alternate $alt, $lhs = " . join(",", @rhs) .
- X " -> \"$value\"\n" if $debug;
- X $Value{$currentGrammar, $alt} = $value;
- X $LHS{$currentGrammar, $alt} = $lhs;
- X $RHS{$currentGrammar, $alt} = pack($listTemplate, @rhs);
- X local(@altList) = unpack($listTemplate,
- X $Alternate{$currentGrammar, $lhs});
- X push(@altList, $alt);
- X $Alternate{$currentGrammar, $lhs} =
- X pack($listTemplate, sort { $a <=> $b } @altList);
- X return $alt;
- }
- X
- sub main'registerToken
- {
- X package Marpa;
- X local($value, @symbolList) = @_;
- X
- X push(@TokenValue, $value);
- X push(@Token, pack($listTemplate, @symbolList));
- }
- X
- $inProgress = 0;
- X
- $normal = 1;
- $backtrack = 2;
- $fail = 3;
- $succeed = 4;
- X
- sub elementOf
- {
- X local($element, @set) = @_;
- X
- X foreach $setElement (@set)
- X {
- X return 1 if $element == $setElement;
- X }
- X return 0;
- }
- X
- %evalValues = ();
- X
- sub main'value
- {
- X local($sid, $occurrence) = @_;
- X $occurrence = 1 unless defined $occurrence;
- X local($ret) = $evalValues{$sid, $occurrence};
- X $ret = "[?value($sid, occurrence)?]" unless defined $ret;
- X return $ret;
- }
- X
- sub main'v
- {
- X local($name, $occurrence) = @_;
- X $sid = &main'requireSymbol($name);
- X return &main'value($sid, $occurrence);
- }
- X
- sub symbolEval
- {
- X local($ret);
- X local($L1pos) = $L1Position[$evalPosition];
- X local($L1) = $L1[$evalPosition];
- X print "Starting \$evalPosition=$evalPosition," .
- X "\$L1pos=$L1pos," .
- X "\$L1=$L1\n" if $debug;
- X $evalPosition++;
- X return $TokenValue[$L1pos] if $L1pos >= $[;
- X local(@RHSCount) = ();
- X local(@RHS) = unpack($listTemplate, $RHS{$currentGrammar, $L1});
- X local(%Values) = ();
- X local($i);
- X for ($i = $[; $i <= $#RHS; $i++)
- X {
- X local($sym) = $RHS[$i];
- X local($val) = &symbolEval();
- X return $ret if $parseReject;
- X $RHSCount[$sym] = 0 unless defined $RHSCount[$sym];
- X $RHSCount[$sym]++;
- X $Values{$sym, $RHSCount[$sym]} = $val;
- X }
- X print "Evaluating \$L1pos=$L1pos," . "\$L1=$L1\n" if $debug;
- X %evalValues = %Values;
- X $evalString = $Value{$currentGrammar, $L1};
- X print "Evaluating \"$evalString\"\n" if $debug;
- X die "Marpa: no value, production $L1, grammar $currentGrammar\n"
- X unless defined $evalString;
- X package main;
- X die "Marpa: unable to eval \"$Marpa'evalString\", $@\n"
- X unless $Marpa'ret = eval $Marpa'evalString;
- X package Marpa;
- }
- X
- sub main'parse
- {
- X local($ret);
- X local($startSymbol) = @_;
- X
- X if ($inProgress)
- X {
- X $state = $backtrack;
- X } else {
- X $state = $normal;
- X $position = $[;
- X @L1 = ();
- X @L1Position = ();
- X @L2 = ($startSymbol);
- X $inProgress++;
- X }
- X
- X config: for (;;)
- X {
- X if ($debug)
- X {
- X if ($state == $normal)
- X {
- X print "q";
- X } elsif ($state == $backtrack)
- X {
- X print "b";
- X } elsif ($state == $succeed)
- X {
- X print "t";
- X } else
- X {
- X print "?";
- X }
- X print ",";
- X print $position+1;
- X print ",";
- X if ($#L1 < $[) { print "e"; }
- X else {
- X local($i);
- X for ($i=$[; $i<=$#L1; $i++)
- X {
- X if ($L1Position[$i] >= $[)
- X {
- X print $symbolName{$currentGrammar, $L1[$i]};
- X } else {
- X local($symbol) = $LHS{$currentGrammar, $L1[$i]};
- X local($altList) = $Alternate{$currentGrammar, $symbol};
- X local(@altList) = unpack($listTemplate, $altList);
- X local($j);
- X altCount: for ($j = $[; $j <= $#altList; $j++)
- X {
- X last altCount if $altList[$j] == $L1[$i];
- X }
- X print join("",
- X "<$symbolName{$currentGrammar, $symbol}",
- X ($j+1),
- X ">"
- X );
- X }
- X }
- X }
- X print ",";
- X if ($#L2 < $[) { print "e" if $state == $succeed; }
- X else {
- X local($i);
- X for ($i=$#L2; $i>=$[; $i--)
- X {
- X print "<$symbolName{$currentGrammar, $L2[$i]}>";
- X }
- X }
- X print "$" unless $state == $succeed;
- X print "\n";
- X }
- X
- X if ($state == $fail)
- X {
- X return $ret;
- X }
- X
- X if ($state == $normal)
- X {
- X if ($#L2 == $[ - 1)
- X {
- X $state = $position > $#Token ? $succeed : $backtrack;
- X next config;
- X }
- X
- X if (&elementOf($L2[$#L2],
- X unpack($listTemplate, $Token[$position])))
- X {
- X push(@L1, pop(@L2));
- X push(@L1Position, $position);
- X $position++;
- X next config;
- X }
- X
- X local($altList) = $Alternate{$currentGrammar, $L2[$#L2]};
- X if (defined $altList)
- X {
- X local(@altList) = unpack($listTemplate, $altList);
- X local($alt) = shift @altList;
- X push(@L1, $alt);
- X push(@L1Position, $[ - 1);
- X pop(@L2);
- X push(@L2, reverse unpack($listTemplate,
- X $RHS{$currentGrammar, $alt}));
- X next config;
- X }
- X
- X $state = $backtrack;
- X next config;
- X
- X } # end of if NORMAL
- X
- X if ($state == $backtrack)
- X {
- X if ($#L1 < $[)
- X {
- X $state = $fail;
- X next config;
- X }
- X
- X if ($L1Position[$#L1] >= $[)
- X {
- X $position--;
- X push(@L2, pop(@L1));
- X pop(@L1Position);
- X next config;
- X }
- X
- X $oldAlt = pop(@L1);
- X pop(@L1Position);
- X foreach $element (unpack($listTemplate,
- X $RHS{$currentGrammar, $oldAlt}))
- X {
- X pop(@L2);
- X }
- X
- X @altList = unpack($listTemplate,
- X $Alternate{$currentGrammar, $LHS{$currentGrammar, $oldAlt}});
- X
- X altPass: while ($element = shift(@altList))
- X {
- X last altPass if $element == $oldAlt;
- X }
- X
- X $newAlt = shift(@altList);
- X
- X if (defined $newAlt)
- X {
- X $state = $normal;
- X push(@L1, $newAlt);
- X push(@L1Position, $[ - 1);
- X push(@L2, reverse unpack($listTemplate,
- X $RHS{$currentGrammar, $newAlt}));
- X next config;
- X }
- X
- X push(@L2, $LHS{$currentGrammar, $oldAlt});
- X next config;
- X
- X } # end of if BACKTRACK
- X
- X last config;
- X
- X }
- X
- X $evalPosition = $[;
- X $parseReject = 0;
- X local($val) = &symbolEval();
- X $ret = $val if (!$parseReject);
- X return $ret;
- }
- X
- 1;
- X
- package main;
- SHAR_EOF
- chmod 0644 marpa.pl ||
- echo 'restore of marpa.pl failed'
- Wc_c="`wc -c < 'marpa.pl'`"
- test 7038 -eq "$Wc_c" ||
- echo 'marpa.pl: original size 7038, current size' "$Wc_c"
- fi
- # ============= milarepa.pl ==============
- if test -f 'milarepa.pl' -a X"$1" != X"-c"; then
- echo 'x - skipping milarepa.pl (File already exists)'
- else
- echo 'x - extracting milarepa.pl (Text)'
- sed 's/^X//' << 'SHAR_EOF' > 'milarepa.pl' &&
- require "naropa.pl";
- X
- $S_Semicolon = &main'registerSymbol("semicolon", ";");
- &main'terminatorSymbol($S_Semicolon);
- X
- $S_StringPiece =
- X &main'registerSymbol("string piece", "\"((\\\\.)|([^\\\"]))*\"");
- X
- $S_Whitespace = &main'registerSymbol("whitespace", "\\s+");
- &main'discardSymbol($S_Whitespace);
- X
- $S_Comment = &main'registerSymbol("comment", "#[^\n]*\n");
- &main'discardSymbol($S_Comment);
- X
- $S_ConcatenateSign = &main'registerSymbol("concatenate sign", "\\\.");
- $S_NameComponent = &main'registerSymbol("name component", "\\w+");
- $S_Tilde = &main'registerSymbol("tilde", "~");
- $S_ProduceSign = &main'registerSymbol("produce sign", "::=");
- X
- # action ::= lex pattern | production;
- $S_Action = &main'registerSymbol("action");
- $S_LexPattern = &main'registerSymbol("lex pattern");
- $S_Production = &main'registerSymbol("production");
- &main'registerAlternate('&value($S_LexPattern)', $S_Action, $S_LexPattern);
- &main'registerAlternate('&value($S_Production)', $S_Action, $S_Production);
- X
- # lex pattern ::= symbol name . tilde . string
- $S_SymbolName = &main'registerSymbol("symbol name");
- &main'registerAlternate(
- X qq/join("",
- X "&requirePattern('",
- X &value($S_SymbolName),
- X "', ",
- X &value($S_StringPiece),
- X ");\n")/,
- X $S_LexPattern, $S_SymbolName, $S_Tilde, $S_StringPiece);
- X
- # symbol name ::= name component list
- $S_NameComponentList = &main'registerSymbol("name component list");
- &main'registerAlternate(
- X '&value($S_NameComponentList)',
- X $S_SymbolName, $S_NameComponentList);
- X
- # name component list ::=
- # name component |
- # name component . name component list;
- &main'registerAlternate('&value($S_NameComponent)',
- X $S_NameComponentList, $S_NameComponent);
- &main'registerAlternate(
- X 'join(" ",&value($S_NameComponent),&value($S_NameComponentList))',
- X $S_NameComponentList, $S_NameComponent, $S_NameComponentList);
- X
- # string ::= string piece list
- $S_String = &main'registerSymbol("string");
- $S_StringPieceList = &main'registerSymbol("string piece list");
- &main'registerAlternate(
- X '&value($S_StringPieceList)',
- X $S_String, $S_StringPieceList);
- X
- # string piece list ::=
- # string piece |
- # string piece . string piece list;
- &main'registerAlternate(
- X '&value($S_StringPiece)',
- X $S_StringPieceList, $S_StringPiece);
- &main'registerAlternate(
- X 'join("", &value($S_StringPiece), &value($S_StringPieceList))',
- X $S_StringPieceList, $S_StringPiece, $S_StringPieceList);
- X
- # production ::=
- # production proper |
- # production proper . production action;
- $S_ProductionProper = &main'registerSymbol("production proper");
- $S_ProductionAction = &main'registerSymbol("production action");
- ®isterAlternate(
- X qq/join("",
- X '&defaultAction(\n ',
- X &value($S_ProductionProper),
- X ');\n')/,
- X $S_Production, $S_ProductionProper);
- ®isterAlternate(
- X qq/join("",
- X '®isterAlternate(\n ',
- X &value($S_ProductionAction),
- X '\n ,',
- X &value($S_ProductionProper),
- X ');\n')/,
- X $S_Production, $S_ProductionProper, $S_ProductionAction);
- X
- # production action ::= string piece;
- &main'registerAlternate(
- X '&value($S_StringPiece)',
- X $S_ProductionAction, $S_StringPiece);
- X
- # production proper ::= lhs . produce sign . rhs;
- $S_LHS = &main'registerSymbol("lhs");
- $S_RHS = &main'registerSymbol("rhs");
- ®isterAlternate(
- X qq/join('',
- X '&requireSymbol("',
- X &value($S_LHS),
- X '")\n',
- X &value($S_RHS))/,
- X $S_ProductionProper, $S_LHS, $S_ProduceSign, $S_RHS
- );
- X
- # lhs ::= symbol name;
- ®isterAlternate('&value($S_SymbolName)', $S_LHS, $S_SymbolName);
- X
- # rhs ::= symbol list;
- $S_SymbolList = &main'registerSymbol("symbol list");
- ®isterAlternate(
- X qq/&value($S_SymbolList)/,
- X $S_RHS, $S_SymbolList
- );
- X
- # symbol list ::=
- # symbol name |
- # symbol name . concatenate symbol . symbol list;
- &main'registerAlternate(
- X qq/join(''
- X ,' ,&requireSymbol("'
- X ,&value($S_SymbolName)
- X ,'")\n'
- X )/,
- X $S_SymbolList, $S_SymbolName);
- ®isterAlternate(
- X qq/join('',
- X ' ,&requireSymbol("',
- X &value($S_SymbolName),
- X '")\n',
- X &value($S_SymbolList)
- X )/,
- X $S_SymbolList, $S_SymbolName,
- X $S_ConcatenateSign, $S_SymbolList);
- X
- print "\nrequire 'naropa.pl';\n";
- X
- # $Marpa'debug = 1;
- &naropa($S_Action);
- X
- print "\n&naropa(&requireSymbol('start symbol'));\n";
- SHAR_EOF
- chmod 0644 milarepa.pl ||
- echo 'restore of milarepa.pl failed'
- Wc_c="`wc -c < 'milarepa.pl'`"
- test 4221 -eq "$Wc_c" ||
- echo 'milarepa.pl: original size 4221, current size' "$Wc_c"
- fi
- # ============= mrtest.pl ==============
- if test -f 'mrtest.pl' -a X"$1" != X"-c"; then
- echo 'x - skipping mrtest.pl (File already exists)'
- else
- echo 'x - extracting mrtest.pl (Text)'
- sed 's/^X//' << 'SHAR_EOF' > 'mrtest.pl' &&
- X
- require 'naropa.pl';
- # "start symbol ::= E;"
- &defaultAction(
- X &requireSymbol("start symbol")
- X ,&requireSymbol("E")
- );
- X
- # "E ::= T;"
- &defaultAction(
- X &requireSymbol("E")
- X ,&requireSymbol("T")
- );
- X
- # "E ::= T . plus sign . E;"
- &defaultAction(
- X &requireSymbol("E")
- X ,&requireSymbol("T")
- X ,&requireSymbol("plus sign")
- X ,&requireSymbol("E")
- );
- X
- # "T ::= F ;"
- &defaultAction(
- X &requireSymbol("T")
- X ,&requireSymbol("F")
- );
- X
- # "T ::= F . times sign . T;"
- &defaultAction(
- X &requireSymbol("T")
- X ,&requireSymbol("F")
- X ,&requireSymbol("times sign")
- X ,&requireSymbol("T")
- );
- X
- # "F ::= number;"
- &defaultAction(
- X &requireSymbol("F")
- X ,&requireSymbol("number")
- );
- X
- # "number ~ "\\d+";"
- &requirePattern('number', "\\d+");
- X
- # "plus sign ~ "\\+";"
- &requirePattern('plus sign', "\\+");
- X
- # "times sign ~ "\\*";"
- &requirePattern('times sign', "\\*");
- X
- X
- &naropa(&requireSymbol('start symbol'));
- SHAR_EOF
- chmod 0644 mrtest.pl ||
- echo 'restore of mrtest.pl failed'
- Wc_c="`wc -c < 'mrtest.pl'`"
- test 888 -eq "$Wc_c" ||
- echo 'mrtest.pl: original size 888, current size' "$Wc_c"
- fi
- # ============= mrtest2.pl ==============
- if test -f 'mrtest2.pl' -a X"$1" != X"-c"; then
- echo 'x - skipping mrtest2.pl (File already exists)'
- else
- echo 'x - extracting mrtest2.pl (Text)'
- sed 's/^X//' << 'SHAR_EOF' > 'mrtest2.pl' &&
- X
- require 'naropa.pl';
- # "start symbol ::= E "&v('E')";"
- ®isterAlternate(
- X "&v('E')"
- X ,&requireSymbol("start symbol")
- X ,&requireSymbol("E")
- );
- X
- # "E ::= T "&v('T')";"
- ®isterAlternate(
- X "&v('T')"
- X ,&requireSymbol("E")
- X ,&requireSymbol("T")
- );
- X
- # "E ::= T . plus sign . E "&v('T')+&v('E')";"
- ®isterAlternate(
- X "&v('T')+&v('E')"
- X ,&requireSymbol("E")
- X ,&requireSymbol("T")
- X ,&requireSymbol("plus sign")
- X ,&requireSymbol("E")
- );
- X
- # "T ::= F "&v('F')";"
- ®isterAlternate(
- X "&v('F')"
- X ,&requireSymbol("T")
- X ,&requireSymbol("F")
- );
- X
- # "T ::= F . times sign . T "&v('F')*&v('T')";"
- ®isterAlternate(
- X "&v('F')*&v('T')"
- X ,&requireSymbol("T")
- X ,&requireSymbol("F")
- X ,&requireSymbol("times sign")
- X ,&requireSymbol("T")
- );
- X
- # "F ::= number "&v('number')";"
- ®isterAlternate(
- X "&v('number')"
- X ,&requireSymbol("F")
- X ,&requireSymbol("number")
- );
- X
- # "number ~ "\\d+";"
- &requirePattern('number', "\\d+");
- X
- # "plus sign ~ "\\+";"
- &requirePattern('plus sign', "\\+");
- X
- # "times sign ~ "\\*";"
- &requirePattern('times sign', "\\*");
- X
- X
- &naropa(&requireSymbol('start symbol'));
- SHAR_EOF
- chmod 0644 mrtest2.pl ||
- echo 'restore of mrtest2.pl failed'
- Wc_c="`wc -c < 'mrtest2.pl'`"
- test 1101 -eq "$Wc_c" ||
- echo 'mrtest2.pl: original size 1101, current size' "$Wc_c"
- fi
- # ============= naropa.pl ==============
- if test -f 'naropa.pl' -a X"$1" != X"-c"; then
- echo 'x - skipping naropa.pl (File already exists)'
- else
- echo 'x - extracting naropa.pl (Text)'
- sed 's/^X//' << 'SHAR_EOF' > 'naropa.pl' &&
- require "tilopa.pl";
- X
- sub defaultAction
- {
- X local($lhs, @rhs) = @_;
- X local(@action);
- X push(@action, "join('','", &main'symbolName($lhs), "=',");
- X local(@rhsList) = ();
- X foreach $rhs (@rhs)
- X {
- X push(@rhsList, "'<'", "&value($rhs)", "'>'", "','");
- X }
- X pop(@rhsList); # pop extra comma
- X push(@action, join(",", @rhsList));
- X push(@action, ")");
- X &main'registerAlternate(join('', @action), $lhs, @rhs);
- }
- X
- sub naropa
- {
- X local($top) = @_;
- X local($line);
- X line: while ($line = <main'STDIN>)
- X {
- X chop $line;
- X while (length($line))
- X {
- X print "# \"$line\"\n";
- X local($before) = length($line);
- X $line = &main'lex($line);
- X local($after) = length($line);
- X local($ret);
- X if ($after == $before)
- X {
- X $ret = "Lexer failed!!!, line=\"$line\"";
- X $line = "";
- X } else
- X {
- X $ret = "Parse failed!!!"
- X unless $ret = &main'parse($top);
- X }
- X print "$ret\n";
- X &main'clearParse();
- X }
- X }
- }
- X
- 1;
- SHAR_EOF
- chmod 0644 naropa.pl ||
- echo 'restore of naropa.pl failed'
- Wc_c="`wc -c < 'naropa.pl'`"
- test 966 -eq "$Wc_c" ||
- echo 'naropa.pl: original size 966, current size' "$Wc_c"
- fi
- # ============= test.pl ==============
- if test -f 'test.pl' -a X"$1" != X"-c"; then
- echo 'x - skipping test.pl (File already exists)'
- else
- echo 'x - extracting test.pl (Text)'
- sed 's/^X//' << 'SHAR_EOF' > 'test.pl' &&
- require "marpa.pl";
- X
- $SID_E = ®isterSymbol("E");
- $SID_Plus = ®isterSymbol("+", "\+");
- $SID_T = ®isterSymbol("T");
- $SID_Times = ®isterSymbol("*", "\*");
- $SID_F = ®isterSymbol("F");
- $SID_a = ®isterSymbol("a", "\d*");
- X
- ®isterAlternate("'E(' . &value($SID_T) . ',' . " .
- X " &value($SID_Plus) . ',' . &value($SID_E) . ')'",
- X $SID_E, $SID_T, $SID_Plus, $SID_E);
- ®isterAlternate("'E(' . &value($SID_T) . ')'",
- X $SID_E, $SID_T);
- ®isterAlternate("'T(' . &value($SID_F) . ',' . " .
- X " &value($SID_Times) . ',' &value($SID_T) . ')'",
- X $SID_T, $SID_F, $SID_Times, $SID_T);
- ®isterAlternate("'T(' . &value($SID_F) . ')'",
- X $SID_T, $SID_F);
- ®isterAlternate("'F(' . &value($SID_a) . ')'",
- X $SID_F, $SID_a);
- X
- ®isterToken("a", $SID_a);
- ®isterToken("+", $SID_Plus);
- ®isterToken("a", $SID_a);
- X
- print &parse($SID_E);
- print "\n";
- SHAR_EOF
- chmod 0644 test.pl ||
- echo 'restore of test.pl failed'
- Wc_c="`wc -c < 'test.pl'`"
- test 865 -eq "$Wc_c" ||
- echo 'test.pl: original size 865, current size' "$Wc_c"
- fi
- # ============= test2.pl ==============
- if test -f 'test2.pl' -a X"$1" != X"-c"; then
- echo 'x - skipping test2.pl (File already exists)'
- else
- echo 'x - extracting test2.pl (Text)'
- sed 's/^X//' << 'SHAR_EOF' > 'test2.pl' &&
- require "marpa.pl";
- X
- $SID_E = ®isterSymbol("E");
- $SID_Plus = ®isterSymbol("+");
- $SID_T = ®isterSymbol("T");
- $SID_Times = ®isterSymbol("*");
- $SID_F = ®isterSymbol("F");
- $SID_a = ®isterSymbol("a");
- X
- ®isterAlternate("&value($SID_T)+&value($SID_E)",
- X $SID_E, $SID_T, $SID_Plus, $SID_E);
- ®isterAlternate("&value($SID_T)",
- X $SID_E, $SID_T);
- ®isterAlternate("&value($SID_F)*&value($SID_T)",
- X $SID_T, $SID_F, $SID_Times, $SID_T);
- ®isterAlternate("&value($SID_F)",
- X $SID_T, $SID_F);
- ®isterAlternate("&value($SID_a)",
- X $SID_F, $SID_a);
- X
- ®isterToken("5", $SID_a);
- ®isterToken("+", $SID_Plus);
- ®isterToken("6", $SID_a);
- X
- print &parse($SID_E);
- print "\n";
- SHAR_EOF
- chmod 0644 test2.pl ||
- echo 'restore of test2.pl failed'
- Wc_c="`wc -c < 'test2.pl'`"
- test 700 -eq "$Wc_c" ||
- echo 'test2.pl: original size 700, current size' "$Wc_c"
- fi
- # ============= tilopa.pl ==============
- if test -f 'tilopa.pl' -a X"$1" != X"-c"; then
- echo 'x - skipping tilopa.pl (File already exists)'
- else
- echo 'x - extracting tilopa.pl (Text)'
- sed 's/^X//' << 'SHAR_EOF' > 'tilopa.pl' &&
- require "marpa.pl";
- X
- package Tilopa;
- X
- $debug = 0;
- X
- sub main'lex
- {
- X local($input) = @_;
- X
- X local($key);
- X inputpass: while (length($input))
- X {
- X study $input;
- X pattern: foreach $key (keys %Marpa'symbolPattern)
- X {
- X local($grammar, $sid) = split($;, $key);
- X next pattern unless $grammar == $Marpa'currentGrammar;
- X local($pattern) = $Marpa'symbolPattern{$key};
- X print "Trying pattern \"$pattern\"\n" if $debug;
- X next pattern unless $input =~ /^$pattern/;
- X $input = $';
- X next inputpass if $Marpa'discard{$grammar, $sid};
- X last inputpass if $Marpa'terminator{$grammar, $sid};
- X &main'registerToken($&, $sid);
- X print "Registering $& as $Marpa'symbolName{$grammar, $sid}\n"
- X if $debug;
- X next inputpass;
- X }
- X last inputpass;
- X }
- X return $input;
- }
- X
- 1;
- package main;
- SHAR_EOF
- chmod 0644 tilopa.pl ||
- echo 'restore of tilopa.pl failed'
- Wc_c="`wc -c < 'tilopa.pl'`"
- test 813 -eq "$Wc_c" ||
- echo 'tilopa.pl: original size 813, current size' "$Wc_c"
- fi
- # ============= test.mr ==============
- if test -f 'test.mr' -a X"$1" != X"-c"; then
- echo 'x - skipping test.mr (File already exists)'
- else
- echo 'x - extracting test.mr (Text)'
- sed 's/^X//' << 'SHAR_EOF' > 'test.mr' &&
- start symbol ::= E;
- E ::= T;
- E ::= T . plus sign . E;
- T ::= F ;
- T ::= F . times sign . T;
- F ::= number;
- number ~ "\\d+";
- plus sign ~ "\\+";
- times sign ~ "\\*";
- SHAR_EOF
- chmod 0644 test.mr ||
- echo 'restore of test.mr failed'
- Wc_c="`wc -c < 'test.mr'`"
- test 160 -eq "$Wc_c" ||
- echo 'test.mr: original size 160, current size' "$Wc_c"
- fi
- # ============= test2.mr ==============
- if test -f 'test2.mr' -a X"$1" != X"-c"; then
- echo 'x - skipping test2.mr (File already exists)'
- else
- echo 'x - extracting test2.mr (Text)'
- sed 's/^X//' << 'SHAR_EOF' > 'test2.mr' &&
- start symbol ::= E "&v('E')";
- E ::= T "&v('T')";
- E ::= T . plus sign . E "&v('T')+&v('E')";
- T ::= F "&v('F')";
- T ::= F . times sign . T "&v('F')*&v('T')";
- F ::= number "&v('number')";
- number ~ "\\d+";
- plus sign ~ "\\+";
- times sign ~ "\\*";
- SHAR_EOF
- chmod 0644 test2.mr ||
- echo 'restore of test2.mr failed'
- Wc_c="`wc -c < 'test2.mr'`"
- test 244 -eq "$Wc_c" ||
- echo 'test2.mr: original size 244, current size' "$Wc_c"
- fi
- exit 0
- --
- Jeffrey Kegler, Independent UNIX Consultant, Algorists, Inc.
- jeffrey@algor2.ALGORISTS.COM or uunet!algor2!jeffrey
- 137 E Fremont AVE #122, Sunnyvale CA 94087
- "Nitwit ideas are for emergencies. You use them when you've got
- nothing else to try. If they work, they go in the Book. Otherwise
- you follow the Book, which is largely a collection of nitwit ideas
- that worked." from the _Mote in God's Eye_ by Larry Niven and Jerry
- Pournelle
-