Mega Top 1

home *** CD-ROM | disk | FTP | other *** search

/ Mega Top 1 / os2_top1.zip / os2_top1 / APPS / TEKST / SPIDER / MASTER / SPIDER.WEB < prev next >

Wrap

Text File | 1990-01-23 | 85.2 KB | 2,577 lines

% Copyright 1989 by Norman Ramsey, Odyssey Research Associates % To be used for research purposes only % For more information, see file COPYRIGHT in the parent directory \message{OK, entering \string\batchmode...} \batchmode \let\RA\rightarrow \def\vert{{\tt\char'174}} \def\pb{$\.|\ldots\.|$} % C brackets (|...|) \def\title{SPIDER} \def\topofcontents{\null\vfill \titlefalse % include headline on the contents page \def\rheader{\hfil} \centerline{\titlefont The {\ttitlefont SPIDER} processor} \vfill} \def\syntax##1{\leavevmode\hbox{$\langle\hbox{\sl ##1\/}\rangle$}} \def\produces{\leavevmode\hbox{${}::={}$}} \def\opt##1{$[$##1$]$} #*={\tt SPIDER} proper. #*Introduction. This is an AWK program designed to read a description of a programming language and to write out the language-dependent parts of WEB. In the main, the description of a programming language is a list of all the tokens of the language (together with various facts about them) and a grammar for prettyprinting code fragments written in that language. The ``Spider User's Guide'' describes how to use {\tt SPIDER} to construct a {\tt WEB} system for the ALGOL-like language of your choice. ({\tt SPIDER} may be able to handle LISP and Miranda and other strange languages; the experiment hasn't been tried. The unusual lexical requirements of FORTRAN are probably beyond it, at least until the lexical analysis is modernized.) # The outline of the program is fairly straightforward. We use |exitcode| throughout to monitor error status. If we were more Knuthlike, we would have a |history| variable with values of |spotless|, and so on. This will have to wait until we get macros back into \.{TANGLE}. We put the pattern-action statement for productions last, because in case of a conflict like \.{token~-->~...}, we want the interpretation as {\tt token} to win out over the interpretation as a prodution. #u#1 BEGIN { #<Set initial values#> exitcode=0 } #@ #<Ignore comments and blank lines#> #<Pattern-action statements#> #<Production pattern-action statement#> #<Default action for patterns we don't recognize#> #@ END { #<Write out all of the WEB code#> print "Writing out lists" > logfile #<Write lists of everything#> #<Write statistics for this \.{SPIDER}#> #<Check for errors at the very end#> if (exitcode != 0) { exit exitcode } } # There are a couple of actions we may want to perform with just about any command. If a command fails, we move on to the next, but we remember the fault so we can complain at the end. #<Punt this command#>= exitcode=-1 next # Throughout \.{SPIDER} we always use the variable |i| to step through the fields of a command, so that |$i| is always the next field of interest. When we thinik we have finished a command, we will always want to check to make sure there are no unexamined fields left over. #<Check that we used everything#>= if (i<=NF) { print "Error: leftover fields", $i, "... on line", NR #<Punt...#> } # To \.{SPIDER}, any line beginning with |"## "| is a comment. \.{SPIDER} also ignores blank lines. #<Ignore comments...#>= #=/^##|^ *$/#> { ## comments, blank lines print $0 > logfile next } # But, if \.{SPIDER} encounters a line we don't recognize, it complains. #<Default act...#>= { print "Warning: I don't know what to do with this line:" print " ", $0 print "Warning: I don't know what to do with this line:" > logfile print " ", $0 > logfile } #*1Files written by {\tt SPIDER}. {\tt SPIDER} writes output to a number of files. Because 4.3~BSD AWK is limited in the number of files it can write at one time, there is substantial overlap. Here is a table: \noindent\halign{\vrule height10pt depth3.5pt width0pt \it##\hfil\tabskip=1em&\tt##\hfil&\tabskip=0pt \hsize=4in\vtop{\noindent##\strut\par}\cr \noalign{\medskip} \bf Internal Name&\bf External Name&\omit\bf Description\hfil\cr \noalign{\smallskip} categoryfile&names.unsorted& names of categories, to be checked for duplicates by {\tt nodups.awk} \cr cycles&cycle.test& potential cycles, to be checked by {\tt cycle.web} \cr grammarfile&grammar.web& grammar; included in {\tt weave.web} \cr ilkfile&names.unsorted& names of ilks, to be checked for duplicates by {\tt nodups.awk} \cr logfile&spider.slog& log file, to be consulted when things go wrong \cr macrofile&*web.tex& language specific macro file, {\tt\string\input} by all \TeX{} files created by {\tt weave.web} \cr productions&productions.list& list of the productions (numbered) used in debugging \.{WEAVE} \cr reserved&scraps.web& code for converting the reserved word to scraps. {\tt scraps.web} is included by {\tt weave.web} \cr scrapfile&scraps.web& code for converting tokens to scraps. {\tt scraps.web} is included by {\tt weave.web} \cr tlang&outtoks.web& Information about what language we're webbing. {\tt outtoks.web} is included by {\tt tangle.web}. \cr tokennamefile&names.unsorted& list of names of all the tokens, to be checked by {\tt nodups.awk} \cr translationfile&trans\_keys.unsorted& list of names of all the translation keywords. Checked for duplicates by {\tt nodups.awk}, and also for recognizability by {\tt transcheck.awk}. \cr ttokfile&outtoks.web& This is the tokenization code for {\tt TANGLE}. \cr wlang&scraps.web& Information about what language we're webbing, {\tt scraps.web} is included by {\tt weave.web}. \cr } # Every action writes information to a log file. This log file can be used to check up on what happened. #<Set initial...#>= logfile = "spider.slog" # Here we write the names of the key words used in translations. #<Set initi...#>= translationfile = "trans_keys.unsorted" # We write tokens out to two files: |scrapfile| for \.{WEAVE}, and |ttokfile| for \.{TANGLE}. #<Set init...#>= scrapfile = "scraps.web" print "@*Scrap code generated by {\\tt SPIDER}." > scrapfile ttokfile = "outtoks.web" print "@*Token code generated by {\\tt SPIDER}." > scrapfile # The reserved word stuff gets a file of its own, or it would in an ideal world. #<Set init...#>= reserved = "scraps.web" ## use same file; not enough files # We'll also end up writing a list of token names, for name checking purposes. #<Set initial...#>= tokennamefile = "names.unsorted" ## cut down on number of output files # We also write out every ilk, so we'll be able to look for name clashes with translations and so on. #<Set init...#>= ilkfile = "names.unsorted" ## cut down on number of output files # We also write all the category names to a separate file, so we can check for duplicates later. #<Set init...#>= categoryfile = "names.unsorted" ## cut down on number of output files # We use a special file to write grammar information: #<Set init...#>= grammarfile = "grammar.web" print "@*Grammar code generated by {\\tt SPIDER}." > grammarfile # We use the language information to write banners and macro information. We combine this with other stuff because AWK can't handle more than 10 output files. #<Set initial...#>= tlang = "outtoks.web" ## same as ttokfile wlang = "scraps.web" ## same as scrapfile # We will write a list of the successfully parsed productions to a separate file. #<Set init...#>= productions = "productions.list" # These productions will get fed to {\tt cycle.awk}, which looks for cycles. #<Set initial...#>= cycles = "cycle.test" #*Processing translations. Translations tell \.{WEAVE} or \.{TANGLE} what to write out in particular circumstances (e.g.~after scanning a particular token, or when firing some production). They are described at some length in the ``\.{SPIDER} User's Guide.'' Translations are enclosed in angle brackets and separated by dashes. They can contain key words, digits, the self marker~`{\tt*}', or quoted strings. Since we can't put a space or dash into strings, we allow the use of key words |space| and |dash| to stand for those symbols. #^space#> #^dash#> Other key words are interpreted by \.{WEAVE} as prettyprinting instructions: \yskip\hang |break_space| denotes an optional line break or an en space; \yskip\hang |force| denotes a line break; \yskip\hang |big_force| denotes a line break with additional vertical space; \yskip\hang |opt| denotes an optional line break (with the continuation line indented two ems with respect to the normal starting position)---this code is followed by an integer |n|, and the break will occur with penalty $10n$; \yskip\hang |backup| denotes a backspace of one em; \yskip\hang |cancel| obliterates any |break_space| or |force| or |big_force| tokens that immediately precede or follow it and also cancels any |backup| tokens that follow it; \yskip\hang |indent| causes future lines to be indented one more em; \yskip\hang |outdent| causes future lines to be indented one less em. \yskip\hang |math_rel|, |math_bin|, and |math_op| will be translated into \.{\\mathrel\{}, \.{\\mathbin\{}, and \.{\\mathop\{}, respectively. \yskip\noindent All of these tokens are removed from the \TeX\ output that comes from programming language text between \pb\ signs; |break_space| and |force| and |big_force| become single spaces in this mode. %The translation of other %program texts results in \TeX\ %control sequences \.{\\1}, \.{\\2}, %\.{\\3}, \.{\\4}, \.{\\5}, \.{\\6}, %\.{\\7} corresponding respectively to %|indent|, |outdent|, |opt|, %|backup|, |break_space|, |force|, and %|big_force|. However, A sequence of consecutive `\.\ ', |break_space|, |force|, and/or |big_force| tokens is first replaced by a single token (the maximum of the given ones). %Some Other control sequences in the \TeX\ output will be %`\.{\\\\\{}$\,\ldots\,$\.\}' %surrounding identifiers, `\.{\\\&\{}$\,\ldots\,$\.\}' surrounding %reserved words, `\.{\\.\{}$\,\ldots\,$\.\}' surrounding strings, %`\.{\\C\{}$\,\ldots\,$\.\}$\,$|force|' surrounding comments, and %`\.{\\X$n$:}$\,\ldots\,$\.{\\X}' surrounding module names, where %|n| is the module number. # We write out the names of all the key words used translations, so we can check that \.{WEAVE} can be expected to recognize them. This helps us catch the problem early if a translation given is not one of the above (as opposed to, say, having the C~compiler fail to compile \.{WEAVE}). #<Write lists...#>= for (t in translation_keywords) { print t > translationfile } # #<Write stat...#>= for (t in translation_keywords) { num_of_translation_keywords++ } printf "You used %d translation keywords.\n", \ num_of_translation_keywords > logfile printf "You used %d translation keywords.\n", num_of_translation_keywords # If the macro facility worked right, we would use the following patterns to recognize items as they occur: #d cat_pattern = #=/[a-zA-Z][a-zA-Z_]*/#> #d trans_pattern = #=/<(([0-9]|[a-zA-Z][a-zA-Z_]*|"([^"]*\\")*[^"]*"|\*)-)*#>#& #=([0-9]|[a-zA-Z][a-zA-Z_]*|"([^"]*\\")*[^"]*"|\*)>/#> # Here's where we swallow a translation and spit out the \.{WEAVE} code to handle that translation. Since AWK has no functions, we define this as a module. When we're appending a key word {\it in the process of creating a scrap from a token}, we use |small_app| in preference to |app|, because |app|'s cleverness about mathness and dollar signs only works when reducing existing scraps, not when creating scraps from tokens. We'll expect the variable |append_keyword| to be set to either |"small_app"| or |"app"|. #<Take translation from |transstring| and write corresponding \.{WEAVE} code to |outstring|, using |selfstring| as translation of |"<*>"|#>= temp = substr(transstring,2,length(transstring)-2) ## kills awk bug trcnt = split(temp,trtok,"-") outstring = "" for (tridx=1;tridx<=trcnt;tridx++) { alternate=trtok[tridx] #<Convert |"space"| and |"dash"|#> if (alternate ~ #=/^[0-9]$/#>) { ## digit temp = sprintf("\tapp_str(\"%s\");\n",alternate) outstring=outstring temp } else if (alternate ~ #=/^[a-zA-Z_]+$/#>) { ## key word translation_keywords[alternate]=1 ## remember temp = sprintf("\t%s(%s);\n",append_keyword,alternate) ##Call |app| or |small_app| depending whether we're reducing or creating scraps outstring=outstring temp } else if (alternate ~ #=/^\"([^"]*\\\")*[^"]*\"$/#>) { ## string temp = sprintf("\tapp_str(%s);\n",alternate) outstring=outstring temp } else if (alternate ~ #=/^\*$/#>) { ## self marker #<If |selfstring==""|, complain loudly#> outstring=outstring selfstring } else { print "Bogus translation", wherestring exitcode = -1 } } # Here we convert the key words |space| and |dash| to strings. We quote the strings, to be sure that they are handled by the string mechanism. #<Convert |"space"|...#>= if (alternate=="space") { alternate="\" \"" } else if (alternate=="dash") { alternate="\"-\"" } # There are some places (notably in productions) where the translation |"<*>"| makes no sense. In this case the caller sets |selfstring=""|, and we complain. #<If |selfstring==""|, complain...#>= if (selfstring=="") { print "Translation \"<*>\" makes no sense", wherestring exitcode = -1 } # There are times when we may want to convert a translation directly into a quoted string, usually for \.{TANGLE}'s benefit. Here, the only things allowed are quoted strings and |space| and |dash|. We peel off quote marks and concatenate things together, and then we put the quote marks back on at the end. #<Convert restricted translation in |transstring| to quoted string in |outstring|#>= temp = substr(transstring,2,length(transstring)-2) ## kills awk bug trcnt = split(temp,trtok,"-") outstring = "" for (tridx=1;tridx<=trcnt;tridx++) { alternate=trtok[tridx] #<Convert |"space"| and |"dash"|#> if (alternate ~ #=/^[0-9]$/#>) { ## digit print "Digit not allowed in restricted translation", wherestring exitcode = -1 } else if (alternate ~ #=/^[a-zA-Z_]+$/#>) { ## key word print "Key word not allowed in restricted translation", wherestring exitcode = -1 } else if (alternate ~ #=/^\"([^"]*\\\")*[^"]*\"$/#>) { ## string temp = substr(alternate,2,length(alternate)-2) ## strip quotes outstring=outstring temp } else if (alternate ~ #=/^\*$/#>) { ## self marker print "<*> not allowed in restricted translation", wherestring exitcode = -1 } else { print "Bogus restricted translation", wherestring exitcode = -1 } } outstring = "\"" outstring "\"" ## put quotes back on |outstring| #*Tokens. Tokens are pretty complicated. Each token has a string by which we recognize it in the input. This string is what immediately follows the |token| command. Then, there's another string that tells \.{TANGLE} how to write out the token. Finally, it has a category and a translation (so we can make a scrap out of it), and a mathness (to tell us whether it has to be in math mode, horizontal mode, or either). The \.{translation} and \.{mathness} have defaults. #*2Scanning for token descriptions. This module is used everywhere we must scan a line for token descriptions. #<Scan this line from |start_place| to finish, looking for \.{translation}$\ldots$ and putting results in |this_translation|$\ldots$#>= for (i=start_place;i<NF;) { if ($i=="tangleto") { ## for \.{TANGLE} i++ this_tangleto=$i i++ } else if ($i=="translation") { ## for \.{WEAVE} i++ this_translation=$i i++ } else if ($i=="mathness") { ## for \.{WEAVE} i++ this_mathness=$i i++ } else if ($i=="category") { ## for \.{WEAVE} i++ this_category=$i categories[$i]=1 i++ } else if ($i=="name") { ## for debugging i++ this_name="SP_" $i ##OK, so it's hacking... i++ } else { print "Error: unrecognized token description", $i, "on line", NR #<Punt...#> } } #<Check that we used everything#> # We check for the presence or absence of certain empty strings after scanning. #<Make sure |this_name| is empty#>= if (this_name != "") { print "Error: name doesn't apply on line", NR #<Punt...#> } # #<Make sure |this_tangleto| is empty#>= if (this_tangleto != "") { print "Error: tangleto doesn't apply on line", NR #<Punt...#> } # #<Make sure |this_category| is empty#>= if (this_category != "") { print "Error: category doesn't apply on line", NR #<Punt...#> } # #<Make sure |this_translation| is empty#>= if (this_translation != "") { print "Error: translation doesn't apply on line", NR #<Punt...#> } # #<Make sure |this_category| is not empty#>= if (this_category == "") { print "Error: you must give a category on line", NR #<Punt...#> } #*1Setting the default token descriptions. \.{SPIDER} maintains default information about {\em mathness} and {\em translation}, so these can be omitted from token descriptions. We can change the operative defaults at any time by using a |"default"| command. It too, scans for keywords, using the standard scanning module #<Pattern-action...#>= #=/^default /#> { print "Setting defaults..." > logfile start_place=2 #<Set |this_mathness| etcetera to the defaults and those with no defaults to |""|#> #<Scan this line from |start_place| to finish, looking for \.{translation}$\ldots$ and putting results in |this_translation|$\ldots$#> #<Make sure |this_name| is empty#> #<Make sure |this_category| is empty#> default_translation=this_translation default_mathness=this_mathness #@ print "\tdefault translation is", default_translation > logfile print "\tdefault mathness is", default_mathness > logfile #@ next } # Normally, we will set all quantities to the defaults before scanning: #<Set |this_mathness| etcetera to the defaults and those with no defaults to |""|#>= this_translation=default_translation this_mathness=default_mathness this_name="" this_category="" this_tangleto="" # When \.{SPIDER} starts up, the defaults are already set: #<Set initi...#>= default_translation="<*>" default_mathness="maybe" #*1Recognizing token designators. Let's begin by discussing the way \.{WEAVE} and \.{TANGLE} represent tokens internally. \.{WEAVE} and \.{TANGLE} process tokens in a two-step process. Both read the token from the input using |get_next|, which returns a unique eight-bit number representing the token. Generally printable ASCII characters represent themselves, and other tokens get numbers in the unprintable range. \.{TANGLE} assigns ranges to some tokens ahead of time: |string| is 2, |identifier| is #'202, and so on. Tokens that we introduce to \.{TANGLE} must have numbers between #'13 and #'37 inclusive. Rather than work with eight-bit numbers themselves, we use names for the tokens. This makes \.{WEAVE} and \.{TANGLE} easier to debug when things go wrong. In \.{WEAVE}, the category, mathness, and translation are all attached to a scrap based on the eight-bit number returned by |get_next|, and this is done at a later time. In \.{TANGLE}, characters are written to the output file(s) based on the token code, which can be either eight bits for simple tokens or sixteen for identifiers and things. Our mission in this section will be to read in all the token information from the {\tt token} command, and to create the names and numbers used by \.{WEAVE} and \.{TANGLE} to represent the tokens. In the next section we will write the code that processes the tokens for both \.{WEAVE} and \.{TANGLE} (lexical analysis in |get_next|, and subsequent processing elsewhere). You will pardon us if things get a bit tedious. # The {\tt token} command is used to specify tokens that are not reserved words. Reserved word tokens get special treatment all their own. #<Pattern...#>= #=/^token /#> { print "Token", $2 > logfile if ($2=="identifier") { #<Process identifier token description#> } else if ($2=="number") { #<Process numeric token description#> } else if ($2=="newline") { #<Process newline token description#> } else if ($2=="pseudo_semi") { #<Process |pseudo_semi| token description#> } else if ($2 ~ #=/[a-zA-ZA-Z0-9]+/#>) { ## we recognize no other names print "Error: unknown token species:", $2 #<Punt this command#> } else { #<Process a non-alphanumeric token description#> } categories[this_category]=1 ## is this right? #^questions#> next } # Identifiers, numbers (and string literals), newlines, and the special token \.{pseduo\_semi} are predefined. #<Process identifier token description#>= #<Set |this_mathness| etcetera to the defaults and those with no defaults to |""| #> this_translation="" start_place=3 #<Scan this line from |start_place| to finish, looking for \.{translation}$\ldots$ and putting results in |this_translation|$\ldots$#> #<Make sure |this_name| is empty#> #<Make sure |this_tangleto| is empty#> #<Make sure |this_category| is not empty#> #<Make sure |this_translation| is empty#> id_category=this_category id_mathness=this_mathness # We have yet to implement a separate procedure for numerics and strings! #<Process numeric token description#>= print "Warning: numeric constants and strings are",\ "identified in this WEAVE." print "Warning: numeric constants and strings are",\ "identified in this WEAVE." > logfile #<Set |this_mathness| etcetera to the defaults and those with no defaults to |""| #> this_translation="" start_place=3 #<Scan this line from |start_place| to finish, looking for \.{translation}$\ldots$ and putting results in |this_translation|$\ldots$#> #<Make sure |this_name| is empty#> #<Make sure |this_tangleto| is empty#> #<Make sure |this_category| is not empty#> #<Make sure |this_translation| is empty#> number_category=this_category number_mathness=this_mathness # #<Process newline token description#>= #<Set |this_mathness| etcetera to the defaults and those with no defaults to |""| #> start_place=3 #<Scan this line from |start_place| to finish, looking for \.{translation}$\ldots$ and putting results in |this_translation|$\ldots$#> #<Make sure |this_name| is empty#> #<Make sure |this_tangleto| is empty#> #<Make sure |this_category| is not empty#> newline_category=this_category newline_mathness=this_mathness newline_translation=this_translation # #<Process |pseudo_semi| token description#>= #<Set |this_mathness| etcetera to the defaults and those with no defaults to |""| #> start_place=3 #<Scan this line from |start_place| to finish, looking for \.{translation}$\ldots$ and putting results in |this_translation|$\ldots$#> #<Make sure |this_name| is empty#> #<Make sure |this_tangleto| is empty#> #<Make sure |this_category| is not empty#> pseudo_semi_category=this_category pseudo_semi_mathness=this_mathness pseudo_semi_translation=this_translation # Here is where things get a bit more interesting; we have to consider all the other (non-reserved-word) tokens, and find a way to convert them to \.{WEAVE} and \.{TANGLE}'s internal form. We take single characters straight, except for those that must be escaped one way or another. For multicharacter tokens, we have to invent a name and a number, which process we will describe below. Tokens have a zillion attributes: not just category, translation, and their friends, but things like internal representations, the length of the input string, you name it. We remember the length of the longest token in the system, because when we go to recognize tokens we will look for the longest first and then on down. We maintain that length at the very end here. #<Process a non-alphanumeric token description#>= this_string=$2 #<Translate |"{space}"| to |" "| in |this_string|#> $2 = this_string #<Set |tokenname[$2]|, |tokenlength[$2]|, and, for long tokens, set |tokentest[$2]| and |tokennumbers[$2]|#> if (tokens[$2]!="") { print "Warning: token", $2, "defined twice" } tokens[$2]=1 ## remember this token #<Set attributes of token |$2|#> #<Make sure token |$2| has a number if it needs one#> #<Update record of maximum token length#> # This code represents and undocumented feature. We should replace it by allowing restricted translations in |$2|, the then documenting it. When doing this, we'll have to match the full |trans_pattern| in all its glory; A mere |#=/<.*>/#>| won't do. #<Translate |"{space}"| to |" "| in |this_string|#>= old_string = this_string this_string = "" ## Invariant: |this_string old_string| corresponds to result, and ## |"{space}"| is translated in |this_string| but not |old_string| idx = index(old_string,"{space}") while (idx != 0) { temp =substr(old_string,1,idx-1) this_string = this_string temp " " old_string = substr(old_string,idx+7) idx = index(old_string,"{space}") } this_string = this_string old_string # Tokens need an internal eight-bit representation. For single characters (which are assumed to be printable), we use the ASCII code as the internal representation. Multicharacter tokens will be assigned a name and a number. (The names may be specified by the user or generated by \.{SPIDER}.) Unfortunately the numbers for \.{WEAVE} and \.{TANGLE} have to be different (the reasons will only depress you). We assign \.{WEAVE} numbers by starting numbering from |highesttoken|, and working our way down. At the moment |hisghesttoken==200|, and I can't remember whether 200 is a ``magic number'' or not, so you'd better assume that it is. We get the tpoken numbers for \.{TANGLE} by subtracting an offset, as you'll see later. #<Set initial...#>= highesttoken=200 ## highest numbered token tokennumber=highesttoken # At the end we check to make sure we haven't used up too many numbers for tokens. \.{WEAVE} token numbers must be |>=127|. #<Check for errors at the...#>= if (tokennumber<127) { print "Error: too many token names for WEAVE --- over by",\ 127-tokennumber exitcode=-1 } # \.{TANGLE} tokens must be between #'13 and #'37 inclusive. We add three to the number because \.{TANGLE} has special definitions for the three tokens taken off the top. #<Check for errors...#>= if (highesttoken-tokennumber > #'37-(#'13-1)+3) { \ ## number of tokens in |#'13|--|#'37|, plus 3 print "Error: too many token names for TANGLE --- over by",\ highesttoken-tokennumber - (#'37-(#'13-1)+3) exitcode=-1 } # The token name is what \.{WEAVE} and \.{TANGLE} will use internally to refer to the token's internal representation as an eight-bit code. We use names instead of using the numbers directly in the vague hope that it will make \.{WEAVE} and \.{TANGLE} easier to debug when something goes wrong. For multi-character tokens, the name will be a \.{WEB} macro that is defined to be equal to the token's eight-bit code. If the token is a single character, its ``name'' will be that character, quoted with single quotes. The single-character tokens \.{@}, \.{\\}, and \.{'} require special handling, since they have to be escaped in some way to be quoted. Once we've computed the name, we put it in |tokenname[$2]|. #<Set |tokenname[$2]|, |tokenlength[$2]|, and, for long tokens, set |tokentest[$2]| and |tokennumbers[$2]|#>= if ($2=="@") { $2="@@" tokenname[$2]="'@@'" tokenlength[$2]=1 } else if ($2=="'" || $2 == "\\") { $2="\\" $2 tokenname[$2]="'" $2 "'" tokenlength[$2]=1 } else if (length($2)>1) { #<Handle multicharacter tokens#> } else { temp = sprintf("'%s'", $2) tokenname[$2] = temp tokenlength[$2]=1 } # For the long tokens, we generate a name by which we'll refer to the token. That name will actually be defined to be a number, which we'll take to be the current value of |tokennumber|. We'll write in |tokentest[$2]| the C~code used to recognize that token, and in |tokenlength[$2]| we'll leave that token's length. (The length is used both to find long tokens before short ones, and to avoid finding long ``tokens'' that actually go beyond the end of the line.) #<Handle multicharacter tokens#>= tokenname[$2]="SP_gen_token_" tokennumber tokennumbers[$2]=tokennumber tokennumber-- ## figure out how to recognize the token temp = sprintf( "strncmp(\"%s\",loc-1,%d)==0", $2, length($2)) tokentest[$2]=temp tokenlength[$2]=length($2) # The setting of attributes is as for all tokens: #<Set attributes of token |$2|#>= #<Set |this_mathness| etcetera to the defaults and those with no defaults to |""| #> this_name=tokenname[$2] start_place=3 #<Scan this line from |start_place| to finish, looking for \.{translation}$\ldots$ and putting results in |this_translation|$\ldots$#> #<Make sure |this_category| is not empty#> tokencategory[$2]=this_category tokenmathness[$2]=this_mathness tokentranslation[$2]=this_translation tokenname[$2]=this_name tokentangleto[$2]=this_tangleto # We have to remember the length of the longest token so we can recognize long tokens before short ones. #<Update record of maximum token length#>= temp = tokenlength[$2] if (temp > maxtokenlength) { maxtokenlength=temp } # We're paranoid. #<Make sure token |$2| has a number if it needs one#>= if (tokenlength[$2]>1 && tokennumbers[$2]=="") { print "This can't happen: token", $2, "is long", \ "but has no number" exitcode = -1 } #*1Writing {\tt WEB}'s lexical analysis code. The token recognition problem is the same for \.{WEAVE} and \.{TANGLE}. Both have routines called |get_next| that recognize the tokens on input. Most of |get_next| is prefabricated (and the same in both \.{WEAVE} and \.{TANGLE}), but we have to put in the part that recognizes multi-character non-alphanumeric tokens. We write the same code to both \.{WEAVE} and \.{TANGLE}. #<Write out...#>= tempfile = scrapfile #<Write token recognition code to |tempfile|#> tempfile = ttokfile #<Write token recognition code to |tempfile|#> # This is how we do it. #<Write token recognition code to |tempfile|#>= print "@ Here we input tokens of more than one character" > tempfile print "@<Compress two-symbol operator@>=" > tempfile #<Look for multicharacter tokens, starting with the longest, and working down#> # We look for long tokens, then shorter, and so on. We have to make sure we don't look beyond the end of a line. #<Look for multicharacter tokens, starting with the longest, and working down#>= for (len=maxtokenlength; len>=2; len--) { printf "if (loc+%d<=limit) {\n", len-1 > tempfile #<Check for tokens in |tokentest| of length |len|#> printf "\t}\n" > tempfile } #<Make sure there are no tokens of length 1 in |tokentest|#> # #<Check for tokens in |tokentest| of length |len|#>= notfirst=0 for (t in tokentest) { if (tokenlength[t]==len) { printf "\t" > tempfile if (notfirst==1) { printf "else " > tempfile } notfirst=1 printf "if (%s) {\n", tokentest[t] > tempfile printf "\t\tloc += %d;\n", len-1 > tempfile printf "\t\treturn %s;\n\t\t}\n", tokenname[t] > tempfile } } # #<Make sure there are no tokens of length 1 in |tokentest|#>= for (t in tokentest) { if (tokenlength[t]==1) { print "This can't happen: token", t, "is of length 1 but", \ "it has a test" exitcode=-1 } } #*1Writing out {\tt WEAVE}'s token-to-scrap code. Here is where we write the code that converts an already-recognized token (from |get_next|) into a scrap. There are several different kinds of tokens, and each requires a slightly different treatment. Will write out the code for the different species one at a time. #<Write out all...#>= print "Writing out predefined scraps" > logfile #<Write code for identifier scrap#> #<Write code for string or constant scrap#> #<Write code for newline scrap#> #<Write code for |pseudo_semi| scrap#> print "Writing out token scraps" > logfile #<Write code for ordinary token scraps#> # This is how we write out the information for the identifier. #<Write code for identifier scrap#>= if (id_category != "") { print "@ @<Append an identifier scrap@>=" > scrapfile print "p=id_lookup(id_first, id_loc,normal);" > scrapfile print "if (p->ilk==normal) {" > scrapfile print " small_app(id_flag+p-name_dir);" > scrapfile printf " app_scrap(SP_%s,%s_math);", \ id_category, id_mathness > scrapfile appended[id_category]=1 print " /* not a reserved word */" > scrapfile print "}" > scrapfile print "else if reserved(p) {" > scrapfile print "@<Decide on reserved word scraps@>;" > scrapfile print "}" > scrapfile print "else {" > scrapfile print " err_print(\"! Identifier with unmentioned ilk\");" > scrapfile print "@.Identifier with unmentioned ilk@>" > scrapfile print "}" > scrapfile } else { print "Error: I don't know what to do with an identifier" print " Please give me a \"token identifier ...\"" exitcode = -1 } # We hold the name |"identifier"|, and we reserve a number for identifiers. #<Set initial...#>= tokennumbers["identifier"]=tokennumber; tokennumber-- tokenname["identifier"]="identifier" # This is how we write out the string or constant scrap, at the end. #<Write code for string or constant scrap#>= print "Warning: TeX strings have the same category as ", \ "numeric constants in this WEAVE." print "Warning: TeX strings have the same category as ", \ "numeric constants in this WEAVE." > logfile if (number_category != "") { print "@ For some reason strings, constants,",\ " and \TeX\ strings are identified." > scrapfile print "That has to be fixed." > scrapfile print "@<Do the |app_scrap| for a string or constant@>=" > scrapfile printf "app_scrap(SP_%s,%s_math);\n", number_category,\ number_mathness > scrapfile appended[number_category]=1 } else { print "Error: I don't know what to do with a numeric constant" print " Please give me a \"token number ...\"" exitcode = -1 } # We hold names and numbers for constants and strings, as well as identifiers. #<Set initial...#>= tokennumbers["constant"]=tokennumber; tokennumber-- tokenname["constant"]="constant" tokennumbers["string"]=tokennumber; tokennumber-- tokenname["string"]="string" # #<Write code for newline scrap#>= if (newline_category != "") { print "@ @<Append a newline scrap@>=" > scrapfile transstring=newline_translation selfstring="small_app(next_control);" wherestring="in translation of token newline" append_keyword="small_app" #<Take translation from |transstring| and write corresponding \.{WEAVE} code to |outstring|, using |selfstring| as translation of |"<*>"|#> print outstring > scrapfile printf " app_scrap(SP_%s,%s_math);\n", newline_category,\ newline_mathness > scrapfile appended[newline_category]=1 } else { print "Error: I don't know what to do with a newline" print " Please give me a \"token newline ...\"" exitcode = -1 } # #<Write code for |pseudo_semi| scrap#>= if (pseudo_semi_category != "") { print "@ @<Append a |pseudo_semi| scrap@>=" > scrapfile transstring=pseudo_semi_translation selfstring="small_app(next_control);" wherestring="in translation of token pseudo_semi" append_keyword="small_app" #<Take translation from |transstring| and write corresponding \.{WEAVE} code to |outstring|, using |selfstring| as translation of |"<*>"|#> print outstring > scrapfile printf " app_scrap(SP_%s,%s_math);\n", pseudo_semi_category,\ pseudo_semi_mathness > scrapfile appended[pseudo_semi_category]=1 } else { printf "Error: I don't know what to do with a pseudo_semi (%s;)",\ substr(at_sign,1,1) print " Please give me a \"token pseudo_semi ...\"" exitcode = -1 } # Here is how we write out the code that converts ordinary tokens to scraps: #<Write code for ordinary token scraps#>= print "@ @<Cases for ordinary tokens@>=" > scrapfile for (t in tokens) { temp = tokenname[t] printf "case %s:\n", temp > scrapfile transstring=tokentranslation[t] selfstring="small_app(next_control);" wherestring= sprintf ("in translation of token %s", t) append_keyword="small_app" #<Take translation from |transstring| and write corresponding \.{WEAVE} code to |outstring|, using |selfstring| as translation of |"<*>"|#> print outstring > scrapfile printf "\tapp_scrap(SP_%s,%s_math);\n", tokencategory[t], \ tokenmathness[t] > scrapfile temp = tokencategory[t] appended[temp]=1 #^append check#> print "\tbreak;" > scrapfile } #*3{\tt TANGLE}'s token-to-output conversion. We have to write special cases for things appearing in |tokennumbers|. The output conventions for |string|, |constant| and |identifier| are fixed by \.{TANGLE}. One day we have to improve \.{TANGLE}'s treatment of spacing in the output; at the moment it just makes sure there are spaces between adjacent identifiers or numbers. #^future enhancements#> #<Write out...#>= print "@ @<Cases for tokens to be output@>=" > ttokfile for (t in tokennumbers) { #<If |t| is |"string"|, |"constant"|, or |"identifier"|, just |continue|#> printf "case %s:\n", tokenname[t] > ttokfile this_tangleto = tokentangleto[t] if (this_tangleto=="") { printf "\tC_printf(\"%%s\",\"%s\");\n",t > ttokfile } else { printf "\tif (out_state==verbatim) {\n" > ttokfile printf "\t\tC_printf(\"%%s\",\"%s\");\n",t > ttokfile printf "\t} else {\n" > ttokfile #<Write code to print |this_tangleto| onto |ttokfile|#> printf "\t}\n" > ttokfile } print "\tif (out_state!=verbatim) out_state=misc;" > ttokfile print "break;" > ttokfile } # We also have to write something for the tokens that aren't in |tokennumbers| but which have a nonnull |tokentangleto| anyway. #<Write out...#>= print "@ @<Cases for tokens to be output@>=" > ttokfile for (t in tokentangleto) { #<If |t| is |"string"|, |"constant"|, or |"identifier"|, just |continue|#> if (tokennumbers[t]!="" || tokentangleto[t]=="") continue if (t=="@") { thistangletokname = "@@" } else if (t=="\\" || t=="'") { thistangletokname = "\\" t } else { thistangletokname = t } printf "case '%s':\n", thistangletokname > ttokfile this_tangleto = tokentangleto[t] if (this_tangleto=="") { print "This can't happen -- null tangleto for", t, wherestring exitcode = -1 } else { printf "\tif (out_state==verbatim) {\n" > ttokfile printf "\t\tC_printf(\"%%s\",\"%s\");\n",t > ttokfile printf "\t} else {\n" > ttokfile #<Write code to print |this_tangleto| onto |ttokfile|#> printf "\t}\n" > ttokfile } print "\tif (out_state!=verbatim) out_state=misc;" > ttokfile print "break;" > ttokfile } # The tokens for |string|, |constant|, and |identifier| are treated specially by \.{TANGLE}; code to handle them already lives in \.{TANGLE}.web. Therefore, we don't gum up the works with our scheming. #<If |t| is |"string"|, |"constant"|, or |"identifier"|, just |continue|#>= if (t=="string"||t=="constant"||t=="identifier") continue # This is somewhat like the translation code, but tuned for \.{TANGLE} #<Write code to print |this_tangleto| onto |ttokfile|#>= oldwherestring = wherestring wherestring = "for tangleto " wherestring #@ transstring=this_tangleto #<Convert restricted translation in |transstring| to quoted string in |outstring|#> printf "\tC_printf(\"%%s\",%s);\n",outstring > ttokfile #@ wherestring=oldwherestring #*3Defining the token names. At some point we'll have to define all these names, for both \.{TANGLE} and \.{WEAVE}. We may as well show how we do that now. #<Write out...#>= tempfile = scrapfile #<Write the definitions of the token names to |tempfile|#> tempfile = ttokfile #<Write the definitions of the token names to |tempfile|#> # We use an ugly trick to get the token numbers different for \.{WEAVE} and \.{TANGLE}: #<Write the definitions of the token names to |tempfile|#>= print "@ Here are the definitions of the token names" > tempfile for (t in tokennumbers) { temp = tokennumbers[t] if (temp==0) continue ## don't know why we need this!! if (tempfile==ttokfile) { ## output to \.{TANGLE} #<If |t| is |"string"|, |"constant"|, or |"identifier"|, just |continue|#> ## already defined in \.{TANGLE} temp = temp + #'37 + 3 - highesttoken ## hackety hack! ## +3 because three highest are already defined! } printf "@d %s = %s\n", tokenname[t], temp > tempfile } # Some token names are just characters quoted with |'|. We write out all the others. #<Write lists...#>= for (t in tokenname) { temp = tokenname[t] if (substr(temp,1,1) != "'") { #<Strip opening |"SP_"| from |temp|, if it is there#> print temp > tokennamefile } } # #<Strip opening |"SP_"| from |temp|, if it is there#>= tempa=substr(temp,1,3) if (tempa=="SP_") { temp = substr(temp,4) ## remove |"SP_"| } #*Reserved words and ilks. \.{TANGLE} doesn't even need the {\it idea} of reserved words; it treats them like all other identifiers. \.{WEAVE}, however, needs to be able to recognize reserved words to do prettyprinting. \.{WEAVE} uses a two-tiered system for coping with reserved words. I think this system was really designed to make it easier to code \.{WEAVE} by hand, and is therefore not of much interest for \.{SPIDER}, but we retain it as a matter of least resistance. Every reserved word belongs to an ilk, and it is the ilks that, like tokens, have translations, categories, and so on. I have made a bewildering array of defaults that is probably full of bugs. We use a special convention to initialize the |this_| family. #<Pattern-act...#>= #=/^ilk /#> { print "Ilk", $2 > logfile #<Set |this_mathness| etcetera to the defaults and those with no defaults to |""| #> #<If no category is specified, invent a default if you can#> this_name="" start_place=3 #<Scan this line from |start_place| to finish, looking for \.{translation}$\ldots$ and putting results in |this_translation|$\ldots$#> #<Make sure |this_category| is not empty#> #<Make sure |this_name| is empty#> ilk_category[$2]=this_category ilk_mathness[$2]=this_mathness ilk_translation[$2]=this_translation next } # The pernicious option here is to be able to leave off the category, so that an item of ilk |fish_like| will get category |fish|. #<If no category is specified, invent a default if you can#>= if ($2 ~ #=/^[a-zA-Z_]+_like$/#> && $0 !~ #=/ category /#>) { ## give default category this_category = substr($2,1,length($2)-5) categories[this_category]=1 } # For the reserved words, our only option is to set an ilk. We go through wild and assuredly ill-advised gyrations attempting to set all the default properties of that ilk. If the ilk is omitted, we make a new ilk by attaching the string |"_like"| to the name of the reserved word. {\bf Don't use this feature; it embarrasses the author.} #^ill-advised#> #<Pattern-action...#>= #=/^reserved /#> { print "Reserved word", $2 > logfile if ($0 !~ #=/ ilk /#>) { #<Attempt to make up an ilk, with all its defaults#> } for (i=3; i<=NF;) { if ($i == "ilk") { i++ reservedilk[$2]=$i has_reserved[$i]=1 ## remember that ilk has some reserved word i++ } else { print "Error: bad reserved word attribute:", $i, \ "on line", NR #<Punt...#> } } #<Check that we used everything#> next } # Here is our feeble attempt to make up an ilk for a reserved word for which no ilk is given. The default ilk for |"with"| is |"with_like"|, and so on. {\bf Please, please don't do this.} #<Attempt to make up an ilk, with all its defaults#>= temp = $2 "_like" reservedilk[$2]=temp if (ilk_translation[temp]=="") { ilk_translation[temp]=default_translation } has_reserved[temp]=1 if (ilk_mathness[temp]=="") { ilk_mathness[temp]=default_mathness } ## and default category for that ilk is the resword itself if (ilk_category[temp]=="") { ilk_category[temp]=$2 categories[$2]=1 } ilk_is_made_up[temp]=1 ## we really should do something with this #^mistakes#> #*1Telling {\tt WEAVE} how to recognize reserved words. At the end, we'll write out definitions for the ilk names, and we'll write translations of all the ilks. #<Write out all...#>= print "Writing out reserved words and ilks" > logfile ilkno=64 print "@ Here is a list of all the ilks" > reserved for (i in ilk_translation) { printf "@d SP_%s = %d\n", i, ilkno > reserved ilkno++ } # Here is where we write the code that converts reserved word tokens into scraps. #<Write out all...#>= print " " > reserved print "@ Here are the scraps we get from the reserved words" > reserved print "@d the_word = res_flag+p-name_dir" > reserved print "@<Decide on reserved word scraps@>=" > reserved print "switch (p->ilk) {" > reserved for (t in ilk_translation) { printf "\tcase SP_%s: \n\t\t", t > reserved transstring=ilk_translation[t] selfstring="small_app(the_word);" wherestring= sprintf ("in translation of ilk %s", t) append_keyword="small_app" #<Take translation from |transstring| and write corresponding \.{WEAVE} code to |outstring|, using |selfstring| as translation of |"<*>"|#> if (trcnt>0) ## at least one text in the translation has_translation[t]=1 print outstring > reserved printf "\tapp_scrap(SP_%s,%s_math);\n", ilk_category[t], \ ilk_mathness[t] > reserved temp=ilk_category[t] appended[temp]=1 #^append check#> printf "\t\tbreak;\n" > reserved } print "}" > reserved # At the end, we'll have to enter each reserved word in the identifier table, along with its ilk. #<Write out all...#>= print "@ @<Store all the reserved words@>=" > reserved for (i in reservedilk) { printf "id_lookup(\"%s\",NULL,SP_%s);\n", i, reservedilk[i] > reserved } # At the very end, we'll make sure every ilk has both a reserved word and some translation. {\bf Perhaps this could be cleaned up a bit?} #<Check for errors at...#>= for (i in ilk_translation) { if (has_reserved[i] != 1) { print "Error: there is no reserved word of ilk", i exitcode=-1 } if (has_translation[i] != 1) { print "Error: ilk", i, "has no translation" exitcode=-1 } } # #<Write lists...#>= for (i in ilk_translation) { print i > ilkfile } # #<Write stat...#>= for (i in ilk_translation) number_of_ilks++ for (i in reservedilk) number_of_reserved_words++ printf "You defined %d reserved words of %d ilks.\n", \ number_of_reserved_words, number_of_ilks printf "You defined %d reserved words of %d ilks.\n", \ number_of_reserved_words, number_of_ilks > logfile #*The prettyprinting grammar. The most intricate part of \.{WEAVE} is its mechanism for converting programming language code into \TeX\ code. A ``bottom up'' approach is used to parse the programming language material, since \.{WEAVE} must deal with fragmentary constructions whose overall ``part of speech'' is not known. At the lowest level, the input is represented as a sequence of entities that we shall call {\it scraps}, where each scrap of information consists of two parts, its {\it category} and its {\it translation}. The category is essentially a syntactic class, and the translation is a token list that represents \TeX\ code. Rules of syntax and semantics tell us how to combine adjacent scraps into larger ones, and if we are lucky an entire program text that starts out as hundreds of small scraps will join together into one gigantic scrap whose translation is the desired \TeX\ code. If we are unlucky, we will be left with several scraps that don't combine; their translations will simply be output, one by one. The combination rules are given as context-sensitive productions that are applied from left to right. Suppose that we are currently working on the sequence of scraps $s_1\,s_2\ldots s_n$. We try first to find the longest production that applies to an initial substring $s_1\,s_2\ldots\,$; but if no such productions exist, we find to find the longest production applicable to the next substring $s_2\,s_3\ldots\,$; and if that fails, we try to match $s_3\,s_4\ldots\,$, etc. A production applies if the category codes have a given pattern. For example, one of the productions is $$\hbox{\.{open [ math semi <\.{"\\\\,"}-opt-5> ] --> open math}}$$ and it means that three consecutive scraps whose respective categories are |open|, |math|, and |semi| are con\-verted to two scraps whose categories are |open| and |math|. The |open| scrap has not changed, while the string \.{<"\\\\,"-opt-5>} indicates that the new |math| scrap has a translation composed of the translation of the original |math| scrap followed by the translation of the |semi| scrap followed by `\.{\\,}' followed by `|opt|' followed by `\.5'. (In the \TeX\ file, this will specify an additional thin space after the semicolon, followed by an optional line break with penalty 50.) Their is an extensive discussion of the grammar, with examples, in the ``Spider User's Guide.'' Y'oughta read it. #*1Scanning a production. A production in the grammar is written as a sequence of category names and translations, followed by a right arrow (\.{-->}), followed by a category name. When \.{WEAVE} is scanning the sequence of scraps that makes up a module, it checks to see whether the categories of those scraps match the categories given on the left side of the production. If so, the production fires, and the scraps and translations on the left side of the arrow are combined into a single, new scrap, and the new scrap is given the category from the right side of the arrow. The scraps which are combined are called the firing scraps, #^firing scraps#> and the category given to the combination is called the target category. Instead of a category name, e.g.~``\.{math},'' one can write a list of category names, e.g.~``\.{(open\vert lsquare)}'' instead. A scrap matches the list if and only if its category is one of the names listed. One can also use the wildcard ``\.?'', which any scrap matches. On the right-hand side, one can write a \## followed by a number in place of the target category name. If we specify the target category as ``\.{\##2}'', for example, it means ``give the new scrap the same category as the second scrap that matched the left side of the production.'' # Here is the whole syntax as quoted from the ``Spider User's Guide'' \begingroup\def\\{\par\noindent\ignorespaces}\tt \noindent\syntax{production} \produces\\\quad \syntax{left context} [ \syntax{firing instructions} ] \syntax{right context} --> \syntax{left context} \syntax{target category} \syntax{right context}\\ \syntax{left context} \produces~\syntax{scrap designators}\\ \syntax{right context} \produces~\syntax{scrap designators}\\ \syntax{firing instruction} \produces \syntax{scrap designator}\\ \syntax{firing instruction} \produces \syntax{translation}\\ \syntax{scrap designator} \produces~?\\ \syntax{scrap designator} \produces~\opt{!}\syntax{marked category}\\ \syntax{scrap designator} \produces~\opt{!}\syntax{category alternatives}\\ \syntax{category alternatives} \produces~(\syntax{optional alternatives}\syntax{marked category})\\ \syntax{optional alternative} \produces~\syntax{marked category}\vert\\ \syntax{marked category} \produces~\syntax{category name}\opt{*}\\ \syntax{target category} \produces~\#\syntax{integer}\\ \syntax{target category} \produces~\syntax{category name}\\ \endgroup # Here is the pattern that reads productions. In most of the modules below, we read through some of the fields of the production. We use |i| to remember what field we are about to examine. When a module terminates, |$i| is left pointing to the first field of interest to the next module. #<Production patt...#>= #=/-->/#> { #<Set up to parse this production#> #<Read through the fields of the production, up to the arrow#> #<Set |lowpos|, |highpos|, and |arrowpos| to their proper values#> #<Update |highestposoverall|#> #<Update |highestunknownpos|#> #<Check to see that left context matches#> #<Process scrap to which we are translating#> #<Check to see that right context matches#> #<Check to make sure we used all the fields of the production#> #<Compute the appropriate test for this production and put it in |prodtest[prodnum]|#> #<Compute the code to write the new translation, should this production fire, and put it in |prodtrans[prodnum]|#> #<Write the start token in |ppstart[prodnum]| and the number of tokens reduced in |tokensreduced[prodnum]|#> #<If we only reduced one token, write the reduction out to file |cycles| for later testing#> next } ## \.{/-->/} # Each scrap in the production will be given a position |pos|, beginning with 1. (Using 1 and not 0 lets us make good use of the fact that uninitialized AWK variables will have value zero.) We will remember the positions of the scraps that get reduced; they will be from |lowpos| to |highpos-1|. We keep track of the production number in |prodnum|, and we save a copy of the input line in |inputline[prodnum]|. #<Set up to parse this production#>= lowpos=0; highpos=0; pos=1 prodnum=prodnum+1 inputline[prodnum]=$0 print "Parsing production", prodnum, $0 > logfile # This is the guts of the parsing. We have to read each field in the production, determine whether it is category or translation information, and act accordingly. Each scrap will be given a position |pos|. We will write in |test[pos]| the code needed to decide whether a particular scrap matches the pattern given in the production. Scraps can match a single category by name, a list of categories, or |"?"|, which every scrap matches. Categories can be starred, in which case we underline the index entry of the first identifier in the scrap's translation. We also write in |trans[pos]| the code necessary to produce the translations preceding the scrap at |pos|. #<Read through the fields...#>= trans[pos]="" for (i=1; i<=NF; i++) { if ($i ~ #=/<.*>/#>) { ## should be |trans_pattern| #<Process a translation in |$i|#> } else if ($i ~ #=/^!?[a-zA-Z_]+(\*\*?)?$/#>) { ## |cat_pattern| #<Process a single category#> } else if ($i ~ #=/^!?$([a-zA-Z_]+\|)*[a-zA-Z_]+$(\*\*?)?$/#>){ #<Process a list of alternative categories#> } else if ($i == "?") { #<Process a category wild card#> } else if ($i == "[") { lowpos=pos } else if ($i == "]") { highpos=pos } else if ($i=="-->") { break } else { ## we don't recognize the field print "Error: bad field is", $i, "in production on line", NR #<Forget this production#> } } i++ # When we find a mistake, we just abandon the current production. Decrementing |prodnum| will make it as if this production never happened. #<Forget this production#>= prodnum-- #<Punt this...#> # We process the translation and add the result to the current translation for |pos|. #<Process a translation...#>= transstring=$i selfstring="" ## senseless for productions wherestring= sprintf ("in production on line %d", NR) append_keyword="app" #<Take translation from |transstring| and write corresponding \.{WEAVE} code to |outstring|, using |selfstring| as translation of |"<*>"|#> trans[pos]=trans[pos] outstring # Here we'll set |test[pos]|. The phrase |test[pos]| will be a single C conjunct; if the test for each scrap is true, the whole production will fire. If we're called upon to make a scrap underlined or reserved, we'll add to |trans[pos]|. If a category is negated we add an extra clause to make sure nothing matches the zero category, since {\tt WEAVE} assumes no production ever matches a scrap with category zero. #<Process a single category#>= field[pos]=$i ## save this field to compare RHS #<Set |negation|, and remove leading |"!"| from |$i| if necessary#> #<Strip stars from |$i| (if any) and add appropriate translations to |trans[pos]|#> cat = $i categories[cat]=1 ## remember |cat| is a category if (negation==0) { test[pos]=sprintf("(pp+%d)->cat==SP_%s",pos-1,cat) } else { test[pos]=sprintf("((pp+%d)->cat!=SP_%s && (pp+%d)->cat != 0)",\ pos-1,cat,pos-1) } #<Update the record of the rightmost occurrence of category |cat|#> #<Advance |pos|, making the new |trans[pos]| empty#> # The list of categories is enclosed in parentheses and the individual categories are separated by vertical bars. We have to make the test for these things a disjunction, but processing is more or less like the processing for a single category. If a list of alternatives is negated we add an extra clause to make sure nothing matches the zero category, since {\tt WEAVE} assumes no production ever matches a scrap with category zero. #<Process a list of alternative categories#>= field[pos]=$i ## save this field to compare RHS #<Set |negation|, and remove leading |"!"| from |$i| if necessary#> if (negation==0) { test[pos]="(" ## open for a list of good alternatives } else { temp=sprintf("(pp+%d)->cat==0",pos-1) test[pos]="!(" temp "||" ## open for a list of bad alternatives } #<Strip stars from |$i| (if any) and add appropriate translations to |trans[pos]|#> temp = substr($i,2,length($i)-2) ## throw out parens m = split(temp,tok,"|") for (j=1;j<=m;j++) { cat = tok[j] categories[cat]=1 ## remember it's a category #<Update the record of the rightmost occurrence of category |cat|#> temp=sprintf("(pp+%d)->cat==SP_%s",pos-1,cat) test[pos]=test[pos] temp ## add alternative to test if (j!=m) test[pos]=test[pos] "||\n" ## line too long errors } test[pos]= test[pos] ")" #<Advance |pos|, making the new |trans[pos]| empty#> # We keep track of the rightmost occurrence of each category. This enables us to backtrack by exactly the right amount when a production fires and creates a new scrap. #<Update the record of the rightmost occurrence of category |cat|#>= if (pos > highestpos[cat]) { highestpos[cat]=pos } # If a category or lsit of alternatives is preceded by an exclamation point (|"!"|), we set |negation|, and we will test for scraps that are {\it not} of that category or are {\it not} of one of the categories listed. #<Set |negation|...#>= temp = substr($i,1,1) if (temp=="!") { negation = 1 $i = substr($i,2) } else { negation = 0 } # Since both translations and tokens can add to |trans[pos]| we must make sure it is empty whenever we get a new |pos|. This device makes that easy. #<Advance |pos|, making the new |trans[pos]| empty#>= pos=pos+1 trans[pos]="" # If a category is single-starred, we take this construct to be the {\it definition} of that item, and we underline the index entry for this module. The |make_underlined| routine finds the first identifier in the translation of the starred scrap, and underlines the index entry for that identifier in this module. If a category is double-starred, we used to try to change the ilk of the appropriate identifier to make it a reserved word. The only use this ever had was in handling C typedefs, and it should probably be removed. #^mistakes#> In the meanwhile, double starring is like single starring. #<Strip stars from |$i| (if any) and add appropriate translations to |trans[pos]|#>= if ($i ~ #=/^([a-zA-Z_]+|$([a-zA-Z_]+\|)*[a-zA-Z_]+$)\*\*$/#>) { ## it's double-starred temp = sprintf("\tmake_underlined(pp+%d);\n",pos-1) trans[pos] = trans[pos] temp $i = substr($i,1,length($i)-2) } else if ($i ~ #=/^([a-zA-Z_]+|$([a-zA-Z_]+\|)*[a-zA-Z_]+$)\*$/#>) { ## it's starred temp = sprintf("\tmake_underlined(pp+%d);\n",pos-1) trans[pos] = trans[pos] temp $i = substr($i,1,length($i)-1) } else if ($i ~ #=/\*$/#>) { ## a bad star? print "Error: can't remove stars in production on line", NR #<Forget this production#> } # Wild cards are easy to process, but we do have to remember that not even a wild card matches a scrap of category zero. #<Process a category wild card#>= field[pos]=$i ## save this field to compare RHS test[pos]=sprintf("(pp+%d)->cat!=0",pos-1) ## anything nonzero matches highwildcard=pos ## we don't really need this? #<Advance |pos|, making the new |trans[pos]| empty#> # We reach this point in the program after we will have read the arrow into |$i|. This module establishes in what ranges of |pos| the contexts fall: $$\vbox{\halign{##\hfil\tabskip1em&\hfil##\hfil\cr \bf Items&\bf Range\cr \noalign{\vskip2pt} left context&|1..lowpos-1|\cr firing instructions&|lowpos..highpos-1|\cr right context&|highpos..arrowpos-1|\cr }}$$ If |lowpos| and |highpos| haven't been set by the appearance of square brackets, we set them to make the contexts empty. None or both should be set. #<Set |lowpos|, |highpos|, and |arrowpos| to their proper values#>= arrowpos=pos if (lowpos==0 && highpos==0) { lowpos=1 ## first transform position highpos=arrowpos ## first token not reduced ## (or one beyond last token position) } else if (lowpos==0 || highpos==0) { print "Error: square brackets don't balance in", \ "production on line", NR #<Forget this production#> } # Here is the efficient place to update the rightmost (highest) position of {\it any} category. #<Update |highestposoverall|#>= if (arrowpos-1 > highestposoverall) { highestposoverall=arrowpos-1 } # Dealing with grammars in which categories can be unnamed (using wildcards or negation) can be a pain in the ass. What we have to do, when reducing after firing a production, is move backwards enough so that we don't miss any earlier productions that are supposed to fire. This means we have to move back at least far enough so that the new scrap will match any unnamed category. {\bf But} we don't have to worry about wildcards (|"?"|) at the end of a production, because they would have matched anyway, even before the current production fired. Hence: #<Update |highestunknownpos|#>= for (hup=arrowpos-1; field[hup]=="?";) { hup-- } for (;hup>highestunknownpos;hup--) { temp=field[hup] temp=substr(temp,1,1) if (temp=="?" || temp =="!") { highestunknownpos=hup ## we know |hup>highestunknownpos| break ## redundant, since test will fail } } # Here is the error checking for context sensitive productions. #<Check to see that left context matches#>= for (pos=1;pos<lowpos;pos++) { #<Check |$i| against |field[pos]|#> i++ } # #<Check to see that right context matches#>= for (pos=highpos;pos<arrowpos;pos++) { #<Check |$i| against |field[pos]|#> i++ } # #<Check |$i| against |field[pos]|#>= if (i>NF || $i != field[pos]) { print "Error: token mismatch is: found", $i, \ "sought", field[pos], "on line", NR #<Forget this...#> } # We process our target scrap in between checking the left and right contexts. This scrap can be the name of a category, or it can be ``$\##nnn$'', where $nnn$ refers to the number of a category on the left side of the arrow. In this way it is possible to match wildcards and lists of alternatives. #<Process scrap to which we are translating#>= ## i points to the target category if (i>NF) { print "Error: no target category in production on line", NR #<Forget this...#> } if ($i ~ #=/##[0-9]+/#>) { ## a number $i = substr($i,2) ## peel off the \## #<Make sure |1 <= $i < arrowpos|#> targetcategory[prodnum]="Unnamed category" temp = sprintf("(pp+%d)->cat", $i-1) unnamed_cat[prodnum]=temp } else if ($i ~ #=/[a-zA-Z][a-zA-Z_]*/#>) { ## a category targetcategory[prodnum]=$i categories[$i]=1 ## remember this is a category } else { print "Error: unrecognizable target token", $i, \ "in production on line", NR #<Forget this...#> } i++ # We call this at the end to make sure there aren't unused fields left over #<Check to make sure we used all the fields of the production#>= if (i<=NF) { print "Error: used only " i-1 " of " NF " tokens", \ "in production on line", NR #<Forget this...#> } # After having vetted the whole production, we combine the tests and translations for each |pos|. #<Compute the appropriate test for this production and put it in |prodtest[prodnum]|#>= prodtest[prodnum]="" for (pos=1;pos<arrowpos;pos++) { if (pos>1) { prodtest[prodnum]=prodtest[prodnum] " &&\n\t\t" } prodtest[prodnum]=prodtest[prodnum] test[pos] } # #<Compute the code to write the new translation, should this production fire, and put it in |prodtrans[prodnum]|#>= prodtrans[prodnum]="" for (pos=lowpos;pos<highpos;pos++) { prodtrans[prodnum]=prodtrans[prodnum] trans[pos] ## add code to append this scrap temp = sprintf("\tapp1(pp+%d);\n",pos-1) prodtrans[prodnum]=prodtrans[prodnum] temp #<If not negated, record the fact that a token of category satisfying |test[pos]| could have been reduced#> } prodtrans[prodnum]=prodtrans[prodnum] trans[highpos] # #<Write the start token in |ppstart[prodnum]| and the number of tokens reduced in |tokensreduced[prodnum]|#>= ppstart[prodnum]=lowpos-1 tokensreduced[prodnum]=highpos-lowpos # #<If we only reduced one token, write the reduction out to file |cycles| for later testing#>= if (highpos-lowpos==1) { printf "%d: %s --> %s\n", prodnum, field[lowpos], \ targetcategory[prodnum] > cycles wrotecycles = 1 } # If we never even had the possibility of a cycle, we still have to write out a dummy file so the cycle checker in the Makefile won't barf. # #<Write lists of everything#>= if(wrotecycles==0) { print "0: dummy --> nodummy" > cycles } # For error checking, we keep track of categories that get reduced in productions. We can't do this while scanning the production, because we don't know at the beginning what |lowpos| will be, since we might or might not ever see a left square bracket. If a particular category is never reduced, that merits a warning later on. #<If not negated, record the fact that a token of category satisfying |test[pos]| could have been reduced#>= temp = field[pos] tempa = substr(temp,1,1) if (tempa != "!") { if (temp ~ #=/^$([a-zA-Z_]+\|)*[a-zA-Z_]+$(\*\*?)?$/#>) { ## list of alternatives #<Remove trailing stars from |temp|#> temp = substr(temp,2,length(temp)-2) m = split(temp,tok,"|") for (j=1;j<=m;j++) { alternate = tok[j] reduced[alternate]=1 } } else if (temp ~ #=/^[a-zA-Z_]+(\*\*?)?$/#>) { #<Remove trailing stars from |temp|#> reduced[temp]=1 } else if (temp != "?") { print "Confusion: unintelligible field[pos]:", temp, \ "in production on line", NR #<Forget this...#> } } # #<Remove trailing...#>= while (temp ~ #=/\*$/#>) { temp = substr(temp,1,length(temp)-1) } # #<Check for err...#>= for (c in categories) { if (reduced[c] != 1) { print "Warning: category", c, "never reduced" } } # Here's a check for the target token number #<Make sure |1 <= $i < arrowpos|#>= if ((0+$i)<1 || (0+$i)>=0+arrowpos) { print "Error: can't take token number", $i, "of", arrowpos-1, \ "tokens", "in production on line", NR #<Forget this...#> } #*1Writing the scrap reduction code. Before writing the grammar, we want to define all of the category codes. #<Write out...#>= print "Writing out category codes" > logfile print "@ Here is a list of category codes scraps can have" > grammarfile i=1 for (t in categories) { printf "@d SP_%s = %d\n",t,i > grammarfile i++ } print "@c" > grammarfile # We also want to make sure we can print the names of categories in case we need to debug. #<Write out...#>= print "##ifdef DEBUG" > grammarfile print "##define PRINT_CAT(A,B) case A: printf(B); break" > grammarfile print "print_cat(c) /* symbolic printout of a category */" > grammarfile print "eight_bits c;" > grammarfile print "{" > grammarfile print " switch(c) {" > grammarfile for (t in categories) { printf "PRINT_CAT(SP_%s,\"%s\");\n",t,t > grammarfile } print " case 0: printf(\"zero\"); break;" > grammarfile print " default: printf(\"UNKNOWN\"); break;" > grammarfile print " }" > grammarfile print "}" > grammarfile print "##endif DEBUG" > grammarfile print " " > grammarfile # And there goes the list... #<Write lists...#>= for (c in categories) { print c > categoryfile } # #<Write stat...#>= for (c in categories) { number_of_categories++ } printf "You used %d different categories in %d productions.\n", \ number_of_categories, prodnum printf "You used %d different categories in %d productions.\n", \ number_of_categories, prodnum > logfile printf "The biggest production had %d scraps on its left-hand side.\n", \ highestposoverall printf "The biggest production had %d scraps on its left-hand side.\n", \ highestposoverall > logfile # We will write a list of the successfully parsed productions to a separate file. The list will include production numbers, to which the user can refer when debugging. #<Write lists...#>= for (n=1; n<= prodnum; n++) { printf "%2d: %s\n",n,inputline[n] > productions } # Finally, we write out the code for all of the productions. Here is our first view of category checking: we want to make sure that each category can be appended, either by |app_scrap| or by |reduce|. We also want to make sure each category can be reduced by firing some production. We track these things using the arrays |appended| and |reduced|. We write the definition of |highestposoverall|, for safety. We used to write this code as a very deeply nested if-then-else, but that caused a yacc overflow in the generated code for C~{\tt WEAVE}. So now we write {\tt if (...) \LB...; goto end\_prods;\RB} #<Write out...#>= print "Writing out grammar" > logfile print "@ Here is where we define |highestposoverall| and where we" > grammarfile print "check the productions." > grammarfile print "@d highestposoverall =", highestposoverall > grammarfile print "@<Test for all of the productions@>=" > grammarfile for (n=1; n<=prodnum; n++) { if (n%5==0) print "@ @<Test for all of the productions@>=" \ > grammarfile ## avoids overflowing \.{WEAVE} of \.{WEAVE} #<Change \vert,\_, and {\tt \##} in |inputline[n]|; put results in |this_string|#> #<Make |this_string| no more than 60 characters wide#> printf "if (%s) {\n\t/* %d: {\\tt %s} */\n%s",\ prodtest[n],n,this_string,prodtrans[n] > grammarfile #<Write the |reduce| call, taking note of whether the category is named#> print "\tgoto end_prods;" > grammarfile printf "} " > grammarfile } printf "\n" > grammarfile print "end_prods:" > grammarfile # We do different things for a category that is unnamed. #<Write the |reduce| call, taking note of whether the category is named#>= ttk=targetcategory[n] if (ttk == "Unnamed category") { #^append check#> printf "\treduce(pp+%d,%d,%s,%d,%d);\n",ppstart[n],\ tokensreduced[n],unnamed_cat[n],\ 1-highestposoverall,n > grammarfile } else { appended[ttk]=1 ## remember we appended this token #^append check#> reduction=highestpos[ttk] if (reduction<highestunknownpos) { reduction = highestunknownpos } printf "\treduce(pp+%d,%d,SP_%s,%d,%d);\n",ppstart[n],\ tokensreduced[n],targetcategory[n],\ 1-reduction,n > grammarfile } # This is the place we check for errors. #^append check#> #^reduce check#> #<Check for errors...#>= for (c in categories) { if (appended[c] != 1) { if (c=="ignore_scrap") { ## appended by \.{WEAVE} print "Warning: category", c, "never appended" } else { print "Error: category", c, "never appended" exitcode=-1 } } } # It's desirable to put the production in a comment, but we have to get rid of the confusing \vert, or \.{WEAVE} will think it introduces code. We also have to escape underscores and sharp signs, otherwise \TeX\ will think we want math mode. #<Change \vert,\_, and {\tt \##} in |inputline[n]|; put results in |this_string|#>= this_string = inputline[n] tempi = index(this_string,"|") while (tempi != 0) { tempa = substr(this_string,1,tempi-1) tempb = substr(this_string,tempi+1) this_string = tempa "\\vert " tempb tempi = index(this_string,"|") } templ = ""; tempr = this_string tempi = index(tempr,"_") while (tempi != 0) { tempa = substr(tempr,1,tempi-1) tempr = substr(tempr,tempi+1) templ = templ tempa "\\_" tempi = index(tempr,"_") } this_string = templ tempr templ = ""; tempr = this_string tempi = index(tempr,"##") while (tempi != 0) { tempa = substr(tempr,1,tempi-1) tempr = substr(tempr,tempi+1) templ = templ tempa "\\##" tempi = index(tempr,"##") } this_string = templ tempr # We have to keep these productions from making an input line too long. #<Make |this_string| no more than 60 characters wide#>= toolong=this_string; this_string="" while (length(toolong)>60) { idx=59 idchar = substr(toolong,idx,1) while (idx>1 && idchar!=" ") { idx-- idchar = substr(toolong,idx,1) } if (idx==1) idx=59 temp = substr(toolong,1,idx-1) toolong = substr(toolong,idx+1) this_string = this_string temp "\n" } this_string = this_string toolong #*The rest of {\tt SPIDER}. We present the remaining features of \.{SPIDER} in them order we used in the ``\.{SPIDER} User's Guide.'' #*2 Naming the target language. \.{SPIDER} is designed to help you build a \.{WEB} system for any programming language. We need to know the name of the language, and what extension to use when writing the tangled unnamed module. We use this information to pick a name for the file that will hold this \.{WEB}'s special \TeX{} macros, and we write |"\\input webkernel"| on that file. #<Patt...#>= #=/^language /#> { language = $2 extension=language for (i=3; i<NF; ) { if ($i=="extension") { i++ extension=$i i++ } else if ($i=="version") { i++ version=$i i++ } else { print "Error: unknown language property", $i,\ "on line", NR #<Punt...#> } } #<Check that we used everything#> #<Write the first line of the macro file#> next } # #<Write out...#>= if (language != "") { print "@ Here is the language-dependent stuff" > tlang if (version!="") version = ", Version " version printf "@d banner = \"This is %s TANGLE%s %s\\n\"\n", language, \ version, date > tlang printf "@<Global...@>=char C_file_extension[]=\"%s\";\n", extension \ > tlang #@ print "@ Here is the language-dependent stuff" > wlang if (version!="") version = ", Version " version printf "@d banner = \"This is %s WEAVE%s %s\\n\"\n", language, \ version, date > wlang print "@<Set |out_ptr| and do a |tex_printf| to read the macros@>=" \ > wlang printf "*out_ptr='x'; tex_printf(\"\\\\input %sweb.te\");\n", \ extension > wlang printf "@ @<Global...@>=char C_file_extension[]=\"%s\";\n", extension \ > wlang } else { print "Error: you haven't given me any \"language\" information" exitcode=-1 } #*1Defining {\TeX} macros. The first thing we do after getting the language is write a line to the macro file. This makes sure the kernel \.{WEB} macros will be available. #<Write the first line of the macro file#>= macrofile = extension "web.tex" print "\\input webkernel.tex" > macrofile # Processing macros is straightforward: everything between \.{macros begin} and \.{macros end} gets copied into the macro file. #<Patt...#>= #=/^macros begin$/,/^macros end$/#> { if (begunmacs==0) { begunmacs=1 next } if ($0 ~ #=/^macros end$/#>) { begunmacs=0 next } if (macrofile=="") { if (complained==0) { print "Error: you must give \"language\"",\ "before \"macros\"" complained=1 #<Punt...#> } } else { print $0 > macrofile } next } #*1Handling modules. We need to give module names a category, both when we define modules and when we use them in other modules. We might conceivably fool around with mathness, but we don't really intend to do so. #<Pattern-action...#>= #=/^module /#> { for (i=2;i<NF;) { if ($i=="definition") { i++ mod_def_cat=$i categories[$i]=1 print "Module definition category set to", $i > logfile i++ } else if ($i=="use") { i++ mod_use_cat=$i categories[$i]=1 print "Module use category set to", $i > logfile i++ } else { print "Error: unknown module property", $i, \ "on line", NR #<Punt...#> } } #<Check that we used everything#> next } # Here's how we rig it: #<Write out...#>= if (mod_def_cat!="") { print "@ @<Call |app_scrap| for a module definition@>=" > scrapfile printf "app_scrap(SP_%s,no_math);\n", mod_def_cat > scrapfile appended[mod_def_cat]=1 } else { print "Error: I don't know what to do with a module definition" print " Give me a \"module definition ...\"" exitcode=-1 } if (mod_use_cat!="") { print "@ @<Call |app_scrap| for a module use@>=" > scrapfile printf "app_scrap(SP_%s,maybe_math);\n", mod_use_cat > scrapfile appended[mod_use_cat]=1 } else { print "Error: I don't know what to do with a module use" print " Give me a \"module use ...\"" exitcode=-1 } #*1At sign. With \.{SPIDER}, we can designate any character we like as the ``magic at sign.'' #<Pattern-act...#>= #=/^at_sign /#> { if (NF==2 && length($2)==1) { if ($2=="@") { at_sign="@@" } else { at_sign=$2 } } else { print "Error: I can't understand", $0 print " Give me an at sign of length 1" #<Punt...#> } next } # We write the at sign out to the grammar file and to \.{TANGLE}'s token file #<Write out all...#>= tempfile = grammarfile #<Write |at_sign| definion to |tempfile|#> tempfile = ttokfile #<Write |at_sign| definion to |tempfile|#> # It's trivially done #<Write |at_sign| definion to |tempfile|#>= print "@ Here is the |at_sign| for the new web" > tempfile printf "@d at_sign = @`%s'\n", at_sign > tempfile print " " > tempfile print "@ Here is |the_at_sign| left for common" > tempfile print "@<Global...@>=char the_at_sign = at_sign;" > tempfile print " " > tempfile # We provide a default at sign: #<Set init...#>= at_sign="@@" #*1Comments. We have to explain how our programming language supports comments. We give the strings that initiate and terminate a comment. We can say comments are terminated by ``newline'' if that's the case. #<Pattern-act...#>= #=/^comment /#> { print $0 > logfile for (i=2; i<NF;) { if ($i=="begin") { i++ if ($i ~ #=/^<.*>$/#>) { transstring = $i wherestring = "in \"comment begin\" on line " NR #<Convert restricted translation in |transstring| to quoted string in |outstring|#> begin_comment_string = outstring i++ } else { print "Error: \"comment begin\" must have a restricted translation" #<Punt...#> } } else if ($i=="end") { i++ if ($i=="newline") { comments_end_with_newline = 1 end_comment_string = "\"\\n\"" } else if ($i ~ #=/^<.*>$/#>){ comments_end_with_newline = 0 transstring = $i wherestring = "in \"comment end\" on line " NR #<Convert restricted translation in |transstring| to quoted string in |outstring|#> end_comment_string = outstring } else { print "Error: \"comment end\" must have a restricted translation" #<Punt...#> } i++ } else { print "Error: bad comment attribute:", $i #<Punt...#> } } #<Check that we used everything#> #<Write the comment definitions to the macro file#> next } # \.{WEAVE} and \.{TANGLE} must be able to recognize comments. Here we give \.{TANGLE} quoted strings that show the beginning and end of a comment. #<Write out...#>= print "@ Here we recognize the comment start seqence" > ttokfile print "@<See a comment starting at |loc| and skip it@>=" > ttokfile printf "{int len; len=strlen(%s);\n", begin_comment_string > ttokfile printf "if (loc+len<=limit && !strncmp(loc,%s,len)) {\n",\ begin_comment_string > ttokfile print "\tloc += len; /* a new thing */" > ttokfile print "\tskip_comment(); /* scan to end of comment or newline */" > ttokfile print "\tif (comment_continues || comments_end_with_newline)" > ttokfile print "\t\treturn('\\n');" > ttokfile print "\telse continue;\n}\n}" > ttokfile # Now this is \.{WEAVE} finding the start of a comment #<Write out...#>= print "@ @<See a comment starting at |loc-1| and return |begin_comment|@>=" \ > scrapfile printf "{int len; len=strlen(%s);\n", begin_comment_string > scrapfile printf "if (loc+len-1<=limit && !strncmp(loc-1,%s,len)) {\n",\ begin_comment_string > scrapfile print "\tloc += len-1;" > scrapfile print "\t return (begin_comment); /* scan to end of comment or newline */" > scrapfile print "}\n}" > scrapfile # Here \.{TANGLE} spots the end of a comment #<Write out...#>= print "@ Here we deal with recognizing the end of comments" > ttokfile printf "@d comments_end_with_newline = %d\n", comments_end_with_newline >ttokfile print "@<Recognize comment end starting at |loc-1|@>=" > ttokfile if (comments_end_with_newline != 1) { printf "{int len; len=strlen(%s);\n", end_comment_string > ttokfile printf "if (loc+len-1<=limit && !strncmp(loc-1,%s,len)) {\n",\ end_comment_string > ttokfile print "loc += len-1; return(comment_continues=0); }}" > ttokfile } else { print "/* This code will never be executed */ " > ttokfile } # Now here is \.{WEAVE}. \.{WEAVE} copes elsewhere with the situation when |comments_end_with_newline| holds, so we don't need to consider it here. #<Write out...#>= printf "@ Here we recognize end of comments" > scrapfile printf "@d comments_end_with_newline = %d\n",comments_end_with_newline >scrapfile print "@<Check for end of comment@>=" > scrapfile printf "{int len; len=strlen(%s);\n", end_comment_string > scrapfile printf "if (loc+len-1<=limit && !strncmp(loc-1,%s,len)) {\n",\ end_comment_string > scrapfile print " loc++; if(bal==1) {if (phase==2) app_tok('}'); return(0);}" > scrapfile print " else {" > scrapfile print " err_print(\"! Braces don't balance in comment\");" > scrapfile print "@.Braces don't balance in comment@>" > scrapfile print " @<Clear |bal| and |return|@>;" > scrapfile print " }" > scrapfile print "}" > scrapfile print "}" > scrapfile # We have to give \.{TANGLE} the beginning and ending comment strings, so it can use thing in writing its own comments. #<Write out...#>= print "@ Important tokens:" > ttokfile printf "@d begin_comment_string = %s\n", begin_comment_string > ttokfile printf "@d end_comment_string = %s\n", end_comment_string > ttokfile # We also have to write out the starting and ending comment strings to the macro file. We do this at the time of parsing |#=/^comment /#>|, so the user has a chance to override. #<Write the comment definitions to the macro file#>= if (macrofile!="") { this_string=substr(begin_comment_string,2,length(begin_comment_string)-2) #<Write |this_string| into |tex_string|, escaping \TeX's specials#> printf "\\def\\commentbegin{%s}\n", tex_string > macrofile if (comments_end_with_newline==0) { this_string=substr(end_comment_string,2,length(end_comment_string)-2) #<Write |this_string| into |tex_string|, escaping \TeX's specials#> printf "\\def\\commentend{%s}\n", tex_string > macrofile } else { print "\\def\\commentend{\\relax}" > macrofile } } else { print "Error: I can't write comment info to the macro file---" print " you haven't given me any \"language\" information" #<Punt...#> } # Escaping \TeX's specials is pretty easy: #<Set initial...#>= texof["\\"]="\\BS" texof["{"]="\\{" texof["}"]="\\{" texof["$"]="\\$" texof["&"]="\\amp" texof["##"]="\\##" texof["^"]="\\H" texof["_"]="\\_" texof["~"]="\\TI" texof["%"]="\\%" # #<Write |this_string| into |tex_string|, escaping \TeX's specials#>= tex_string="" while (length(this_string)>0) { c = substr(this_string,1,1) this_string = substr(this_string,2) cprime = texof[c] if (cprime=="") { tex_string = tex_string c } else { tex_string = tex_string cprime } } #*1Controlling line numbering. Here we fart around with line numbering for \.{TANGLE}. This lets \.{TANGLE} write an indication of the locations of things in the \.{WEB} source. The C preprocessor accepts these things as \.{\##line} directives. #<Pattern-act...#>= #=/^line /#> { print $0 > logfile for (i=2; i<NF;) { if ($i=="begin") { i++ if ($i ~ #=/^<.*>$/#>) { transstring = $i wherestring = "in \"line begin\" on line " NR #<Convert restricted translation in |transstring| to quoted string in |outstring|#> sharp_line_open = outstring i++ } else { print "Error: \"line begin\" must have a restricted translation" #<Punt...#> } } else if ($i=="end") { i++ if ($i ~ #=/^<.*>$/#>){ transstring = $i wherestring = "in \"line end\" on line " NR #<Convert restricted translation in |transstring| to quoted string in |outstring|#> sharp_line_close = outstring } else { print "Error: \"line end\" must have a restricted translation" #<Punt...#> } i++ } else { print "Error: bad line attribute:", $i, "on line", NR #<Punt...#> } } ## |for| #<Check that we used everything#> next } # We have to give \.{TANGLE} the strings for \&{\##line} commands. #<Write out...#>= print "@ Important tokens:" > ttokfile printf "@d sharp_line_open = %s\n", sharp_line_open > ttokfile printf "@d sharp_line_close = %s\n", sharp_line_close > ttokfile # We'll choose some innocuous defaults #<Set init...#>= sharp_line_open = "\"##line\"" sharp_line_close = "\"\"" #*1Tracking the generation date. We want to be able to note the date on which we generate files. #<Patt...#>= #=/^date /#> { ## date returned as ``Fri Dec 11 11:31:18 EST 1987'' mo = month[$3] day = $4 year = $7 time = $5 #<Set |hour|, |minute|, and |ampm| from |time|#> date = sprintf ("(generated at %d:%s %s on %s %d, %d)",\ hour, minute, ampm, mo, day, year) next } # We want the months to have their full names #<Set init...#>= month["Jan"]="January" month["Feb"]="February" month["Mar"]="March" month["Apr"]="April" month["May"]="May" month["Jun"]="June" month["Jul"]="July" month["Aug"]="August" month["Sep"]="September" month["Oct"]="October" month["Nov"]="November" month["Dec"]="December" # We make a ``friendly'' time from |time=="hh:mm:ss"|. #<Set |hour|, |minute|, and |ampm| from |time|#>= hour = substr(time,1,2) if (hour >=12) ampm = "PM" else ampm="AM" if (hour==0) { hour =12 } else if (hour>12) { hour = hour -12 } minute = substr(time,4,2) #*=The {\tt SPIDER} tools. #i cycle.web #*Flagging duplicate names. Detects duplicate names in a sorted list. #(nodups.awk#>= { if ($0==last) { print "Error: duplicate name", $0, "on lines", NR-1"-"NR exit -1 } last = $0 } #*Checking translation keywords for validity. #(transcheck.awk#>= #=/^good translations$/#>,#=/^test translations$/#> { if ($0 !~ #=/^good translations$|^test translations$/#>) { istranslation[$0]=1 } next } { if (istranslation[$0]!=1) { print "Error:", $0, "is not a valid translation" exitcode = -1 } } END { exit exitcode } # This is a copy of {\tt transcheck.list}, which should be the first part of the input to {\tt transcheck.awk}. Since \.{TANGLE} will insert its own stuff, we can't use it. {\tt transcheck.awk} {\em could} be updated to work with the tangled output, though, if it seemed desirable. #(junk.list#>= good translations break_space force big_force opt backup big_cancel cancel indent outdent math_rel math_bin math_op test translations #*=Index. This is a combined index to {\tt SPIDER} and the {\tt SPIDER} tools. Since the {\tt SPIDER} tools are nearly trivial, it's really just {\tt SPIDER}.