home *** CD-ROM | disk | FTP | other *** search
Text File | 1990-10-28 | 50.1 KB | 1,114 lines |
- Program structure
- Patterns
- Skeletons
- Comments
- Program File
- Library and Initialization
- Boolean Patterns
- Constant Patterns and Skeletons
- Input-Output Skeletons
- Directory Skeletons
- Arithmetic Skeletons
- Conditional and Iterative Skeletons
- Variables
- Intervals
- Lexicographic Intervals
- Rule Set
- Debugging Aids
- Performance
- Bibliography
- :Structure of a Convert program.
-
- A Convert program has the format
-
- ((p)(s)(v)(r)) x
-
- where (p) is a list of pattern definitions ((P1) p1 (P2) p2 ...)
- (Pj a pattern, pj its name)
-
- (s) is a list of skeleton definitions ((S1) s1 (S2) s2 ...)
- (Sj a skeleton, sj its name)
-
- (v) is a list of variables (v1 v2 ...) (each vi a number between
- 0 and 30, a single space separating variables in the list;
- no space whatever if no variables are declared)
-
- (r) is a list of rules, each having one of two possible forms:
- (pattern,skeleton): [for a repeating rule]
- (pattern,skeleton); [for a terminal rule]
-
- x is the name of the program (null for the main program,
- which must appear last)
-
- Every Convert program is a function which maps a string into another
- string; this mapping is performed by the list of rules. Application
- of the list of rules to a string procedes as follows:
-
- - Rules are tried in order, from first to last.
-
- - Trying a rule means trying to match the pattern to the
- string. If there is a match, the rule is said to apply
- and the string is replaced by whatever text is produced by
- the skeleton.
-
- - If a rule ending in a colon applies, the list is
- tried from the beginning on the new string; if a rule
- ending in a semicolon applies, the string produced
- by the corresponding skeleton is the value of the function.
-
- - If no rule applies (none of the patterns match), the
- string remains unchanged and is returned as the value
- of the function.
-
- Thus patterns are predicates which determine whether a string has a certain
- form or not, while skeletons are string-valued entities.
-
- Sample programs:
-
- [the null program; as a function, it is the identity mapping]
- (()()()())
- -------------------
- [the canonical "display a greeting" program]
- (()()()(
- (,Hello<,> world!);
- ))
- -------------------
- [a function named w to replace strings of two or more spaces by a single space]
- (()()()(0 1)(
- (<0> (ITR, )<1>,<0> <1>):
- )) w
- -------------------
- [a function named x to replace trailing spaces by tabs, for each eight columns]
- (()()(0 1)(
- ((and,<[8]>,<0> (ITR, ))<1>,<0>(^I)(x,<1>));
- ((and,<[8]>,<0>)<1>,<0>(x,<1>));
- )) x
- -------------------
- The first two may be placed in files by themselves, compiled and executed.
-
- Two programs can be used together, with one calling the other. In the second
- one below, the pattern 0, referenced as <:0:>, although it is only used once
- represents the more complicated composite (and,<[8]>,<0>). This whole phrase
- is a pattern, and must be enclosed in parentheses to make a definition. There
- is no conflict between variable names and pattern or skeleton names; here 0
- is used in both senses.
-
- [replace 8-column tabs by spaces]
- (()()(0 1 2)(
- ((and,<[8]>,<0>(^I)<1>)<2>,(z,<0>)(y,<1><2>));
- ((and,<[8]>,<0>)<2>,<0>(y,<2>));
- (<0>(^I)<1>,(z,<0>)<1>):
- )) y
-
- [fill out tab space]
- ((
- ((and,<[8]>,<0>)) 0 [a pattern definition]
- )()(0)(
- (<:0:>,<0>); [a rule using the above definition]
- (,<=> ): [this rule appends 8 blanks to the text]
- )) z
-
-
- Skeletons can also be symbolized by single letters. The following program
- copies one file to another, which COPY could do more efficiently. (R)
- and (W) are read and write skeletons, respectively; (W) incorporates a
- variable that must have been defined by a pattern match.
-
- [read word, write word]
- (()()(0)(
- (<0>(^Z),(W));
- (<0>,(W)(R)):
- )) a
-
- [main program]
- (()(
- ((%R,<9>.ONE,(ITR, )<-->(or, ,(^Z)))) R
- ((%W,<9>.TWO,<0>)) W
- )(9)(
- (<9>(or, ,.,<>),<<
- >>(%Or,<9>.ONE)<<
- >>(%Ow,<9>.TWO)<<
- >>(a,(R)));
- ))
- :Patterns may have one of the forms
-
- xxx [constant: string of ASCII characters between
- ! and ~, except parentheses, angle brackets,
- comma and single and double quotes; it may
- include SP, CR, LF and HT]
- (<) (>) [constants: left and right angle brackets]
- <(> <)> [constants: left and right parentheses]
- <,> <'> <"> [constants: comma, single quote, double quote]
- (QUO/.../) [constant ... with delimiters /]
- (DEC,n) [the decimal byte n (mod 256)]
- (HEX,n) [the hexadecimal number n]
- (^xxx) [control characters xxx, each x in [@,_]]
- <()> [balanced parentheses]
- <:a:> [defined pattern a]
- <[n]> [interval of length n]
- <[-n]> [remaining text but the last n bytes]
- <[s]> [interval of length given by skeleton s]
- <--> [indefinite length interval]
- <n> [variable n, n between 0 and 30]
- <> [null string at the end of the text]
- (LAM,(v),p) [p with local variables given in list (v)]
-
- (AND,p1,p2,...,pn) [matches if all patterns p1,..pn match]
- (and,p1,p2,...,pn) [simplified version of AND]
- (OR,p1,p2,...,pn) [first of p1,...pn to match]
- (or,p1,p2,...,pn) [simplified version of OR]
- (NOT,p) [string does not match p]
- (ITR,p) [Repeat p as much as possible]
- (itr,p) [Repeat p as little as necessary]
- (IVL,m,n,) [lexicographic interval [m,n]; m, n constants]
- (ivl,s1,s2,...) [lexicographic intervals [s1,s2],... ]
- (^) [any control character below ASCII SP]
- (PWS) [print remaining workspace]
- (PWS,r) [print remaining workspace after message r]
- (PVR,n) [print value of variable n]
- (HLT) [wait for keystroke]
- (HLT,mssg) [print mssg, wait]
- (NOP,p) [null pattern, disables p]
- (f,s) [matches text identical to value of function
- f applied to argument s (another skeleton)]
- <<...>> [null pattern]
- :Skeletons may take any one of the following forms
-
- xxx [constant: string of ASCII characters between
- ! and ~, except parentheses, angle brackets,
- comma and single and double quotes; it may
- include SP, CR, LF and HT]
- (<) (>) [constants: left and right angle brackets]
- <(> <)> [constants: left and right parentheses]
- <,> <'> <"> [constants: comma, single quote, double quote]
- (QUO/.../) [constant ... between delimiters /]
- (DEC,n) [the decimal byte n (mod 256)]
- (HEX,n) [the hexadecimal number n]
- (^xxx) [control characters xxx]
- <n> [variable n, n between 0 and 30]
- (f) [function (skeleton) f, null argument]
- (f,s) [function f, argument s (another skeleton)]
- <=> [same text used in the last comparison, or
- argument of skeleton in a definition]
- <<...>> [null skeleton]
-
-
-
-
- (if,s{,p,s,s}[,p,s]) [conditional]
- (IF,(v),s{,p,s,s}[,p,s]) [conditional with variables]
- (nf,s{,p,s,s}[,p,s]) [negative conditional]
- (NF,(v),s{,p,s,s}[,p,s]) [negative conditional with variables]
- (while,s{,p,s,s}[,s]) [iterative]
- (WHILE,(v),s{,p,s,s}[,p,s]) [iterative with variables]
- (until,s{,p,s,s}[,s]) [iterative, negative form]
- (UNTIL,(v),s{,p,s,s}[,p,s]) [iterative with variables, negative form]
-
- :Comments enclosed in square brackets (which may be nested) may appear at any
- of the places indicated by periods in the following sample structure:
-
- .(.(.(P1)p1.(P2)p2.).(.(S1)s1.(S2)s2.).(v1 v2).(.(Pa,Sa);.(Pb,Sb):.).)x.
-
- They may NOT appear:
- - within patterns or skeletons,
- - between definitions or functions and their names,
- - between rules and the colon or semicolon following them.
-
- Spaces and tabs are allowed in any of the latter two categories, besides
- places where comments are legal. No spaces are allowed within a list of
- variables, except single spaces used to separate variable numbers appearing
- in the list. Since all spaces, tabs, and similar characters are taken
- literally in patterns and skeletons, the pair <<...>> may be used to format
- programs into lines and columns; any text enclosed (as long as it does not
- contain a pair of right angle brackets) will be ignored. For example
-
- (and,<[8]>,<<
- >><0>)
- is identical to
- (and,<[8]>,<0>)
-
- :Program file.
- A Convert program will eventually have to be compiled and executed if it is to
- produce any practical results. The program responsible for first compiling it
- is CONVERT.REC, which requires a disk file containing the program, and which
- will produce a REC program to be executed. Both steps require having a REC
- compiler in the system, usually REC86.EXE, although REC86F.EXE and REC87.EXE
- may also be used; either of the latter two is required for the execution of
- Convert programs involving floating point operations. The disk file bearing
- a Convert program must have the extension CNV; the command line
-
- REC86 CONVERT SOURCE
-
- will compile the file SOURCE.CNV to produce SOURCE.REC, which can then be
- executed with the command line
-
- REC86 SOURCE [argument]
-
- where [argument] is an optional string received by program SOURCE as the
- initial text when its execution begins.
-
- Explicit disk assignments and pathnames can be given for any of the files
- mentioned in these command lines. REC and CNV extensions are always assumed.
-
- The execution of the compiled program requires the presence of a library,
- CNVLIB.REC, in the current directory or in any of the directories specified
- by the PATH environment variable. The command interpreter environment is
- available in variable <30>; it consists of a concatenation of NUL-terminated
- strings, each having the form VAR=value.
-
- When execution terminates, all open files are closed and whatever text
- results from application of the main program to the initial string is
- displayed on the console.
-
- Since a Convert program will be compiled into a REC program, it is convenient
- to require that it be laid out like a REC program:
-
- (subroutine 1) n1 (subroutine 2) n2 ... (main program)
-
- wherein subroutine definitions alternate with subroutine names, which are
- printing ASCII characters excluding the space and the 11 characters ( ) % &
- # ~ < > , @ and }. CONVERT will insert the appropriate braces to get a REC
- program, as well as a subroutine to initialize and load the required routines
- from the library.
-
-
-
- Certain comments enclosed in square brackets are processed by the compiler;
- in order for them to be noted the program should be structured according to
- the following scheme:
-
- [SOURCE.CNV]
- [Author, Date]
- [comments]
- [Exclude LIB]
- [Include ..]
-
- [subroutine 'a' descriptor]
- (()()()()) a
-
- [another subroutine]
- (()()()()) b
-
- [main program]
- (()()()())
-
-
- The comments which are processed are [xx.CNV], [Exclude LIB], and
- [Include ..].
-
- The header [SOURCE.CNV] should be upper case, and will be transformed into
- [SOURCE.REC] by the compiler. The header, together with the attribution of
- author and date help to identify the program, and with the date, its version.
- All this header material should be placed in balanced square brackets.
-
- The [Exclude LIB] comment indicates the program should not include the initial-
- izing routine; if appearing, it overrides any [Include ..] also present.
- [Include ..] indicates which routines must be loaded at run time in addition
- to those whose inclusion is automatically determined at compile time. Both
- types of comment are required only in special circumstances.
-
- The [Include ..] comment is only required by the root program in an overlay
- tree, by a program requiring floating point arithmetic, or by a program using
- the operators ^ or ** (for exponentiation) or % (for remainder) in arguments
- to the formula-evaluating skeleton (#f,s) and not using explicitly the skele-
- tons #^ or #%. For example, the comment [Include #.#^#%] is required for a
- program which would evaluate formulae containing floating point numbers and
- exponentiation and remainder operations, and invoking only the library function
- #f.
-
- The [Exclude LIB] comment is required by the overlaying segments of an overlay
- tree. More about Include and Exclude may be found in CNVADV.HLP.
-
- The subroutines and main program then follow. During compilation, the
- compiler displays for each one of them the names of patterns and skeletons
- they define, the variables they declare, the punctuation following each rule
- (: or ;), the name of the subroutine (if it is one), and a series of dots,
- one for each sector of compiled code written to the disk.
-
- :Library and Initalization. The library is contained in file CNVLIB.REC, which
- is sought and read during the initializing process. The library should reside
- in the current directory or any of the directories given by PATH unless the
- command line contains an argument having one of the following forms:
-
- L/d: L/pathname-ending-in-\ L/filename.ext
-
- The first and second forms indicate CNVLIB.REC resides on the specified disk
- or directory, respectively; the third form indicates that the file whose name
- is given (and which may include a disk identifier and subdirectory path)
- should be used as source for the library. No extension is assumed in the
- last form if one is not given explicitly.
-
- The argument L/ is transparent to the program, that is, it will neither appear
- in the program's initial text nor interfere with other arguments; for instance,
-
- REC86 DSASM L/A: WRTSYS
- REC86 DSASM WRTSYS L/A:
-
- both produce the same result, passing the string "WRTSYS" as initial text to
- the program being executed (DSASM.REC).
- :Boolean patterns. One of the mechanisms for generating complex patterns
- from simpler constituents is to form Boolean combinations of patterns. The
- fundamental Boolean connectives AND, OR, and NOT may be used. In Convert, AND
- and OR are not binary operations, but rather may have any number of arguments;
- a Boolean function of a variable number of arguments has a standard definition,
- which follows the associative law and reduces to the binary function in that
- special case. Thus it is not capriciousness which requires that
-
- (AND) always matches (to the null string)
- (AND,p) is the same as p (matches if p matches)
- (AND,p1,p2,...) matches when all of the pj match
- (OR) never matches
- (OR,p) is the same as p (matches if p matches)
- (OR,p1,p2,...) matches if at least one pi matches.
-
- As is customary in many programming languages, Boolean combinations are
- executed progressively, so that no more arguments are evaluated than the
- minimum needed to reach a decision. The first failure in an AND, the first
- match in an OR decides the expression.
-
-
-
-
- Convert has an alternative series of Boolean functions, namely (and,...),
- (or,...), and (nor,...) [which is equivalent to (NOT,(or,...))]. They exist
- only for reasons of efficiency. The issue is that Convert combines variable
- matching with variable generation, rather than separating these two activities.
- This is in turn more efficient, but subject to logical paradoxes if not done
- correctly. Concretely, if a variable match fails, Convert will back up and
- retry the previous variable it identified, or will progress to the next
- alternative of a preceding OR. In using the lower case Boolean operators we
- forsake all this jockeying in the interest of speed, but it is forsaken
- nevertheless.
-
- Thus, lower case Boolean operators are to be used when the decision which
- they represent is final - for example if their arguments are constants. In
- practice, if the patterns (ITR,xxx) and (itr,xxx) are used, the searching
- required by a program can be assigned to them, and the lower case operators
- used exclusively. The upper case operators work correctly albeit more slowly.
-
- Combining variable generation with variable matching as Convert does restrict
- the participation of NOT in patterns; its arguments must not contain unbound
- variables.
-
-
-
- One of the most common uses of AND is to impose some condition on a variable,
- although it could also be used to parse a string in two or more different ways.
- For example, consider CP/M's directory entry from which we may wish to extract
- the file name and the extension, and ignore the rest of the block. The pattern
-
- (and,<[1]>(and,<[8]>,<1>)(and,<[3]>,<2>)<[20]>,<4>)
-
- will bind the whole 32-byte string to the variable <4>, but at the same
- time allow us to retrieve the file name as <1>, and the extension as <2>.
- Both uses of AND are shown in this example. The pattern
-
- (and,(ITR,(and,<[1]>,(nor, ,(^I))),<0>)
-
- will match a string free of spaces or tabs - whitespace as some say - and
- associate it with the variable <0>. The inner "and" prevents the null string
- from matching the "nor". The first argument of an AND establishes the length
- of the string that the remaining arguments must match. Another example would
- be to identify a decimal number through
-
- (and,(ITR,(and,<[1]>,(IVL,0,9,))),<0>).
-
-
-
- The most apparent use of the Boolean OR is to express alternatives. Taken
- together with a mechanism to assign a name to a pattern, OR can be used to
- generate recursive patterns. Given the definition
-
- ((or, <:s:>,)) s
-
- we can define a series of spaces - which is either a space followed by the
- series, or the null string. This definition also makes use of the property,
- that the first viable alternative satisfies an OR. Alternatively,
-
- ((OR,, <:m:>)) m
-
- would be a pattern that rendered the null string unless a reconsideration
- were forced upon it. These two alternatives define respectively a maximal
- and a minimal string satisfying a condition defined by an OR. Equivalently,
-
- ((ITR, )) s
- ((itr, )) m
-
- Many other kinds of recursive definitions can be made with OR, but it is
- interesting to note that ITR and itr seem to be sufficient for the
- applications that have been encountered.
-
- An OR with a null terminal argument is a convenient way to express optional
- elements of a string. Consider
-
- ((or,+,-,)<:d:>(or,.<:d:>,)) f
-
- as a definition of a signed floating point number. The sign is entirely
- optional, as is the decimal point. According to whether <:d:> accepts the
- null string or not, <:f:> could match or not a single isolated decimal point.
- If it does not, we would have to modify the definition to make 1. into an
- acceptable number.
-
- :Constant Patterns and Skeletons. There are two types of constants: default
- constants and distinguished constants. A default constant is any string
- including spaces, tabs, carriage returns, line feeds and printing ASCII
- characters (those between ! and ~) except any of the seven characters ( )
- < > , " and '. The first five exceptions have been reserved to delimit
- patterns and skeletons of diverse kinds; the last two play an exceptional role
- in REC and so must be given special treatment; it is the conflict with REC
- which keeps them from being used for quoting in Convert. The distinguished
- constants are described in the remainder of this section.
-
- The very delimiters themselves have to be quoted; for conciseness and to
- avoid reserving yet more symbols, we use them to quote each other:
-
- <(> left parenthesis
- <)> right parenthesis
- (<) left angle
- (>) right angle
- <,> comma
- <'> single quote
- <"> double quote
-
-
-
- It is not convenient to incorporate control characters directly into programs,
- because they interfere with printing the program for reference. The pattern
- (CTL,xxx) proved to be unsightly, so we use the form (^xxx), in analogy to
- the common representation ^X for a single control character. The characters
- allowed in the string xxx are those between @ and _; these bounds correspond
- to control characters whose values in ASCII are 0 and 31, respectively. Some
- useful control characters to remember are:
-
- (^I) horizontal tab
- (^Z) end of file
- (^MJ) carriage return, line feed
- (^[) escape
-
- Provision has been made for quoting a long string, too cumbersome to represent
- character by character using the foregoing conventions; we could write
-
- (QUO/.../)
-
- wherein / could be any character not occurring in the text ..., and terminates
- the string with its second appearance in the pattern.
-
-
-
- Two constants have been provided for recognizing (as patterns) or generating
- (as skeletons) bytes of binary data:
-
- (DEC,n) decimal byte n (mod 256)
- (HEX,k) the shortest byte string needed to represent
- the hex value k (including leading zeros in k)
-
- For example, (DEC,2035) represents a byte whose decimal value is 243, whereas
- (HEX,5C) represents a byte whose decimal value is 92. For hexadecimal strings
- with 3 or more digits, the string represented by (HEX,k) is machine-dependent;
- in microprocessors following Intel's convention, (HEX,05C) would represent two
- bytes, the first of which is 92 (decimal) and the second zero (the least
- significant byte in the lowest addressed location); on other microprocessors
- like the MC68000, (HEX,05C) would represent two bytes, the first one zero and
- the second one 92 (the most significant byte at the lowest address). In any
- case, the string represented is |_ (|k|+1)/2 _| bytes long.
-
- :Input-Output skeletons exploiting MS-DOS's BDOS:
-
- (%Or,pathname) open file for reading
- (%Ow,pathname) open file for writing
- (%r) read from standard input
- (%r,pathname) read file
- (%r,pathname,pattern) read until match
- (%r,pathname,patt,skt) read, match, substitute
- (%r,pathname,patt,skt,skf) read, match, two options
- (%W,pathname,xxx) write file
- (%C,pathname) close file
- (%E) close all files
- (%Lr) get id of currently logged-in disk
- (%Lw,D) log in the given disk
- (%S,pathname) search
- (%A) search again
- (%D,pathname) delete
- (%N,new_pathname,old_pathname) rename
- (%T,xxx) type, preserve
- (%t,xxx) type, erase
- (%+) type CR,LF
- (%P,xxx) print, preserve
- (%p,xxx) print, erase
-
- The pattern-directed READ operation is worthy of attention. There are
- several forms of the READ function-skeleton %r,
-
- (%r) read from default
- (%r,pathname) read file
- (%r,pathname,pattern) read until match
- (%r,pathname,patt,skt) read, match, substitute
- (%r,pathname,patt,skt,skf) read, match, two options
-
- which are progressively more complex. Beginning with the first, the standard
- input is assigned, and a single line is delivered to the workspace with each
- invocation, without the terminating carriage return or line feed. The
- standard input is the keyboard unless modified by means of redirection by
- the command interpreter. This form gives a simple means of communication
- that can be especially helpful in the initial stages of program development.
-
- The next alternative in the sequence of complexity requires the programmer
- to assign a specific file. The latter can be a constant, or it can be a
- skeleton which evaluates into the name of the file. An example would be
- (%r,<7>:<8>.<9>), where the disk, file and extension have been determined
- separately and bound to variables 7, 8 and 9, respectively. (%r,) is the
- same as (%r).
-
- Just as in the null name case, a single line, without its terminator, is placed
- in the workspace with each execution of the skeleton. If the end of file has
- been reached, a control-Z is inserted and will be produced repeatedly each
- additional time an attempt is made to read the exhausted file. The file itself
- is buffered so that partial contents such as a single line can be read at will.
-
- Sometimes data is not divided into lines, or it may be that the program finds
- it inconvenient to receive a whole line at a time. A binary file would typify
- the former case, the scrutiny of a file word by word or by sentences would
- exemplify the second. In these cases the third form of %r is useful.
-
- The third variant, to which a pattern has been adjoined, will search the
- input stream until the first instance of the pattern is found, which will
- then be given to the workspace. Using the skeleton again will locate the
- second instance of the pattern, and so on.
-
- If the file is exhausted and no match was obtained, all of the material read
- remains in the workspace; subsequent reads will leave the null string. Care
- in the selection of the pattern must be exercised to avoid the possiblility
- of overflowing the workspace, which would cause program termination.
-
-
- The final two forms practically allow the incorporation of a whole pre-
- processor into the read command, since it allows any recognition and
- transformation that Convert is capable of expressing to be incorporated
- into the act of reading a file; furthermore this processing can be tailored
- individually for each file, and even for each instance of reading.
-
- In the next to the last case, %r leaves the value of skt if the pattern
- does match; otherwise all of the text (to the end of the file) is left. In
- the last case, skf is a skeleton which generates the text left in the
- workspace if the pattern does not match the text read.
-
- In these skeletons, if the null pattern is given [e.g. (%r,,,skt)], %r reads
- up to the next carriage return (or to the end of the file if no CR is found).
-
- Finally, patterns within READ skeletons requiring unbound variables for
- binding may be accomodated by including a list of variables (a list of one
- or more numbers between 0 and 30 separated by single blanks) following the
- alternative name %R:
-
- (%R,(v1 ...),pathname,pattern)
- (%R,(v1 ...),pathname,pattern,skt)
- (%R,(v1 ...),pathname,pattern,skt,skf)
-
- Skeletons performing writing operations vary as to the result they leave in
- the workspace. Skeletons %T and %P leave the value of their argument; %t and
- %p always erase it. In addition, %T and %t write CR and LF before sending
- their argument to the console, whereas %P, %p and %W write no more than the
- value of their arguments.
-
- If %W is not able to write on the indicated file due to lack of space, it will
- leave on the workspace the unwritten portion of its argument.
-
- (%Or,pathname) leaves the null string if it was possible to open the
- indicated file; it leaves the string "Not Found" if the file could not be
- opened.
-
- In functions %r, %W, %Or, %Ow and %C the value of the skeleton designating the
- file is not restricted to an actual disk file, in conformity with CP/M's
- conventions (inherited by MS-DOS) that there may be such devices as TTY: or
- EOF:. The assortment is not exactly the same, but it includes
-
- TTY: console keyboard MEM:X named memory pseudofile
- NUL: the null file ARR:X named array pseudofile
- CTR:X named counter STK:X named stack pseudofile
-
-
- The operations of reading via %r and writing via %W may be performed on
- these devices just as though they were disk files. They may be opened and
- closed by using %O and %C; in fact these operations are a necessity if these
- false devices are to be given a meaningful definition, or if they are to be
- removed from a program in an orderly fashion when they are no longer required.
-
- (%W,TTY:,skel) works like (%t,skel), but without the CR/LF included by %t in
- its operation. (%r,TTY:prompt) reads a line from the keyboard regardless of
- standard input redirection, and allows a user-defined prompt to be provided.
-
- Every write operation on NUL: simply erases its argument; every read from NUL:
- returns control-Z; that is, NUL: functions as a file to which one can "write"
- without limit but which is permanently exhausted when read. This pseudofile
- is useful in compiler and assembler construction, where for test purposes one
- may not want to produce, say, an object file. In this case a variable could
- be bound at the beginning of the program to NUL: or to a disk file, and this
- variable be used in all references to the object file.
-
- MEM:X, ARR:X and STK:X pseudofiles are dealt with in detail in CNVADV.HLP.
-
-
-
-
- To open and then read a counter requires only a name, of eight or less
- ASCII characters. Neither counters nor pseudofiles use extensions.
- To write a counter we have a series of parameters, whose tail may optionally
- be discarded at any point after the counter's name. Altogether, we can write
-
- (%Or,CTR:XXXXXXXX)
- (%W,CTR:XXXXXXXX,val,incr)
- (%r,CTR:XXXXXXXX)
-
- Argument val is the initial value; its default is 0; incr is the increment,
- which may be signed and whose default is +1; the assigned value is any
- (sixteen bit) integer which is modified in modulo-16 arithmetic. Both of the
- parameters are ASCII strings representing decimal numbers, introduced into
- the skeleton as constants or as other skeletons which evaluate into constants
- of the required form. Every time that a counter is read, its present value
- is reported, but the increment is added to its stored value so that it will
- appear at the next reading. In the language of "C",it is a postincrementing
- counter. When a counter is opened, it is assigned default value 0, increment 1.
-
- :Directory Skeletons. It is possible to write a Convert program which will
- process a series of files, for example to pack several short files into
- a single large file. Later, with another similar program, they can be restored
- to their original condition. This process can be used advantageously to build
- up libraries, for example. Another use would be to search for a series of
- files and to offer the user to process each one in turn, interactively.
-
- To realize this sort of operation Convert offers access to the BDOS functions
- for directory and disk system access via the operators K and k in REC.
-
- (%S,pathname) initial search for file
- (%A) subsequent search for file
- (%D,pathname) delete specified files
- (%N,newpathname,oldpathname) rename file
- (%Lr) get id of currently logged-in disk
- (%Lw,D) log in the given disk
-
- Arguments for %S and %D may include the wildcard characters * and ?, allowing
- families of files to be referenced. The following scheme, showing a Main
- program and satellite, can be used as the basis of a Convert program which
- will process a series of files as given by a possibly ambiguous file reference
- in the initial command line.
-
- [Program heading, including Name and Comments]
-
- [the bulk of the program, accessed through a program called "y"]
- (()()()()) y
- ...
- [Gather directory entries in WS, removing status bytes]
- (()()(0 1)(
- (Not Found(^@)<0>,<0>);
- (<[9]><0>(^@)<1>,(%A)(^@)<0>(^@)<1>):
- )) z
-
- [Main program: search for first]
- (()()(8)(
- ((and,(or,<[1]>:,)(ITR,<-->\),<8>),(y,(z,(%S,<=>)(^@))));
- ))
-
- %S and %A return a string starting with nine bytes (containing attributes,
- file length, time stamp, etc.) and the base name of the file (name, period,
- extension). Thus the main program binds any disk identifier and subdirectory
- path to variable <8>, which then is available to any subroutine called as long
- as its variable list does not include 8. "Not Found" is returned whenever the
- search fails. (^@) --NUL-- is used to separate file names in "z".
-
- On arriving at the execution of "y" the workspace contains all the relevant
- file names --possibly none-- which were found by "z". It is advisable
- to gather them up all at once on entering the program, before the directory
- begins to change. This avoids the conflict of a new file having the same name
- as an old file.
-
- For other programs to use the workspace, this directory extract must be stored
- in a variable and parcelled out to the programs as they are required. It will
- generally be sufficient to call the following program "y" and interpose it
- between the actual program, "x", and the main program, which already calls it.
-
- [Get next name]
- (()()(0 1)(
- [separate next name] (<0>(^@)<1>,<<
- [process this file] >>(x,<8><0>))<<
- [rest of files] >><1>):
- )) y
-
- "y" rebuilds a full name by prefixing the filename (bound to variable 0) with
- the (possibly null) disk-subdirectory path bound to <8> by the main program.
- "x" encounters a pathname on entry, must leave a null chain when it finishes;
- but it could possibly return some additional files for processing.
-
- :Arithmetic Skeletons. Convert provides facilities for integer and floating
- point arithmetic (through skeleton #) and for character "arithmetic" (through
- skeleton &). The latter facilitates upper/lower case conversions, hexadecimal
- dumps and access to individual bits or groups of bits within bytes.
-
- The file CNVADV.HLP contains a detailed description of the operations possible
- with these two library skeletons.
-
- :Conditional and iterative skeletons. The fundamental skeleton forms in
- Convert are constants, variables and functions. In principle it is sufficient
- to work with this combination because functions are readily defined and
- capable of defining any construction that one wants. Thus the motivation for
- introducing further skeleton forms would have to be to offer some frequently
- used function as an inherent feature of the language. Other motives would be
- to avoid the cumbersome ritual of defining a function if a particularly
- simple action were desired, or to avoid preparing an argument for a function
- and then finding that the argument would not be used after all.
-
- Most languages have facilities for the selection of alternatives - an IF
- statement, or for an orderly repetition of some activity - a DO or a WHILE
- statement. The convenience of these constructs is widely recognized. Convert
- offers the conditional skeletons if, IF, nf and NF, and the iterative
- skeletons while, WHILE, until and UNTIL, all of which are described in
- detail in CNVADV.HLP.
-
- :Variables are distinguished by decimal numbers in Convert. There is no
- theoretical limit to their range, but the practical range is 0-30 with the
- present structure of the underlying REC compiler. Few programs go beyond
- ten variables, many subsist with one or two, and it is possible to have a
- program without any variables at all.
-
- Even if a program binds no variables, they must be enclosed in parentheses.
- The same is true of pattern and skeleton definitions, so that a program
- quadruple (()()()()) may contain some null parentheses.
-
- A program should not bind a variable which it has not declared - that is,
- if a previously undefined variable appears in the pattern of one of the
- rules of a program where it may become defined, it must be declared in the
- variable list. The variable list guarantees that the program has at its
- disposal new instances of the variables listed, and that those variables are
- unbound prior to trial of each rule in the rule set.
-
- A variable is designated by enclosing its number within angle brackets. <0>,
- <8>, or <15> are examples of variables. Variables are defined by patterns and
- used by skeletons. Convert combines variable matching and variable generation
- so that a variable can be defined and then matched within the same pattern.
-
-
- Generally speaking, variables are defined by the constants which surround
- them; thus v<0>e would assign the value ariabl to <0> when matched against
- the word variable, or a null value when matched to ve. Since the null string
- is accepted throughout Convert, a very common type of error arises from matches
- involving a null string in a way that the programmer had not foreseen. Since
- Convert uses other pattern forms than constants, a variable need not always be
- delimited by a constant. <[9]>(and,<[3]>,<0>)<[20]> could be used to pick the
- extension out of a CP/M-style directory entry, for example.
-
- Persons who like long or arbitrary variable names will be disappointed with
- Convert. When CONVERT was built over a LISP substrate, it was possible to make
- arbitrary choices of variable names; this was because LISP contained a kind
- of preprocessor which parsed individual atoms and replaced them by their own
- address in a dictionary. A string parser does not necessarily want to isolate
- atoms, so another symbolism must be found.
-
- Other persons find that it is much easier to work with a very concise symbolism
- even though it means reusing the same few symbols over and over again in each
- different context. Thus one may see a complicated file of Convert programs all
- of which use the same series of variables <0>, <1>, ... up to the number that
- the given program requires. It is not hard to become accustomed to this way of
- thinking.
-
- :Intervals. There is a kind of pattern, which is technically a variable when
- giving a formal definition of the syntax of Convert, but which is not treated
- as a variable in this description because it is not assigned a name, and so
- can't be referred to again, either in the same pattern or in the paired
- skeleton of its rule. Of course, it is not lost entirely; it can be named by
- participating in an AND together with a named variable. We call these unnamed
- variables intervals, of which there are four kinds:
-
- <--> indefinite interval
- <[n]> interval of length n
- <[-n]> all but the last n bytes
- <> null interval (no more text in workspace)
-
- The indefinite interval simply allows us to skip over uninteresting parts of a
- text - for example an end-of-file embedded in a line of whatsoever length can
- be detected by <-->(^Z).
-
- Tabular information can be broken into columns by specifying an interval of a
- determined length. Such a pattern is often paired with a named variable through
- an and, as in (and,<[8]>,<0>). In this sense it is a predicate, describing some
- property of a string, and it is reasonable that it should be compounded using
- the Boolean AND.
-
- Easy access to the last n bytes of a string is obtained with <[-n]>, where
- n is an integer. A pattern such as <[-1]><0> would bind the very last byte
- to variable <0>; if we need to keep the text preceding the last byte we can
- always bind it to a variable through an and: (and,<[-1]>,<1>)<0> binds all
- but the last byte to <1> and the last byte to <0>. <[-n]> will of course
- fail if less than n bytes remain in the workspace.
-
- A null pattern will always match a null interval, but the null designator <>
- refers to something slightly different; it requires that it be matched to the
- entire remainder of the workspace, and that that remainder be null. Thus the
- rule
- (<>,there<'>s nothing here)
-
- will guarantee that the workspace is empty and say so; the rule
-
- (,goodbye)
-
- will ALWAYS succeed. It is a useful final rule to give a function a value.
-
- The reason that we have to insist explicitly on a null workspace has to do
- with a feature of the pattern matching process which usually simplifies
- programs and makes them run faster.
-
- A final UNBOUND VARIABLE, including <-->, will always match the entire
- remainder of the workspace. A final CONSTANT, including definite intervals
- and BOUND variables, will only require corresponding text, but will remain
- indifferent to any text or lack thereof which follows it. This is why null
- text will match anything, because "anything" always begins with a null
- character. Historically CONVERT programs spent far too much time seeking
- character by character for the end of the text when it corresponded to a
- final variable. Likewise, it is bothersome to append a final <--> to text
- only whose beginning interests us.
-
- Consider the two patterns
-
- <4><4> and <4><4><>
-
- The first pattern matches anything: assuming it is initially unbound, the
- first instance of <4> binds tentatively to the null string. Then, since
- the second instance of <4> is an instance of a BOUND variable, it will also
- match. No constraints are placed on the rest of the workspace, so the
- tentative binding becomes final. On the other hand, the second pattern
- matches only strings made up of two identical substrings: if the workspace
- text is non-null; the initial null-string trial value for the first instance
- of <4> causes <> to fail after the second instance of <4> matches.
-
- There is another context in which the null-seeker <> is implicit. Suppose
- that we have defined balanced parentheses using the following two pattern
- definitions:
-
- [non-parenthesis] ((and,<[1]>,(nor,<(>,<)>)) n
- [balanced parenthesis] (<(>(ITR,(or,<:p:>,<:n:>))<)>) p
-
- and that we want to bind the variable <0> to the contents of a parenthesis
- pair. We would then define
-
- [parenthesis interior] ((and,<:p:>,<(><0><)>)) i
-
- where the second pattern in the "and" has implicitly the form <(><0><)><>,
- which ensures that the right parenthesis following <0> is the final right
- parenthesis that <:p:> picked out, not the first one to be found; the
- implicit <> is required by the definition of "and".
-
- :A different kind of interval is the lexicographic interval. This is a
- pattern that matches a string if it lies within an interval defined by
- two strings. The lexicographic ordering is that induced by the ASCII
- code. There are three lexicographic interval patterns:
-
- (IVL/x/y/) Interval defined by constant strings x and y
- (which do not contain the delimiter /).
- (ivl,s1,s2,...) Multiple intervals delimited pairwise by skeletons
- s1, s2, ... .
- (^) A single control character between NUL and US; this
- is equivalent to (ivl,(^@),(^_)).
-
- (IVL/x/y/) matches strings lying in the interval [x,y] (limits included);
- it will match the shortest string t satisfying x <= t <= y. If x is the
- null string, IVL matches the null string trivially; if y is the null string,
- IVL is interpreted to match the shortest string t satisfying x <= t.
-
- In (ivl,s1,s2,...) skeletons are taken by pairs and evaluated; if vj is the
- value of skeleton sj, the effect is the same as (or,(IVL/v1/v2/),...). If
- there is an odd number of skeletons, a null string is assumed for the second
- string of the last interval. Delimiters in ivl are always commas; the
- delimiter in IVL may be any character not in the interval bound strings.
-
- :The working part of a Convert quadruple is its rule set, the fourth member
- of the quadruple. There is not much point to a null rule set, but it is
- quite possible for a program to depend on a single rule. The set has the
- form
- (
- (p1,s1):
- (p2,s2);
- ...
- (pn,sn);
- )
-
- The outer parentheses are necessary; they define the set. The inner parentheses
- are also necessary, for they define the rule. It is a matter of one's personal
- preference as to where the external parentheses are located, but it is good
- programming practice to adopt a definite style and follow it, as it makes
- errors much easier to spot when reviewing a program. In contrast, once the
- rule has commenced, every space, tab, line feed or what not counts. The comma
- which separates the pattern from the skeleton is a prominent part of the rule
- and must be placed accurately. Commas which occur within the pattern are also
- crucial in their placement. Commas which are really constants and part of the
- text must be quoted: <,>.
-
-
- The colons or semicolons which immediately follow the rules are essential, and
- are an inheritance from REC. They determine whether rules are to be repeated,
- when the colon is used, or whether the program has terminated, signalled by the
- semicolon. Rules are tried out in sequence, from top to bottom, left to right
- as they are written on paper. Several short rules can occupy the same line if
- convenient. If no rule applies, the workspace is left unchanged. This is still
- another way that the execution of a program may terminate.
-
- There are some fine points to be considered in designing a rule set. Using a
- colon to produce an iterative transformation of the workspace supposes that
- the whole workspace is available for the transformation. If only a part of
- the workspace is to be subject to the transformation, the preservation of the
- remainder arises in a way that makes it preferable to exercise a recursive
- call even though it is to the same rule set. This makes counting the number
- of preserved fragments automatic.
-
- Since Convert is a pattern directed language, and the pattern is presented
- by example and not in some other way, the rules must be stated precisely.
- Outside the rule more freedom is permitted, allowing the rule set to be
- formatted according to what the programmer considers attractive. Comments,
- enclosed in square brackets, have no effect on the compilation and their
- liberal use will enhance the program's quality.
-
- :Debugging Aids. Convert offers several aids to debugging programs. As with
- any language, programs that don't contain errors don't have to be debugged.
- This may look like redundant advice, but it is sound. Good programming habits
- reduce errors. Although there are free-form aspects to Convert, it is still
- amenable to developing typical program formats and adhering to them. They
- promote good programming, and make lapses easier to detect. The patterns
-
- (PWS) [print remaining workspace]
- (PWS,mssg) [identify printed workspace with message]
- (HLT) [stop program, wait for keystroke]
- (HLT,mssg) [print message, then halt]
- (PVR,n) [print value of variable n]
- (NOP,p) [null pattern, disables p]
-
- are provided primarily for debugging. The rule
-
- ((PWS,subroutine: )(or),);
-
- placed at the beginning of a program allows tracing by showing the workspace as
- each subroutine is entered; also the evolution of the workspace as repetitive
- rules change it. If a given rule hangs up a calculation, it can be found
- by interspersing this same rule among the normal rules.
-
- On the skeleton side of a rule, the only debugging aids are the skeleton
- functions %t and %T which display their argument on the console (the former
- nulls its argument, the latter returns it unchanged). %T may be applied to a
- problematical skeleton to see what kind of a result it produces when evaluated.
- If a function is under suspicion, the combination
-
- (%T,(f,(%T,...)))
-
- will show both the argument and the result. To display messages without their
- remaining in the workspace, %t may be used:
-
- (%t,...)
-
- Selective printing suffices to find the majority of errors in Convert which
- are syntactically correct but the result of a poorly designed or erroneous
- program. There are occasional errors which are due to flaws in the support
- programs which Convert uses. These are gradually being caught and eliminated,
- but the possibility always exists that an error has occurred on this more
- fundamental level.
-
-
-
-
- There are certain possibilities for error arising out of the resemblance
- between the symbolism of REC and the symbolism of Convert, which are activated
- when a bit of source code passes directly from Convert to REC. For example, the
- notation <:0:> for a defined pattern reference can be miswritten <0:>,
- causing ":" to pass unchanged, this may well provoke an unending loop.
-
- Another source of error lies in the use of double angle brackets to generate
- continuation lines for patterns or skeletons. Consider the code
-
- (pattern,<<
- >>skeleton 1<<
- >>skeleton 2>>
- >>);
-
- The double angle following skeleton 2 is reversed; such errors are common.
- Note also that the double angles are effective only within a pattern or within
- a skeleton, they cannot bridge across from one to the other. Thus,
-
- <<(pattern, skeleton)>>
-
- cannot be used to "comment out" a rule.
-
-
- If a program cannot be debugged with judicious use of print statements, and
- following it with DEBUG appears too formidable, it is useful to revise the
- intermediate REC code. If sections appear more like Convert code than REC code,
- a parenthesis, angle bracket or quote is probably unbalanced. It should be
- located and corrected, and the compilation tried anew.
-
- It is possible to put much more selective print statements or comments in the
- REC intermediate, using the operator T or inserting 'message'TL at appropriate
- points. Of course, they will be lost when the intermediate is discarded in
- favor of a new one when the Convert source is corrected and recompiled, so one
- should not expend the effort to make elaborate insertions unless the problem
- is very difficult, the intermediate is renamed to save it, or one is willing
- to forego a new compilation until significant progress has been made in
- correcting the difficulties in the program.
-
- :Performance. As a very rough estimate, a REC program compiles into three times
- as many bytes as the source code. This factor is reduced according to the ratio
- of comments to program, and when large quantities of text are quoted. Comments
- result in no compiled code; quoted material goes over byte for byte plus a tiny
- overhead to load it onto the pushdown list at execution time.
-
- Convert programs typically produce 30% more REC object code than there was
- source code, again modified by the presence of comments and the ratio of
- constant patterns and skeletons to the use of variables and Boolean composites.
- To this one must add the initializing code inserted by the compiler, occupying
- about 1K. Once loaded and ready to execute, a Convert program takes up about
- 3.5 times as many bytes as in the source file, plus 2K of initializing code
- and between 2K and 12K of library routines, depending of the program require-
- ments determined by its use of library calls.
-
- As a practical matter, the maximum size Convert program (in source) that a
- REC compiler can execute (without recourse to overlays) is about a quarter
- of the compilation area size at REC's disposal, which is about 23 or 24K
- for REC80 and up to 60K for REC86.
-
-
-
-
- Many of the simple programs that can be written in Convert can also be written
- in assembly language, and the amount of resultant code compared. In very
- general terms, the factor ranges between ten and twenty, tending to twenty.
- A similar inflation is experienced in some other "high level" languages - the
- "C" compiler for example. The reason for this inflation is readily found in
- the formal expansion of certain structures which have to be fairly intricate
- to handle general cases and can be much reduced in particular instances.
-
- Programming time in Convert can be extremely short - minutes in some cases. The
- inflation in program size is fully recovered when considering development time
- for a program.
-
- Execution time tends to follow program size. Programs written in Convert to
- process files can handle them at the rate of lines per second - slow compared
- to good assembly language programming, not unacceptable in an absolute sense,
- and vastly faster than doing the same job manually through an editor.
-
- :Bibliography.
- Convert is a chain oriented adaptation of the LISP based CONVERT in
- Adolfo Guzman and Harold V. McIntosh
- "CONVERT"
- Communications of the ACM 9 604-615 (1966).
-
- Articles on string Convert are
- Gerardo Cisneros and Harold V. McIntosh
- "Introduction to the programming language Convert"
- SIGPLAN Notices 21 #4 (Apr) 48-57 (1986)
- [Translated from "Introduccion al lenguaje de
- programacion Convert", Acta Mex. de Ciencia y
- Tecnologia 3 #9 (Ene-Mar) 65-74 (1985)]
-
- Harold V. McIntosh and Gerardo Cisneros
- "The Programming Languages REC and Convert"
- SIGPLAN Notices 25 #7 (Jul) 81-94 (1990) ]
-
- [CNVRT.HLP]
- [Harold V. McIntosh, 11 March 1984]
- [Rev.: G. Cisneros, 23 January 1986]
- [Rev. for MS-DOS: G. Cisneros, 25 September 1990]
- :[end]