Turbo Toolbox

home *** CD-ROM | disk | FTP | other *** search

/ Turbo Toolbox / Turbo_Toolbox.iso / sharew / exoten / rec / cnvrt.hlp < prev next >

Wrap

Text File | 1990-10-28 | 50.1 KB | 1,114 lines

Program structure Patterns Skeletons Comments Program File Library and Initialization Boolean Patterns Constant Patterns and Skeletons Input-Output Skeletons Directory Skeletons Arithmetic Skeletons Conditional and Iterative Skeletons Variables Intervals Lexicographic Intervals Rule Set Debugging Aids Performance Bibliography :Structure of a Convert program. A Convert program has the format ((p)(s)(v)(r)) x where (p) is a list of pattern definitions ((P1) p1 (P2) p2 ...) (Pj a pattern, pj its name) (s) is a list of skeleton definitions ((S1) s1 (S2) s2 ...) (Sj a skeleton, sj its name) (v) is a list of variables (v1 v2 ...) (each vi a number between 0 and 30, a single space separating variables in the list; no space whatever if no variables are declared) (r) is a list of rules, each having one of two possible forms: (pattern,skeleton): [for a repeating rule] (pattern,skeleton); [for a terminal rule] x is the name of the program (null for the main program, which must appear last) Every Convert program is a function which maps a string into another string; this mapping is performed by the list of rules. Application of the list of rules to a string procedes as follows: - Rules are tried in order, from first to last. - Trying a rule means trying to match the pattern to the string. If there is a match, the rule is said to apply and the string is replaced by whatever text is produced by the skeleton. - If a rule ending in a colon applies, the list is tried from the beginning on the new string; if a rule ending in a semicolon applies, the string produced by the corresponding skeleton is the value of the function. - If no rule applies (none of the patterns match), the string remains unchanged and is returned as the value of the function. Thus patterns are predicates which determine whether a string has a certain form or not, while skeletons are string-valued entities. Sample programs: [the null program; as a function, it is the identity mapping] (()()()()) ------------------- [the canonical "display a greeting" program] (()()()( (,Hello<,> world!); )) ------------------- [a function named w to replace strings of two or more spaces by a single space] (()()()(0 1)( (<0> (ITR, )<1>,<0> <1>): )) w ------------------- [a function named x to replace trailing spaces by tabs, for each eight columns] (()()(0 1)( ((and,<[8]>,<0> (ITR, ))<1>,<0>(^I)(x,<1>)); ((and,<[8]>,<0>)<1>,<0>(x,<1>)); )) x ------------------- The first two may be placed in files by themselves, compiled and executed. Two programs can be used together, with one calling the other. In the second one below, the pattern 0, referenced as <:0:>, although it is only used once represents the more complicated composite (and,<[8]>,<0>). This whole phrase is a pattern, and must be enclosed in parentheses to make a definition. There is no conflict between variable names and pattern or skeleton names; here 0 is used in both senses. [replace 8-column tabs by spaces] (()()(0 1 2)( ((and,<[8]>,<0>(^I)<1>)<2>,(z,<0>)(y,<1><2>)); ((and,<[8]>,<0>)<2>,<0>(y,<2>)); (<0>(^I)<1>,(z,<0>)<1>): )) y [fill out tab space] (( ((and,<[8]>,<0>)) 0 [a pattern definition] )()(0)( (<:0:>,<0>); [a rule using the above definition] (,<=> ): [this rule appends 8 blanks to the text] )) z Skeletons can also be symbolized by single letters. The following program copies one file to another, which COPY could do more efficiently. (R) and (W) are read and write skeletons, respectively; (W) incorporates a variable that must have been defined by a pattern match. [read word, write word] (()()(0)( (<0>(^Z),(W)); (<0>,(W)(R)): )) a [main program] (()( ((%R,<9>.ONE,(ITR, )<-->(or, ,(^Z)))) R ((%W,<9>.TWO,<0>)) W )(9)( (<9>(or, ,.,<>),<< >>(%Or,<9>.ONE)<< >>(%Ow,<9>.TWO)<< >>(a,(R))); )) :Patterns may have one of the forms xxx [constant: string of ASCII characters between ! and ~, except parentheses, angle brackets, comma and single and double quotes; it may include SP, CR, LF and HT] (<) (>) [constants: left and right angle brackets] <(> <)> [constants: left and right parentheses] <,> <'> <"> [constants: comma, single quote, double quote] (QUO/.../) [constant ... with delimiters /] (DEC,n) [the decimal byte n (mod 256)] (HEX,n) [the hexadecimal number n] (^xxx) [control characters xxx, each x in [@,_]] <()> [balanced parentheses] <:a:> [defined pattern a] <[n]> [interval of length n] <[-n]> [remaining text but the last n bytes] <[s]> [interval of length given by skeleton s] <--> [indefinite length interval] <n> [variable n, n between 0 and 30] <> [null string at the end of the text] (LAM,(v),p) [p with local variables given in list (v)] (AND,p1,p2,...,pn) [matches if all patterns p1,..pn match] (and,p1,p2,...,pn) [simplified version of AND] (OR,p1,p2,...,pn) [first of p1,...pn to match] (or,p1,p2,...,pn) [simplified version of OR] (NOT,p) [string does not match p] (ITR,p) [Repeat p as much as possible] (itr,p) [Repeat p as little as necessary] (IVL,m,n,) [lexicographic interval [m,n]; m, n constants] (ivl,s1,s2,...) [lexicographic intervals [s1,s2],... ] (^) [any control character below ASCII SP] (PWS) [print remaining workspace] (PWS,r) [print remaining workspace after message r] (PVR,n) [print value of variable n] (HLT) [wait for keystroke] (HLT,mssg) [print mssg, wait] (NOP,p) [null pattern, disables p] (f,s) [matches text identical to value of function f applied to argument s (another skeleton)] <<...>> [null pattern] :Skeletons may take any one of the following forms xxx [constant: string of ASCII characters between ! and ~, except parentheses, angle brackets, comma and single and double quotes; it may include SP, CR, LF and HT] (<) (>) [constants: left and right angle brackets] <(> <)> [constants: left and right parentheses] <,> <'> <"> [constants: comma, single quote, double quote] (QUO/.../) [constant ... between delimiters /] (DEC,n) [the decimal byte n (mod 256)] (HEX,n) [the hexadecimal number n] (^xxx) [control characters xxx] <n> [variable n, n between 0 and 30] (f) [function (skeleton) f, null argument] (f,s) [function f, argument s (another skeleton)] <=> [same text used in the last comparison, or argument of skeleton in a definition] <<...>> [null skeleton] (if,s{,p,s,s}[,p,s]) [conditional] (IF,(v),s{,p,s,s}[,p,s]) [conditional with variables] (nf,s{,p,s,s}[,p,s]) [negative conditional] (NF,(v),s{,p,s,s}[,p,s]) [negative conditional with variables] (while,s{,p,s,s}[,s]) [iterative] (WHILE,(v),s{,p,s,s}[,p,s]) [iterative with variables] (until,s{,p,s,s}[,s]) [iterative, negative form] (UNTIL,(v),s{,p,s,s}[,p,s]) [iterative with variables, negative form] :Comments enclosed in square brackets (which may be nested) may appear at any of the places indicated by periods in the following sample structure: .(.(.(P1)p1.(P2)p2.).(.(S1)s1.(S2)s2.).(v1 v2).(.(Pa,Sa);.(Pb,Sb):.).)x. They may NOT appear: - within patterns or skeletons, - between definitions or functions and their names, - between rules and the colon or semicolon following them. Spaces and tabs are allowed in any of the latter two categories, besides places where comments are legal. No spaces are allowed within a list of variables, except single spaces used to separate variable numbers appearing in the list. Since all spaces, tabs, and similar characters are taken literally in patterns and skeletons, the pair <<...>> may be used to format programs into lines and columns; any text enclosed (as long as it does not contain a pair of right angle brackets) will be ignored. For example (and,<[8]>,<< >><0>) is identical to (and,<[8]>,<0>) :Program file. A Convert program will eventually have to be compiled and executed if it is to produce any practical results. The program responsible for first compiling it is CONVERT.REC, which requires a disk file containing the program, and which will produce a REC program to be executed. Both steps require having a REC compiler in the system, usually REC86.EXE, although REC86F.EXE and REC87.EXE may also be used; either of the latter two is required for the execution of Convert programs involving floating point operations. The disk file bearing a Convert program must have the extension CNV; the command line REC86 CONVERT SOURCE will compile the file SOURCE.CNV to produce SOURCE.REC, which can then be executed with the command line REC86 SOURCE [argument] where [argument] is an optional string received by program SOURCE as the initial text when its execution begins. Explicit disk assignments and pathnames can be given for any of the files mentioned in these command lines. REC and CNV extensions are always assumed. The execution of the compiled program requires the presence of a library, CNVLIB.REC, in the current directory or in any of the directories specified by the PATH environment variable. The command interpreter environment is available in variable <30>; it consists of a concatenation of NUL-terminated strings, each having the form VAR=value. When execution terminates, all open files are closed and whatever text results from application of the main program to the initial string is displayed on the console. Since a Convert program will be compiled into a REC program, it is convenient to require that it be laid out like a REC program: (subroutine 1) n1 (subroutine 2) n2 ... (main program) wherein subroutine definitions alternate with subroutine names, which are printing ASCII characters excluding the space and the 11 characters ( ) % & # ~ < > , @ and }. CONVERT will insert the appropriate braces to get a REC program, as well as a subroutine to initialize and load the required routines from the library. Certain comments enclosed in square brackets are processed by the compiler; in order for them to be noted the program should be structured according to the following scheme: [SOURCE.CNV] [Author, Date] [comments] [Exclude LIB] [Include ..] [subroutine 'a' descriptor] (()()()()) a [another subroutine] (()()()()) b [main program] (()()()()) The comments which are processed are [xx.CNV], [Exclude LIB], and [Include ..]. The header [SOURCE.CNV] should be upper case, and will be transformed into [SOURCE.REC] by the compiler. The header, together with the attribution of author and date help to identify the program, and with the date, its version. All this header material should be placed in balanced square brackets. The [Exclude LIB] comment indicates the program should not include the initial- izing routine; if appearing, it overrides any [Include ..] also present. [Include ..] indicates which routines must be loaded at run time in addition to those whose inclusion is automatically determined at compile time. Both types of comment are required only in special circumstances. The [Include ..] comment is only required by the root program in an overlay tree, by a program requiring floating point arithmetic, or by a program using the operators ^ or ** (for exponentiation) or % (for remainder) in arguments to the formula-evaluating skeleton (#f,s) and not using explicitly the skele- tons #^ or #%. For example, the comment [Include #.#^#%] is required for a program which would evaluate formulae containing floating point numbers and exponentiation and remainder operations, and invoking only the library function #f. The [Exclude LIB] comment is required by the overlaying segments of an overlay tree. More about Include and Exclude may be found in CNVADV.HLP. The subroutines and main program then follow. During compilation, the compiler displays for each one of them the names of patterns and skeletons they define, the variables they declare, the punctuation following each rule (: or ;), the name of the subroutine (if it is one), and a series of dots, one for each sector of compiled code written to the disk. :Library and Initalization. The library is contained in file CNVLIB.REC, which is sought and read during the initializing process. The library should reside in the current directory or any of the directories given by PATH unless the command line contains an argument having one of the following forms: L/d: L/pathname-ending-in-\ L/filename.ext The first and second forms indicate CNVLIB.REC resides on the specified disk or directory, respectively; the third form indicates that the file whose name is given (and which may include a disk identifier and subdirectory path) should be used as source for the library. No extension is assumed in the last form if one is not given explicitly. The argument L/ is transparent to the program, that is, it will neither appear in the program's initial text nor interfere with other arguments; for instance, REC86 DSASM L/A: WRTSYS REC86 DSASM WRTSYS L/A: both produce the same result, passing the string "WRTSYS" as initial text to the program being executed (DSASM.REC). :Boolean patterns. One of the mechanisms for generating complex patterns from simpler constituents is to form Boolean combinations of patterns. The fundamental Boolean connectives AND, OR, and NOT may be used. In Convert, AND and OR are not binary operations, but rather may have any number of arguments; a Boolean function of a variable number of arguments has a standard definition, which follows the associative law and reduces to the binary function in that special case. Thus it is not capriciousness which requires that (AND) always matches (to the null string) (AND,p) is the same as p (matches if p matches) (AND,p1,p2,...) matches when all of the pj match (OR) never matches (OR,p) is the same as p (matches if p matches) (OR,p1,p2,...) matches if at least one pi matches. As is customary in many programming languages, Boolean combinations are executed progressively, so that no more arguments are evaluated than the minimum needed to reach a decision. The first failure in an AND, the first match in an OR decides the expression. Convert has an alternative series of Boolean functions, namely (and,...), (or,...), and (nor,...) [which is equivalent to (NOT,(or,...))]. They exist only for reasons of efficiency. The issue is that Convert combines variable matching with variable generation, rather than separating these two activities. This is in turn more efficient, but subject to logical paradoxes if not done correctly. Concretely, if a variable match fails, Convert will back up and retry the previous variable it identified, or will progress to the next alternative of a preceding OR. In using the lower case Boolean operators we forsake all this jockeying in the interest of speed, but it is forsaken nevertheless. Thus, lower case Boolean operators are to be used when the decision which they represent is final - for example if their arguments are constants. In practice, if the patterns (ITR,xxx) and (itr,xxx) are used, the searching required by a program can be assigned to them, and the lower case operators used exclusively. The upper case operators work correctly albeit more slowly. Combining variable generation with variable matching as Convert does restrict the participation of NOT in patterns; its arguments must not contain unbound variables. One of the most common uses of AND is to impose some condition on a variable, although it could also be used to parse a string in two or more different ways. For example, consider CP/M's directory entry from which we may wish to extract the file name and the extension, and ignore the rest of the block. The pattern (and,<[1]>(and,<[8]>,<1>)(and,<[3]>,<2>)<[20]>,<4>) will bind the whole 32-byte string to the variable <4>, but at the same time allow us to retrieve the file name as <1>, and the extension as <2>. Both uses of AND are shown in this example. The pattern (and,(ITR,(and,<[1]>,(nor, ,(^I))),<0>) will match a string free of spaces or tabs - whitespace as some say - and associate it with the variable <0>. The inner "and" prevents the null string from matching the "nor". The first argument of an AND establishes the length of the string that the remaining arguments must match. Another example would be to identify a decimal number through (and,(ITR,(and,<[1]>,(IVL,0,9,))),<0>). The most apparent use of the Boolean OR is to express alternatives. Taken together with a mechanism to assign a name to a pattern, OR can be used to generate recursive patterns. Given the definition ((or, <:s:>,)) s we can define a series of spaces - which is either a space followed by the series, or the null string. This definition also makes use of the property, that the first viable alternative satisfies an OR. Alternatively, ((OR,, <:m:>)) m would be a pattern that rendered the null string unless a reconsideration were forced upon it. These two alternatives define respectively a maximal and a minimal string satisfying a condition defined by an OR. Equivalently, ((ITR, )) s ((itr, )) m Many other kinds of recursive definitions can be made with OR, but it is interesting to note that ITR and itr seem to be sufficient for the applications that have been encountered. An OR with a null terminal argument is a convenient way to express optional elements of a string. Consider ((or,+,-,)<:d:>(or,.<:d:>,)) f as a definition of a signed floating point number. The sign is entirely optional, as is the decimal point. According to whether <:d:> accepts the null string or not, <:f:> could match or not a single isolated decimal point. If it does not, we would have to modify the definition to make 1. into an acceptable number. :Constant Patterns and Skeletons. There are two types of constants: default constants and distinguished constants. A default constant is any string including spaces, tabs, carriage returns, line feeds and printing ASCII characters (those between ! and ~) except any of the seven characters ( ) < > , " and '. The first five exceptions have been reserved to delimit patterns and skeletons of diverse kinds; the last two play an exceptional role in REC and so must be given special treatment; it is the conflict with REC which keeps them from being used for quoting in Convert. The distinguished constants are described in the remainder of this section. The very delimiters themselves have to be quoted; for conciseness and to avoid reserving yet more symbols, we use them to quote each other: <(> left parenthesis <)> right parenthesis (<) left angle (>) right angle <,> comma <'> single quote <"> double quote It is not convenient to incorporate control characters directly into programs, because they interfere with printing the program for reference. The pattern (CTL,xxx) proved to be unsightly, so we use the form (^xxx), in analogy to the common representation ^X for a single control character. The characters allowed in the string xxx are those between @ and _; these bounds correspond to control characters whose values in ASCII are 0 and 31, respectively. Some useful control characters to remember are: (^I) horizontal tab (^Z) end of file (^MJ) carriage return, line feed (^[) escape Provision has been made for quoting a long string, too cumbersome to represent character by character using the foregoing conventions; we could write (QUO/.../) wherein / could be any character not occurring in the text ..., and terminates the string with its second appearance in the pattern. Two constants have been provided for recognizing (as patterns) or generating (as skeletons) bytes of binary data: (DEC,n) decimal byte n (mod 256) (HEX,k) the shortest byte string needed to represent the hex value k (including leading zeros in k) For example, (DEC,2035) represents a byte whose decimal value is 243, whereas (HEX,5C) represents a byte whose decimal value is 92. For hexadecimal strings with 3 or more digits, the string represented by (HEX,k) is machine-dependent; in microprocessors following Intel's convention, (HEX,05C) would represent two bytes, the first of which is 92 (decimal) and the second zero (the least significant byte in the lowest addressed location); on other microprocessors like the MC68000, (HEX,05C) would represent two bytes, the first one zero and the second one 92 (the most significant byte at the lowest address). In any case, the string represented is |_ (|k|+1)/2 _| bytes long. :Input-Output skeletons exploiting MS-DOS's BDOS: (%Or,pathname) open file for reading (%Ow,pathname) open file for writing (%r) read from standard input (%r,pathname) read file (%r,pathname,pattern) read until match (%r,pathname,patt,skt) read, match, substitute (%r,pathname,patt,skt,skf) read, match, two options (%W,pathname,xxx) write file (%C,pathname) close file (%E) close all files (%Lr) get id of currently logged-in disk (%Lw,D) log in the given disk (%S,pathname) search (%A) search again (%D,pathname) delete (%N,new_pathname,old_pathname) rename (%T,xxx) type, preserve (%t,xxx) type, erase (%+) type CR,LF (%P,xxx) print, preserve (%p,xxx) print, erase The pattern-directed READ operation is worthy of attention. There are several forms of the READ function-skeleton %r, (%r) read from default (%r,pathname) read file (%r,pathname,pattern) read until match (%r,pathname,patt,skt) read, match, substitute (%r,pathname,patt,skt,skf) read, match, two options which are progressively more complex. Beginning with the first, the standard input is assigned, and a single line is delivered to the workspace with each invocation, without the terminating carriage return or line feed. The standard input is the keyboard unless modified by means of redirection by the command interpreter. This form gives a simple means of communication that can be especially helpful in the initial stages of program development. The next alternative in the sequence of complexity requires the programmer to assign a specific file. The latter can be a constant, or it can be a skeleton which evaluates into the name of the file. An example would be (%r,<7>:<8>.<9>), where the disk, file and extension have been determined separately and bound to variables 7, 8 and 9, respectively. (%r,) is the same as (%r). Just as in the null name case, a single line, without its terminator, is placed in the workspace with each execution of the skeleton. If the end of file has been reached, a control-Z is inserted and will be produced repeatedly each additional time an attempt is made to read the exhausted file. The file itself is buffered so that partial contents such as a single line can be read at will. Sometimes data is not divided into lines, or it may be that the program finds it inconvenient to receive a whole line at a time. A binary file would typify the former case, the scrutiny of a file word by word or by sentences would exemplify the second. In these cases the third form of %r is useful. The third variant, to which a pattern has been adjoined, will search the input stream until the first instance of the pattern is found, which will then be given to the workspace. Using the skeleton again will locate the second instance of the pattern, and so on. If the file is exhausted and no match was obtained, all of the material read remains in the workspace; subsequent reads will leave the null string. Care in the selection of the pattern must be exercised to avoid the possiblility of overflowing the workspace, which would cause program termination. The final two forms practically allow the incorporation of a whole pre- processor into the read command, since it allows any recognition and transformation that Convert is capable of expressing to be incorporated into the act of reading a file; furthermore this processing can be tailored individually for each file, and even for each instance of reading. In the next to the last case, %r leaves the value of skt if the pattern does match; otherwise all of the text (to the end of the file) is left. In the last case, skf is a skeleton which generates the text left in the workspace if the pattern does not match the text read. In these skeletons, if the null pattern is given [e.g. (%r,,,skt)], %r reads up to the next carriage return (or to the end of the file if no CR is found). Finally, patterns within READ skeletons requiring unbound variables for binding may be accomodated by including a list of variables (a list of one or more numbers between 0 and 30 separated by single blanks) following the alternative name %R: (%R,(v1 ...),pathname,pattern) (%R,(v1 ...),pathname,pattern,skt) (%R,(v1 ...),pathname,pattern,skt,skf) Skeletons performing writing operations vary as to the result they leave in the workspace. Skeletons %T and %P leave the value of their argument; %t and %p always erase it. In addition, %T and %t write CR and LF before sending their argument to the console, whereas %P, %p and %W write no more than the value of their arguments. If %W is not able to write on the indicated file due to lack of space, it will leave on the workspace the unwritten portion of its argument. (%Or,pathname) leaves the null string if it was possible to open the indicated file; it leaves the string "Not Found" if the file could not be opened. In functions %r, %W, %Or, %Ow and %C the value of the skeleton designating the file is not restricted to an actual disk file, in conformity with CP/M's conventions (inherited by MS-DOS) that there may be such devices as TTY: or EOF:. The assortment is not exactly the same, but it includes TTY: console keyboard MEM:X named memory pseudofile NUL: the null file ARR:X named array pseudofile CTR:X named counter STK:X named stack pseudofile The operations of reading via %r and writing via %W may be performed on these devices just as though they were disk files. They may be opened and closed by using %O and %C; in fact these operations are a necessity if these false devices are to be given a meaningful definition, or if they are to be removed from a program in an orderly fashion when they are no longer required. (%W,TTY:,skel) works like (%t,skel), but without the CR/LF included by %t in its operation. (%r,TTY:prompt) reads a line from the keyboard regardless of standard input redirection, and allows a user-defined prompt to be provided. Every write operation on NUL: simply erases its argument; every read from NUL: returns control-Z; that is, NUL: functions as a file to which one can "write" without limit but which is permanently exhausted when read. This pseudofile is useful in compiler and assembler construction, where for test purposes one may not want to produce, say, an object file. In this case a variable could be bound at the beginning of the program to NUL: or to a disk file, and this variable be used in all references to the object file. MEM:X, ARR:X and STK:X pseudofiles are dealt with in detail in CNVADV.HLP. To open and then read a counter requires only a name, of eight or less ASCII characters. Neither counters nor pseudofiles use extensions. To write a counter we have a series of parameters, whose tail may optionally be discarded at any point after the counter's name. Altogether, we can write (%Or,CTR:XXXXXXXX) (%W,CTR:XXXXXXXX,val,incr) (%r,CTR:XXXXXXXX) Argument val is the initial value; its default is 0; incr is the increment, which may be signed and whose default is +1; the assigned value is any (sixteen bit) integer which is modified in modulo-16 arithmetic. Both of the parameters are ASCII strings representing decimal numbers, introduced into the skeleton as constants or as other skeletons which evaluate into constants of the required form. Every time that a counter is read, its present value is reported, but the increment is added to its stored value so that it will appear at the next reading. In the language of "C",it is a postincrementing counter. When a counter is opened, it is assigned default value 0, increment 1. :Directory Skeletons. It is possible to write a Convert program which will process a series of files, for example to pack several short files into a single large file. Later, with another similar program, they can be restored to their original condition. This process can be used advantageously to build up libraries, for example. Another use would be to search for a series of files and to offer the user to process each one in turn, interactively. To realize this sort of operation Convert offers access to the BDOS functions for directory and disk system access via the operators K and k in REC. (%S,pathname) initial search for file (%A) subsequent search for file (%D,pathname) delete specified files (%N,newpathname,oldpathname) rename file (%Lr) get id of currently logged-in disk (%Lw,D) log in the given disk Arguments for %S and %D may include the wildcard characters * and ?, allowing families of files to be referenced. The following scheme, showing a Main program and satellite, can be used as the basis of a Convert program which will process a series of files as given by a possibly ambiguous file reference in the initial command line. [Program heading, including Name and Comments] [the bulk of the program, accessed through a program called "y"] (()()()()) y ... [Gather directory entries in WS, removing status bytes] (()()(0 1)( (Not Found(^@)<0>,<0>); (<[9]><0>(^@)<1>,(%A)(^@)<0>(^@)<1>): )) z [Main program: search for first] (()()(8)( ((and,(or,<[1]>:,)(ITR,<-->\),<8>),(y,(z,(%S,<=>)(^@)))); )) %S and %A return a string starting with nine bytes (containing attributes, file length, time stamp, etc.) and the base name of the file (name, period, extension). Thus the main program binds any disk identifier and subdirectory path to variable <8>, which then is available to any subroutine called as long as its variable list does not include 8. "Not Found" is returned whenever the search fails. (^@) --NUL-- is used to separate file names in "z". On arriving at the execution of "y" the workspace contains all the relevant file names --possibly none-- which were found by "z". It is advisable to gather them up all at once on entering the program, before the directory begins to change. This avoids the conflict of a new file having the same name as an old file. For other programs to use the workspace, this directory extract must be stored in a variable and parcelled out to the programs as they are required. It will generally be sufficient to call the following program "y" and interpose it between the actual program, "x", and the main program, which already calls it. [Get next name] (()()(0 1)( [separate next name] (<0>(^@)<1>,<< [process this file] >>(x,<8><0>))<< [rest of files] >><1>): )) y "y" rebuilds a full name by prefixing the filename (bound to variable 0) with the (possibly null) disk-subdirectory path bound to <8> by the main program. "x" encounters a pathname on entry, must leave a null chain when it finishes; but it could possibly return some additional files for processing. :Arithmetic Skeletons. Convert provides facilities for integer and floating point arithmetic (through skeleton #) and for character "arithmetic" (through skeleton &). The latter facilitates upper/lower case conversions, hexadecimal dumps and access to individual bits or groups of bits within bytes. The file CNVADV.HLP contains a detailed description of the operations possible with these two library skeletons. :Conditional and iterative skeletons. The fundamental skeleton forms in Convert are constants, variables and functions. In principle it is sufficient to work with this combination because functions are readily defined and capable of defining any construction that one wants. Thus the motivation for introducing further skeleton forms would have to be to offer some frequently used function as an inherent feature of the language. Other motives would be to avoid the cumbersome ritual of defining a function if a particularly simple action were desired, or to avoid preparing an argument for a function and then finding that the argument would not be used after all. Most languages have facilities for the selection of alternatives - an IF statement, or for an orderly repetition of some activity - a DO or a WHILE statement. The convenience of these constructs is widely recognized. Convert offers the conditional skeletons if, IF, nf and NF, and the iterative skeletons while, WHILE, until and UNTIL, all of which are described in detail in CNVADV.HLP. :Variables are distinguished by decimal numbers in Convert. There is no theoretical limit to their range, but the practical range is 0-30 with the present structure of the underlying REC compiler. Few programs go beyond ten variables, many subsist with one or two, and it is possible to have a program without any variables at all. Even if a program binds no variables, they must be enclosed in parentheses. The same is true of pattern and skeleton definitions, so that a program quadruple (()()()()) may contain some null parentheses. A program should not bind a variable which it has not declared - that is, if a previously undefined variable appears in the pattern of one of the rules of a program where it may become defined, it must be declared in the variable list. The variable list guarantees that the program has at its disposal new instances of the variables listed, and that those variables are unbound prior to trial of each rule in the rule set. A variable is designated by enclosing its number within angle brackets. <0>, <8>, or <15> are examples of variables. Variables are defined by patterns and used by skeletons. Convert combines variable matching and variable generation so that a variable can be defined and then matched within the same pattern. Generally speaking, variables are defined by the constants which surround them; thus v<0>e would assign the value ariabl to <0> when matched against the word variable, or a null value when matched to ve. Since the null string is accepted throughout Convert, a very common type of error arises from matches involving a null string in a way that the programmer had not foreseen. Since Convert uses other pattern forms than constants, a variable need not always be delimited by a constant. <[9]>(and,<[3]>,<0>)<[20]> could be used to pick the extension out of a CP/M-style directory entry, for example. Persons who like long or arbitrary variable names will be disappointed with Convert. When CONVERT was built over a LISP substrate, it was possible to make arbitrary choices of variable names; this was because LISP contained a kind of preprocessor which parsed individual atoms and replaced them by their own address in a dictionary. A string parser does not necessarily want to isolate atoms, so another symbolism must be found. Other persons find that it is much easier to work with a very concise symbolism even though it means reusing the same few symbols over and over again in each different context. Thus one may see a complicated file of Convert programs all of which use the same series of variables <0>, <1>, ... up to the number that the given program requires. It is not hard to become accustomed to this way of thinking. :Intervals. There is a kind of pattern, which is technically a variable when giving a formal definition of the syntax of Convert, but which is not treated as a variable in this description because it is not assigned a name, and so can't be referred to again, either in the same pattern or in the paired skeleton of its rule. Of course, it is not lost entirely; it can be named by participating in an AND together with a named variable. We call these unnamed variables intervals, of which there are four kinds: <--> indefinite interval <[n]> interval of length n <[-n]> all but the last n bytes <> null interval (no more text in workspace) The indefinite interval simply allows us to skip over uninteresting parts of a text - for example an end-of-file embedded in a line of whatsoever length can be detected by <-->(^Z). Tabular information can be broken into columns by specifying an interval of a determined length. Such a pattern is often paired with a named variable through an and, as in (and,<[8]>,<0>). In this sense it is a predicate, describing some property of a string, and it is reasonable that it should be compounded using the Boolean AND. Easy access to the last n bytes of a string is obtained with <[-n]>, where n is an integer. A pattern such as <[-1]><0> would bind the very last byte to variable <0>; if we need to keep the text preceding the last byte we can always bind it to a variable through an and: (and,<[-1]>,<1>)<0> binds all but the last byte to <1> and the last byte to <0>. <[-n]> will of course fail if less than n bytes remain in the workspace. A null pattern will always match a null interval, but the null designator <> refers to something slightly different; it requires that it be matched to the entire remainder of the workspace, and that that remainder be null. Thus the rule (<>,there<'>s nothing here) will guarantee that the workspace is empty and say so; the rule (,goodbye) will ALWAYS succeed. It is a useful final rule to give a function a value. The reason that we have to insist explicitly on a null workspace has to do with a feature of the pattern matching process which usually simplifies programs and makes them run faster. A final UNBOUND VARIABLE, including <-->, will always match the entire remainder of the workspace. A final CONSTANT, including definite intervals and BOUND variables, will only require corresponding text, but will remain indifferent to any text or lack thereof which follows it. This is why null text will match anything, because "anything" always begins with a null character. Historically CONVERT programs spent far too much time seeking character by character for the end of the text when it corresponded to a final variable. Likewise, it is bothersome to append a final <--> to text only whose beginning interests us. Consider the two patterns <4><4> and <4><4><> The first pattern matches anything: assuming it is initially unbound, the first instance of <4> binds tentatively to the null string. Then, since the second instance of <4> is an instance of a BOUND variable, it will also match. No constraints are placed on the rest of the workspace, so the tentative binding becomes final. On the other hand, the second pattern matches only strings made up of two identical substrings: if the workspace text is non-null; the initial null-string trial value for the first instance of <4> causes <> to fail after the second instance of <4> matches. There is another context in which the null-seeker <> is implicit. Suppose that we have defined balanced parentheses using the following two pattern definitions: [non-parenthesis] ((and,<[1]>,(nor,<(>,<)>)) n [balanced parenthesis] (<(>(ITR,(or,<:p:>,<:n:>))<)>) p and that we want to bind the variable <0> to the contents of a parenthesis pair. We would then define [parenthesis interior] ((and,<:p:>,<(><0><)>)) i where the second pattern in the "and" has implicitly the form <(><0><)><>, which ensures that the right parenthesis following <0> is the final right parenthesis that <:p:> picked out, not the first one to be found; the implicit <> is required by the definition of "and". :A different kind of interval is the lexicographic interval. This is a pattern that matches a string if it lies within an interval defined by two strings. The lexicographic ordering is that induced by the ASCII code. There are three lexicographic interval patterns: (IVL/x/y/) Interval defined by constant strings x and y (which do not contain the delimiter /). (ivl,s1,s2,...) Multiple intervals delimited pairwise by skeletons s1, s2, ... . (^) A single control character between NUL and US; this is equivalent to (ivl,(^@),(^_)). (IVL/x/y/) matches strings lying in the interval [x,y] (limits included); it will match the shortest string t satisfying x <= t <= y. If x is the null string, IVL matches the null string trivially; if y is the null string, IVL is interpreted to match the shortest string t satisfying x <= t. In (ivl,s1,s2,...) skeletons are taken by pairs and evaluated; if vj is the value of skeleton sj, the effect is the same as (or,(IVL/v1/v2/),...). If there is an odd number of skeletons, a null string is assumed for the second string of the last interval. Delimiters in ivl are always commas; the delimiter in IVL may be any character not in the interval bound strings. :The working part of a Convert quadruple is its rule set, the fourth member of the quadruple. There is not much point to a null rule set, but it is quite possible for a program to depend on a single rule. The set has the form ( (p1,s1): (p2,s2); ... (pn,sn); ) The outer parentheses are necessary; they define the set. The inner parentheses are also necessary, for they define the rule. It is a matter of one's personal preference as to where the external parentheses are located, but it is good programming practice to adopt a definite style and follow it, as it makes errors much easier to spot when reviewing a program. In contrast, once the rule has commenced, every space, tab, line feed or what not counts. The comma which separates the pattern from the skeleton is a prominent part of the rule and must be placed accurately. Commas which occur within the pattern are also crucial in their placement. Commas which are really constants and part of the text must be quoted: <,>. The colons or semicolons which immediately follow the rules are essential, and are an inheritance from REC. They determine whether rules are to be repeated, when the colon is used, or whether the program has terminated, signalled by the semicolon. Rules are tried out in sequence, from top to bottom, left to right as they are written on paper. Several short rules can occupy the same line if convenient. If no rule applies, the workspace is left unchanged. This is still another way that the execution of a program may terminate. There are some fine points to be considered in designing a rule set. Using a colon to produce an iterative transformation of the workspace supposes that the whole workspace is available for the transformation. If only a part of the workspace is to be subject to the transformation, the preservation of the remainder arises in a way that makes it preferable to exercise a recursive call even though it is to the same rule set. This makes counting the number of preserved fragments automatic. Since Convert is a pattern directed language, and the pattern is presented by example and not in some other way, the rules must be stated precisely. Outside the rule more freedom is permitted, allowing the rule set to be formatted according to what the programmer considers attractive. Comments, enclosed in square brackets, have no effect on the compilation and their liberal use will enhance the program's quality. :Debugging Aids. Convert offers several aids to debugging programs. As with any language, programs that don't contain errors don't have to be debugged. This may look like redundant advice, but it is sound. Good programming habits reduce errors. Although there are free-form aspects to Convert, it is still amenable to developing typical program formats and adhering to them. They promote good programming, and make lapses easier to detect. The patterns (PWS) [print remaining workspace] (PWS,mssg) [identify printed workspace with message] (HLT) [stop program, wait for keystroke] (HLT,mssg) [print message, then halt] (PVR,n) [print value of variable n] (NOP,p) [null pattern, disables p] are provided primarily for debugging. The rule ((PWS,subroutine: )(or),); placed at the beginning of a program allows tracing by showing the workspace as each subroutine is entered; also the evolution of the workspace as repetitive rules change it. If a given rule hangs up a calculation, it can be found by interspersing this same rule among the normal rules. On the skeleton side of a rule, the only debugging aids are the skeleton functions %t and %T which display their argument on the console (the former nulls its argument, the latter returns it unchanged). %T may be applied to a problematical skeleton to see what kind of a result it produces when evaluated. If a function is under suspicion, the combination (%T,(f,(%T,...))) will show both the argument and the result. To display messages without their remaining in the workspace, %t may be used: (%t,...) Selective printing suffices to find the majority of errors in Convert which are syntactically correct but the result of a poorly designed or erroneous program. There are occasional errors which are due to flaws in the support programs which Convert uses. These are gradually being caught and eliminated, but the possibility always exists that an error has occurred on this more fundamental level. There are certain possibilities for error arising out of the resemblance between the symbolism of REC and the symbolism of Convert, which are activated when a bit of source code passes directly from Convert to REC. For example, the notation <:0:> for a defined pattern reference can be miswritten <0:>, causing ":" to pass unchanged, this may well provoke an unending loop. Another source of error lies in the use of double angle brackets to generate continuation lines for patterns or skeletons. Consider the code (pattern,<< >>skeleton 1<< >>skeleton 2>> >>); The double angle following skeleton 2 is reversed; such errors are common. Note also that the double angles are effective only within a pattern or within a skeleton, they cannot bridge across from one to the other. Thus, <<(pattern, skeleton)>> cannot be used to "comment out" a rule. If a program cannot be debugged with judicious use of print statements, and following it with DEBUG appears too formidable, it is useful to revise the intermediate REC code. If sections appear more like Convert code than REC code, a parenthesis, angle bracket or quote is probably unbalanced. It should be located and corrected, and the compilation tried anew. It is possible to put much more selective print statements or comments in the REC intermediate, using the operator T or inserting 'message'TL at appropriate points. Of course, they will be lost when the intermediate is discarded in favor of a new one when the Convert source is corrected and recompiled, so one should not expend the effort to make elaborate insertions unless the problem is very difficult, the intermediate is renamed to save it, or one is willing to forego a new compilation until significant progress has been made in correcting the difficulties in the program. :Performance. As a very rough estimate, a REC program compiles into three times as many bytes as the source code. This factor is reduced according to the ratio of comments to program, and when large quantities of text are quoted. Comments result in no compiled code; quoted material goes over byte for byte plus a tiny overhead to load it onto the pushdown list at execution time. Convert programs typically produce 30% more REC object code than there was source code, again modified by the presence of comments and the ratio of constant patterns and skeletons to the use of variables and Boolean composites. To this one must add the initializing code inserted by the compiler, occupying about 1K. Once loaded and ready to execute, a Convert program takes up about 3.5 times as many bytes as in the source file, plus 2K of initializing code and between 2K and 12K of library routines, depending of the program require- ments determined by its use of library calls. As a practical matter, the maximum size Convert program (in source) that a REC compiler can execute (without recourse to overlays) is about a quarter of the compilation area size at REC's disposal, which is about 23 or 24K for REC80 and up to 60K for REC86. Many of the simple programs that can be written in Convert can also be written in assembly language, and the amount of resultant code compared. In very general terms, the factor ranges between ten and twenty, tending to twenty. A similar inflation is experienced in some other "high level" languages - the "C" compiler for example. The reason for this inflation is readily found in the formal expansion of certain structures which have to be fairly intricate to handle general cases and can be much reduced in particular instances. Programming time in Convert can be extremely short - minutes in some cases. The inflation in program size is fully recovered when considering development time for a program. Execution time tends to follow program size. Programs written in Convert to process files can handle them at the rate of lines per second - slow compared to good assembly language programming, not unacceptable in an absolute sense, and vastly faster than doing the same job manually through an editor. :Bibliography. Convert is a chain oriented adaptation of the LISP based CONVERT in Adolfo Guzman and Harold V. McIntosh "CONVERT" Communications of the ACM 9 604-615 (1966). Articles on string Convert are Gerardo Cisneros and Harold V. McIntosh "Introduction to the programming language Convert" SIGPLAN Notices 21 #4 (Apr) 48-57 (1986) [Translated from "Introduccion al lenguaje de programacion Convert", Acta Mex. de Ciencia y Tecnologia 3 #9 (Ene-Mar) 65-74 (1985)] Harold V. McIntosh and Gerardo Cisneros "The Programming Languages REC and Convert" SIGPLAN Notices 25 #7 (Jul) 81-94 (1990) ] [CNVRT.HLP] [Harold V. McIntosh, 11 March 1984] [Rev.: G. Cisneros, 23 January 1986] [Rev. for MS-DOS: G. Cisneros, 25 September 1990] :[end]