home *** CD-ROM | disk | FTP | other *** search
Text File | 1986-12-01 | 44.0 KB | 1,008 lines |
- Program
- Pattern
- Skeleton
- Comments
- Program File
- Library and Initialization
- Boolean Patterns
- Constant Patterns and Skeletons
- Input-Output Skeletons
- Directory Skeletons
- Arithmetic Skeletons
- Conditional and Iterative Skeletons
- Variables
- Intervals
- Rule Set
- Debugging Aids
- Performance
- Bibliography
- :[Structure of a Convert program]
- [format: ((p)(s)(v) (
- (pattern, skeleton): [repeating]
- [or] (pattern, skeleton); [terminal]
- )) x]
- where (p) is a list of pattern definitions ((P1) p1 (P2) p2 ...)
- (s) is a list of skeleton definitions ((S1) s1 (S2) s2 ...)
- (v) is a list of variables (v1 v2 ...) (each vi between 0 and 30)
- x is the name of the program
-
- Sample programs:
-
- [replace two or more spaces by a single space]
- (()()()(0 1)(
- (<0> (ITR, )<1>,<0> <1>):
- )) w
-
- [insert tabs for each eight columns]
- (()()(0 1)(
- ((and,<[8]>,<0> (ITR, ))<1>,<0>(^I)(x,<1>));
- ((and,<[8]>,<0>)<1>,<0>(x,<1>));
- )) x
- -
- Two programs can be used together, with one calling the other. In the
- second the pattern 0, written as <:0:>, although it is only used once
- represents the more complicated composite (and,<[8]>,<0>). This whole
- phrase is a pattern, and must be enclosed in parentheses to make a
- definition. There is no conflict between variable names and pattern or
- skeleton names; here 0 is used in both senses.
-
- [replace 8-column tabs by spaces]
- (()()(0 1 2)(
- ((and,<[8]>,<0>(^I)<1>)<2>,(z,<0>)(y,<1><2>));
- ((and,<[8]>,<0>)<2>,<0>(y,<2>));
- (<0>(^I)<1>,(z,<0>)<1>):
- )) y
-
- [fill out tab space]
- ((
- ((and,<[8]>,<0>)) 0
- ()(0)(
- (<:0:>,<0>);
- (<0>,<0> ):
- )) z
-
- -
- Skeletons can also be symbolized by single letters. The following program
- copies one file to another, which PIP.COM could do more efficiently. (R)
- and (W) are read and write skeletons, respectively; (W) incorporates a
- variable that must have been defined by a pattern match.
-
- [read word, write word]
- (()()(0)(
- (<0>(^Z),(W));
- (<0>,(W)(R)):
- )) a
-
- [main program]
- (()(
- ((%R,<9>.ONE,(ITR, )<-->(or, ,(^Z)))) R
- ((%W,<9>.TWO,(%T,<0>))) W
- )(9)(
- (<9>(or, ,.,<>),<<
- >>(%Or,<9>.ONE)<<
- >>(%Ow,<9>.TWO)<<
- >>(a,(R)));
- ))
-
- -
- :Patterns may have one of the forms
- (AND,p1,p2,...,pn) [matches if all patterns p1,..pn match]
- (and,p1,p2,...,pn) [simplified version of AND]
- (OR,p1,p2,...,pn) [first of p1,...pn to match]
- (or,p1,p2,...,pn) [simplified version of OR]
- (NOT,p) [pattern does not start w/p]
- (DEF,p1,n1,...,pk,nk,p) [p defined in terms of pj w/ names nj]
- (ITR,p) [Repeat p as much as possible]
- (itr,p) [Repeat p as little as possible]
- (QUO/.../) [constant ... with delimiters /]
- (DEC,n) [the decimal byte n (mod 256)]
- (HEX,n) [the hexadecimal number n]
- (^xxx) [control characters xxx]
- (IVL,m,n,) [lexicographic interval [m,n]; m, n constants]
- (ivl,s1,s2,...) [lexicographic intervals [s1,s2],... ]
- (PWS) [print remaining workspace]
- (PWS,r) [print remaining workspace after message r]
- (PVR,n) [print value of variable n]
- (HLT) [wait for keystroke]
- (HLT,mssg) [print mssg, wait]
- (NOP,p) [null pattern, disables p]
- (<) (>) [constants: left and right angle brackets]
- -
- <(> <)> [constants: left and right parentheses]
- <,> <'> <"> [constants: comma, single quote, double quote]
- <()> [balanced parentheses]
- (LAM,(v),p) [p with local variables given in list (v)]
- <:a:> [defined pattern a]
- <[n]> [interval of length n]
- <[s]> [interval of length given by skeleton s]
- <--> [indefinite interval]
- <n> [variable n]
- s [matches text identical to value of skel. s]
- xxx [constant: string of ASCII characters between
- ! and ~, except parentheses, angle brackets,
- comma and single and double quotes; it may
- include SP, CR, LF and HT]
- <> [no free space left]
- <<...>> [null pattern]
- :Skeletons may take any one of the following forms
- (<) (>) [constants: left and right angle brackets]
- -
- <(> <)> [constants: left and right parentheses]
- <,> <'> <"> [constants: comma, single quote, double quote]
- xxx [constant: string of ASCII characters between
- ! and ~, except parentheses, angle brackets,
- comma and single and double quotes; it may
- include SP, CR, LF and HT]
- (QUO/.../) [constant ... between delimiters /]
- (^xxx) [control characters xxx]
- <n> [variable n]
- (f) [function (skeleton) f, no argument]
- (f,s) [function f, argument s (another skeleton)]
- <=> [text used in the last comparison]
- (LAM,(v),s) [s with local variables given in list (v)]
- <<...>> [null skeleton]
- (if,s{,p,s,s}[,p,s]) [conditional]
- (IF,(v),s{,p,s,s}[,p,s]) [conditional with variables]
- (nf,s{,p,s,s}[,p,s]) [negative conditional]
- (NF,(v),s{,p,s,s}[,p,s]) [negative conditional with variables]
- (while,s{,p,s,s}[,s]) [iterative]
- (WHILE,(v),s{,p,s,s}[,p,s]) [iterative with variables]
- (until,s{,p,s,s}[,s]) [iterative, negative form]
- (UNTIL,(v),s{,p,s,s}[,p,s]) [iterative with variables, negative form]
-
- :Since all spaces, tabs, and similar characters are taken literally
- in patterns and skeletons, the pair <<...>> may be used to format
- programs into lines and columns. Comments in square brackets will
- appear in the compiled program; text between angular brackets will
- be ignored; for example
-
- (and,<[8]>,<<
- >><0>)
-
- is identical to
-
- (and,<[8]>,<0>)
-
- :Program file. A Convert program will eventually have to be compiled and
- executed if it is to produce any practical results. The program responsible for
- first compiling it is CONVERT.REC, which requires a disk file containing the
- program, and which will produce a REC program to be executed. Both steps
- require having a REC compiler in the system - REC80.COM (for CP/M or CDOS with
- an 8080, 8085 or Z80), REC86.CMD (for CP/M-86) or REC86.EXE (for MS-DOS).
-
- The disk file bearing a Convert program may have any extension but REC; if
- not given, the assumed extension is CNV.
-
- The command line
-
- REC80 CONVERT SOURCE
-
- will automatically compile the file SOURCE.CNV to produce SOURCE.REC, which
- can be executed with the command line
-
- REC80 SOURCE [argument]
-
- where [argument] is an optional string received by program SOURCE as initial
- text when its execution begins.
-
- -
- Explicit disk assignments can be made for any of the files mentioned in these
- command lines. Explicit extensions can also be given, but their omission has
- proved to be extremely convenient.
-
- The execution of the compiled program requires the presence of a library,
- CNVLIB.REC, in the current disk. The operation of I/O skeletons varies
- depending on the presence or absence of the additional argument in the command
- line; the variants are described in the corresponding section. When execution
- terminates, all open files are closed and the last contents of the workspace
- are displayed on the console.
-
- Since a Convert program will be compiled into a REC program, it is convenient
- to require that it be laid out like a REC program:
-
- (subroutine 1) n1 (subroutine 2) n2 ... (main program)
-
- wherein subroutine definitions alternate with subroutine names, which are
- printing ASCII characters excluding the space and the 11 characters ( ) % &
- # ~ < > , @ and }. CONVERT will insert the appropriate braces to get a REC
- program, as well as a subroutine to initialize and load the required routines
- from the library.
-
- -
- Certain comments enclosed in square brackets are processed by the compiler;
- in order for them to be noted the program should be structured according to
- the following scheme:
-
- [SOURCE.CNV]
- [Author, Date]
- [comments]
- [Exclude LIB]
- [Include ..]
- [[Logon message at startup.]]
-
- [subroutine a descriptor]
- (()()()()) a
-
- [another subroutine]
- (()()()()) b
-
- [main program]
- (()()()())
-
- The comments which are processed are [xx.CNV], [Exclude LIB], [Include ..]
- and [[...]].
- -
- The header [SOURCE.CNV] should be upper case, and will be transformed into
- [SOURCE.REC] by the compiler. The header, together with the attribution of
- author and date help to identify the program, and with the date, its version.
- All this header material should be placed in balanced square brackets.
-
- The [Exclude LIB] comment indicates the program should not include the initial-
- izing routine; if appearing, it overrides any [Include ..] also present
- [Include ..] indicates which routines must be loaded at run time in addition
- to those whose inclusion is automatically determined at compile time. Both
- types of comment are optional.
-
- The double square brackets may enclose a logon message that will be displayed
- at startup. It may be shown again during the program execution by using the
- skeleton function (%Q). It should be short and concise, as it will occupy
- space at runtime. If none is included in the program, Convert will insert the
- message "convert/icuap/1985".
-
- The subroutines and main program then follow. The compiler displays for each
- one of them the names of patterns and skeletons they define, the variables they
- declare, the punctuation following each rule (: or ;), the name of the subrou-
- tine (if it is one), and a series of dots, one for each sector of compiled code
- written to the disk.
- -
- :Library and Initalization. The library is contained in file CNVLIB.REC, which
- is sought and read during the initializing process. The library should reside
- in the currently logged disk unless the command line contains an argument
- having one of the following forms:
-
- L/B:
- L/d:filename.ext
-
- The first form indicates CNVLIB.REC resides on the specified disk (B in the
- example); the second form indicates that the file whose name is given (in
- which both the disk identifier and the extension are optional) should be used
- as library. No extension is assumed in the second form if one is not given
- explicitly.
-
- The argument L/ is transparent to the program, that is, it will neither appear
- in the program's initial text nor interfere with other arguments; for instance,
-
- REC80 DSASM L/A: WRTSYS
- REC80 DSASM WRTSYS L/A:
-
- both produce the same result, passing the string "WRTSYS" as initial text to
- the program being executed (DSASM.REC).
- -
- The [Include ..] comment is only required by the root program in an overlay
- tree, by a program requiring floating point arithmetic, or by a program using
- the operators ^ or ** (for exponentiation) or % (for remainder) in arguments
- to the formula-evaluating skeleton (#f,s) and not calling explicitly the skele-
- tons #^ or #%. For example, the comment [Include #.#^#%] is required for a
- program which would evaluate formulae containing floating point numbers and
- exponentiation and remainder operations, and invoking only the library function
- #f.
-
- The [Exclude LIB] comment is required by the overlaying segments of an overlay
- tree. More about Include and Exclude may be found in CNVADV.HLP.
-
- :Boolean patterns. One of the mechanisms for generating complex patterns
- from simpler constituents is to form Boolean combinations of patterns. The
- fundamental Boolean connectives AND, OR, and NOT may be used. In Convert, AND
- and OR are not binary operations, but rather may have any number of arguments;
- a Boolean function of a variable number of arguments has a standard definition,
- which follows the associative law and reduces to the binary function in that
- special case. Thus it is not capriciousness which requires that
-
- (AND) always matches (to the null string)
- (AND,p) is the same as p (matches if p matches)
- (AND,p1,p2,...) matches when all of the pj match
- (OR) never matches
- (OR,p) is the same as p (matches if p matches)
- (OR,p1,p2,...) matches if at least one pi matches.
-
- As is customary in programming languages, Boolean combinations are executed
- progressively, so that no more arguments are evaluated than the minimum
- needed to reach a decision. The first failure in an AND, the first match in
- OR decides the expression.
-
-
-
- -
- Convert has an alternative series of Boolean functions, namely (and,...),
- (or,...), and (nor,...) [which is equivalent to (NOT,(or,...))]. They exist
- only for reasons of efficiency. The issue is that Convert combines variable
- matching with variable generation, rather than separating these two activities.
- This is in turn more efficient, but subject to logical paradoxes if not done
- correctly. Concretely, if a variable match fails, Convert will back up and
- retry the previous variable it identified, or will progress to the next
- alternative of a preceding OR. In using the lower case Boolean operators we
- forsake all this jockeying in the interest of speed, but it is forsaken
- nevertheless.
-
- Thus, lower case Boolean operators are to be used when the decision which
- they represent is final - for example if their arguments are constants. In
- practice, if the patterns (ITR,xxx) and (itr,xxx) are used, the searching
- required by a program can be assigned to them, and the lower case operators
- used exclusively. The upper case operators work correctly albeit more slowly.
-
- Combining variable generation with variable matching as Convert does restrict
- the participation of NOT in patterns; its arguments must not contain unbound
- variables.
-
-
- -
- One of the most common uses of AND is to impose some condition on a variable,
- although it could also be used to parse a string in two or more different ways.
- For example, consider CP/M's directory entry from which we may wish to extract
- the file name and the extension, and ignore the rest of the block. The pattern
-
- (and,<[1]>(and,<[8]>,<1>)(and,<[3]>,<2>)<[20]>,<4>)
-
- will bind the whole 32-byte string to the variable <4>, but at the same
- time allow us to retrieve the file name as <1>, and the extension as <2>.
- Both uses of AND are shown in this example. The pattern
-
- (and,(ITR,(and,<[1]>,(nor, ,(^I))),<0>)
-
- will match a string free of spaces or tabs - whitespace as some say - and
- associate it with the variable <0>. The inner "and" prevents the null string
- from matching the "nor". The first argument of an AND establishes the length
- of the string that the remaining arguments must match. Another example would
- be to identify a decimal number through
-
- (and,(ITR,(and,<[1]>,(IVL,0,9,))),<0>).
-
-
- -
- The most apparent use of the Boolean OR is to express alternatives. Taken
- together with a mechanism to assign a name to a pattern, OR can be used to
- generate recursive patterns. Given the definition
-
- ((or, <:s:>,)) s
-
- we can define a series of spaces - which is either a space followed by the
- series, or the null string. This definition also makes use of the property,
- that the first viable alternative satisfies an OR. Alternatively,
-
- ((OR,, <:m:>)) m
-
- would be a pattern that rendered the null string unless a reconsideration
- were forced upon it. These two alternatives define respectively a maximal
- and a minimal string satisfying a condition defined by an OR. Equivalently,
-
- ((ITR, )) s
- ((itr, )) m
-
- Many other kinds of recursive definitions can be made with OR, but it is
- interesting to note that ITR and itr seem to be sufficient for the
- applications that have been encountered.
-
- -
- An OR with a null terminal argument is a convenient way to express optional
- elements of a string. Consider
-
- ((or,+,-,)<:d:>(or,.<:d:>,)) f
-
- as a definition of a signed floating point number. The sign is entirely
- optional, as is the decimal point. According to whether <:d:> accepts the
- null string or not, <:f:> could match or not a single isolated decimal point.
- If it does not, we would have to modify the definition to make 1. into an
- acceptable number.
-
-
- :Constant Patterns and Skeletons. There are two types of constants: default
- constants and distinguished constants. A default constant is any string
- including spaces, tabs, carriage returns, line feeds and printing ASCII
- characters (those between ! and ~) except any of the seven characters ( )
- < > , " and '. The first five exceptions have been reserved to delimit
- patterns and skeletons of diverse kinds; the last two play an exceptional role
- in REC and so must be given special treatment; it is the conflict with REC
- which keeps them from being used for quoting in Convert. The distinguished
- constants are described in the remainder of this section.
-
- The very delimiters themselves have to be quoted; for conciseness and to
- avoid reserving yet more symbols, we use them to quote each other:
-
- <(> left parenthesis
- <)> right parenthesis
- (<) left angle
- (>) right angle
- <,> comma
- <'> single quote
- <"> double quote
-
-
- -
- It is not convenient to incorporate control characters directly into programs,
- because they interfere with printing the program for reference. The pattern
- (CTL,xxx) proved to be unsightly, so we use the form (^xxx), in analogy to
- the common representation ^X for a single control character. The characters
- allowed in the string xxx are those between @ and _; these bounds correspond
- to control characters whose values in ASCII are 0 and 31, respectively. Some
- useful control characters to remember are:
-
- (^I) horizontal tab
- (^Z) end of file
- (^MJ) carriage return, line feed
- (^[) escape
-
- Provision has been made for quoting a long string, too cumbersome to represent
- character by character using the foregoing conventions; we could write
-
- (QUO/.../)
-
- wherein / could be any character not occurring in the text ..., and terminates
- the string with its second appearance in the pattern.
-
-
- -
- Two constants have been provided for recognizing (as patterns) or generating
- (as skeletons) bytes of binary data:
-
- (DEC,n) decimal byte n (mod 256)
- (HEX,k) the shortest byte string needed to represent
- the hex value k (including leading zeros in k)
-
- For example, (DEC,2035) represents a byte whose decimal value is 243, whereas
- (HEX,5C) represents a byte whose decimal value is 92. For hexadecimal strings
- with 3 or more digits, the string represented by (HEX,k) is machine-dependent;
- in microprocessors following Intel's convention, (HEX,05C) would represent two
- bytes, the first of which is 92 (decimal) and the second zero (the least
- significant byte in the lowest addressed location); on other microprocessors
- like the MC68000, (HEX,05C) would represent two bytes, the first one zero and
- the second one 92 (the most significant byte at the lowest address).
-
- :Input-Output skeletons exploiting CP/M's BDOS and BIOS:
-
- (%Or,D:FILENAME.EXT) open file for reading
- (%Ow,D:FILENAME.EXT) open file for writing
- (%R) read from default
- (%R,D:FILENAME.EXT) read file
- (%R,source,pattern) read until match
- (%R,source,patt,skt) read, match, substitute
- (%R,source,patt,skt,skf) read, match, two options
- (%W,D:FILENAME.EXT,xxx) write file
- (%C,D:FILENAME.EXT) close file
- (%E) close all files
- (%Z) reset disk system
- (%Lr) get id of currently logged-in disk
- (%Lw,D) log in the given disk
- (%S,D:FILENAME.EXT) search
- (%A,D:FILENAME.EXT) search again
- (%D,D:FILENAME.EXT) delete
- (%N,D:NEW.NXT,D:OLD.OXT) rename
- (%T,xxx) type, preserve
- (%t,xxx) type, erase
- (%+) type CR,LF
- -
- (%P,xxx) print, preserve
- (%p,xxx) print, erase
- (%B,dfc,hcx,hdx) direct BIOS call (CP/M-86)
-
- The pattern-directed READ operation is worthy of attention. There are
- several forms of the READ function-skeleton %R,
-
- (%R) read from default
- (%R,D:FILENAME.EXT) read file
- (%R,source,pattern) read until match
- (%R,source,patt,skt) read, match, substitute
- (%R,source,patt,skt,skf) read, match, two options
-
- which are progressively more complex. Beginning with the first, a default
- input device is assigned, and a single line is delivered to the workspace
- with each invocation, without the terminating carriage return or line feed.
- The default will be the disk file mentioned on the command line when the
- program was loaded for execution, unless the line was blank. In that case
- the console is assigned. This form gives a simple means of communication
- that can be especially helpful in the initial stages of program development.
-
-
- -
- The next alternative in the sequence of complexity requires the programmer
- to assign a specific disk file. The latter can be a constant, as illustrated
- above, or it can be a skeleton which evaluates into the name of the disk file.
- An example would be (%R,<7>:<8>.<9>), where the disk, file and extension have
- been determined separately and bound to variables 7, 8 and 9, respectively.
- (%R,) is the same as (%R).
-
- Just as in the default case, a single line, without its terminator, is placed
- in the workspace with each execution of the skeleton. If the end of file has
- been reached, a control-Z is inserted and will be produced repeatedly each
- additional time an attempt is made to read the exhausted file. The file itself
- is buffered so that partial contents such as a single line can be read at will.
-
- Sometimes data is not divided into lines, or it may be that the program finds
- it inconvenient to receive a whole line at a time. A binary file would typify
- the former case, the scrutiny of a file word by word or by sentences would
- exemplify the second. In these cases the third form of %R is useful.
-
- The third variant, to which a pattern has been adjoined, will search the
- input stream until the first instance of the pattern is found, which will
- then be given to the workspace. Using the skeleton again will locate the
- second instance of the pattern, and so on.
- -
- If the file is exhausted and no match was obtained, all of the material read
- remains in the workspace; subsequent reads will leave the null string.
-
- The final two forms practically allows the incorporation of a whole pre-
- processor into the read command, since it allows any recognition and
- transformation that Convert is capable of expressing to be incorporated
- into the act of reading a file; furthermore this processing can be tailored
- individually for each file, and even for each instance of reading.
-
- In the next to the last case, %R leaves the value of skt if the pattern
- does match; otherwise all of the text (to the end of the file) is left. In
- the last case, skf is a skeleton which generates the text left in the
- workspace if the pattern does not match the text read.
-
- In these skeletons, if the null pattern is given [e.g. (%R,,,skt)], %R reads
- up to the next carriage return (or to the end of the file if no CR is found).
-
-
-
-
-
-
- -
- Skeletons performing writing operations vary as to the result they leave in
- the workspace. Skeletons %T and %P leave the value of their argument; %t and
- %p always erase it. In addition, %T and %t write CR and LF before sending
- their argument to the console, whereas %P, %p and %W write no more than the
- value of their arguments.
-
- If %W is not able to write on the indicated file due to lack of space, it will
- leave on the workspace the unwritten portion of its argument.
-
- (%Or,D:FILENAME.EXT) leaves the bull string if it was possible to open the
- indicated file; it leaves the string "Not Found" if the file could not be
- opened.
-
- In functions %R, %W, %Or, %Ow and %C the value of the skeleton designating the
- file is not restricted to an actual disk file, in conformity with CP/M's PIP
- conventions that there may be such devices as TTY: or EOF:. The assortment
- is not exactly the same, but it includes
-
- TTY: console keyboard
- NUL: the null file
- CTR:X named counter
- MEM:X named memory pseudofile
- -
- The operations of reading via %R and writing via %W may be performed on
- these devices just as though they were disk files. They may be opened and
- closed by using %O and %C; in fact these operations are a necessity if these
- false devices are to be given a meaningful definition, or if they are to be
- removed from a program in an orderly fashion when they are no longer required.
-
- (%W,TTY:,skel) works like (%t,skel), but without the CR/LF included by %t
- in its operation.
-
- Every write operation on NUL: simply erases its argument; every read from NUL:
- returns control-Z; that is, NUL: functions as a file to which one can "write"
- without limit but which is permanently exhausted when read. This pseudofile
- is useful in compiler and assembler construction, where for test purposes one
- may not want to produce, say, an object file. In this case a variable could
- be bound at the beginning of the program to NUL: or to a disk file, and this
- variable be used in all references to the object file.
-
- MEM:X pseudofiles are dealt with in detail in CNVADV.HLP.
-
-
-
-
- -
- To open and then read a counter requires only a name, of eight or less
- ASCII characters. Neither counters nor memory regions use extensions.
- To write a counter we have a series of parameters, whose tail may optionally
- be discarded at any point after the counter's name. Altogether, we can write
-
- (%Or,CTR:XXXXXXXX)
- (%W,CTR:XXXXXXXX,val,incr)
- (%R,CTR:XXXXXXXX)
-
- Argument val is the initial value; its default is 0; incr is the increment,
- which may be signed and whose default is +1; the assigned value is any
- (sixteen bit) integer which is modified in modulo-16 arithmetic. Both of the
- parameters are ASCII strings representing decimal numbers, introduced into
- the skeleton as constants or as other skeletons which evaluate into constants
- of the required form. Every time that a counter is read, its present value
- is reported, but the increment is added to its stored value so that it will
- appear at the next reading. In the language of "C",it is a postincrementing
- counter. When a counter is opened, it is assigned default value 0, increment 1.
-
-
- :Directory Skeletons. It is possible to write a Convert program which will
- process a series of files, for example to empack several short files into
- a single large file. Later, with another similar program, they can be restored
- to their original condition. This process can be used advantageously to build
- up a library for Microsoft's F80, for example. Although they offer a library
- function which can combine several .REL files, there lacks a way to assemble
- hundreds of small .MAC files individually, yet not overflow the directory of
- the single disk which could hold them.
-
- To realize this sort of operation Convert offers access to the BDOS functions
- for directory and disk system access via the operators K and k in REC.
-
- (%S,D:FILENAME.EXT) initial search for file
- (%A,D:FILENAME.EXT) subsequent search for file
- (%D,D:FILENAME.EXT) delete specified files
- (%N,D:NEWFILE.EXT,D:OLDFILE.EXT) rename file
- (%Z) reset disk system
- (%Lr) get id of currently logged-in disk
- (%Lw,D) log in the given disk
-
- The %Z and %L functions are required when disk changes are contemplated.
-
- -
- The following scheme, showing a Main program and satellite, can be used as
- the basis of a Convert program which will process a series of files as shown
- by a possibly ambiguous file reference in the initial command line.
-
- [Program heading, including Name, Excludes, Comments and Logon Message]
-
- [the bulk of the program, accessed through a program called "y"]
- (()()()()) y
- ...
-
- [Gather directory entries in WS]
- (()()(0)(
- (Not Found<0>,<0>);
- (<0>,(%A,<8>:<9>)<0>):
- )) z
-
- [Main program: search for first]
- (()()(8 9)(
- (<8>:<9>,(y,(z,(%S,<8>:<9>))));
- (<9>,@:<9>):
- ))
-
- -
- On arriving at the execution of "y" the workspace contains all the relevant
- directory lines - possibly none - which were found by "z". It is advisable
- to gather them up all at once on entering the program, before the directory
- begins to change. This avoids the conflict of a new file having the same name
- as an old file, avoids the loss of the internal state which is stored in BDOS
- between calls, and avoids an error latent in some versions of CP/M.
-
- For other programs to use the workspace, this directory extract must be stored
- in a variable and parcelled out to the programs as they are required. It will
- generally be sufficient to call the following program "y" and interpose it
- between the actual program, "x", and the main program, which already calls it.
-
- [Get next name]
- (()()(1 2 3)(
- [quit if no more] (<>,);
- [parse dir entry] (<[1]>(and,<[8]>,<1>)(and,<[3]>,<2>)<[20]><3>,<<
- [process this file] >>(x,<8>:<1>.<2>)))<<
- [rest of files] >><3>):
- )) y
-
- "x" encounters its file, D:FILENAME.EXT on entry, must leave a null chain when
- it finishes; but it could possibly return some additional files for processing.
- -
- :Arithmetic Skeletons. Convert provides facilities for integer and floating
- point arithmetic (through skeleton #) and for character "arithmetic" (through
- skeleton &). The latter facilitates upper/lower case conversions, hexadecimal
- dumps and access to individual bits or groups of bits within bytes.
-
- The file CNVADV.HLP contains a detailed description of the operations possible
- with these two library skeletons.
-
- :Conditional and iterative skeletons. The fundamental skeleton forms in
- Convert are constants, variables and functions. In principle it is sufficient
- to work with this combination because functions are readily defined and
- capable of defining any construction that one wants. Thus the motivation for
- introducing further skeleton forms would have to be to offer some frequently
- used function as an inherent feature of the language. Other motives would be
- to avoid the cumbersome ritual of defining a function if a particularly
- simple action were desired, or to avoid preparing an argument for a function
- and then finding that the argument would not be used after all.
-
- Most languages have facilities for the selection of alternatives - an IF
- statement, or for an orderly repetition of some activity - a DO or a WHILE
- statement. The convenience of these constructs is widely recognized. Convert
- offers the conditional skeletons if, IF, nf and NF, and the iterative
- skeletons whil, WHILE, until and UNTIL, all of which are described in
- detail in CNVADV.HLP.
-
- :Variables are numbered decimally in Convert. There is no theoretical limit
- to their range, but the practical range is 0-30 with the present structure
- of the underlying REC compiler. Few programs go beyond ten variables, many
- subsist with one or two, and it is possible to have a program without any
- variables at all.
-
-
- Even if a program binds no variables, they must be enclosed in parentheses.
- The same is true of pattern and skeleton definitions, so that a program
- quadruple (()()()()) may contain some null parentheses.
-
- A program should not bind a variable which it has not declared - that is,
- if a previously undefined variable appears in the pattern of one of the
- rules of a program where it may become defined, it must be declared in the
- variable list. A program in which this has not been done may work, but the
- order in which variables are pushed and popped in the internal dictionary
- will be violated, and it can only result that obscure errors will occur.
-
- A variable is designated by enclosing its number within angle brackets. <0>,
- <8>, or <15> are examples of variables. Variables are defined by patterns and
- used by skeletons. Convert combines variable matching and variable generation
- so that a variable can be defined and then matched within the same pattern.
- -
- Generally speaking, variables are defined by the constants which surround
- them; thus v<0>e would assign the value ariabl to <0> when matched against
- the word variable, or a null value when matched to ve. Since the null string
- is accepted throughout Convert, a very common type of error arises from matches
- involving a null string in a way that the programmer had not foreseen. Since
- Convert uses other pattern forms than constants, a variable need not always be
- delimited by a constant. <[9]>(and,<[3]>,<0>)<[20]> could be used to pick the
- extension out of a CP/M directory entry, for example.
-
- Persons who like long or arbitrary variable names will be disappointed with
- Convert. When CONVERT was built over a LISP substrate, it was possible to make
- arbitrary choices of variable names; this was because LISP contained a kind
- of preprocessor which parsed individual atoms and replaced them by their own
- address in a dictionary. A string parser does not necessarily want to isolate
- atoms, so another symbolism must be found.
-
- Other persons find that it is much easier to work with a very concise symbolism
- even though it means reusing the same few symbols over and over again in each
- different context. Thus one may see a complicated file of Convert programs all
- of which use the same series of variables <0>, <1>, ... up to the number that
- the given program requires. It is not hard to become accustomed to this way of
- thinking.
-
- :Intervals. There is a kind of pattern, which is technically a variable when
- giving a formal definition of the syntax of Convert, but which is not treated
- as a variable in this description because it is not assigned a name, and so
- can't be referred to again, either in the same pattern or in the paired
- skeleton of its rule. Of course, it is not lost entirely; it can be named by
- participating in an AND together with a named variable. We call these unnamed
- variables intervals, of which there are three kinds:
-
- <--> indefinite interval
- <[n]> interval of length n
- <> null interval (no more text in workspace)
-
- The indefinite interval simply allows us to skip over uninteresting parts of a
- text - for example an end-of-file embedded in a line of whatsoever length can
- be detected by <-->(^Z).
-
- Tabular information can be broken into columns by specifying an interval of a
- determined length. Such a pattern is often paired with a named variable through
- an and, as in (and,<[8]>,<0>). In this sense it is a predicate, describing some
- property of a string, and it is reasonable that it should be compounded using
- the Boolean AND.
-
- -
- A null pattern will always match a null interval, but the null designator <>
- refers to something slightly different; it requires that it be matched to the
- entire remainder of the workspace, and that that remainder be null. Thus the
- rule
- (<>,there<'>s nothing here)
-
- will guarantee that the workspace is empty and say so; the rule
-
- (,goodbye)
-
- will ALWAYS succeed. It is a useful final rule to give a function a value.
-
- The reason that we have to insist explicitly on a null workspace has to do
- with a quirk of the pattern matching process which usually simplifies programs
- and makes them run faster. A final VARIABLE, including <-->, will always match
- the entire remainder of the workspace. A final CONSTANT, including definite
- intervals, will only require corresponding text, but will remain indifferent to
- any text or lack thereof which follows it. This is why null text will match
- anything, because "anything" always begins with a null character. Historically
- CONVERT programs spent far too much time seeking character by character for the
- end of the text when it corresponded to a final variable. Likewise, it is
- bothersome to append a final <--> to text only whose beginning interests us.
- -
- There is another context in which the null-seeker <> is implicit. Suppose
- that we have defined balanced parentheses using the following two pattern
- definitions:
-
- [non-parenthesis] ((and,<[1]>,(nor,<(>,<)>)) n
- [balanced parenthesis] (<(>(ITR,(or,<:p:>,<:n:>))<)>) p
-
- and that we want to bind the variable <0> to the contents of a parenthesis
- pair. We would then define
-
- [parenthesis interior] ((and,<:p:>,<(><0><)>)) i
-
- where the second pattern in the "and" has implicitly the form <(><0><)><>,
- which ensures that the right parenthesis following <0> is the final right
- parenthesis that <:p:> picked out, not the first one to be found; the
- implicit <> is required by the definition of "and".
-
- :The working part of a Convert quadruple is its rule set, the fourth member
- of the quadruple. There is not much point to a null rule set, but it is
- quite possible for a program to depend on a single rule. The set has the
- form
- (
- (p1,s1):
- (p2,s2);
- ...
- (pn,sn);
- )
-
- The outer parentheses are necessary; they define the set. The inner parentheses
- are also necessary, for they define the rule. It is a matter of one's personal
- preference as to where the external parentheses are located, but it is good
- programming practice to adopt a definite style and follow it, as it makes
- errors much easier to spot when reviewing a program. In contrast, once the
- rule has commenced, every space, tab, line feed or what not counts. The comma
- which separates the pattern from the skeleton is a prominent part of the rule
- and must be placed accurately. Commas which occur within the pattern are also
- crucial in their placement. Commas which are really constants and part of the
- text must be quoted: <,>.
-
- -
- The colons or semicolons which immediately follow the rules are essential, and
- are an inheritance from REC. They determine whether a rule is to be repeated,
- when the colon is used, or whether the program has terminated, signalled by the
- semicolon. Rules are tried out in sequence, from top to bottom, left to right
- as they are written on paper. Several short rules can occupy the same line if
- convenient. If no rule applies, the workspace is left unchanged. This is still
- another way that the execution of a program may terminate.
-
- There are some fine points to be considered in designing a rule set. Using a
- colon to produce an iterative transformation of the workspace supposes that
- the whole workspace is available for the transformation. If only a part of
- the workspace is to be subject to the transformation, the preservation of the
- remainder arises in a way that makes it preferable to exercise a recursive
- call even though it is to the same rule set. This makes counting the number
- of preserved fragments automatic.
-
- Since Convert is a pattern directed language, and the pattern is presented
- by example and not in some other way, the rules must be stated precisely.
- Outside the rule more freedom is permitted, allowing the rule set to be
- formatted according to what the programmer considers attractive. Comments,
- enclosed in square brackets, have no effect on the compilation and their
- liberal use will enhance the program's quality.
-
- :Debugging Aids. Convert offers several aids to debugging programs. As with
- any language, programs that don't contain errors don't have to be debugged.
- This may look like redundant advice, but it is sound. Good programming habits
- reduce errors. Although there are free-form aspects to Convert,, it is still
- amenable to developing typical program formats and adhering to them. They
- promote good programming, and make lapses easier to detect. The patterns
-
- (PWS) [print remaining workspace]
- (PWS,mssg) [identify printed workspace with message]
- (HLT) [stop program, wait for keystroke]
- (HLT,mssg) [print message, then halt]
- (PVR,n) [print value of variable n]
- (NOP,p) [null pattern, disables p]
-
- are provided primarily for debugging. The rule
-
- ((PWS,subroutine: )(or),);
-
- placed at the beginning of a program allows tracing by showing the workspace as
- each subroutine is entered; also the evolution of the workspace as repetitive
- rules change it. If a given rule hangs up a calculation, it can be found
- by interspersing this same rule among the normal rules.
- -
- On the skeleton side of a rule, the only debugging aids are the skeleton
- functions %t and %T which display their argument on the console (the former
- nulls its argument, the latter returns it unchanged). %T may be applied to a
- problematical skeleton to see what kind of a result it produces when evaluated.
- If a function is under suspicion, the combination
-
- (%T,(f,(%T,...)))
-
- will show both the argument and the result. To display messages without their
- remaining in the workspace, %t may be used:
-
- (%t,...)
-
- Selective printing suffices to find the majority of errors in Convert which
- are syntactically correct but the result of a poorly designed or erroneous
- program. There are occasional errors which are due to flaws in the support
- programs which Convert uses. These are gradually being caught and eliminated,
- but the possibility always exists that an error has occurred on this more
- fundamental level.
-
-
-
- -
- There are certain possibilities for error arising out of the resemblance
- between the symbolism of REC and the symbolism of Convert, which are activated
- when a bit of source code passes directly from Convert to REC. For example, the
- notation <:0:> for a defined pattern reference can be miswritten <0:>,
- causing ":" to pass unchanged, this may well provoke an unending loop.
-
- Another source of error lies in the use of double angle brackets to generate
- continuation lines for patterns or skeletons. Consider the code
-
- (pattern,<<
- >>skeleton 1<<
- >>skeleton 2>>
- >>);
-
- The double angle following skeleton 2 is reversed; such errors are common.
- Note also that the double angles are effective only within a pattern or within
- a skeleton, they cannot bridge across from one to the other. Thus,
-
- <<(pattern, skeleton)>>
-
- cannot be used to "comment out" a rule.
-
- -
- If a program cannot be debugged with judicious use of print statements, and
- following it with DDT appears too formidable, it is useful to revise the
- intermediate REC code. If sections appear more like Convert code than REC code,
- a parenthesis, angle bracket or quote is probably unbalanced. It should be
- located and corrected, and the compilation tried anew.
-
- It is possible to put much more selective print statements or comments in the
- REC intermediate, using the operator T or inserting 'message'TL at appropriate
- points. Of course, they will be lost when the intermediate is discarded in
- favor of a new one when the Convert source is corrected and recompiled, so one
- should not expend the effort to make elaborate insertions unless the problem
- is very difficult, the intermediate is renamed to save it, or one is willing
- to forego a new compilation until significant progress has been made in
- correcting the difficulties in the program.
-
-
- :Performance. As a very rough estimate, a REC program compiles into three times
- as many bytes as the source code. This factor is reduced according to the ratio
- of comments to program, and when large quantities of text are quoted. Comments
- result in no compiled code; quoted material goes over byte for byte plus a tiny
- overhead to load it onto the pushdown list at execution time.
-
- Convert programs typically produce 30% more REC object code than there was
- source code, again modified by the presence of comments and the ratio of
- constant patterns and skeletons to the use of variables and Boolean composites.
- To this one must add the initializing code inserted by the compiler, occupying
- about 1K. Once loaded and ready to execute, a Convert program takes up about
- 3.5 times as many bytes as in the source file, plus 2K of initializing code
- and between 2K and 12K of library routines, depending of the program require-
- ments determined by its use of library calls.
-
- As a practical matter, the maximum size Convert program (in source) that a
- REC compiler can execute (without recourse to overlays) is about a quarter
- of the compilation area size at REC's disposal, which is about 23 or 24K
- for REC80 and up to 60K for REC86.
-
-
-
- -
- Many of the simple programs that can be written in Convert can also be written
- in assembly language, and the amount of resultant code compared. In very
- general terms, the factor ranges between ten and twenty, tending to twenty.
- A similar inflation is experienced in some other "high level" languages - the
- "C" compiler for example. The reason for this inflation is readily found in
- the formal expansion of certain structures which have to be fairly intricate
- to handle general cases and can be much reduced in particular instances.
-
- Programming time in Convert can be extremely short - minutes in some cases. The
- inflation in program size is fully recovered when considering development time
- for a program.
-
- Execution time tends to follow program size. Programs written in Convert to
- process files can handle them at the rate of lines per second - slow compared
- to good assembly language programming, not unacceptable in an absolute sense,
- and vastly faster than doing the same job manually through an editor.
-
-
- :Bibliography.
-
- [CNVRT.HLP]
- [Harold V. McIntosh, 11 March 1984]
- [Rev.: G. Cisneros, 23 January 1986]
-
- [Convert is a chain oriented adaptation of the LISP based CONVERT in
-
- Adolfo Guzman and Harold V. McIntosh
- CONVERT
- Communications of the ACM 9 604-615 (1966).]
-
-
-
- :[end]