home *** CD-ROM | disk | FTP | other *** search
-
- [HOWTOUSE.DOC]
- [Harold V. McIntosh, 28 December 1983]
-
- This disk is intended as a collection of examples for the programming
- language REC, although many of the examples are useful in their own
- right, particularly CNVRT.REC which compiles the programming language
- CONVERT. The files REC80.COM and REC86.CMD are binary files containing
- object code for REC destined respectively for the Intel 8080 and the
- Intel 8086. REC80 may be executed by the Intel 8080, Intel 8085, or the
- Zilog Z80, since it uses none of the instructions which differentiate
- these CPU's. REC86 will presumably execute in any of the Intel family
- of 8086 CPU's, that is the 8088, 80186, 80286, although in fact it has
- only been proven on Godbout's dual 8085-8088 board.
-
- REC80 partitions the available memory in the ratio 5-4-1 for compile
- area, pushdown area and workspace, after setting aside space for the
- 8080's pushdown stack and having lost whatever space was occupied by the
- binary program and the tables FXT and VRT. REC86 is configured for 8080
- mode with a 64K code segment, and a fixed partition for the available
- memory. Thus REC80 will work in almost any system - at least 16K or
- larger, but since the source code is available on the companion disk,
- either of the two versions can be reassembled to suit whatever system
- desired by changing the EQU's defining the memory partition or else
- by reprogramming the partitioning alogrithm.
-
- The assumed operating system is CP/M 2.2, on the one hand, or CP/M86
- on the other hand. However, deliberate advantage is taken of the CP/M
- transfer vector in BIOS to get direct access to console I-O. This was
- necessary in some editor-like applications of REC, for which CP/M's
- tampering with the data stream was too slow and conflicted with the
- natural editing functions of the programs involved. The source code
- contains provision to restore the CP/M I-O. We have observed that the
- direct access prevents REC programs from working with MicroShell in all
- respects. REC80 will run with CP/M 1.4, because it does not use any of
- the features which distinguish the two versions.
-
- There are some minor aspects in which the programs are dependent on
- the nature of the console equipment, again when used for editing and
- similar purposes. Although some ASCII standards are universally res-
- pected, such as for the assignment of carriage return and line feed,
- many others are not, or do not even exist. Thus screen clear, and the
- cursor movements may require some adjustment, although this adjustment
- can be made in the applications programs and would not require any
- change in the binary REC programs. As written, they have all worked
- successfully with the TeleVideo 912-C and TeleVideo 910 terminals.
-
- REC has operators which give direct access to the I-O ports. None of
- these applications programs use them. Anyone planning to use them
- would clearly have to know the configuration of his own system to do
- so. The operators in REC86 are literal transcriptions of those in
- REC80, and have not been tested for 16-bit port addresses.
-
- REC expects to find one or two file names on its command line. If it
- finds neither, it enters a conversational mode, in which it displays
- a logo with the version date. In this mode REC expressions may be
- entered, using CP/M's editing facilities [which means that you can get
- back to CP/M by typing ^C at the beginning of a blank line]. They are
- compiled and promptly executed, giving one an opportunity to experiment
- with REC as a language. Try ('hello'TL;) for example. A line may be
- edited, but once a carriage return is given, it is compiled and cannot
- be retrieved for additional changes. As many lines will be prompted as
- required to balance parentheses; there is no guarantee that the program
- entered makes sense, of course.
-
- When REC encounters a file name on the command line, it supposes that
- it is the name of a source file which is to be loaded and executed. A
- second file name is preserved and transmitted to the executing program
- in the CP/M location 05CH, which means that an application program can
- make use of CP/M's file handling facilities. REC does not preserve the
- remainder of the command line, however.
-
- There is no such thing as REC object code, because the compiler is a
- loader, which loads for immediate execution.
-
- REC, as a language, can be described rather simply. There are four
- control symbols - the two parentheses, colon and semicolon. There are
- also operations, and some of these can have truth values [true or
- false], and are designated predicates. Operations are executed in the
- sequence they are written, which can be interrupted in two ways. A
- colon means to repeat the sequence from the beginning, a semicolon means
- to quit entirely. A predicate can interrupt the sequence in one further
- way - when it is false you take up a new sequence. The new sequence can
- be written after the nearest colon or semicolon, a place which otherwise
- is inaccessible. After mentioning two idiosyncracies the description of
- the control is complete.
-
- First, several predicates in a sequence all go over to the same alternate
- sequence in the case they are false. Instead of giving a program the form
- of a binary tree in which the point at which execution has arrived is
- well defined by the hisory of previous definitions, many of the branches
- are allowed to overlap. Often this is of no concern, but it will now and
- then happen that it is necessary to know how the program got to a given
- point. Setting and testing variables is a simple resolution which goes
- outside the framework of a simple control structure - it implies a theory
- of variables. Repeating a past predicate may be inconvenient [say there
- is a very long calculation involved] or impossible [a real-time event
- has passed]. This quandray already appears in the calculation of an
- exclusive or [a and not-b or not-a and b; you may have to do one or the
- other twice]. Recognizing the existence of this point goes a long way
- toward meeting the confusion it may cause, which in many years of REC
- programming has seemed to be the only really serious source of indecision
- in laying out programs.
-
- The second point is that logical negatives are essential in setting up
- a program, and can be obtained in a slightly backhanded way by reversing
- the sense of all predicates in the final segment of a REC expression. In
- a simpler form, if p is a predicate, (p) is its negative.
-
- Indeed, boolean combinations of predicates can be realized through the
- flow of control set up by the syntactic elements we have mentioned. If
- p and q are two predicates, (p;q;) is their logical or, (pq;) is their
- and. (p(q);(p)q;) is the exclusive or but, as remarked, it may involve
- doing p or q twice. Since the evaluation of a series of predicates is
- sequential, from left to right, no more of them will be executed than is
- necessary to reach a decision. As a consequence the boolean operations are
- not really commutative, and care must be taken with their ordering. In
- practice this situation is more of an advantage than a disadvantage.
-
- Few programs can subsist without subroutines, and REC is no exception.
- The mechanism of subroutine definition is to enclose a list of the
- desired subroutines and their names, together with the program which
- is to use them, in a pair of curly braces. The combination might show
- the form {(www) a (xxx) b ... (yyy) n (zzz)}, in which the first
- parenthesis [which itself could be enclosed in braces because of some
- subroutine nesting] the definition of the subroutine a, and so on. It
- is agreed that all the subroutine definitions will be valid within the
- main program and each other, but only within the curly braces, and
- unless superceded at a lower level by a new subroutine definition on
- that level.
-
- In reality, the syntactical definition of REC says nothing whatsoever
- about the choice of operators or predicates for the language, nor about
- the symbolism used to represent them, beyond the reservation of the
- symbols which have been mentioned. It is a historical custom that REC
- has used single letters for symbols, mainly because of the convenience
- of a single keystroke in its interactive use, and the resulting economy
- of not having to parse identifiers. REC80, which is linguistically
- identical to REC86, displays a collection of operators and predicates
- assigned to slightly less than the 96 printable characters of the ASCII
- alphabet, whose detailed description is often quite intricate, and which
- may be deduced from the explanations given in the source listings. Some
- twenty years of experience, and the influence of several people have gone
- into the choice of a relatively general-purpose collection. Anyone could
- start afresh and create a new version of REC by changing its inherent
- function library; something which has been done from time to time in the
- past.
-
- The REC in this collection makes use of three principal data structures,
- to which a proportionate share of the available memory is dedicated. They
- are: the compiling area, the pushdown list, and the workspace. The first
- of these is where the program which REC compiles is normally stored;
- if access to the compiler itself is granted in the choice of functions
- [C, in the present example], the use of this area can be programmed.
- The second, the pushdown list, reflects a decision to work on the level
- of reversed Polish notation, passing arguments to functions through a
- stack and recovering results from the same place. Chains of characters
- are likely to be the principal data type used, which means that some
- provision must be made for handling variable length arguments. The
- third structure is a sort of accumulator for long character strings,
- and is serviced by specialized functions such as searches, insertions,
- and deletions.
-
- The source listings contain descriptions of each of the operators and
- predicates, and should be consulted for the details of their functioning.
- The listing is broken down into separate modules according to these
- functional groupings, namely compiler, pushdown list, workspace, main
- program and input-output.
-
- REC interacts with the CP/M operating system by treating BDOS as a
- subroutine [k, or K] to which the necessary parameters are passed.
- Space for buffers, file control blocks and the like can be taken from
- the pushdown list, the workspace, or from absolute locations in the
- memory.
-
- The disk is mostly filled with applications programs. Five of them
- are compilers for "languages," such as one would meet in a Computer
- Science course. Indeed, in Puebla they are used for exercises in just
- such courses. Consequently, a certain background in things mathematical
- - symbolic logic, for example - would probably be useful preparation
- for the examples. Be that as it may, the applications of these languages
- are in turn readily comprehensible and rather amusing. Much sport can
- be had making up Turing Machines or Markov Algorithms to do simple
- things.
-
- The languages are:
-
- Turing Machine
- Markov Algorithm
- Post Production
- LISP
- CONVERT.
-
- They range from simple to complex; by far the most attention was
- devoted to the last of these, because it is a practical language of
- many applications. So is LISP, of course, but here it was written in
- REC as an exercise, which means that one should turn to one of the
- many serious LISP processors available if the intention is to use it
- for a major undertaking.
-
- In all cases one or two sample programs are shown for these languages
- so that some comparisons can be made between the ways that each of
- these languages goes about doing things; a binary sum is one example,
- a self-compiler of the language where appropriate is another. The
- user of this disk is urged to prepare for unpleasant surprises by
- making a backup copy before doing any experimentation. The REC versions
- of the sample programs will actually run, and are intended to be
- significant and illustrative in their own right. However, the five
- programs cited above will compile them from their own source programs.
- If this compilation is botched, one will be deprived of the enjoyment
- of the sample programs until the art of compilation has been mastered;
- this may or may not take a while.
-
- One of the great stimulants for the evolution of REC in particular
- directions was the desire to have a language with which CONVERT could
- be compiled; CONVERT has previously existed as an interpreter within
- LISP. Even yet there are details to be mastered in the generation of
- search patterns for CONVERT, so that CNVRT.REC will suffice for a time.
- CONVERT is a pattern matching language whose evolution from a scheme
- such as Post Productions is relatively evident. The idea behind its
- operation is to isolate segments of a text, either because they satisfy
- some requirement or because they are situated in a prescribed envoiron-
- ment [letters between a pair of parentheses, for example]. The segments
- so isolated are to be combined into new text, preferably with the help
- of very general functions and control structures. The whole process can
- be repeated over and over until a desired result is obtained.
-
- A brief outline of CNVRT is the following. The same structure that REC
- has is followed, to facilitate compilation. Thus a string of subroutines
- is to be written terminated by a main program; CNVRT.REC will insert the
- curly braces and the set of system subroutines.
-
- Each individual program is a quadruple, (()()()()), consisting of a list
- of pattern definitions, a list of skeleton definitions, a list of the
- variables to be used, and a rule set. Some of these elements may be null,
- but they must nevertheless be represented in the quadruple by a parenthesis
- pair.
-
- Pattern and skeleton lists follow the convention of alternating the body
- of the definition with the name which will represent it. Variables are
- simply numbers, in the range 0 to 20 or so (32, minus variables used by
- the system), separated by spaces. The only penalty attached to listing
- more variables than are needed is that they will be needlessly saved and
- unsaved many times.
-
- The rule set is a sequence of pairs, each pair must be followed by a
- colon or semicolon; in the nature of REC this determines whether the
- rule is a repetitive rule or a terminal rule. The pair consists of a
- pattern and a skeleton; the pattern is a predicate which ascertains
- whether the content of REC's workspace has a prescribed form or not.
- If it does, it may have identified some strings and labelled them as
- values of the variables. A successful pattern match is followed by
- substitution in the skeleton, an unsuccessful match results in passing
- to the next rule. If no rules remain, the process halts, the workspace
- remains unchanged.
-
- The skeleton portion of the rule determines the new content of the
- workspace. The workspace may be transformed iteratively by a repetitive
- rule, in which case the same rule set is scanned anew. Among the patterns
- are functions, whose arguments fill the workspace, and whose own rule set
- determines the further course of transformation. Several functions can
- work in sequence, each building up its part of the new workspace.
-
- Patterns are primitive and composite. Primitive patterns can be constants
- or variables. Implicitly, any character string which is not otherwise a
- pattern is a constant, recognizing the same character string in the
- workspace text. An interplay between round parentheses and angular
- brackets is used to quote these syntactic elements. Variables are strings
- of characters which are identified by special symbols, and which ought to
- have been selected in advance of the pattern matching. Part of the intricacy
- of CNVRT.REC, whose development has still not been brought to completion,
- lies in the mixing of the generation of variable candidates with the search
- and matching mechanism. If they are not mixed, however, the search becomes
- even more time consuming than it already is.
-
- Composite patterns are either boolean combinations of simpler patterns, or
- pattern definitions, or recursive combinations of the two. Two special
- cases, maximal iteration and minimal iteration, handle the majority of
- applications of recursively defined patterns [a parenthesis nest, a list
- of even length, or something similar], and may indeed be sufficient for
- all such combinations.
-
- Skeletons are either constants, variables, or functions. Again, anything
- which is not explicitly something else is implicitly a constant, which
- saves a great deal of quotation. Variables are character strings which
- were dissected out of the source text by the matching pattern paired with
- the skeleton. For the moment, functions are defined externally, but with
- time and experience inherent functions will probably be included in the
- language - counterparts of such things as do, while, or if in some of the
- other popular languages.
-
- Two principal examples of the use of CNVRT are included; one is the
- traditional symbol manipulation exercise of calculating symbolic
- derivatives without deltas and epsilons, but rather from a systematic
- application of the rules for the construction of algebraic expressions
- and the rules for the derivatives of these constructions; both can be
- given a very elegant formulation in recursive terms.
-
- The second example is not complete, but is included because of its
- utility in understanding another popular language - "C". The five
- programs
-
- ASMBL.CNV
- STMNT.CNV
- DCLRN.CNV
- EXPRN.CNV
- PRGRM.CNV
-
- each illustrate one aspect of"C". They can be used to test one's
- understanding of respectively, "C", Kernighan and Ritchey's book,
- CNVRT, or any combination of the three. In particular, STMNT does
- not exactly handle "CASE" correctly; this can be easily corrected,
- it is a useful exercise to see how to do so. It is interesting that
- the five programs were written in one month's time, using appendix
- A of the book, with little previous understanding of many of the
- principal ingredients. Two months are still lacking to realize the
- hypothetical goal of establishing a "C" compiler in a new machine,
- and will evidently be spent in learning how to bring down the size
- of all the intermediates a little.
-
- As a final illustration, a CNVRT "compiler" of REC is shown, which
- in turn may be examined for some insight into how REC works.
-
- One portion of the REC compiler, which occupies about 1.5K bytes,
- does the actual compilation, and functions just about as efficiently
- as a loader would. In fact, the custom of restricting REC functions
- to single letters results in the compiled code occupying about three
- times the volume of the source code. This is understandable in terms
- of the jumps and calls, each three byte instructions, into which the
- majority of REC characters translate. Particularly in the days of
- paper tape readers or punched card readers, the saving in volume amply
- repaid the time dedicated to compilation.
-
- A second portion of REC contains functions managing the pushdown list,
- with another third involved in the operation of the workspace. Some
- space is dedicated to tables, and to input-output routines. However,
- the majority of these depend upon CP/M for their operation, which is
- a part of the system overhead. All told, REC for either machine, 8080
- or 8086, occupies 5K bytes of code.
-
- There is obviously some inefficiency resulting from compiling a program
- in REC rather than writing it in assembly language. The restriction of
- the number of functions to the ASCII alphabet has resulted in some
- ingenious combinations and programming conventions which all take their
- toll in space. As usual, the rapidity with which certain tasks can be
- programmed in any language overshadows many of the penalties which are
- paid for a systematic and stylized way of doing things.
-
- The situation becomes more pronounced in CNVRT; to program every
- possible combination within a general class without fail requires
- elaborate precautions which are hardly ever necessary in the simpler
- examples. One has only to think of the similar situation in "C",
- FORTRAN, and other languages where the generality of indexing in
- arrays leads to formulations which appear excessive when applied to
- one-dimensional character strings. CNVRT seems to inflate by a
- factor between two and five when compared to REC, again depending
- on the proportion of comments present. This can mean a factor of
- ten to twenty compared to machine language.
-
- This condition will have to be overcome before a "C" compiler written
- in CNVRT becomes a practical reality. As it would seem, the same could
- be said of a "C" compiler written in "C".
-
- The programming examples are intended to be self-sufficient. Thus,
- when executed, say by a command line such as:
-
- A>REC80 DERIV.REC
-
- they will respond with an indication of the kind of input that they
- expect. Many are one-shot, running through a single cycle of oper-
- ation before returning to CP/M, others will cycle indefinitely until
- given null input. A person who wishes to improvise some programs of
- his own should note the trick employed by the five language compilers.
- Any commentary enclosed between double square brackets - [[...]] -
- will be echoed by the program as part of its initialization. Also,
- the source programs have three blank lines near their beginning, which
- designates a preferred location for the pre-defined subroutines which
- the compiler will incorporate into its object program.
-
- In general, these two features must be found within the first 1K bytes
- of the source program, and considerable caution should be exercised
- regarding the placement of parentheses, double square brackets, and
- triple blank lines at the beginning of programs. Also, a title such
- as [PROGRAM.XXX] will be changed to [PROGRAM.REC], helping to identify
- the compiled program.
-
- The compilers require a command line of the form:
-
- A>REC80 LANGUAGE.REC PROGRAM.LNG
-
- where LNG is the extension - TNG, MKV, PST, LSP, or CNV - identifying
- the language. In each case, the program will expect the user to employ
- it as an editor, compiling the source code line by line. Inexperienced
- users will probably not want to follow this procedure; therefore when
- the language extension is omitted:
-
- A>REC80 LANGUAGE.REC PROGRAM
-
- the compiler will follow an automatic sequence which will require no
- operator intervention. However, the file must already exist on the disk
- with the appropriate extension. The usual CP/M conventions for designating
- alternative disks may be used, nor need REC80 reside on disk A. In this
- mode of operation there is no interaction via the console, and thus there
- need be no preoccupation with the cursor controls and screen clearing
- commands.
-
- [end]
-