home *** CD-ROM | disk | FTP | other *** search
- Ch. 4. The Workspace, Part I (I, Q, F, D, J, Z)
- Ch. 4. The Workspace, Part II (E, U, V, M, j, z, A, B, q, Y)
- Ch. 4. The Workspace, Part III (a, b, e, f, w, <, >)
- : Chapter 4. The Workspace
-
- The workspace (WS) is a linear array of bytes of up to 64K, otherwise limited
- by the available memory, with which five pointers are associated: p0, p1, p2,
- p3 and p4. Given any pair of pointers p[j] and p[k], with j<k, pointer p[j]
- is inclusive and pointer p[k] is exclusive, so that null strings may be
- validly delimited in the workspace: the string delimited by pointers p[j] and
- p[k] starts at p[j] and its length is p[k]-p[j]. The absolute limits of the
- WS are defined by p0 (the lower limit) and p4 (the upper limit); the last
- address of the WS is p4-1 and the workspace length is p4-p0.
-
- The workspace can be used to carry out insertions, searches and
- modifications of character strings. The text contained by the WS begins at
- p0 and ends at p3-1, so that its total length is p3-p0; clearly p3 may not
- exceed p4. Pointers p1 and p2 are altered by the text insertion, search and
- modification operations; p1 and p2 always point to the lower and upper ends,
- respectively, of some substring of the text delimited by p0 and p3.
-
- Thus, the five workspace pointers always satisfy the following
- relationship:
-
- p0 ≤ p1 ≤ p2 ≤ p3 ≤ p4
-
- In the figure we show a representation of the workspace and its
- pointers; tick marks are intended to denote byte boundaries. The workspace
- shown contains a 7-byte string (delimited by p0 and p3); pointers p1 and p2
- delimit the single character c. The marker at p4 is required by the
- operator >, which we describe later on in this chapter.
-
- ┌─┬─┬─┬─┬─┬──┬──┬────────────────────────────────────────┬─┐
- │z a b c d CR LF│ │▒│
- └─┴─┴─┴─┴─┴──┴──┴────────────────────────────────────────┴─┘
- p0 ┘ p1 ┘ └ p2 └ p3 └ p4
-
- The initial contents of the workspace depend on the operating system.
- If the command line has the form
-
- recxx X
-
- where xx is 80 or 86 depending on the processor used and X is absent or is
- the name of a file containing a REC program to be compiled and executed, the
- workspace will be initially empty in CP/M, while under MS-DOS it will
- contain a CR, a LF, and a copy of the environment strings each terminated by
- the NUL character (a byte whose value is 0). Each environment string has the
- form VAR=value. In both cases the workspace pointers satisfy the relationship
- p0 = p1 = p2 ≤ p3 < p4.
-
- If the command line has the form
-
- recxx Y Z
-
- where xx is 80 or 86, Y is the name of a file and Z is a string consisting of
- at least one nonblank character, the initial contents of the workspace under
- CP/M will be the string Z (i.e., the command line tail); while under MS-DOS
- it will be the string Z followed by a CR, a LF and the environment strings as
- above, with the following relationship among the pointers: p0=p1=p2<p3<p4.
-
- In the MC68000 version of REC, the initial workspace contents are
- about the same as in REC86; the main differences are that argument strings
- are terminated with NULs and the arguments are separated from the environment
- strings by a single LF.
-
- Several of the operators and predicates which perform their main
- function on the workspace require one or more arguments from the PDL. All of
- them lift from it the arguments they require; some leave a result on the PDL.
-
- Next we describe the first group of predicates and operators which
- act on the workspace.
-
- Operator/Predicate Function performed
-
- I Operator. Inserts in the workspace the top PDL argument, if
- it will fit (i.e., if its length is less than p4-p3). The
- string is inserted starting at pointer p2, and the text
- originally delimited by p2 and p3 is shifted towards p4 to
- make space for the insertion. Pointers p1 and p2 are set to
- delimit the inserted text. If the text won't fit, the error
- message "WS ovfl" is issued and program execution is
- terminated.
-
- Q Operator. Copies the text delimited by p1 and p2 to the PDL.
- Both the workspace and its pointers remain unaltered.
-
- F Predicate, searches for text. The text in the workspace is
- examined from left to right beginning at p1 to see if the
- object of comparison on the PDL can be found. If so, its
- replica on the workspace is bracketed by p1 and p2 and F
- becomes true; otherwise p1 and p2 retain thir original values
- and F becomes false. In both cases the top argument is
- lifted from the PDL.
-
-
- D Operator which erases from the AT the text delimited by p1
- and p2. The text delimited by p2 and p3 is shifted towards
- p1 to close the gap generated by the deletion; when the
- operation ends p2=p1 holds.
-
- J Operator. Moves p1 to the beginning of the text, that is to
- say, assigns to p1 the value of p0.
-
- Z Operator. Moves p1 to the end of the text, i.e., assigns the
- value of p3 to p2.
-
- With these operators it becomes possible to write in REC and execute
- Markov Algorithms, which involve a series of modifications to an initially
- given text. The modifications can be described by a series of substitution
- rules of the form
-
- a1 => b1
- a2 => b2
- ...
-
- where a[i] and b[i] are (possibly null) character strings.
-
-
- In Markov algorithms, text is always searched from the left, and the
- substitution list is always consulted from top to bottom. If the chain a[i]
- is found, it is replaced by b[i], and the substitution list is consulted once
- again from the beginning. If a[i] is not found, a[i+1] is sought, until no
- further transformation is possible. Markov Algorithms also allow one or more
- substitutions to be declared terminal, conventionally by placing a point
- after the substituion arrow. Many examples and a thorough discussion may be
- found in Markov's book, "Theory of Algorithms".
-
- A Markov Algorithm may be translated directly into REC by writing
-
- J 'a' F D 'b' I :
-
- for an ordinary substitution or
-
- J 'a' F D 'b' I ;
-
- for a terminal substitution.
-
- To give a specific example, consider the problem of incrementing by
- one a number written out in binary form. A Markov algorithm to accomplish
- this is shown in the next panel.
-
- *1 => 1*
- *0 => 0*
- * => +
- 1+ => +0
- 0+ =>. 1
- + =>. 1
- => *
-
- It operates as follows: If neither a * nor a + is seen, a * is inserted at
- the front of the word (last substitution). It will then travel to the end of
- the word (first and second substitutions), changing places with 0's and 1's,
- wherupon it mutates into a + (third substitution). The + is the actual
- incrementer, traveling backwards generating a carry if it sees a 1, writing a
- 1 in place of a 0 or placing a 1 in front of the whole string.
-
- A REC program which reads in a binary number (as a string of ASCII 0s
- and 1s) and carries out this algorithm follows.
-
-
-
-
-
-
- [Markov Algorithm to increment a binary number]
- {(2573 TL;) L [new line]
- (R 13%= @L; "0"="0"T|: "1"="1"T|:
- TL "< 0, 1 or 'Return'"TL @L:) T [text]
- (@L "> "TL ""@T I
- (JZ QTL @L "*1" FD "1*" I:
- "*0" FD "0*" I:
- "*" FD "+" I:
- "1+" FD "+0" I:
- "0+" FD "1" I;
- "+" FD "1" I;
- "" FD "*" I: )
- JZQDT"1"=;L:)}
-
- It is not necessary to include J in each substitution because p1
- remains unchanged when F fails. Z is required in order to copy with Q the
- entire text to the PDL. The program stops when a null string or a single 0
- is given in response to the prompt "> " sent to the console at the beginning
- of each iteration in the main program's outer parenthesis level.
-
-
-
-
- When this program is applied to, e.g., the string 1011, the succesive
- steps in the transformation are
-
- 1011 [initial word]
- *1011
- 1*011
- 10*11
- 101*1
- 1011*
- 1011+
- 101+0
- 10+00
- 1100 [final word].
-
- Phrases can be created out of nothing only at the extreme left of the
- text, and must be moved around very laboriously because only one single
- consecutive string can be modified at a time. For example, to add two binary
- numbers, their digits first have to be intermingled, and then an adder moved
- through them. For this reason Markov Algorithms are not very practical.
- Their merit lies in their simplicity, and the ease with which theoretical
- results about computing can be established using them.
-
- : The Workspace, Part II
-
- Operator/Predicate Function performed
-
- E Predicate which becomes true if the WS contains a string
- which starts at p1 and is identical to the top PDL argument,
- in which case p2 is modified to delimit the matching text.
- If E fails, no pointer is moved, but in either case the PDL
- argument is lifted.
-
- U Predicate used to delimit an interval, given strings to
- delimit it. Given an argument on the PDL, U searches in the
- WS for a string identical to the argument, beginning at p2.
- If found, p1 will take the starting value of p2 and p2 will
- point to the beginning of the string found by U, and the
- predicate becomes true; otherwise U becomes false and the
- pointers are not moved. The top argument is lifted in either
- case. U differs from F in that its search starts at p2, and
- on succeeding the interval defined by p1 and p2 is the one
- lying between the previous (p1,p2) interval and the text
- found by U. For instance, if the workspace contains the
- string AABBCCDDEEFFGG, the program (J'BB'F 'F'U;) will result
- in pointer p1 and p2 delimiting the substring CCDDEE.
- V Predicate, similar to U in its operation. Searches in WS,
- beginning at p2, for a string matching exactly the top PDL
- argument (which will be lifted regardless of the outcome of
- the search). If found, value of p2 is altered so that it
- points to the end of the text found by V; p2 does not change
- otherwise. The difference between U and V is that U excludes
- the original interval and the text found from the interval
- defined by p1 and p2, whereas V includes betwwen p1 and p2
- the original interval and the text found. Thus, if the WS
- contains the string AABBCCDDEEFFGG, the program (J'BB'F
- 'F'V;) causes p1 and p2 to delimit the substring BBCCDDEEF.
-
- M Predicate requiring two arguments from the PDL, which define
- a lexicographic interval whose lower bound is the lower
- argument and whose upper bound is the top argument. M
- becomes true if the workspace contains a string which starts
- at p1 and is greater than or equal to the lower bound and
- smaller than or equal to the upper bound; if such a string
- exists p1 and p2 will delimit the smallest string lying in
- the interval; if M becomes false no pointers are moved, but
- in either case both PDL arguments are lifted. Lexicographic
- order is defined in the following panel.
-
- M (cont.) An n-byte string is equal to an m-byte string if and only if
- m=n and both strings consist of the same sequence of bytes;
- an n-byte string is greater than an m-byte string if there
- exists an integer k between 1 and the smaller of m and n,
- such that the strings coincide in the first k-1 bytes and the
- k-th byte of the first string is greater than the
- corresponding byte of the second string; an n-byte string is
- greater than an m-byte string if n>m and the first m
- characters of the first string are identical to the
- corresponding bytes of the second string. The ordering
- relation between bytes containing alphanumeric data is the
- one defined by the ASCII code, so that 0<1<...
- <9<...<A<B<...<Z<...<a<b<...<z. Notice that "" "xxx" M is
- always true regardless of the string xxx and causes p2 to
- take the value of p1, whereas "xxx" "" M, which would
- normally be false when xxx is nonnull, is interpreted in such
- a way that it is true only if the workspace contains a string
- that starts at p1 and is greater than or equal to xxx,
- effectively omitting the null string given as top argument
- from the comparison.
-
-
-
- j Operator, delimits a null interval at p1, i.e., assigns the
- value of p1 to p2.
-
- z Operator, delimits a null string at p2, i.e., it assigns the
- value of p2 to p1.
-
- A Predicate, makes p1 advance one byte if it can. If p1 is at
- the right end of the text (p1=p3) before the operation, A
- becomes false; otherwise p1 advances and drags p2 with it if
- necessary (so that p2<p1 never holds).
-
- B Predicate, makes p2 back up one byte if possible. If p1 is at
- the left end of the text (p1=p0), B becomes false; otherwise
- p1 backs up.
-
- q Operator which leaves two numbers on the PDL: the value of p1
- (as lower argument) and the difference p2-p1 (as top
- argument), i.e., it leaves the origin and length of the
- interval defined by p1 and p2. In the 8086 version, the
- lower argument (p1) consists of 4 bytes, the lower two
- containing p1 and the upper two containing the base address
- of the memory segment in which the workspace resides.
-
- Y Predicate requiring an argument on the PDL previously
- produced by either of the operator combinations qL or q|.
- There are several possibilities, depending on the argument
- provided and the processor used. On the 8080, if a 2-byte
- argument is given (generated by qL), Y is always true and
- assigns a value to p1, depending on the PDL value, say x, as
- follows: if x<p0, it moves p1 to p0; if x>p3, it moves both
- p1 and p2 to p3; if x>p2 but within p0 and p3, it assigns x
- to both p1 and p2, and in the remaining case, x is assigned
- to p1 and p2 remains unchanged. On the 8086, given a 4 byte
- argument (produced by qL), Y is false (and does not alter any
- of the pointers) if the segment base contained in the PDL
- argument is different from the base of the segment containing
- the WS currently in use (which may be changed as will be seen
- later on), otherwise, Y assigns values to p1 (and possibly
- p2) as above. To analyze the case where the PDL argument is
- the product of q|, call x and y the values of p1 and p2 used
- by the operator q, respectively. Then Y will be false (and
- not alter any pointers) if x<p0 or y>p3 (or if the segment
- bases are not equal, on the 8086), and true otherwise, in
- which case x is assigned to p1 and y to p2.
-
-
- We illustrate the use of these operators and predicates with a
- program which carries out the syntactical analysis of an arithmetic
- expression written out in the usual algebraic notation, in which all operands
- are decimal integer constants. The operators recognized are +, -, * and /;
- no blanks are allowed nor are + or - permitted as unary operators, but
- parenthesis may be used. (We will later extend this program to include
- constants containing a decimal point and a power of ten factor, and unary +
- and - operators.)
-
- [Arithmetic expression parser]
- {("0""9"Mz;) d [digit]
- (@d(@d:;);) q [integer]
- ('+'Ez; '-'Ez; '*'Ez; '/'Ez;) o [operator]
- (@q; '('Ez @e ')'Ez;) p [operand]
- (@p (qL @o@p L: Y;) ;) e [expression]
-
- (R 13%=""; T@J|;) J [read a line]
-
- (2573TL '> 'TL @J ""=;
- JZD I @e (A) ": yes"TL: ": no"TL:)}
-
-
-
- The main program signals with a ">" that it is ready to receive a
- string, reads a line and inserts it in the workspace after deleting the
- entire previous contents. If subroutine e recognizes an arithmetic
- expression spanning the entire text, p1 will be at the right end of the text
- so that (A) is true if the WS does contain an expression (p1 cannot advance)
- and false otherwise. Looking at the definition of an expression given in
- subroutine e, it is seen that an expression is a single operand or an
- alternating succession of operands and operators starting and ending with an
- operand (the REC operators qL and Y are used to guarantee that p1 ends up
- pointing to the first character not stasifying this definition of an
- expression); in turn, an operand may be an integer or an expression enclosed
- in parentheses, and an integer consists of one or more decimal digits
- (subroutines q and d). Subroutine q is true if there is a digit at p1
- (recognized by subroutine d) and makes p1 advance through all digits
- following the first one, if any; subroutine d is true only if there is a
- digit at the byte pointed to by p1, in which case the operator z cause p1 to
- advance to the next character. In a similar fashion, subroutine o becomes
- true if p1 points to a byte containing any one of the four symbols +, -, *
- or /.
-
- : The Workspace, Part III
-
- The last operators and predicates affecting the workspace and its
- pointers are the following:
-
- Operator/Predicate Function performed
-
- a Predicate. Given a two-byte integer n on the PDL, attempts
- to delimit an n-byte interval between p1 and p2 keeping p1
- fixed; if it can, it becomes true, lifts its argument and
- moves p2; if there are fewer than n bytes between p1 and p3,
- it becomes false, leaves p2 unchanged and replaces its PDL
- argument with the deficit, i.e., the difference n-(p3-p1).
-
- b Predicate. Given a two-byte integer n on the PDL, attempts
- to bracket between p1 and p2 an n-byte interval holding p2
- fixed, if it can, it becomes true and moves p1; if there are
- fewer than n bytes between p0 and p2, p1 remains unchanged
- and the predicate becomes false. The argument n is lifted in
- either case.
-
-
-
- e Predicate. Given a two-byte integer n on the PDL, attempts
- to extend the text by making p3 advance n bytes. If it can,
- e becomes true, p3 advances, p1 and p2 move to delimit the
- new n-byte extent and n is lifted from the PDL; otherwise no
- pointers are moved, e becomes false, and n is replaced on the
- PDL by the deficit, i.e., n-(p4-p3).
-
- f Predicate; it requires an argument on the PDL whose length we
- will call l for the sake of discussion. f replaces the text
- starting at p1 with the PDL argument if p1+l does not exceed
- p2, in which case it becomes true, leaves its argument on the
- PDL and advances p1 to the end of the replaced text. If the
- replacement would go beyond p2, f becomes false, no
- replacement takes place, p1 does not move and the argument is
- lifted from the PDL. For example, "a"(f:;) would fill all
- bytes delimited by p1 and p2 with the letter a.
-
-
-
-
-
-
-
- w Operator whose operation may take one of three forms,
- depending on the size of the argument it finds on the PDL.
- If this argument is the null string, it replaces it with a
- 10-byte argument consisting of the concatenated values of the
- WS pointers, p0 through p4. If its argument is 10 bytes
- long, assigns from it values to p0, p1, p2, p3 and p4 (but it
- does not check that the above-mentioned inequalities hold;
- the 10-byte argument should have been generated by a previous
- usage of w). Finally, if the PDL contains two 2-byte
- arguments, w assumes the top one to be a length (say l) and
- the lower one to be an address (say x); it will then replace
- both arguments with a ten-byte header containing the current
- values of p0 through p4 and defines a new workspace from x
- and l as follows: p0=p1=x, p2=p3=p4=x+l. In the 8086 version,
- the workspace headers are 12 bytes long (the two extra bytes
- being used to store the base address of the segment in which
- the WS resides) and the address argument of the third option
- may be 4 bytes long, the lower two containing an address and
- the upper two containing the base address of the segment that
- is to hold the new workspace.
-
-
-
- < Operator which closes down the workspace. The WS becomes
- restricted to the interval between pointers p1 and p2. The
- reason for this could be to restrict the editing operations
- to a smaller range, or it could be to have absolute freedom
- to work over some material before incorporating it into the
- rest of the text. As a practical matter, the text between
- pointers p2 and p3 is displaced to the upper end of the WS
- and the original values of p0 and p4 are recorded with it
- before setting up new values of the pointers, which are set
- as follows: p1 and p2 remain unchanged, p0 is set to the
- value of p1, p3 moves to p2 and p4 moves to the left of the
- string moved to the right end of the original WS.
-
- > Operator which reopens the WS (which must have been closed
- previously by <). The text lying between p0 and p3 takes the
- place of the text which lay between p1 and p2 before < was
- used; p0 and p4 are restored to their previous scope, p1 and
- p2 remain unaltered and p3 moves to the right end of the text
- brought back from the far end of the WS. Opening without
- prior closing causes an error to be noted.
-
-
-
- The program shown next parses arithmetic expressions which may
- contain floating point constants with power-of-ten factors of the form En or
- Dn, where n is an optionally signed integer and E and D denote single and
- double precision, respectively. The operators recognized by the program
- include ** and ^, which denote exponentiation; + and - are also accepted as
- unary operators.
-
- [Arithmetic expression parser]
- {("0""9"Mz;) d [digit]
- (@d:;) D [0 or more digits]
- (qL('E'Ez;'D'Ez;)(@a;;) @d@D L; Yj;) E [power of 10 factor]
- (Z< (@d@D'.'Ez; J'.'Ez@d; J@d; J>)
- >@D@E;) q [number]
- ('+'Ez; '-'Ez;) a [additive operator]
- ('*'Ez; '/'Ez;) m [mult. operator]
- ('**'Ez; '^'Ez;) i [power operator]
- (@i; @m; @a;) o [operator]
- (@q; '('Ez @e ')'Ez;) p [operand]
- (Z<(@p; J@a@p; J>)> (qL @o@p L: Y;) ;) e [expression]
-
- (R 13%= ""; T@J|;) J [read a line]
- (2573TL '> 'TL @J ""=; JZD I @e (A) ": yes" TL: ": no" TL:)}
-
- Notice the use of < and > in subroutines q and e. The purpose of <
- in both cases is to mark the beginning of the substring being parsed when
- several options are possible. In subroutine e, the fragment Z<(@p;J@a@p;J>)>
- could have been coded as qL(@p;YqL@a@p; Y)L. The advantage of using < and >
- is that the PDL is not altered in any way since the action of both operators
- is limited exclusively to the workspace.
-
- The next example shows the use of w and f to initialize to 0 all
- bytes in a memory block created on the PDL by the operator c:
-
- [creates a block and fills it with zeros]
- (c m p w 0% (f:;) w n;) 0
-
- This subroutine requires an argument on the PDL for c (the length of the
- block to be created); when done, the top PDL argument is the block's starting
- address. Observe that the operator combination pw turns the top argument
- into a workspace; this is one way to handle multiple workspaces. A very
- important item to note about w is that when used to set up a new workspace
- given an address and a length, the new workspace will appear to be completely
- filled, so that restricting it (with <) or inserting text (with I) will cause
- termination with the "WS overflow" message, unless some or all of the text
- is deleted with D. Our example above does not have this problem, since f
- replaces text rather than inserting it.
-
- The last example of the chapter shows how to fill or truncate a
- string to a specified length: it is a subroutine which leaves an 8 byte
- argument starting from the argument provided to it; if this argument is over
- 8 bytes long, it truncates it to the first 8; otherwise the predicates a, e
- and f are used to pad it with blanks on the right:
-
- [truncate or pad with blanks to 8 bytes]
- (I< (8azZD; e ' '(f:;) ;) JQD>;) 8
-
- There is a possibility that on execution this subroutine will become
- false; it will happen when the workspace is so full that predicate e cannot
- extend the text. The calling program should then take the appropriate action
- to correct the error.
-
- :[REC4.HLP]
- [(c) G. Cisneros, 1985, 1990]