home *** CD-ROM | disk | FTP | other *** search
-
-
- Chapter 4. The Workspace
-
- The workspace (WS) is a linear array of bytes limited by the
- available memory, with which five pointers are associated: p0, p1,
- p2, p3 and p4. Pointers p0 and p1 are inclusive and pointers p2, p3
- and p4 are exclusive, so that null strings may be validly delimited
- by the workspace pointers. The absolute limits of the WS are defined
- by p0 (the lower limit) and p4 (the upper limit); the last address of
- the WS is p4-1 and the workspace length is p4-p0.
-
- The workspace can be used to carry out insertions, searches
- and modifications of character strings. The text contained by the WS
- begins at p0 and ends at p3-1, so that its total length is p3-p0;
- clearly p3 may not exceed p4. Pointers p1 and p2 are altered by the
- text insertion, search and modification operations; p1 and p2 always
- point to the lower and upper ends, respectively, of some substring of
- the text delimited by p0 and p3.
-
- From the foregoing, it may be seen that the five workspace
- pointers always satisfy the following relationship:
-
- p0 < p1 < p2 < p3 < p4
- - - - -
-
- where p0 and p4 are the physical bounds of the WS, p0 and p3 are the
- bounds of the text contained by the workspace and p1 and p2 are
- auxiliary pointers acted upon by the operators and predicates to be
- described in this chapter.
-
- The initial contents of the workspace depend on the command
- line used to invoke REC's execution. If the command line has the
- form
-
- rec8x Y
-
- where x is 0 or 6 depending on the processor used and Y is absent or
- is the name of a file containing a REC program to be compiled and
- executed, the workspace will be initially empty, so that the
- workspace pointers satisfy the relationship p0=p1=p2=p3<p4. If the
- command line has the form
-
- rec8x Y Z
-
- where x is 0 or 6, Y is the name of a file and Z is a string
- consisting of at least one nonblank character, the initial contents
- of the workspace will be the string Z (i.e., the command line tail),
- with the following relationship among the pointers: p0=p1=p2<p3<p4.
-
- Several of the operators and predicates which perform their
- main function on the workspace require one or more arguments from the
- PDL. All of them lift from it the arguments they require; some leave
- a result on the PDL.
-
- Next we describe the first group of predicates and operators
- which act on the workspace.
-
- Operator/Predicate Function performed
-
- I Operator. Inserts in the workspace the top
- PDL argument, if it will fit (i.e., if its
- length is less than p4-p3). The string is
- inserted starting at pointer p2, and the text
- originally delimited by p2 and p3 is shifted
- towards p4 to make space for the insertion.
- Pointers p1 and p2 are set to delimit the
- inserted text. If the text won't fit, the
- error message "WS ovfl" is issued and program
- execution is terminated.
-
- Q Operator. Copies the text delimited by p1
- and p2 to the PDL. Both the workspace and
- its pointers remain unaltered.
-
- F Predicate, searches for text. The text in
- the workspace is examined from left to right
- beginning at p1 to see if the object of
- comparison on the PDL can be found. If so,
- its replica on the workspace is bracketed by
- p1 and p2 and F becomes true; otherwise p1
- and p2 retain thir original values and F
- becomes false. In both cases the top
- argument is lifted from the PDL.
-
- D Operator which erases from the AT the text
- delimited by p1 and p2. The text delimited
- by p2 and p3 is shifted towards p1 to close
- the gap generated by the deletion; when the
- operation ends p2=p1 holds.
-
- J Operator. Moves p1 to the beginning of the
- text, that is to say, assigns to p1 the value
- of p0.
-
- Z Operator. Moves p1 to the end of the text,
- i. e., assigns the value of p3 to p2.
-
- With these operators it becomes possible to write in REC and
- execute Markov Algorithms, which involve a series of modifications to
- an initially given text. The modifications can be described by a
- series of substitution rules of the form
-
- a1 => b1
- a2 => b2
- ...
-
- where a[i] and b[i] are (possibly null) character strings.
-
- Text is always searched from the left, and the substitution
- list is always consulted from top to bottom. If the chain a[i] is
- found, it is replaced by b[i], and the substitution list is consulted
- once again from the beginning. If a[i] is not found, a[i+1] is
- sought, until no further transformation is possible. Markov
- Algorithms also allow one or more substitutions to be declared
- terminal, conventionally by placing a point after the substituion
- arrow. Many examples and a thorough dicussion may be found in
- Markov's book, "Theory of Algorithms".
-
- A Markov Algorithm may be translated directly into REC by
- writing the fragment
-
- J 'a' F D 'b' I :
-
- for an ordinary substitution, and
-
- J 'a' F D 'b' I ;
-
- for a terminal substitution.
-
- To give a specific example, consider the problem of
- incrementing by one a number written out in binary form. A
- Markov algorithm to accomplish this is
-
- *1 => 1*
- *0 => 0*
- * => +
- 1+ => +0
- 0+ =>. 1
- + =>. 1
- => *
-
- and its operation is the following: If neither a * nor a + is seen,
- a * is inserted at the front of the word (last substitution). It
- will then travel to the end of the word (first and second
- substitutions), changing places with 0's and 1's, wherupon it mutates
- into a + (third substitution). The + is the actual incrementer,
- traveling backwards generating a carry if it sees a 1, writing a 1 in
- place of a 0 or placing a 1 in front of the whole string.
-
- A REC program which reads in a binary number (as a string of
- ASCII 0s and 1s) and carries out this algorithm is the following:
-
- [Markov Algorithm to increment a binary number]
- {(2573 TL;) [new line]
- (R 13%= @L; "0"="0"T|: "1"="1"T|:
- TL "< 0, 1 or 'Return'"TL @L:) T [text]
- (@L "> "TL ""@T I
- (JZ QTL @L "*1" FD "1*" I:
- "*0" FD "0*" I:
- "*" FD "+" I:
- "1+" FD "+0" I:
- "0+" FD "1" I;
- "+" FD "1" I;
- "" FD "*" I: )
- JZQDT"1"=;L:)}
-
- It is not necessary to include J in each substitution because
- p1 remains unchanged when F fails. Z is required in order to copy
- with Q the entire text to the PDL. The program stops when a null
- string or a single 0 is given in response to the prompt "> " sent to
- the console at the beginning of each iteration in the main program's
- outer parenthesis level.
-
- When this program is applied to, e.g., the string 1011, the
- succesive steps in the transformation are
-
- 1011 [initial word]
- *1011
- 1*011
- 10*11
- 101*1
- 1011*
- 1011+
- 101+0
- 10+00
- 1100 [final word].
-
- Phrases can be created out of nothing only at the extreme
- left of the text, and must be moved around very laboriously because
- only one single consecutive string can be modified at a time. For
- example, to add two binary numbers, their digits first have to be
- intermingled, and then an adder moved through them. For this reason
- Markov Algorithms are not very practical. Their merit lies in their
- simplicity, and the ease with which theoretical results about
- computing can be established using them.
-
- We now introduce a second group of operators and predicates
- referencing the text contained in the workspace.
-
- Operator/Predicate Function performed
-
- E Predicate which becomes true if the WS
- contains a string which starts at p1 and is
- identical to the top PDL argument, in which
- case p2 is modified to delimit the matching
- text. If E fails, no pointer is altered, but
- in either case the PDL argument is lifted.
-
- U Predicate used to delimit an interval, given
- strings to delimit it. Given an argument on
- the PDL, U searches in the WS for a string
- identical to the argument, beginning at p2.
- If found, p1 will take the starting value of
- p2 and p2 will point to the beginning of the
- string found by U, and the predicate becomes
- true; otherwise U becomes false and the
- pointers are not moved. The top argument is
- lifted in either case. U differs from F in
- that its search starts at p2, and on
- succeeding the interval defined by p1 and p2
- is the one lying between the previous (p1,p2)
- interval and the text found by U. For
- instance, if the workspace contains the
- string AABBCCDDEEFFGG, the program (J'BB'F
- 'F'U;) will result in pointer p1 and p2
- delimiting the substring CCDDEE.
-
- V Predicate, similar to U in its operation.
- Searches in AT, beginning at p2, for a string
- matching exactly the top PDL argument (which
- will be lifted regardless of the outcome of
- the search). If found, value of p2 is
- altered so that it points to the end of the
- text found by V; p2 does not change otherwise.
- The difference between U and V is that U
- excludes the original interval and the text
- found from the interval defined by p1 and p2,
- whereas V includes betwwen p1 and p2 the
- original interval and the text found. Thus,
- if the WS contains the string AABBCCDDEEFFGG,
- the program (J'BB'F 'F'V;) causes p1 and p2
- to delimit the substring BBCCDDEEF.
-
- M Predicate requiring two arguments from the
- PDL, which define a lexicographic interval
- whose lower bound is the lower argument and
- whose upper bound is the top argument. M
- becomes true if the workspace contains a
- string which starts at p1 and is greater than
- or equal to the lower bound and smaller than
- or equal to the upper bound; if such a string
- exists p1 and p2 will delimit the smallest
- string lying in the interval; if M becomes
- false no pointers are moved, but in either
- case both PDL arguments are lifted.
- Lexicographic order is defined as follows:
- An n-byte string is equal to an m-byte string
- if and only if m=n and both strings consist of
- the same sequence of bytes; an n-byte string
- is greater than an m-byte string if there
- exists an integer k between 1 and the smaller
- of m and n, such that the strings coincide in
- the first k-1 bytes and the k-th byte of the
- first string is greater than the corresponding
- byte of the second string; an n-byte string is
- greater than an m-byte string if n>m and the
- first m characters of the first string are
- identical to the corresponding bytes of the
- second string. The ordering relation between
- bytes containing alphanumeric data is the one
- defined by the ASCII code, so that 0<1<...
- <9<...<A<B<...<Z<...<a<b<...<z. Notice that
- "" "xxx" M is always true regardless of the
- string xxx and causes p2 to take the value of
- p1, whereas "xxx" "" M, which would normally
- be false when xxx is nonnull, is interpreted
- in such a way that it is true only if the
- workspace contains a string that starts at
- p1 and is greater than or equal to xxx,
- effectively omitting the null string given as
- top argument from the comparison.
-
- j Operator, delimits a null interval at p1,
- i.e., assigns the value of p1 to p2.
-
- z Operator, delimits a null string at p2, i.e.,
- it assigns the value of p2 to p1.
-
- A Predicate, makes p1 advance one byte if it
- can. If p1 is at the right end of the text
- (p1=p3) before the operation, A becomes false;
- otherwise p1 advances and drags p2 with it if
- necessary (so that p2<p1 never holds).
-
- B Predicate, makes p2 back up one byte if
- possible. If p1 is at the left end of the
- text (p1=p0), B becomes false; otherwise p1
- backs up.
-
- q Operator which leaves two numbers on the PDL:
- the value of p1 (as lower argument) and the
- difference p2-p1 (as top argument), i.e., it
- leaves the origin and length of the interval
- defined by p1 and p2. In the 8086 version,
- the lower argument (p1) consists of 4 bytes,
- the lower two containing p1 and the upper two
- containing the base address of the memory
- segment in which the workspace resides.
-
- Y Predicate requiring an argument on the PDL
- previously produced by either of the operator
- combinations qL or q|. There are several
- possibilities, depending on the argument
- provided and the processor used. On the 8080,
- if a 2-byte argument is given (generated by
- qL), Y is always true and assigns a value to
- p1, depending on the PDL value, say x, as
- follows: if x<p0, it moves p1 to p0; if x>p3,
- it moves both p1 and p2 to p3; if x>p2 but
- within p0 and p3, it assigns x to both p1 and
- p2, and in the remaining case, x is assigned
- to p1 and p2 remains unchanged. On the 8086,
- given a 4 byte argument (produced by qL), Y is
- false (and does not alter any of the pointers)
- if the segment base contained in the PDL
- argument is different from the base of the
- segment containing the WS currently in use
- (which may be changed as will be seen later
- on), otherwise, Y assigns values to p1 (and
- possibly p2) as above. To analyze the case
- where the PDL argument is the product of q|,
- call x and y the values of p1 and p2 used by
- the operator q, respectively. Then Y will
- be false (and not alter any pointers) if x<p0
- or y>p3 (or if the segment bases are not
- equal, on the 8086), and true otherwise, in
- which case x is assigned to p1 and y to p2.
-
- We illustrate the use of these operators and predicates with
- a program which carries out the syntactical analysis of an arithmetic
- expression written out in the usual algebraic notation, in which all
- operands are decimal integer constants. The operators recognized are
- +, -, * and /, no blanks are allowed nor are + or - permitted as
- unary operators, but parenthesis may be used. (We will later extend
- this program to include constants containing a decimal point and
- a power of ten factor, and unary + and - operators.)
-
- [Arithmetic expression parser]
- {("0""9"Mz;) d [digit]
- (@d(@d:;);) q [integer]
- ('+'Ez; '-'Ez; '*'Ez; '/'Ez;) o [operator]
- (@q; '('Ez @e ')'Ez;) p [operand]
- (@p (qL @o@p L: Y;) ;) e [expression]
-
- (R 13%=""; T@J|;) J [read a line]
-
- (2573TL '> 'TL @J ""=;
- JZD I @e (A) ": yes"TL: ": no"TL:)}
-
- The main program signals with a ">" that it is ready to
- receive a string, reads a line and inserts it in the workspace after
- deleting the entire previous contents. If subroutine e recognizes an
- arithmetic expression spanning the entire text, p1 will be at the
- right end of the text so that (A) is true if the WS does contain an
- expression (p1 cannot advance) and false otherwise. Looking at the
- definition of an expression given in subroutine e, it is seen that an
- expression is a single operand or an alternating succession of
- operands and operators starting and ending with an operand (the REC
- operators qL and Y are used to guarantee that p1 ends up pointing to
- the first character not stasifying this definition of an expression);
- in turn, an operand may be an integer or an expression enclosed in
- parentheses, and an integer consists of one or more decimal digits
- (subroutines q and d). Subroutine q is true if there is a digit at
- p1 (recognized by subroutine d) and makes p1 advance through all
- digits following the first one, if any; subroutine d is true only if
- there is a digit at the byte pointed to by p1, in which case the
- operator z cause p1 to advance to the next character. In a similar
- fashion, subroutine o becomes true if p1 points to a byte containing
- any one of the four symbols +, -, * or /.
-
- The last operators and predicates affecting the workspace and
- its pointers are the following:
-
-
- Operator/Predicate Function performed
-
- a Predicate. Given a two-byte integer n on the
- PDL, attempts to delimit an n-byte interval
- between p1 and p2 keeping p1 fixed; if it can,
- it becomes true, lifts its argument and moves
- p2; if there are fewer than n bytes between
- p1 and p3, it becomes false, leaves p2
- unchanged and replaces its PDL argument with
- the deficit, i.e., the difference n-(p3-p1).
-
- b Predicate. Given a two-byte integer n on the
- PDL, attempts to bracket between p1 and p2 an
- n-byte interval holding p2 fixed, if it can,
- it becomes true and moves p1; if there are
- fewer than n bytes between p0 and p2, p1
- remains unchanged and the predicate becomes
- false. The argument n is lifted in either
- case.
-
- e Predicate. Given a two-byte integer n on the
- PDL, attempts to extend the text by making
- p3 advance n bytes. If it can, e becomes
- true, p3 advances, p1 and p2 move to delimit
- the new n-byte extent and n is lifted from
- the PDL; otherwise no pointers are moved, e
- becomes false, and n is replaced on the PDL
- by the deficit, i.e., n-(p4-p3).
-
- f Predicate; it requires an argument on the
- PDL whose length we will call l for the sake
- of discussion. f replaces the text starting
- at p1 with the PDL argument if p1+l does not
- exceed p2, in which case it becomes true,
- leaves its argument on the PDL and advances
- p1 to the end of the replaced text. If the
- replacement would go beyond p2, f becomes
- false, no replacement takes place, p1 does
- not move and the argument is lifted from the
- PDL. For example, "a"(f:;) would fill all
- bytes delimited by p1 and p2 with the letter
- a.
-
- w Operator whose operation may take one of
- three forms, depending on the size of the
- argument it finds on the PDL. If this
- argument is the null string, it replaces it
- with a 10-byte argument consisting of the
- concatenated values of the WS pointers, p0
- through p4. If its argument is 10 bytes
- long, assigns from it values to p0, p1, p2,
- p3 and p4 (but it does not check that the
- above-mentioned inequalities hold; the
- 10-byte argument should have been generated
- by a previous usage of w). Finally, if the
- PDL contains two 2-byte arguments, w assumes
- the top one to be a length (say l) and the
- lower one to be an address (say x); it will
- then replace both arguments with a ten-byte
- header containing the current values of p0
- through p4 and defines a new workspace from
- x and l as follows: p0=p1=x, p2=p3=p4=x+l.
- In the 8086 version, the workspace headers
- are 12 bytes long (the two extra bytes being
- used to store the base address of the segment
- in which the WS resides) and the address
- argument of the third option may be 4 bytes
- long, the lower two containing an address
- and the upper two containing the base address
- of the segment that is to hold the new
- workspace.
-
- < Operator which closes down the workspace.
- The workspace becomes restricted to the
- interval between pointers p1 and p2. The
- reason for this could be to restrict the
- editing operations to a smaller range, or
- it could be to have absolute freedom to
- work over some material before incorporating
- it into the rest of the text. As a practical
- matter, the text between pointers p2 and p3
- is displaced to the upper end of the
- workspace and the original values of p0 and
- p4 are recorded with it before setting up
- new values of the pointers, which are set as
- follows: p1 and p2 remain unchanged, p0 is
- set to the value of p1, p3 moves to p2 and
- p4 moves to the left of the string moved
- to the right end of the original workspace.
-
- > Operator which reopens the workspace (which
- must have been closed previously by <). The
- text lying between p0 and p3 takes the place
- of the text which lay between p1 and p2
- before < was used; p0 and p4 are restored to
- their previous scope, p1 and p2 remain
- unaltered and p3 moves to the right end of
- the text brought back from the far end of the
- workspace. If > is executed without a prior
- execution of <, no action takes place and a
- note is made of the error.
-
- The program shown next parses arithmetic expressions which
- may contain floating point constants with power-of-ten factors of the
- form En or Dn, where n is an optionally signed integer and E and D
- denote single and double precision, respectively. The operators
- recognized by the program include ** and ^, which denote
- exponentiation; + and - are also accepted as unary operators.
-
- [Arithmetic expression parser]
- {("0""9"Mz;) d [digit]
- (@d:;) D [0 or more digits]
- (qL('E'Ez;'D'Ez;)(@a;;) @d@D L; Yj;) E [power of 10 factor]
- (Z< (@d@D'.'Ez; J'.'Ez@d; J@d; J>)
- >@D@E;) q [number]
- ('+'Ez; '-'Ez;) a [additive operator]
- ('*'Ez; '/'Ez;) m [mult. operator]
- ('**'Ez; '^'Ez;) i [power operator]
- (@i; @m; @a;) o [operator]
- (@q; '('Ez @e ')'Ez;) p [operand]
- (Z<(@p; J@a@p; J>)>
- (qL @o@p L: Y;) ;) e [expression]
-
- (R 13%= ""; T@J|;) J [read a line]
- (2573TL '> 'TL @J ""=;
- JZD I @e (A) ": yes" TL: ": no" TL:)}
-
- Notice the use of < and > in subroutines q and e. The
- purpose of < in both cases is to mark the beginning of the substring
- being parsed when several options are possible. In subroutine e,
- the fragment Z<(@p;J@a@p;J>)> could have been coded as qL(@p;YqL@a@p;
- Y)L. The advantage of using < and > is that the PDL is not altered
- in any way since the action of both operators is limited exclusively
- to the workspace.
-
- The next example shows the use of w and f to initialize to 0
- all bytes in a memory block created on the PDL by the operator c:
-
- [creates a block and fills it with zeros]
- (c m p w 0% (f:;) w n;) 0
-
- This subroutine requires an argument on the PDL for c (the length of
- the block to be created); when done, the top PDL argument is the
- block's starting address. Observe that the operator combination pw
- turns the top argument into a workspace; this is one way to handle
- multiple workspaces.
-
- The following subroutine leaves an 8 byte argument starting
- from the argument provided to it; if this argument is over 8 bytes
- long, it truncates it to the first 8; otherwise the predicates a, e
- and f are used to pad it with blanks on the right:
-
- [truncate or pad with blanks to 8 bytes]
- (I< (8azZD; e ' '(f:;) ;_) JQD>;) 8
-
- The operator _ performs an unconditional return to CP/M, the
- operating system; this is included as a pretection in case e turns
- out to be false (which would mean there is no space in the workspace
- to carry out the operation).