Programmer 7500

home *** CD-ROM | disk | FTP | other *** search

/ Programmer 7500 / MAX_PROGRAMMERS.iso / INFO / MISC / REC.ZIP / RECC.DOC < prev next >

Wrap

Text File | 1986-12-01 | 22.5 KB | 545 lines

Chapter 4. The Workspace The workspace (WS) is a linear array of bytes limited by the available memory, with which five pointers are associated: p0, p1, p2, p3 and p4. Pointers p0 and p1 are inclusive and pointers p2, p3 and p4 are exclusive, so that null strings may be validly delimited by the workspace pointers. The absolute limits of the WS are defined by p0 (the lower limit) and p4 (the upper limit); the last address of the WS is p4-1 and the workspace length is p4-p0. The workspace can be used to carry out insertions, searches and modifications of character strings. The text contained by the WS begins at p0 and ends at p3-1, so that its total length is p3-p0; clearly p3 may not exceed p4. Pointers p1 and p2 are altered by the text insertion, search and modification operations; p1 and p2 always point to the lower and upper ends, respectively, of some substring of the text delimited by p0 and p3. From the foregoing, it may be seen that the five workspace pointers always satisfy the following relationship: p0 < p1 < p2 < p3 < p4 - - - - where p0 and p4 are the physical bounds of the WS, p0 and p3 are the bounds of the text contained by the workspace and p1 and p2 are auxiliary pointers acted upon by the operators and predicates to be described in this chapter. The initial contents of the workspace depend on the command line used to invoke REC's execution. If the command line has the form rec8x Y where x is 0 or 6 depending on the processor used and Y is absent or is the name of a file containing a REC program to be compiled and executed, the workspace will be initially empty, so that the workspace pointers satisfy the relationship p0=p1=p2=p3<p4. If the command line has the form rec8x Y Z where x is 0 or 6, Y is the name of a file and Z is a string consisting of at least one nonblank character, the initial contents of the workspace will be the string Z (i.e., the command line tail), with the following relationship among the pointers: p0=p1=p2<p3<p4. Several of the operators and predicates which perform their main function on the workspace require one or more arguments from the PDL. All of them lift from it the arguments they require; some leave a result on the PDL. Next we describe the first group of predicates and operators which act on the workspace. Operator/Predicate Function performed I Operator. Inserts in the workspace the top PDL argument, if it will fit (i.e., if its length is less than p4-p3). The string is inserted starting at pointer p2, and the text originally delimited by p2 and p3 is shifted towards p4 to make space for the insertion. Pointers p1 and p2 are set to delimit the inserted text. If the text won't fit, the error message "WS ovfl" is issued and program execution is terminated. Q Operator. Copies the text delimited by p1 and p2 to the PDL. Both the workspace and its pointers remain unaltered. F Predicate, searches for text. The text in the workspace is examined from left to right beginning at p1 to see if the object of comparison on the PDL can be found. If so, its replica on the workspace is bracketed by p1 and p2 and F becomes true; otherwise p1 and p2 retain thir original values and F becomes false. In both cases the top argument is lifted from the PDL. D Operator which erases from the AT the text delimited by p1 and p2. The text delimited by p2 and p3 is shifted towards p1 to close the gap generated by the deletion; when the operation ends p2=p1 holds. J Operator. Moves p1 to the beginning of the text, that is to say, assigns to p1 the value of p0. Z Operator. Moves p1 to the end of the text, i. e., assigns the value of p3 to p2. With these operators it becomes possible to write in REC and execute Markov Algorithms, which involve a series of modifications to an initially given text. The modifications can be described by a series of substitution rules of the form a1 => b1 a2 => b2 ... where a[i] and b[i] are (possibly null) character strings. Text is always searched from the left, and the substitution list is always consulted from top to bottom. If the chain a[i] is found, it is replaced by b[i], and the substitution list is consulted once again from the beginning. If a[i] is not found, a[i+1] is sought, until no further transformation is possible. Markov Algorithms also allow one or more substitutions to be declared terminal, conventionally by placing a point after the substituion arrow. Many examples and a thorough dicussion may be found in Markov's book, "Theory of Algorithms". A Markov Algorithm may be translated directly into REC by writing the fragment J 'a' F D 'b' I : for an ordinary substitution, and J 'a' F D 'b' I ; for a terminal substitution. To give a specific example, consider the problem of incrementing by one a number written out in binary form. A Markov algorithm to accomplish this is *1 => 1* *0 => 0* * => + 1+ => +0 0+ =>. 1 + =>. 1 => * and its operation is the following: If neither a * nor a + is seen, a * is inserted at the front of the word (last substitution). It will then travel to the end of the word (first and second substitutions), changing places with 0's and 1's, wherupon it mutates into a + (third substitution). The + is the actual incrementer, traveling backwards generating a carry if it sees a 1, writing a 1 in place of a 0 or placing a 1 in front of the whole string. A REC program which reads in a binary number (as a string of ASCII 0s and 1s) and carries out this algorithm is the following: [Markov Algorithm to increment a binary number] {(2573 TL;) [new line] (R 13%= @L; "0"="0"T|: "1"="1"T|: TL "< 0, 1 or 'Return'"TL @L:) T [text] (@L "> "TL ""@T I (JZ QTL @L "*1" FD "1*" I: "*0" FD "0*" I: "*" FD "+" I: "1+" FD "+0" I: "0+" FD "1" I; "+" FD "1" I; "" FD "*" I: ) JZQDT"1"=;L:)} It is not necessary to include J in each substitution because p1 remains unchanged when F fails. Z is required in order to copy with Q the entire text to the PDL. The program stops when a null string or a single 0 is given in response to the prompt "> " sent to the console at the beginning of each iteration in the main program's outer parenthesis level. When this program is applied to, e.g., the string 1011, the succesive steps in the transformation are 1011 [initial word] *1011 1*011 10*11 101*1 1011* 1011+ 101+0 10+00 1100 [final word]. Phrases can be created out of nothing only at the extreme left of the text, and must be moved around very laboriously because only one single consecutive string can be modified at a time. For example, to add two binary numbers, their digits first have to be intermingled, and then an adder moved through them. For this reason Markov Algorithms are not very practical. Their merit lies in their simplicity, and the ease with which theoretical results about computing can be established using them. We now introduce a second group of operators and predicates referencing the text contained in the workspace. Operator/Predicate Function performed E Predicate which becomes true if the WS contains a string which starts at p1 and is identical to the top PDL argument, in which case p2 is modified to delimit the matching text. If E fails, no pointer is altered, but in either case the PDL argument is lifted. U Predicate used to delimit an interval, given strings to delimit it. Given an argument on the PDL, U searches in the WS for a string identical to the argument, beginning at p2. If found, p1 will take the starting value of p2 and p2 will point to the beginning of the string found by U, and the predicate becomes true; otherwise U becomes false and the pointers are not moved. The top argument is lifted in either case. U differs from F in that its search starts at p2, and on succeeding the interval defined by p1 and p2 is the one lying between the previous (p1,p2) interval and the text found by U. For instance, if the workspace contains the string AABBCCDDEEFFGG, the program (J'BB'F 'F'U;) will result in pointer p1 and p2 delimiting the substring CCDDEE. V Predicate, similar to U in its operation. Searches in AT, beginning at p2, for a string matching exactly the top PDL argument (which will be lifted regardless of the outcome of the search). If found, value of p2 is altered so that it points to the end of the text found by V; p2 does not change otherwise. The difference between U and V is that U excludes the original interval and the text found from the interval defined by p1 and p2, whereas V includes betwwen p1 and p2 the original interval and the text found. Thus, if the WS contains the string AABBCCDDEEFFGG, the program (J'BB'F 'F'V;) causes p1 and p2 to delimit the substring BBCCDDEEF. M Predicate requiring two arguments from the PDL, which define a lexicographic interval whose lower bound is the lower argument and whose upper bound is the top argument. M becomes true if the workspace contains a string which starts at p1 and is greater than or equal to the lower bound and smaller than or equal to the upper bound; if such a string exists p1 and p2 will delimit the smallest string lying in the interval; if M becomes false no pointers are moved, but in either case both PDL arguments are lifted. Lexicographic order is defined as follows: An n-byte string is equal to an m-byte string if and only if m=n and both strings consist of the same sequence of bytes; an n-byte string is greater than an m-byte string if there exists an integer k between 1 and the smaller of m and n, such that the strings coincide in the first k-1 bytes and the k-th byte of the first string is greater than the corresponding byte of the second string; an n-byte string is greater than an m-byte string if n>m and the first m characters of the first string are identical to the corresponding bytes of the second string. The ordering relation between bytes containing alphanumeric data is the one defined by the ASCII code, so that 0<1<... <9<...<A<B<...<Z<...<a<b<...<z. Notice that "" "xxx" M is always true regardless of the string xxx and causes p2 to take the value of p1, whereas "xxx" "" M, which would normally be false when xxx is nonnull, is interpreted in such a way that it is true only if the workspace contains a string that starts at p1 and is greater than or equal to xxx, effectively omitting the null string given as top argument from the comparison. j Operator, delimits a null interval at p1, i.e., assigns the value of p1 to p2. z Operator, delimits a null string at p2, i.e., it assigns the value of p2 to p1. A Predicate, makes p1 advance one byte if it can. If p1 is at the right end of the text (p1=p3) before the operation, A becomes false; otherwise p1 advances and drags p2 with it if necessary (so that p2<p1 never holds). B Predicate, makes p2 back up one byte if possible. If p1 is at the left end of the text (p1=p0), B becomes false; otherwise p1 backs up. q Operator which leaves two numbers on the PDL: the value of p1 (as lower argument) and the difference p2-p1 (as top argument), i.e., it leaves the origin and length of the interval defined by p1 and p2. In the 8086 version, the lower argument (p1) consists of 4 bytes, the lower two containing p1 and the upper two containing the base address of the memory segment in which the workspace resides. Y Predicate requiring an argument on the PDL previously produced by either of the operator combinations qL or q|. There are several possibilities, depending on the argument provided and the processor used. On the 8080, if a 2-byte argument is given (generated by qL), Y is always true and assigns a value to p1, depending on the PDL value, say x, as follows: if x<p0, it moves p1 to p0; if x>p3, it moves both p1 and p2 to p3; if x>p2 but within p0 and p3, it assigns x to both p1 and p2, and in the remaining case, x is assigned to p1 and p2 remains unchanged. On the 8086, given a 4 byte argument (produced by qL), Y is false (and does not alter any of the pointers) if the segment base contained in the PDL argument is different from the base of the segment containing the WS currently in use (which may be changed as will be seen later on), otherwise, Y assigns values to p1 (and possibly p2) as above. To analyze the case where the PDL argument is the product of q|, call x and y the values of p1 and p2 used by the operator q, respectively. Then Y will be false (and not alter any pointers) if x<p0 or y>p3 (or if the segment bases are not equal, on the 8086), and true otherwise, in which case x is assigned to p1 and y to p2. We illustrate the use of these operators and predicates with a program which carries out the syntactical analysis of an arithmetic expression written out in the usual algebraic notation, in which all operands are decimal integer constants. The operators recognized are +, -, * and /, no blanks are allowed nor are + or - permitted as unary operators, but parenthesis may be used. (We will later extend this program to include constants containing a decimal point and a power of ten factor, and unary + and - operators.) [Arithmetic expression parser] {("0""9"Mz;) d [digit] (@d(@d:;);) q [integer] ('+'Ez; '-'Ez; '*'Ez; '/'Ez;) o [operator] (@q; '('Ez @e ')'Ez;) p [operand] (@p (qL @o@p L: Y;) ;) e [expression] (R 13%=""; T@J|;) J [read a line] (2573TL '> 'TL @J ""=; JZD I @e (A) ": yes"TL: ": no"TL:)} The main program signals with a ">" that it is ready to receive a string, reads a line and inserts it in the workspace after deleting the entire previous contents. If subroutine e recognizes an arithmetic expression spanning the entire text, p1 will be at the right end of the text so that (A) is true if the WS does contain an expression (p1 cannot advance) and false otherwise. Looking at the definition of an expression given in subroutine e, it is seen that an expression is a single operand or an alternating succession of operands and operators starting and ending with an operand (the REC operators qL and Y are used to guarantee that p1 ends up pointing to the first character not stasifying this definition of an expression); in turn, an operand may be an integer or an expression enclosed in parentheses, and an integer consists of one or more decimal digits (subroutines q and d). Subroutine q is true if there is a digit at p1 (recognized by subroutine d) and makes p1 advance through all digits following the first one, if any; subroutine d is true only if there is a digit at the byte pointed to by p1, in which case the operator z cause p1 to advance to the next character. In a similar fashion, subroutine o becomes true if p1 points to a byte containing any one of the four symbols +, -, * or /. The last operators and predicates affecting the workspace and its pointers are the following: Operator/Predicate Function performed a Predicate. Given a two-byte integer n on the PDL, attempts to delimit an n-byte interval between p1 and p2 keeping p1 fixed; if it can, it becomes true, lifts its argument and moves p2; if there are fewer than n bytes between p1 and p3, it becomes false, leaves p2 unchanged and replaces its PDL argument with the deficit, i.e., the difference n-(p3-p1). b Predicate. Given a two-byte integer n on the PDL, attempts to bracket between p1 and p2 an n-byte interval holding p2 fixed, if it can, it becomes true and moves p1; if there are fewer than n bytes between p0 and p2, p1 remains unchanged and the predicate becomes false. The argument n is lifted in either case. e Predicate. Given a two-byte integer n on the PDL, attempts to extend the text by making p3 advance n bytes. If it can, e becomes true, p3 advances, p1 and p2 move to delimit the new n-byte extent and n is lifted from the PDL; otherwise no pointers are moved, e becomes false, and n is replaced on the PDL by the deficit, i.e., n-(p4-p3). f Predicate; it requires an argument on the PDL whose length we will call l for the sake of discussion. f replaces the text starting at p1 with the PDL argument if p1+l does not exceed p2, in which case it becomes true, leaves its argument on the PDL and advances p1 to the end of the replaced text. If the replacement would go beyond p2, f becomes false, no replacement takes place, p1 does not move and the argument is lifted from the PDL. For example, "a"(f:;) would fill all bytes delimited by p1 and p2 with the letter a. w Operator whose operation may take one of three forms, depending on the size of the argument it finds on the PDL. If this argument is the null string, it replaces it with a 10-byte argument consisting of the concatenated values of the WS pointers, p0 through p4. If its argument is 10 bytes long, assigns from it values to p0, p1, p2, p3 and p4 (but it does not check that the above-mentioned inequalities hold; the 10-byte argument should have been generated by a previous usage of w). Finally, if the PDL contains two 2-byte arguments, w assumes the top one to be a length (say l) and the lower one to be an address (say x); it will then replace both arguments with a ten-byte header containing the current values of p0 through p4 and defines a new workspace from x and l as follows: p0=p1=x, p2=p3=p4=x+l. In the 8086 version, the workspace headers are 12 bytes long (the two extra bytes being used to store the base address of the segment in which the WS resides) and the address argument of the third option may be 4 bytes long, the lower two containing an address and the upper two containing the base address of the segment that is to hold the new workspace. < Operator which closes down the workspace. The workspace becomes restricted to the interval between pointers p1 and p2. The reason for this could be to restrict the editing operations to a smaller range, or it could be to have absolute freedom to work over some material before incorporating it into the rest of the text. As a practical matter, the text between pointers p2 and p3 is displaced to the upper end of the workspace and the original values of p0 and p4 are recorded with it before setting up new values of the pointers, which are set as follows: p1 and p2 remain unchanged, p0 is set to the value of p1, p3 moves to p2 and p4 moves to the left of the string moved to the right end of the original workspace. > Operator which reopens the workspace (which must have been closed previously by <). The text lying between p0 and p3 takes the place of the text which lay between p1 and p2 before < was used; p0 and p4 are restored to their previous scope, p1 and p2 remain unaltered and p3 moves to the right end of the text brought back from the far end of the workspace. If > is executed without a prior execution of <, no action takes place and a note is made of the error. The program shown next parses arithmetic expressions which may contain floating point constants with power-of-ten factors of the form En or Dn, where n is an optionally signed integer and E and D denote single and double precision, respectively. The operators recognized by the program include ** and ^, which denote exponentiation; + and - are also accepted as unary operators. [Arithmetic expression parser] {("0""9"Mz;) d [digit] (@d:;) D [0 or more digits] (qL('E'Ez;'D'Ez;)(@a;;) @d@D L; Yj;) E [power of 10 factor] (Z< (@d@D'.'Ez; J'.'Ez@d; J@d; J>) >@D@E;) q [number] ('+'Ez; '-'Ez;) a [additive operator] ('*'Ez; '/'Ez;) m [mult. operator] ('**'Ez; '^'Ez;) i [power operator] (@i; @m; @a;) o [operator] (@q; '('Ez @e ')'Ez;) p [operand] (Z<(@p; J@a@p; J>)> (qL @o@p L: Y;) ;) e [expression] (R 13%= ""; T@J|;) J [read a line] (2573TL '> 'TL @J ""=; JZD I @e (A) ": yes" TL: ": no" TL:)} Notice the use of < and > in subroutines q and e. The purpose of < in both cases is to mark the beginning of the substring being parsed when several options are possible. In subroutine e, the fragment Z<(@p;J@a@p;J>)> could have been coded as qL(@p;YqL@a@p; Y)L. The advantage of using < and > is that the PDL is not altered in any way since the action of both operators is limited exclusively to the workspace. The next example shows the use of w and f to initialize to 0 all bytes in a memory block created on the PDL by the operator c: [creates a block and fills it with zeros] (c m p w 0% (f:;) w n;) 0 This subroutine requires an argument on the PDL for c (the length of the block to be created); when done, the top PDL argument is the block's starting address. Observe that the operator combination pw turns the top argument into a workspace; this is one way to handle multiple workspaces. The following subroutine leaves an 8 byte argument starting from the argument provided to it; if this argument is over 8 bytes long, it truncates it to the first 8; otherwise the predicates a, e and f are used to pad it with blanks on the right: [truncate or pad with blanks to 8 bytes] (I< (8azZD; e ' '(f:;) ;_) JQD>;) 8 The operator _ performs an unconditional return to CP/M, the operating system; this is included as a pretection in case e turns out to be false (which would mean there is no space in the workspace to carry out the operation).