home *** CD-ROM | disk | FTP | other *** search
- CP/M Assembly Language
- Part VII: Filter Programs
- by Eric Meyer
-
- Last time we put together the basic subroutines needed to
- read and write text files.
- Now we'll use these to construct "filter programs": programs
- that read text in, process it in some way, then write the result
- back out.
- One use for such a program is to convert text back and forth
- from Wordstar document (henceforth "DOC") to non-document (plain
- ASCII) form.
-
-
- 1. AND and/or OR
- First we need to introduce a family of 8080 instructions we
- avoided until now: the logical operations ANA (and), ORA (or),
- and XRA (exclusive or), and their immediate cousins ANI, ORI,
- XRI.
- Each operates with the accumulator and another 8-bit value,
- combining them one bit at a time ("bitwise") to produce a result.
- Logical AND, e.g., produces a 1 if both its arguments are 1,
- and a 0 otherwise. That is, 1 and 1 is 1, and anything else (1
- and 0, or 0 and 0) is 0.
- When applied bitwise, you will find for example that 61h AND
- 5Fh is 41h:
-
- 61h = 01100001 binary
- AND 5Fh = 01011111
- ---------------
- 41h = 01000001
-
- Why is this interesting?
- Well, 61h is the ASCII code for the character a, and 41h is
- A. (If you don't have a nice ASCII table with hex/decimal/binary
- values, make or find one.)
- That is, by ANDing it with 5Fh, we have uppercased the
- letter a. Because of the way the ASCII codes are assigned, upper
- and lower case letters all differ only by a single bit (the 6th
- from the right, "bit 5" in assembler speak), and the same trick
- works for all letters.
- Note, of course, that the operation ANI 5FH changes (zeros)
- not only bit 5, but also bit 7, the high (parity) bit.
- (You can also zero just bit 7, by using ANI 7FH, since 7Fh =
- 01111111b.) Remember that much of the difference between a
- WordStar and a plain ASCII file is that Wordstar sets the high
- bit on many characters; so undoing this is part of the task of
- converting between these two formats.
- Logical OR produces a 1 if either argument is 1, and a 0
- otherwise. So 1 or 1, 1 or 0 are both 1, while 0 or 0 is 0.
- Thus you can turn on certain bits, by ORing with a certain
- value. For example, we can undo what we did above:
-
- 41h = 01000001b
- OR 20h = 00100000
- ---------------
- 61h = 01100001
-
- Thus the operation ORI 20h will uppercase a letter.
- Logical XOR produces a 1 if either argument, but not both,
- is 1, and a 0 otherwise.
- We won't have any immediate use for this now, but you might
- note that programmers commonly use XRA A to zero the accumulator,
- since any value XORed with itself gives 0.
- Let's quickly embody this case business in two routines
- which you may find useful: UCASE and LCASE. These respectively
- convert the ASCII value in the accumulator to upper or lower
- case.
-
- UCASE: CPI 'a' LCASE: CPI 'A'
- RC RC
- CPI 'z'+1 CPI 'Z'+1
- RNC RNC
- ANI 5FH ORI 20H
- RET RET
-
- Note that before applying the ANI or ORI operation, we first
- check to make sure the character in A is in fact a letter!
- For example, in UCASE, we simply return if the value is less
- than a or greater than z (not less than z+1). This is because
- ANDing other characters with 5FH could change them in undesirable
- ways. (It would convert - to ^M.)
- The logical operations affect the flags, too: the Z flag
- will be set if the result of the operation is 0, otherwise
- cleared. The C flag will always be cleared. (You will often see
- something like ORA A used just to clear Carry, instead of STC,
- CMC.)
-
-
- 2. The Filter Program
- The basic "filter" program reads a byte of text from an
- input file, processes it in some way, then writes it to an output
- file. It would look something like this:
-
- ;*** FILTER.ASM
- ;*** General Filter Program
- ;
- BDOS EQU 0005H ;basic equates
- FCB1 EQU 005CH
- FCB2 EQU 006CH
- ;
- ORG 0100H ;programs start here
- ;
- START: LXI D,FCB1 ;point to 1st FCB
- ; (source file)
- CALL GCOPEN ;open it for reading
- JC IOERR ;complain if error
- LXI D,FCB2 ;point to 2nd FCB
- ; (destination)
- CALL PCOPEN ;open it for writing
- JC IOERR ;complain if error
- ;
- LOOP: CALL GETCH ;get a character
- JC IOERR ;complain if error
- CPI 1AH ;EOF?
- JZ DONE ;quit if at end of file
- CALL FILTER ;process it in some way
- JMP LOOP ;keep going
- ;
- DONE: CALL PCLOSE ;close the output file
- JC IOERR ;error?
- RET ;all finished
- ;
- IOERR: RET ;error? just quit, for now
- ;
- ;Here is the processing routine
- FILTER: CALL PUTCH ;just write it out, for now
- RET
- ;
- ;*** Be sure to include here the following disk
- ;*** file subroutines from our previous column:
- ;*** GETCH, PUTCH, GCOPEN, PCOPEN, PCLOSE
- ;
- END
-
- If you assemble this as written here, you will have a
- program called FILTER.COM, that will simply make a copy of a disk
- file; i.e., if you say
-
- A>filter oldfile newfile<cr>
-
- and FILTER will read OLDFILE and construct an identical copy
- NEWFILE.
- If you want, you can spruce it up a bit, by adding a signon
- message like FILTER 1.0 (8/19/86) at the START, or an error
- message like I/O ERROR at the IOERR routine. (Use BDOS function 9
- or the SPMSG routine, described in earlier columns.)
- Of course, what we really want is to do something to the
- text enroute. As you can see, you can put any further code you
- want at the location FILTER, which now just writes the character
- out as is. For example, you can add the UCASE routine above, and
- you will have a program that makes an uppercase copy of a file.
-
-
- 3. The WordStar To ASCII Filter
- To get the FILTER program to convert a WordStar DOC to a
- plain ASCII file, you have to know what's in a DOC file.
- We've already said that a lot of characters have their high
- bits set (such as "soft" spaces and returns), so the first thing
- we want to do to them is ANI 7FH to strip that off.
- But there's more than that!
- For example, there's hyphens. WordStar has "soft hyphens",
- which are represented by 1Eh (when not in use) or 1Fh (when in
- use).
- Thus you want to ignore 1Eh, and translate 1Fh to a real
- hyphen. Adding this also to our FILTER routine would produce:
-
- FILTER: ANI 7FH ;strip parity bit
- CPI 1EH ;is it dead soft hyphen?
- RZ ;if so, quit (ignore it)
- CPI 1FH ;is it live soft hyphen?
- JNZ FLT1 ;if not, skip following
- MVI A,'-' ;if so, replace with '-'
- FLT1: CALL PUTCH ;okay, now write it out
- RNC ;return if all clear
- POP H ;ERROR, kill return to
- ; LOOP
- JMP IOERR ;and go here instead
-
- If you use this code above, you will have a FILTER program
- that does a pretty credible job of converting WordStar DOC to
- ASCII files.
- FILTER.COM will take up only 1k on disk, and will be quite
- fast, and much easier to use than the equivalent program in, say,
- MBASIC.
- Of course, you will eventually want to add more processing,
- to suit your taste. For example you may decide you want to ignore
- all the funny control codes like ^S that WordStar uses for
- printer functions, or instead, translate them to the actual
- control codes your printer will need to perform those functions.
- It's your program; you are in control.
-
-
- 4. Buffering Characters
- Now let's consider how you might write another filter
- program to go the other way.
- How often have you encountered files you'd like to edit (and
- reformat) with WordStar, but they're full of hard returns, so you
- can't?
- This is a slightly harder problem.
- You don't just want to turn all hard returns into soft ones,
- because there are places where you want them left hard (like the
- end of a paragraph).
- How can we tell when this is the case?
- No routine will do this perfectly. However, if you can
- assume that paragraphs are always indented (always good
- practice), you can use the following pretty good rule:
-
- A return is the end of a paragraph, and should be left hard,
- if:
-
- (1) the next line is blank;
- (2) the next line begins with a space.
-
- In terms of character values, this means that the next
- character, after this CR and LF, is (1) another CR, or (2) a
- space.
- Notice that what we do with the current character (in this
- case a soft CR) depends on the value of the character after next!
- How can we cope with this?
- We must be able to look ahead and see what's coming, without
- affecting our position in the file: to read characters from the
- source file, but then save them to read again later.
- This can be done by storing them in a special little buffer,
- and modifying our GETCH routine to see if there are any
- characters in this buffer before going to look in the file again.
- Here's the new UNGETC routine, which will "unget" a
- character:
-
- ;Routine to UNGET a character, saving
- ; it for GETCH
- UNGETC: PUSH H ;save registers here
- PUSH D ;(if you don't do this,
- PUSH B ; UNGETCwill be a
- ; hassle to use)
- PUSH PSW ;save the character last
- LDA BUFCNT ;fetch buffer count
- CPI 5 ;already maximal?
- JNC UNG0 ;yes, leave it
- INR A ;no, increase it
- STA BUFCNT ;and put it back
- UNG0: LXI H,UGBUF+3 ;point from next-last
- LXI D,UGBUF+4 ;to last position
- MVI B,4 ;prepare to move 4 bytes
- UNGLP: MOV A,M ;get a byte
- STAX D ;move it up ahead
- DCX H ;back up
- DCX D ;to previous
- DCR B ;count down on B
- JNZ UNGLP ;loop if more to go
- POP PSW ;recover new character
- STA UGBUF ;put it at front of
- ; buffer
- POP B ;restore
- POP D ; the
- POP H ; registers
- RET
- BUFCNT: DB 0 ;count chars in UGBUF
- UGBUF: DS 5 ;room for 5 characters
-
- UNGETC maintains a list of characters read, and put back for
- future use, at UGBUF. The most recently read one is first, the
- oldest last -- BUFCNT holds the count.
- To unget a character, we increment the count, move the
- existing ones ahead to make room, and then put in the new one.
- (Don't try to unget more than the maximum of 5 characters, or the
- earlier ones will disappear into the bit bucket.
- Of course, you could make this value larger if you want.)
- Now what does GETCH have to do?
-
- ;Modified GETCH routine for use with UNGETC
- GETCH: LDA BUFCNT ;check UNGETC buffer
- CPI 0 ;is it empty?
- JZ FGETCH ;if so go read file
- DCR A ;decrease count
- STA BUFCNT ;and put it back
- MOV E,A ;put count (less 1) in E
- MVI D,0 ;now D-E is 16-bit
- ; version
- LXI H,UGBUF ;point to buffer
- DAD D ;now HL points to eldest
- ; character
- MOV A,M ;get it
- STC
- CMC ;clear C flag
- RET ;and return
- FGETCH: .... ;put the old GETCH here
-
- If there are characters in the UGBUF buffer, we decrement
- the count, then fetch the oldest one and return with it; if the
- buffer is empty, we just go ahead and do the usual read from the
- file.
-
-
- 5. The ASCII to WordStar Filter
- If you will add UNGETC, and make the above changes to GETCH,
- we can now get the FILTER program to "soften" CRs more or less
- properly. The processing routine will look like this:
-
- FILTER: CPI 0DH ;is it a CR?
- JNZ FLT1 ;no, just go on
- CALL GETCH ;get the next char (LF?)
- JC FLTERR ;error?
- MOV D,A ;and save it
- CALL GETCH ;once more we want this one
- JC FLTERR ;error?
- MOV E,A ;save it too
- MOV A,D ;recover the first
- CALL UNGETC ;unget it
- MOV A,E ;now the second
- CALL UNGETC ;unget it too
- MOV A,E ;okay, here it is
- CPI 0DH ;here goes: is it a CR?
- JZ FLTH ;yes, make current CR HARD
- CPI ' ' ;or a space?
- JZ FLTH ;yes, HARD again
- FLTS: MVI A,8DH ;no, use a SOFT CR here
- JMP FLT1
- FLTH: MVI A,0DH ;use a HARD CR
- FLT1: CALL PUTCH ;write the char out
- RNC ;return if all clear
- FLTERR: POP H ;ERROR, kill return to LOOP
- JMP IOERR ;and go here, instead
-
- If we've read a CR, and the character after next (the second
- LOOK ahead) is a space or CR, we write a hard CR; otherwise, it
- gets softened. Other characters go through unaffected.
- This is the central task in creating DOC files from ASCII
- files. Of course you can do as much more as you want: e.g.,
- soften hyphens if they occur at the end of a line (before a CR).
- It's all up to you.
-
-
- 6. Other Applications
- You probably will be able to think of other filtering tasks
- as well.
- One possibility is communication with various mainframe
- computers, which have differing requirements for text formats.
- Another is encrypting and decrypting text, using anything from a
- simple substitution cipher on up.
- And if you eliminate the output file routines, you can turn
- the FILTER program into a simple SEARCH program that just reads
- through a disk file: perhaps counting words, or looking for a
- particular string and printing out every line that contains it.
- You will find that the resulting program is remarkably
- compact and fast.
- If you want to make it even more efficient, you can try your
- hand at increasing the buffering of the GETCH and PUTCH routines.
- As they stand they use a simple 128-byte DMA, which means
- your computer will have to alternately read data from the source,
- and write to the destination, in small pieces (the BDOS does its
- own buffering, in units of "blocks", usually from 1K to 4K in
- size).
- You can speed all this up if you use buffers larger than
- this; 16K apiece would be a good choice. This would require
- increasing the GCDMA and PCDMA buffers from 128 bytes to 16*1024
- bytes, and modifying the read/write code in GETCH and PUTCH to do
- the whole 16K a record at a time, stepping the DMA address along
- in 128-byte increments. (An exercise for the stout-hearted
- reader.)
-
-
- 7. Coming Up
- Next time we'll learn how to input and output numbers.