home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.barnyard.co.uk
/
2015.02.ftp.barnyard.co.uk.tar
/
ftp.barnyard.co.uk
/
cpm
/
walnut-creek-CDROM
/
JSAGE
/
ZSUS
/
PROGPACK
/
MEYERTUT.LBR
/
MEYER07.TZT
/
MEYER07.TXT
Wrap
Text File
|
2000-06-30
|
14KB
|
368 lines
CP/M Assembly Language
Part VII: Filter Programs
by Eric Meyer
Last time we put together the basic subroutines needed to
read and write text files.
Now we'll use these to construct "filter programs": programs
that read text in, process it in some way, then write the result
back out.
One use for such a program is to convert text back and forth
from Wordstar document (henceforth "DOC") to non-document (plain
ASCII) form.
1. AND and/or OR
First we need to introduce a family of 8080 instructions we
avoided until now: the logical operations ANA (and), ORA (or),
and XRA (exclusive or), and their immediate cousins ANI, ORI,
XRI.
Each operates with the accumulator and another 8-bit value,
combining them one bit at a time ("bitwise") to produce a result.
Logical AND, e.g., produces a 1 if both its arguments are 1,
and a 0 otherwise. That is, 1 and 1 is 1, and anything else (1
and 0, or 0 and 0) is 0.
When applied bitwise, you will find for example that 61h AND
5Fh is 41h:
61h = 01100001 binary
AND 5Fh = 01011111
---------------
41h = 01000001
Why is this interesting?
Well, 61h is the ASCII code for the character a, and 41h is
A. (If you don't have a nice ASCII table with hex/decimal/binary
values, make or find one.)
That is, by ANDing it with 5Fh, we have uppercased the
letter a. Because of the way the ASCII codes are assigned, upper
and lower case letters all differ only by a single bit (the 6th
from the right, "bit 5" in assembler speak), and the same trick
works for all letters.
Note, of course, that the operation ANI 5FH changes (zeros)
not only bit 5, but also bit 7, the high (parity) bit.
(You can also zero just bit 7, by using ANI 7FH, since 7Fh =
01111111b.) Remember that much of the difference between a
WordStar and a plain ASCII file is that Wordstar sets the high
bit on many characters; so undoing this is part of the task of
converting between these two formats.
Logical OR produces a 1 if either argument is 1, and a 0
otherwise. So 1 or 1, 1 or 0 are both 1, while 0 or 0 is 0.
Thus you can turn on certain bits, by ORing with a certain
value. For example, we can undo what we did above:
41h = 01000001b
OR 20h = 00100000
---------------
61h = 01100001
Thus the operation ORI 20h will uppercase a letter.
Logical XOR produces a 1 if either argument, but not both,
is 1, and a 0 otherwise.
We won't have any immediate use for this now, but you might
note that programmers commonly use XRA A to zero the accumulator,
since any value XORed with itself gives 0.
Let's quickly embody this case business in two routines
which you may find useful: UCASE and LCASE. These respectively
convert the ASCII value in the accumulator to upper or lower
case.
UCASE: CPI 'a' LCASE: CPI 'A'
RC RC
CPI 'z'+1 CPI 'Z'+1
RNC RNC
ANI 5FH ORI 20H
RET RET
Note that before applying the ANI or ORI operation, we first
check to make sure the character in A is in fact a letter!
For example, in UCASE, we simply return if the value is less
than a or greater than z (not less than z+1). This is because
ANDing other characters with 5FH could change them in undesirable
ways. (It would convert - to ^M.)
The logical operations affect the flags, too: the Z flag
will be set if the result of the operation is 0, otherwise
cleared. The C flag will always be cleared. (You will often see
something like ORA A used just to clear Carry, instead of STC,
CMC.)
2. The Filter Program
The basic "filter" program reads a byte of text from an
input file, processes it in some way, then writes it to an output
file. It would look something like this:
; *** FILTER.ASM
; *** General Filter Program
;
BDOS EQU 0005H ; Basic equates
FCB1 EQU 005CH
FCB2 EQU 006CH
;
ORG 0100H ; Programs start here
;
START: LXI D,FCB1 ; Point to 1st FCB (source file)
CALL GCOPEN ; Open it for reading
JC IOERR ; Complain if error
LXI D,FCB2 ; Point to 2nd FCB (destination)
CALL PCOPEN ; Open it for writing
JC IOERR ; Complain if error
;
LOOP: CALL GETCH ; Get a character
JC IOERR ; Complain if error
CPI 1AH ; EOF?
JZ DONE ; Quit if at end of file
CALL FILTER ; Process it in some way
JMP LOOP ; Keep going
;
DONE: CALL PCLOSE ; Close the output file
JC IOERR ; Error?
RET ; All finished
;
IOERR: RET ; Error? just quit, for now
;
; Here is the processing routine
;
FILTER: CALL PUTCH ; Just write it out, for now
RET
;
; *** Be sure to include here the following disk
; *** file subroutines from our previous column:
; *** GETCH, PUTCH, GCOPEN, PCOPEN, PCLOSE
;
END
If you assemble this as written here, you will have a
program called FILTER.COM, that will simply make a copy of a disk
file i.e., if you say
A>FILTER OLDFILE NEWFILE<ret>
and FILTER will read OLDFILE and construct an identical copy
NEWFILE.
If you want, you can spruce it up a bit, by adding a signon
message like FILTER 1.0 (8/19/86) at the START, or an error
message like I/O ERROR at the IOERR routine. (Use BDOS function 9
or the SPMSG routine, described in earlier columns.)
Of course, what we really want is to do something to the
text enroute. As you can see, you can put any further code you
want at the location FILTER, which now just writes the character
out as is. For example, you can add the UCASE routine above, and
you will have a program that makes an uppercase copy of a file.
3. The WordStar To ASCII Filter
To get the FILTER program to convert a WordStar DOC to a
plain ASCII file, you have to know what's in a DOC file.
We've already said that a lot of characters have their high
bits set (such as "soft" spaces and returns), so the first thing
we want to do to them is ANI 7FH to strip that off.
But there's more than that!
For example, there's hyphens. WordStar has "soft hyphens",
which are represented by 1Eh (when not in use) or 1Fh (when in
use).
Thus you want to ignore 1Eh, and translate 1Fh to a real
hyphen. Adding this also to our FILTER routine would produce:
FILTER: ANI 7FH ; Strip parity bit
CPI 1EH ; Is it dead soft hyphen?
RZ ; If so, quit (ignore it)
CPI 1FH ; Is it live soft hyphen?
JNZ FLT1 ; If not, skip following
MVI A,'-' ; If so, replace with '-'
FLT1: CALL PUTCH ; Okay, now write it out
RNC ; Return if all clear
POP H ; ERROR, kill return to loop
JMP IOERR ; And go here instead
If you use this code above, you will have a FILTER program
that does a pretty credible job of converting WordStar DOC to
ASCII files.
FILTER.COM will take up only 1k on disk, and will be quite
fast, and much easier to use than the equivalent program in, say,
MBASIC.
Of course, you will eventually want to add more processing,
to suit your taste. For example you may decide you want to ignore
all the funny control codes like ^S that WordStar uses for
printer functions, or instead, translate them to the actual
control codes your printer will need to perform those functions.
It's your program; you are in control.
4. Buffering Characters
Now let's consider how you might write another filter
program to go the other way.
How often have you encountered files you'd like to edit (and
reformat) with WordStar, but they're full of hard returns, so you
can't?
This is a slightly harder problem.
You don't just want to turn all hard returns into soft ones,
because there are places where you want them left hard (like the
end of a paragraph).
How can we tell when this is the case?
No routine will do this perfectly. However, if you can
assume that paragraphs are always indented (always good
practice), you can use the following pretty good rule:
A return is the end of a paragraph, and should be left hard,
if:
(1) the next line is blank;
(2) the next line begins with a space.
In terms of character values, this means that the next
character, after this CR and LF, is (1) another CR, or (2) a
space.
Notice that what we do with the current character (in this
case a soft CR) depends on the value of the character after next!
How can we cope with this?
We must be able to look ahead and see what's coming, without
affecting our position in the file: to read characters from the
source file, but then save them to read again later.
This can be done by storing them in a special little buffer,
and modifying our GETCH routine to see if there are any
characters in this buffer before going to look in the file again.
Here's the new UNGETC routine, which will "unget" a
character:
; Routine to UNGET a character, saving it for GETCH
;
UNGETC: PUSH H ; Save registers here
PUSH D ; (if you don't do this, UNGETC
PUSH B ; will be a hassle to use)
PUSH PSW ; Save the character last
LDA BUFCNT ; Fetch buffer count
CPI 5 ; Already maximal?
JNC UNG0 ; Yes, leave it
INR A ; No, increase it
STA BUFCNT ; And put it back
;
UNG0: LXI H,UGBUF+3 ; Point from next-last
LXI D,UGBUF+4 ; To last position
MVI B,4 ; Prepare to move 4 bytes
;
UNGLP: MOV A,M ; Get a byte
STAX D ; Move it up ahead
DCX H ; Back up
DCX D ; To previous
DCR B ; Count down on B
JNZ UNGLP ; Loop if more to go
POP PSW ; Recover new character
STA UGBUF ; Put it at front of buffer
POP B ; Restore the registers
POP D
POP H
RET
;
BUFCNT: DB 0 ; Count chars in UGBUF
UGBUF: DS 5 ; Room for 5 characters
UNGETC maintains a list of characters read, and put back for
future use, at UGBUF. The most recently read one is first, the
oldest last -- BUFCNT holds the count.
To unget a character, we increment the count, move the
existing ones ahead to make room, and then put in the new one.
(Don't try to unget more than the maximum of 5 characters, or the
earlier ones will disappear into the bit bucket.
Of course, you could make this value larger if you want.)
Now what does GETCH have to do?
; Modified GETCH routine for use with UNGETC
;
GETCH: LDA BUFCNT ; Check UNGETC buffer
ORA A ; Is it empty?
JZ FGETCH ; If so go read file
DCR A ; Decrease count
STA BUFCNT ; And put it back
MOV E,A ; Put count (less 1) in E
MVI D,0 ; Now D-E is 16-bit version
LXI H,UGBUF ; Point to buffer
DAD D ; Now HL points to eldest char.
MOV A,M ; Get it
STC
CMC ; Clear C flag
RET ; And return
;
FGETCH: .... ; Put the old GETCH here
If there are characters in the UGBUF buffer, we decrement
the count, then fetch the oldest one and return with it; if the
buffer is empty, we just go ahead and do the usual read from the
file.
5. The ASCII to WordStar Filter
If you will add UNGETC, and make the above changes to GETCH,
we can now get the FILTER program to "soften" CRs more or less
properly. The processing routine will look like this:
FILTER: CPI 0DH ; Is it a CR?
JNZ FLT1 ; No, just go on
CALL GETCH ; Get the next char (LF?)
JC FLTERR ; Error?
MOV D,A ; And save it
CALL GETCH ; Once more we want this one
JC FLTERR ; Error?
MOV E,A ; Save it too
MOV A,D ; Recover the first
CALL UNGETC ; Unget it
MOV A,E ; Now the second
CALL UNGETC ; Unget it too
MOV A,E ; Okay, here it is
CPI 0DH ; Here goes: is it a CR?
JZ FLTH ; Yes, make current CR HARD
CPI ' ' ; Or a space?
JZ FLTH ; Yes, HARD again
;
FLTS: MVI A,8DH ; No, use a SOFT CR here
JMP FLT1
;
FLTH: MVI A,0DH ; Use a HARD CR
;
FLT1: CALL PUTCH ; Write the char out
RNC ; Return if all clear
;
FLTERR: POP H ; ERROR, kill return to LOOP
JMP IOERR ; And go here, instead
If we've read a CR, and the character after next (the second
LOOK ahead) is a space or CR, we write a hard CR; otherwise, it
gets softened. Other characters go through unaffected.
This is the central task in creating DOC files from ASCII
files. Of course you can do as much more as you want: e.g.,
soften hyphens if they occur at the end of a line (before a CR).
It's all up to you.
6. Other Applications
You probably will be able to think of other filtering tasks
as well.
One possibility is communication with various mainframe
computers, which have differing requirements for text formats.
Another is encrypting and decrypting text, using anything from a
simple substitution cipher on up.
And if you eliminate the output file routines, you can turn
the FILTER program into a simple SEARCH program that just reads
through a disk file: perhaps counting words, or looking for a
particular string and printing out every line that contains it.
You will find that the resulting program is remarkably
compact and fast.
If you want to make it even more efficient, you can try your
hand at increasing the buffering of the GETCH and PUTCH routines.
As they stand they use a simple 128-byte DMA, which means
your computer will have to alternately read data from the source,
and write to the destination, in small pieces (the BDOS does its
own buffering, in units of "blocks", usually from 1K to 4K in
size).
You can speed all this up if you use buffers larger than
this; 16K apiece would be a good choice. This would require
increasing the GCDMA and PCDMA buffers from 128 bytes to 16*1024
bytes, and modifying the read/write code in GETCH and PUTCH to do
the whole 16K a record at a time, stepping the DMA address along
in 128-byte increments. (An exercise for the stout-hearted
reader.)
7. Coming Up
Next time we'll learn how to input and output numbers.