home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The Unsorted BBS Collection
/
thegreatunsorted.tar
/
thegreatunsorted
/
programming
/
asm_programming
/
CHAP19-2.DOC
< prev
next >
Wrap
Text File
|
1990-08-02
|
34KB
|
833 lines
Chapter 19 - Strings 199
____________________
By this time you may have become annoyed by the fact that there
is no instruction for moving data from one place in memory to
another. That is, you can't have:
mov variable2, variable1
Instead you have to have:
mov ax, variable1
mov variable2, ax
There is one instruction, however, where you can move a BLOCK of
data from one place in memory to another. It is called MOVS. As
usual with these instructions, there are two forms.
movsb
moves a byte from DS:SI to ES:DI and increments or decrements SI
and DI by one, depending on the setting of DF, the direction
flag. Notice that either both are incremented or both are
decremented. You can't have one pointer incrementing while the
other one is decrementing.
movsw
moves a word from DS:SI to ES:DI and increments or decrements SI
and DI by two, depending on the setting of DF, the direction
flag. This requires the same amount of setup as all the other
routines we have looked at so far, so it is not efficient to use
it for just a few bytes. For 30 or so, it is very efficient. This
has the equivalent effect (except for changing DI and SI) as:
mov WORD PTR es:[di], ds:[si] ; or BYTE PTR for bytes
We are going to write some subroutines which copy strings from
one place in memory to another.{1} But first we need to review
text strings.
The text string world is divided into Pascal and C. A Pascal
string has its length in the first byte and the first character
in the second byte. Since the length is in one byte, the string
length may only be 0 to 255. You read the first byte of the
string to get the length.
A C string can have any length. The end of a C string is marked
by a byte with the value 0d. This is not the character '0', it is
the number 0 (0hex). In order to find the end of a C string, you
need to check each character to see if it is 0h.
We'll do the Pascal string first. We are going to pass the
____________________
1. For the technically minded, these routines will be only
half of a real life subroutine, since they assume that the two
strings do not overlap. In robust subroutines, these routines
would be the section for when the destination address is lower
than the source address.
The PC Assembler Tutor 200
______________________
addresses of the strings. If you have the Pascal call:
move_pascal_string (from_string, to_string) ;
The first thing you need to know is that Pascal pushes things on
the stack from left to right. In other words, Pascal will
generate the following code:
lea ax, from_string
push ax
lea ax, to_string
push ax
call move_pascal_string
We will start by assuming both near data (all data is in DS), and
near subroutines. After setting up BP, the stack will look like
this:{2}
from_string address bp + 6
to_string address bp + 4
old IP bp + 2
bp -> old BP bp + 0
Here's the subroutine. Remember, PUSHREGS and POPREGS are macros:
; ----------
move_pascal_string proc near
FROM_PTR EQU [bp+6]
TO_PTR EQU [bp+4]
push bp ; set up bp
mov bp, sp
pushf ; push the flags
PUSHREGS cx, si, di, es ; push the registers
push ds ; move ds to es
pop es
mov si, FROM_PTR ; load pointers
mov di, TO_PTR
cld ; clear DF (increment)
sub cx, cx ; zero cx
mov cl, [si] ; length to cl
inc cx ; increment count by one
rep movsb ; the actual move
POPREGS cx, si, di, es
popf ; pop the flags
pop bp
ret (4) ; pop pointers and return
____________________
2. If you forgot about BP, go back to the chapter on
subroutines.
Chapter 19 - Strings 201
____________________
move_pascal_string endp
; ----------
The count is increased by one since we need to move not only the
text, but the count itself. If the length is 0, we still need to
move one byte - the count byte. The value in DS is moved to ES
with a PUSH and a POP. You cannot move directly from one segment
register to another. Also, at the return we POP 4 bytes (2 words)
to get the pointers off the stack. Remember, in Pascal, it is the
subroutine's responsibility to get rid of the arguments from a
subroutine call.
You will notice that this time we saved the flags register. Why?
Because we are clearing DF. When we return from the subroutine,
we want DF to be exactly the same as it was on entry to the
subroutine. The calling program may have DF set in some special
way and we don't want to interfere with that.
There are three flags which I will call 'hard' flags. Once they
are set they do not change. These are (1) TF, the trap flag, (2)
IEF, the interrupt enable flag, and (3) DF, the direction flag.
The 'soft' flags are CF, OF, ZF, etc. If you call a subroutine
you expect CF, OF, ZF etc. to be unreliable, but you expect these
three 'hard' flags to remain the same. TF is the domain of a
debugger, so it is none of your business. IEF is only of interest
to you if you are writing an interrupt procedure.{3} The third
one, DF, is your concern. If you use DF in a subroutine, you MUST
save the flags to ensure that the DF flag has the same value at
the return that it had on entry.
Now for the C subroutine. If we have a C subroutine call:
move_c_string ( from_string, to_string ) ;
C pushes things on the stack from right to left (the exact
opposite of Pascal). The C complier will generate the following
code.
lea ax, to_string
push ax
lea ax, from_string
push ax
call move_pascal_string
add sp, 4
After setting up BP, the stack will look like this:
to_string address bp + 6
from_string address bp + 4
old IP bp + 2
bp -> old BP bp + 0
____________________
3. If you do an interrupt procedure you don't have to worry
because INT automatically saves the flags while clearing IEF, and
IRET restores the flags on exiting.
The PC Assembler Tutor 202
______________________
Here's the C subroutine:
; ----------
move_c_string proc near
FROM_PTR EQU [bp+4]
TO_PTR EQU [bp+6]
push bp ; set up bp
mov bp, sp
pushf ; push the flags
PUSHREGS ax, si, di, es ; push the registers
push ds ; move ds to es
pop es
mov si, FROM_PTR ; load pointers
mov di, TO_PTR
cld ; clear DF (increment)
move_loop:
lodsb ; source to al
stosb ; al to destination
and al, al ; check for 0
jnz move_loop
POPREGS ax, si, di, es
popf ; pop the flags
pop bp
ret
move_c_string endp
; ----------
We set up the routine the same way, but we cannot use MOVSB. We
need to check each individual byte to see if it is 0 hex, so we
move it to AL, move it from AL to the destination, and then check
AL for 0. Also note that we did not pop the addresses off the
stack with the return statement, since in C it is the calling
program's responsibility to do that. If you look at the calling
code above, you will see:
add sp, 4
which gets rid of the two pointers from the stack. Remember, the
stack grows downward, so you ADD to decrease the size of the
stack.
Let's do the same Pascal program again, but this time use long
pointers, that is, give both the segment and offset of the
string. This means that we will be able to move from any place in
memory to any place in memory. Here is the calling code.
mov ax, segment from_string
Chapter 19 - Strings 203
____________________
push ax
mov ax, offset from_string
push ax
mov ax, segment to_string
push ax
mov ax, offest to_string
push ax
call move_pascal_string
We will still keep it a near subroutine. After setting up BP, the
stack will look like this:
from_string segment bp + 10
from_string offset bp + 8
to_string segment bp + 6
to_string offset bp + 4
old IP bp + 2
bp -> old BP bp + 0
Here's the subroutine:
; ----------
move_pascal_string proc near
FROM_PTR EQU [bp+8]
TO_PTR EQU [bp+4]
push bp ; set up bp
mov bp, sp
pushf ; push the flags
PUSHREGS cx, si, di, ds, es ; push the registers
lds si, FROM_PTR ; load pointers
les di, TO_PTR
cld ; clear DF (increment)
sub cx, cx ; zero cx
mov cl, [si] ; length to cl
inc cx ; increment count by one
rep movsb ; the actual move
POPREGS cx, si, di, ds, es
popf ; pop the flags
pop bp
ret (8) ; pop pointers and return
move_pascal_string endp
; ----------
This takes slightly less code since we load SI and DS at the same
time (with LDS -load DS) and we load DI and ES at the same time
(with LES - load ES). Remember, 8086 instructions which move an
offset:segment pair always have the offset in low memory and the
segment in high memory; the offset is the first two bytes and the
segment is the next two bytes.
The PC Assembler Tutor 204
______________________
We changed the EQU statements, and the return statement is now:
ret (8)
so we take 8 bytes (4 words) off the stack, but the rest is the
same.
CMPS
The final instruction in this group is CMPS, and as usual, it
comes in two varieties.
cmpsb
compares the byte addressed by DS:SI to the byte addressed by
ES:DI. It is the same as the CMP instruction. It moves both bytes
into the 8086, subtracts the DI byte from the SI byte and sets
the flags. The two bytes in memory remain unchanged. You can look
at the flags to see which byte is larger, or if they are equal.
As usual, both SI and DI are incremented or decremented by one,
depending on the setting of DF, the direction flag.
cmpsw
compares the word addressed by DS:SI to the word addressed by
ES:DI. It is the same as the CMP instruction. It moves both words
into the 8086, subtracts the DI word from the SI word and sets
the flags. The two words in memory remain unchanged. You can then
look at the flags to see which word is larger, or if they are
equal. Both SI and DI are incremented or decremented by two,
depending on the setting of DF, the direction flag. This
instruction has the same effect on the flags as:
push ax
mov ax, ds:[si] ; or AL for bytes
cmp ax, es:[di] ; performs ( DS:[si] - ES:[DI] )
pop ax
What use is this instruction? It is possible to use this for word
find, and we will do that later, but it is a little
unsophisticated for that. It is great for data verification,
however.
When you use the DISKCOMP utility in DOS which compares two
floppy disks, it reads each of the disks sector by sector, and
then compares them. A sector is 512 bytes. The code for this
utility looks like this:
; - - - - - DATA - - - - -
error_message db "Sectors are not the same", 0
disk1_buffer db 512 dup (?)
disk2_buffer db 512 dup (?)
; - - - - - CODE - - - - -
Chapter 19 - Strings 205
____________________
get_next_sector:
; the code for reading one sector from each disk goes here.
; then we have the code to compare the two sets of data.
mov si, offset disk1_buffer
mov di, offset disk2_buffer
mov cx, 256 ; 512 / 2 = 256
repe cmpsw
je get_next_sector
lea ax, error_message ; we had an unequal comparison
call print_string
jmp get_next_sector
; - - - - - - - - - -
We do a word compare since it takes only half as many steps. If
there is an unequal comparison at any time, the REPE instruction
will terminate the loop. We can test for this inequality with JE
or JNE. In this example we assume that DS and ES have the same
segment address.
Any time you need to verify data, this is the instruction to use.
We are going to build a word search program. It is not very
valuable since 'a' will not match 'A', but it is a good exercise
to look at CMPS. We will use ch1str.obj, the file we used at the
beginning of the chapter, as the text file and you can try to
find individual words in the file. Remember, the file is
continuous characters (no spaces), and all characters are small.
If you didn't save the file length, you will have to run that
program again to find the length of the file.
Here's the word_search program:
; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE
EXTRN ch1str:BYTE
entry_banner db 13,10, "Enter a word for a word search", 0
no_match_banner db "There was no match", 0
input_buffer db 80 dup (?)
letter_count dw ?
; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE
; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE
mov ax, seg ch1str ; load es register
mov es, ax
cld ; clear DF (increment)
big_loop:
; get a word for the word search
mov ax, offset entry_banner
call print_string
mov ax, offset input_buffer
The PC Assembler Tutor 206
______________________
call get_string
; find the end of string
mov al, 0 ; compare with 0
mov bx, offset input_buffer
mov cx, 0 ; letter count
letter_count_loop:
cmp al, [bx] ; compare to 0
je end_of_count_loop
inc cx ; increment count
inc bx ; increment pointer
jmp letter_count_loop
end_of_count_loop:
jcxz big_loop ; if cx = 0, string is empty so redo
mov letter_count, cx ; store our count
; look for word match
mov di, offset ch1str
mov cx, $$$$ ; $$$$ = length of ch1str
sub cx, letter_count ; calculate last possible match
word_search_loop:
push di ; start of search
push cx ; count for ch1str
mov si, offset input_buffer
mov cx, letter_count
repe cmpsb ; the actual comparison
je found_it ; if equal, we have a match
; no match. are we finished?
pop cx
pop di
inc di ; move to next starting address
loop word_search_loop
; we fell through. finished, but no match
mov ax, offset no_match_banner
call print_string
jmp big_loop
found_it:
pop cx ; clear cx off the stack
pop di ; start of the match
mov si, offset input_buffer
mov cx, 25 ; move 25 characters to buffer
transfer_loop:
mov al, es:[di]
mov [si], al
inc si
inc di
loop transfer_loop
mov BYTE PTR [si], 0 ; end of a C string
mov ax, offset input_buffer
call print_string
jmp big_loop
; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE
Chapter 19 - Strings 207
____________________
The code is so long that the whole assembler file has been put on
disk so you don't have to do all the typing. The pathname is
\XTRAFILE\COMPARE.ASM. All you need to do is enter the length of
ch1str in the MOV instruction where the dollar signs are:
mov cx, $$$$ ; $$$$ = length of ch1str
Link with 'link compare+ch1str+\asmhelp'. You enter a text
string and the program looks for an exact match in ch1str. Here
is how the program is structured.
First, the program prompts you to enter a string. The program
then counts the number of bytes in the string. It must have a
non-zero length or the program will prompt you again for a
string. The program then starts at the beginning of the text. It
saves a copy of the pointer to the start of the comparison so if
we fail we can start over again at the next character. The actual
comparison is:
repe cmpsb
If that makes it through all the letters in the search string,
REPE will quit because CX = 0, not because we have an unequal
character. If the comparison failed we pop DI (the text pointer)
and start at the next character.
If there is a match, we move 25 characters (starting with the
matching characters) from the text to the buffer. It is necessary
to move these because when you call print_string, the string must
be in the DATASTUFF segment, and ch1str isn't. We haven't used
MOVSB here because ES and DS are in the wrong place. For 25
characters there is only a marginal advantage to setting up for
MOVS. Finally, the 25 characters are printed. If there is no
match, a message to that effect is printed.
The text in ch1str is the first draft of chapter 1, but just for
interest, I have hidden eight C keywords and eight of your
favorite Middle English words in the text.{4} See if you can
find them.
SEGMENT OVERRIDES
Here are the string instructions and the override rules for each
one.
LODS moves a byte or word from DS:[si] to AL or AX. You may
use CS:[si], SS:[si] or ES:[si].
STOS moves a byte (or a word) from AL (or AX) to ES:[di]. NO
____________________
4. Two hints. You might find four of these Middle English
words in the name of a boutique. The other four of the Middle
English words are some of your favorite monosyllabic words.
The PC Assembler Tutor 208
______________________
OVERRIDES ARE ALLOWED.
SCAS compares AL (or AX) to the byte (or word) pointed to by
ES:[di]. NO OVERRIDES ARE ALLOWED.
MOVS moves a byte (or a word) from DS:[si] to ES:[di]. You
may use CS:[si], SS:[si] or ES:[si], but you MAY NOT
OVERRIDE ES:[di].
CMPS compares the byte (or a word) from DS:[si] to ES:[di].
You may use CS:[si], SS:[si] or ES:[si], but you MAY NOT
OVERRIDE ES:[di].
Looking at the whole group, you may override DS:[si], but you may
not override ES:[di]. The form of the override is strict. We will
take MOVS as an example. Till now, the instructions were written:
movsb ; byte move
movsw ; word move
If you want to do an override, the syntax is:
movs BYTE PTR ES:[di], SS:[si]
movs WORD PTR ES:[di], SS:[si]
If you write:
movsb ES:[di], SS:[di]
you will get an assembler error. Here are all the legal forms:
LODS
lodsb
lodsw
lods BYTE PTR SS:[si] ; or CS:[si], DS:[si], ES:[si]
lods WORD PTR SS:[si] ; or CS:[si], DS:[si], ES:[si]
STOS
stosb
stosw
stos BYTE PTR ES:[di] ; no override allowed
stos WORD PTR ES:[di] ; no override allowed
SCAS
scasb
scasw
scas BYTE PTR ES:[di] ; no override allowed
scas WORD PTR ES:[di] ; no override allowed
MOVS
movsb
movsw
movs BYTE PTR ES:[di], SS:[si] ;or CS, DS, ES:[si]
movs WORD PTR ES:[di], SS:[si] ;or CS, DS, ES:[si]
CMPS
cmpsb
Chapter 19 - Strings 209
____________________
cmpsw
cmps BYTE PTR SS:[si], ES:[di] ;or CS, DS, ES:[si]
cmps WORD PTR SS:[si], ES:[di] ;or CS, DS, ES:[si]
Just because you can do overrides with these instructions doesn't
mean that you should. In fact, there is a problem. If you are
using the REP instruction with an override:
rep movs WORD PTR ES:[di], SS:[si]
and the 8086 gets a hardware interrupt,{5} the 8086 forgets the
override. What this means is that one moment you are moving data
from the SS segment, and the next moment you are moving data from
the same offset, but in the DS segment. This just won't do. Thus
the rule is:
NEVER USE AN OVERRIDE WITH A REP/REPE/REPNE INSTRUCTION
This actually is no hardship. Using the override adds time to the
instruction. All you need to do is change the segment addresses
for the duration of the string instruction, and the code will run
faster. Of course, there is the setup time, but the break even
point is say, 20 repeats. Here is what you would do if you needed
an SS segment override:
push ds ; save old DS
push ss ; move SS to DS
pop ds ; the same as an SS:[di] override
rep movsb
pop ds ; get old DS back
The other possibility is to use LOOP instead of REP. It is
slower, but better slower and reliable than faster and
unreliable.
rep movs BYTE PTR ES:[di], SS:[si]
is the same as:
repeat_loop:
movs BYTE PTR ES:[di], SS:[si]
loop repeat_loop
There are even three forms of the LOOP instruction: LOOP, LOOPE,
LOOPNE which are the exact counterparts to REP, REPE, REPNE.
____________________
5. Which can be caused by such rare occurances as your
pressing a key on the keyboard or one of the 18 timer interrupts
that happen each second.
The PC Assembler Tutor 210
______________________
SUMMARY
LODS (load from string) moves a byte or word from DS:[si] to AL
or AX, and increments (or decrements) SI depending on the setting
of DF, the direction flag (by 1 for bytes and by 2 for words).
You may use CS:[si], SS:[si] or ES:[si]. This performs the same
action (except for changing SI) as:
mov ax, DS:[SI] ; or AL for bytes
The allowable forms are:
lodsb
lodsw
lods BYTE PTR SS:[si] ; or CS:[si], DS:[si], ES:[si]
lods WORD PTR SS:[si] ; or CS:[si], DS:[si], ES:[si]
STOS (store to string) moves a byte (or a word) from AL (or AX)
to ES:[di], and increments (or decrements) DI depending on the
setting of DF, the direction flag (by 1 for bytes and by 2 for
words). NO OVERRIDES ARE ALLOWED. This performs the same action
(except for changing DI) as:
mov ES:[DI], ax ; or AL for bytes
The allowable forms are:
stosb
stosw
stos BYTE PTR ES:[di] ; no override allowed
stos WORD PTR ES:[di] ; no override allowed
SCAS compares AL (or AX) to the byte (or word) pointed to by
ES:[di], and increments (or decrements) DI depending on the
setting of DF, the direction flag (by 1 for bytes and by 2 for
words). NO OVERRIDES ARE ALLOWED. This sets the flags the same
way as:
cmp ax, ES:[DI] ; or AL for bytes
The allowable forms are:
scasb
scasw
scas BYTE PTR ES:[di] ; no override allowed
scas WORD PTR ES:[di] ; no override allowed
MOVS moves a byte (or a word) from DS:[si] to ES:[di], and
increments (or decrements) SI and DI, depending on the setting of
DF, the direction flag (by 1 for bytes and by 2 for words). You
may use CS:[si], SS:[si] or ES:[si], but you MAY NOT OVERRIDE
Chapter 19 - Strings 211
____________________
ES:[di]. Though the following is not a legal instruction, it
signifies the equivalent action to MOVS (not including changing
DI and SI):
mov WORD PTR ES:[DI], DS:[SI] ; or BYTE PTR for bytes
The allowable forms are:
movsb
movsw
movs BYTE PTR ES:[di], SS:[si] ;or CS, DS, ES:[si]
movs WORD PTR ES:[di], SS:[si] ;or CS, DS, ES:[si]
CMPS compares the byte (or a word) at DS:[si] to the one at
ES:[di], and increments (or decrements) SI and DI, depending on
the setting of DF, the direction flag (by 1 for bytes and by 2
for words). You may use CS:[si], SS:[si] or ES:[si], but you MAY
NOT OVERRIDE ES:[di]. Although the following is not a legal
action, it signifies the equivalent action to CMPS (not including
changing DI and SI):
cmp WORD PTR DS:[SI], ES:[DI] ; or BYTE PTR for bytes
The allowable forms are:
cmpsb
cmpsw
cmps BYTE PTR SS:[si], ES:[di] ;or CS, DS, ES:[si]
cmps WORD PTR SS:[si], ES:[di] ;or CS, DS, ES:[si]
The string instructions may be prefixed by REP/REPE/REPNE which
will repeat the instructions according to the following
conditions:
rep decrement cx ; repeat if cx is not zero
repe decrement cx ; repeat if cx not zero AND zf = 1
repz decrement cx ; repeat if cx not zero AND zf = 1
repne decrement cx ; repeat if cx not zero AND zf = 0
repnz decrement cx ; repeat if cx not zero AND zf = 0
Here, 'e' stands for equal, 'z' is zero and 'n' is not. These
repeat instructions should NEVER be used with a segment override,
since the 8086 will forget the override if a hardware interrupt
occurs in the middle of the REP loop.
'HARD' FLAGS
IEF, TF and DF are 'hard' flags. Once they are set they remain in
the same setting. If you use DF, the direction flag, in a
subroutine, you must save the flags upon entry and restore the
flags on exiting to make sure that DF has not been altered.