home *** CD-ROM | disk | FTP | other *** search
-
-
-
- Chapter 19 - Strings 199
- ____________________
-
- By this time you may have become annoyed by the fact that there
- is no instruction for moving data from one place in memory to
- another. That is, you can't have:
-
- mov variable2, variable1
-
- Instead you have to have:
-
- mov ax, variable1
- mov variable2, ax
-
- There is one instruction, however, where you can move a BLOCK of
- data from one place in memory to another. It is called MOVS. As
- usual with these instructions, there are two forms.
-
- movsb
-
- moves a byte from DS:SI to ES:DI and increments or decrements SI
- and DI by one, depending on the setting of DF, the direction
- flag. Notice that either both are incremented or both are
- decremented. You can't have one pointer incrementing while the
- other one is decrementing.
-
- movsw
-
- moves a word from DS:SI to ES:DI and increments or decrements SI
- and DI by two, depending on the setting of DF, the direction
- flag. This requires the same amount of setup as all the other
- routines we have looked at so far, so it is not efficient to use
- it for just a few bytes. For 30 or so, it is very efficient. This
- has the equivalent effect (except for changing DI and SI) as:
-
- mov WORD PTR es:[di], ds:[si] ; or BYTE PTR for bytes
-
- We are going to write some subroutines which copy strings from
- one place in memory to another.{1} But first we need to review
- text strings.
-
- The text string world is divided into Pascal and C. A Pascal
- string has its length in the first byte and the first character
- in the second byte. Since the length is in one byte, the string
- length may only be 0 to 255. You read the first byte of the
- string to get the length.
-
- A C string can have any length. The end of a C string is marked
- by a byte with the value 0d. This is not the character '0', it is
- the number 0 (0hex). In order to find the end of a C string, you
- need to check each character to see if it is 0h.
-
- We'll do the Pascal string first. We are going to pass the
- ____________________
-
- 1. For the technically minded, these routines will be only
- half of a real life subroutine, since they assume that the two
- strings do not overlap. In robust subroutines, these routines
- would be the section for when the destination address is lower
- than the source address.
-
-
-
-
- The PC Assembler Tutor 200
- ______________________
-
- addresses of the strings. If you have the Pascal call:
-
- move_pascal_string (from_string, to_string) ;
-
- The first thing you need to know is that Pascal pushes things on
- the stack from left to right. In other words, Pascal will
- generate the following code:
-
- lea ax, from_string
- push ax
- lea ax, to_string
- push ax
- call move_pascal_string
-
- We will start by assuming both near data (all data is in DS), and
- near subroutines. After setting up BP, the stack will look like
- this:{2}
-
- from_string address bp + 6
- to_string address bp + 4
- old IP bp + 2
- bp -> old BP bp + 0
-
- Here's the subroutine. Remember, PUSHREGS and POPREGS are macros:
-
- ; ----------
- move_pascal_string proc near
-
- FROM_PTR EQU [bp+6]
- TO_PTR EQU [bp+4]
-
- push bp ; set up bp
- mov bp, sp
- pushf ; push the flags
- PUSHREGS cx, si, di, es ; push the registers
- push ds ; move ds to es
- pop es
-
- mov si, FROM_PTR ; load pointers
- mov di, TO_PTR
- cld ; clear DF (increment)
-
- sub cx, cx ; zero cx
- mov cl, [si] ; length to cl
- inc cx ; increment count by one
-
- rep movsb ; the actual move
-
- POPREGS cx, si, di, es
- popf ; pop the flags
- pop bp
- ret (4) ; pop pointers and return
-
- ____________________
-
- 2. If you forgot about BP, go back to the chapter on
- subroutines.
-
-
-
-
- Chapter 19 - Strings 201
- ____________________
-
- move_pascal_string endp
- ; ----------
-
- The count is increased by one since we need to move not only the
- text, but the count itself. If the length is 0, we still need to
- move one byte - the count byte. The value in DS is moved to ES
- with a PUSH and a POP. You cannot move directly from one segment
- register to another. Also, at the return we POP 4 bytes (2 words)
- to get the pointers off the stack. Remember, in Pascal, it is the
- subroutine's responsibility to get rid of the arguments from a
- subroutine call.
-
- You will notice that this time we saved the flags register. Why?
- Because we are clearing DF. When we return from the subroutine,
- we want DF to be exactly the same as it was on entry to the
- subroutine. The calling program may have DF set in some special
- way and we don't want to interfere with that.
-
- There are three flags which I will call 'hard' flags. Once they
- are set they do not change. These are (1) TF, the trap flag, (2)
- IEF, the interrupt enable flag, and (3) DF, the direction flag.
- The 'soft' flags are CF, OF, ZF, etc. If you call a subroutine
- you expect CF, OF, ZF etc. to be unreliable, but you expect these
- three 'hard' flags to remain the same. TF is the domain of a
- debugger, so it is none of your business. IEF is only of interest
- to you if you are writing an interrupt procedure.{3} The third
- one, DF, is your concern. If you use DF in a subroutine, you MUST
- save the flags to ensure that the DF flag has the same value at
- the return that it had on entry.
-
-
- Now for the C subroutine. If we have a C subroutine call:
-
- move_c_string ( from_string, to_string ) ;
-
- C pushes things on the stack from right to left (the exact
- opposite of Pascal). The C complier will generate the following
- code.
-
- lea ax, to_string
- push ax
- lea ax, from_string
- push ax
- call move_pascal_string
- add sp, 4
-
- After setting up BP, the stack will look like this:
-
- to_string address bp + 6
- from_string address bp + 4
- old IP bp + 2
- bp -> old BP bp + 0
- ____________________
-
- 3. If you do an interrupt procedure you don't have to worry
- because INT automatically saves the flags while clearing IEF, and
- IRET restores the flags on exiting.
-
-
-
-
- The PC Assembler Tutor 202
- ______________________
-
-
- Here's the C subroutine:
-
- ; ----------
- move_c_string proc near
-
- FROM_PTR EQU [bp+4]
- TO_PTR EQU [bp+6]
-
- push bp ; set up bp
- mov bp, sp
- pushf ; push the flags
- PUSHREGS ax, si, di, es ; push the registers
- push ds ; move ds to es
- pop es
-
- mov si, FROM_PTR ; load pointers
- mov di, TO_PTR
- cld ; clear DF (increment)
-
- move_loop:
- lodsb ; source to al
- stosb ; al to destination
- and al, al ; check for 0
- jnz move_loop
-
-
- POPREGS ax, si, di, es
- popf ; pop the flags
- pop bp
- ret
-
- move_c_string endp
- ; ----------
-
-
- We set up the routine the same way, but we cannot use MOVSB. We
- need to check each individual byte to see if it is 0 hex, so we
- move it to AL, move it from AL to the destination, and then check
- AL for 0. Also note that we did not pop the addresses off the
- stack with the return statement, since in C it is the calling
- program's responsibility to do that. If you look at the calling
- code above, you will see:
-
- add sp, 4
-
- which gets rid of the two pointers from the stack. Remember, the
- stack grows downward, so you ADD to decrease the size of the
- stack.
-
- Let's do the same Pascal program again, but this time use long
- pointers, that is, give both the segment and offset of the
- string. This means that we will be able to move from any place in
- memory to any place in memory. Here is the calling code.
-
-
- mov ax, segment from_string
-
-
-
-
- Chapter 19 - Strings 203
- ____________________
-
- push ax
- mov ax, offset from_string
- push ax
- mov ax, segment to_string
- push ax
- mov ax, offest to_string
- push ax
- call move_pascal_string
-
- We will still keep it a near subroutine. After setting up BP, the
- stack will look like this:
-
- from_string segment bp + 10
- from_string offset bp + 8
- to_string segment bp + 6
- to_string offset bp + 4
- old IP bp + 2
- bp -> old BP bp + 0
-
- Here's the subroutine:
-
- ; ----------
- move_pascal_string proc near
-
- FROM_PTR EQU [bp+8]
- TO_PTR EQU [bp+4]
-
- push bp ; set up bp
- mov bp, sp
- pushf ; push the flags
- PUSHREGS cx, si, di, ds, es ; push the registers
-
- lds si, FROM_PTR ; load pointers
- les di, TO_PTR
- cld ; clear DF (increment)
-
- sub cx, cx ; zero cx
- mov cl, [si] ; length to cl
- inc cx ; increment count by one
-
- rep movsb ; the actual move
-
- POPREGS cx, si, di, ds, es
- popf ; pop the flags
- pop bp
- ret (8) ; pop pointers and return
-
- move_pascal_string endp
- ; ----------
-
- This takes slightly less code since we load SI and DS at the same
- time (with LDS -load DS) and we load DI and ES at the same time
- (with LES - load ES). Remember, 8086 instructions which move an
- offset:segment pair always have the offset in low memory and the
- segment in high memory; the offset is the first two bytes and the
- segment is the next two bytes.
-
-
-
-
-
- The PC Assembler Tutor 204
- ______________________
-
- We changed the EQU statements, and the return statement is now:
-
- ret (8)
-
- so we take 8 bytes (4 words) off the stack, but the rest is the
- same.
-
-
- CMPS
-
- The final instruction in this group is CMPS, and as usual, it
- comes in two varieties.
-
- cmpsb
-
- compares the byte addressed by DS:SI to the byte addressed by
- ES:DI. It is the same as the CMP instruction. It moves both bytes
- into the 8086, subtracts the DI byte from the SI byte and sets
- the flags. The two bytes in memory remain unchanged. You can look
- at the flags to see which byte is larger, or if they are equal.
- As usual, both SI and DI are incremented or decremented by one,
- depending on the setting of DF, the direction flag.
-
- cmpsw
-
- compares the word addressed by DS:SI to the word addressed by
- ES:DI. It is the same as the CMP instruction. It moves both words
- into the 8086, subtracts the DI word from the SI word and sets
- the flags. The two words in memory remain unchanged. You can then
- look at the flags to see which word is larger, or if they are
- equal. Both SI and DI are incremented or decremented by two,
- depending on the setting of DF, the direction flag. This
- instruction has the same effect on the flags as:
-
- push ax
- mov ax, ds:[si] ; or AL for bytes
- cmp ax, es:[di] ; performs ( DS:[si] - ES:[DI] )
- pop ax
-
-
- What use is this instruction? It is possible to use this for word
- find, and we will do that later, but it is a little
- unsophisticated for that. It is great for data verification,
- however.
-
- When you use the DISKCOMP utility in DOS which compares two
- floppy disks, it reads each of the disks sector by sector, and
- then compares them. A sector is 512 bytes. The code for this
- utility looks like this:
-
-
- ; - - - - - DATA - - - - -
- error_message db "Sectors are not the same", 0
- disk1_buffer db 512 dup (?)
- disk2_buffer db 512 dup (?)
-
- ; - - - - - CODE - - - - -
-
-
-
-
- Chapter 19 - Strings 205
- ____________________
-
-
- get_next_sector:
-
- ; the code for reading one sector from each disk goes here.
- ; then we have the code to compare the two sets of data.
-
- mov si, offset disk1_buffer
- mov di, offset disk2_buffer
-
- mov cx, 256 ; 512 / 2 = 256
- repe cmpsw
- je get_next_sector
-
- lea ax, error_message ; we had an unequal comparison
- call print_string
- jmp get_next_sector
-
- ; - - - - - - - - - -
-
-
- We do a word compare since it takes only half as many steps. If
- there is an unequal comparison at any time, the REPE instruction
- will terminate the loop. We can test for this inequality with JE
- or JNE. In this example we assume that DS and ES have the same
- segment address.
-
- Any time you need to verify data, this is the instruction to use.
-
- We are going to build a word search program. It is not very
- valuable since 'a' will not match 'A', but it is a good exercise
- to look at CMPS. We will use ch1str.obj, the file we used at the
- beginning of the chapter, as the text file and you can try to
- find individual words in the file. Remember, the file is
- continuous characters (no spaces), and all characters are small.
- If you didn't save the file length, you will have to run that
- program again to find the length of the file.
-
- Here's the word_search program:
-
- ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE
- EXTRN ch1str:BYTE
- entry_banner db 13,10, "Enter a word for a word search", 0
- no_match_banner db "There was no match", 0
- input_buffer db 80 dup (?)
- letter_count dw ?
- ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE
-
- ; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE
- mov ax, seg ch1str ; load es register
- mov es, ax
- cld ; clear DF (increment)
-
- big_loop:
- ; get a word for the word search
- mov ax, offset entry_banner
- call print_string
- mov ax, offset input_buffer
-
-
-
-
- The PC Assembler Tutor 206
- ______________________
-
- call get_string
-
- ; find the end of string
- mov al, 0 ; compare with 0
- mov bx, offset input_buffer
- mov cx, 0 ; letter count
- letter_count_loop:
- cmp al, [bx] ; compare to 0
- je end_of_count_loop
- inc cx ; increment count
- inc bx ; increment pointer
- jmp letter_count_loop
- end_of_count_loop:
- jcxz big_loop ; if cx = 0, string is empty so redo
- mov letter_count, cx ; store our count
-
- ; look for word match
- mov di, offset ch1str
- mov cx, $$$$ ; $$$$ = length of ch1str
- sub cx, letter_count ; calculate last possible match
-
- word_search_loop:
- push di ; start of search
- push cx ; count for ch1str
- mov si, offset input_buffer
- mov cx, letter_count
- repe cmpsb ; the actual comparison
- je found_it ; if equal, we have a match
-
- ; no match. are we finished?
- pop cx
- pop di
- inc di ; move to next starting address
- loop word_search_loop
-
- ; we fell through. finished, but no match
- mov ax, offset no_match_banner
- call print_string
- jmp big_loop
-
- found_it:
- pop cx ; clear cx off the stack
- pop di ; start of the match
- mov si, offset input_buffer
- mov cx, 25 ; move 25 characters to buffer
- transfer_loop:
- mov al, es:[di]
- mov [si], al
- inc si
- inc di
- loop transfer_loop
- mov BYTE PTR [si], 0 ; end of a C string
-
- mov ax, offset input_buffer
- call print_string
- jmp big_loop
- ; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE
-
-
-
-
- Chapter 19 - Strings 207
- ____________________
-
-
-
- The code is so long that the whole assembler file has been put on
- disk so you don't have to do all the typing. The pathname is
- \XTRAFILE\COMPARE.ASM. All you need to do is enter the length of
- ch1str in the MOV instruction where the dollar signs are:
-
- mov cx, $$$$ ; $$$$ = length of ch1str
-
- Link with 'link compare+ch1str+\asmhelp'. You enter a text
- string and the program looks for an exact match in ch1str. Here
- is how the program is structured.
-
- First, the program prompts you to enter a string. The program
- then counts the number of bytes in the string. It must have a
- non-zero length or the program will prompt you again for a
- string. The program then starts at the beginning of the text. It
- saves a copy of the pointer to the start of the comparison so if
- we fail we can start over again at the next character. The actual
- comparison is:
-
- repe cmpsb
-
- If that makes it through all the letters in the search string,
- REPE will quit because CX = 0, not because we have an unequal
- character. If the comparison failed we pop DI (the text pointer)
- and start at the next character.
-
- If there is a match, we move 25 characters (starting with the
- matching characters) from the text to the buffer. It is necessary
- to move these because when you call print_string, the string must
- be in the DATASTUFF segment, and ch1str isn't. We haven't used
- MOVSB here because ES and DS are in the wrong place. For 25
- characters there is only a marginal advantage to setting up for
- MOVS. Finally, the 25 characters are printed. If there is no
- match, a message to that effect is printed.
-
- The text in ch1str is the first draft of chapter 1, but just for
- interest, I have hidden eight C keywords and eight of your
- favorite Middle English words in the text.{4} See if you can
- find them.
-
-
- SEGMENT OVERRIDES
-
- Here are the string instructions and the override rules for each
- one.
-
- LODS moves a byte or word from DS:[si] to AL or AX. You may
- use CS:[si], SS:[si] or ES:[si].
-
- STOS moves a byte (or a word) from AL (or AX) to ES:[di]. NO
- ____________________
-
- 4. Two hints. You might find four of these Middle English
- words in the name of a boutique. The other four of the Middle
- English words are some of your favorite monosyllabic words.
-
-
-
-
- The PC Assembler Tutor 208
- ______________________
-
- OVERRIDES ARE ALLOWED.
-
- SCAS compares AL (or AX) to the byte (or word) pointed to by
- ES:[di]. NO OVERRIDES ARE ALLOWED.
-
- MOVS moves a byte (or a word) from DS:[si] to ES:[di]. You
- may use CS:[si], SS:[si] or ES:[si], but you MAY NOT
- OVERRIDE ES:[di].
-
- CMPS compares the byte (or a word) from DS:[si] to ES:[di].
- You may use CS:[si], SS:[si] or ES:[si], but you MAY NOT
- OVERRIDE ES:[di].
-
- Looking at the whole group, you may override DS:[si], but you may
- not override ES:[di]. The form of the override is strict. We will
- take MOVS as an example. Till now, the instructions were written:
-
- movsb ; byte move
- movsw ; word move
-
- If you want to do an override, the syntax is:
-
- movs BYTE PTR ES:[di], SS:[si]
- movs WORD PTR ES:[di], SS:[si]
-
- If you write:
-
- movsb ES:[di], SS:[di]
-
- you will get an assembler error. Here are all the legal forms:
-
- LODS
- lodsb
- lodsw
- lods BYTE PTR SS:[si] ; or CS:[si], DS:[si], ES:[si]
- lods WORD PTR SS:[si] ; or CS:[si], DS:[si], ES:[si]
-
- STOS
- stosb
- stosw
- stos BYTE PTR ES:[di] ; no override allowed
- stos WORD PTR ES:[di] ; no override allowed
-
- SCAS
- scasb
- scasw
- scas BYTE PTR ES:[di] ; no override allowed
- scas WORD PTR ES:[di] ; no override allowed
-
- MOVS
- movsb
- movsw
- movs BYTE PTR ES:[di], SS:[si] ;or CS, DS, ES:[si]
- movs WORD PTR ES:[di], SS:[si] ;or CS, DS, ES:[si]
-
- CMPS
- cmpsb
-
-
-
-
- Chapter 19 - Strings 209
- ____________________
-
- cmpsw
- cmps BYTE PTR SS:[si], ES:[di] ;or CS, DS, ES:[si]
- cmps WORD PTR SS:[si], ES:[di] ;or CS, DS, ES:[si]
-
-
- Just because you can do overrides with these instructions doesn't
- mean that you should. In fact, there is a problem. If you are
- using the REP instruction with an override:
-
- rep movs WORD PTR ES:[di], SS:[si]
-
- and the 8086 gets a hardware interrupt,{5} the 8086 forgets the
- override. What this means is that one moment you are moving data
- from the SS segment, and the next moment you are moving data from
- the same offset, but in the DS segment. This just won't do. Thus
- the rule is:
-
- NEVER USE AN OVERRIDE WITH A REP/REPE/REPNE INSTRUCTION
-
- This actually is no hardship. Using the override adds time to the
- instruction. All you need to do is change the segment addresses
- for the duration of the string instruction, and the code will run
- faster. Of course, there is the setup time, but the break even
- point is say, 20 repeats. Here is what you would do if you needed
- an SS segment override:
-
- push ds ; save old DS
- push ss ; move SS to DS
- pop ds ; the same as an SS:[di] override
-
- rep movsb
-
- pop ds ; get old DS back
-
- The other possibility is to use LOOP instead of REP. It is
- slower, but better slower and reliable than faster and
- unreliable.
-
- rep movs BYTE PTR ES:[di], SS:[si]
-
- is the same as:
-
- repeat_loop:
- movs BYTE PTR ES:[di], SS:[si]
- loop repeat_loop
-
- There are even three forms of the LOOP instruction: LOOP, LOOPE,
- LOOPNE which are the exact counterparts to REP, REPE, REPNE.
-
-
-
-
- ____________________
-
- 5. Which can be caused by such rare occurances as your
- pressing a key on the keyboard or one of the 18 timer interrupts
- that happen each second.
-
-
-
-
- The PC Assembler Tutor 210
- ______________________
-
-
- SUMMARY
-
-
-
- LODS (load from string) moves a byte or word from DS:[si] to AL
- or AX, and increments (or decrements) SI depending on the setting
- of DF, the direction flag (by 1 for bytes and by 2 for words).
- You may use CS:[si], SS:[si] or ES:[si]. This performs the same
- action (except for changing SI) as:
-
- mov ax, DS:[SI] ; or AL for bytes
-
- The allowable forms are:
-
- lodsb
- lodsw
- lods BYTE PTR SS:[si] ; or CS:[si], DS:[si], ES:[si]
- lods WORD PTR SS:[si] ; or CS:[si], DS:[si], ES:[si]
-
-
- STOS (store to string) moves a byte (or a word) from AL (or AX)
- to ES:[di], and increments (or decrements) DI depending on the
- setting of DF, the direction flag (by 1 for bytes and by 2 for
- words). NO OVERRIDES ARE ALLOWED. This performs the same action
- (except for changing DI) as:
-
- mov ES:[DI], ax ; or AL for bytes
-
- The allowable forms are:
-
- stosb
- stosw
- stos BYTE PTR ES:[di] ; no override allowed
- stos WORD PTR ES:[di] ; no override allowed
-
-
- SCAS compares AL (or AX) to the byte (or word) pointed to by
- ES:[di], and increments (or decrements) DI depending on the
- setting of DF, the direction flag (by 1 for bytes and by 2 for
- words). NO OVERRIDES ARE ALLOWED. This sets the flags the same
- way as:
-
- cmp ax, ES:[DI] ; or AL for bytes
-
- The allowable forms are:
-
- scasb
- scasw
- scas BYTE PTR ES:[di] ; no override allowed
- scas WORD PTR ES:[di] ; no override allowed
-
-
- MOVS moves a byte (or a word) from DS:[si] to ES:[di], and
- increments (or decrements) SI and DI, depending on the setting of
- DF, the direction flag (by 1 for bytes and by 2 for words). You
- may use CS:[si], SS:[si] or ES:[si], but you MAY NOT OVERRIDE
-
-
-
-
- Chapter 19 - Strings 211
- ____________________
-
- ES:[di]. Though the following is not a legal instruction, it
- signifies the equivalent action to MOVS (not including changing
- DI and SI):
-
- mov WORD PTR ES:[DI], DS:[SI] ; or BYTE PTR for bytes
-
- The allowable forms are:
-
- movsb
- movsw
- movs BYTE PTR ES:[di], SS:[si] ;or CS, DS, ES:[si]
- movs WORD PTR ES:[di], SS:[si] ;or CS, DS, ES:[si]
-
-
- CMPS compares the byte (or a word) at DS:[si] to the one at
- ES:[di], and increments (or decrements) SI and DI, depending on
- the setting of DF, the direction flag (by 1 for bytes and by 2
- for words). You may use CS:[si], SS:[si] or ES:[si], but you MAY
- NOT OVERRIDE ES:[di]. Although the following is not a legal
- action, it signifies the equivalent action to CMPS (not including
- changing DI and SI):
-
- cmp WORD PTR DS:[SI], ES:[DI] ; or BYTE PTR for bytes
-
- The allowable forms are:
-
- cmpsb
- cmpsw
- cmps BYTE PTR SS:[si], ES:[di] ;or CS, DS, ES:[si]
- cmps WORD PTR SS:[si], ES:[di] ;or CS, DS, ES:[si]
-
-
-
-
- The string instructions may be prefixed by REP/REPE/REPNE which
- will repeat the instructions according to the following
- conditions:
-
-
- rep decrement cx ; repeat if cx is not zero
- repe decrement cx ; repeat if cx not zero AND zf = 1
- repz decrement cx ; repeat if cx not zero AND zf = 1
- repne decrement cx ; repeat if cx not zero AND zf = 0
- repnz decrement cx ; repeat if cx not zero AND zf = 0
-
- Here, 'e' stands for equal, 'z' is zero and 'n' is not. These
- repeat instructions should NEVER be used with a segment override,
- since the 8086 will forget the override if a hardware interrupt
- occurs in the middle of the REP loop.
-
-
- 'HARD' FLAGS
-
- IEF, TF and DF are 'hard' flags. Once they are set they remain in
- the same setting. If you use DF, the direction flag, in a
- subroutine, you must save the flags upon entry and restore the
- flags on exiting to make sure that DF has not been altered.
-
-