The Unsorted BBS Collection

home *** CD-ROM | disk | FTP | other *** search

/ The Unsorted BBS Collection / thegreatunsorted.tar / thegreatunsorted / programming / asm_programming / CHAP19-2.DOC < prev next >

Wrap

Text File | 1990-08-02 | 33.3 KB | 833 lines

Chapter 19 - Strings 199 ____________________ By this time you may have become annoyed by the fact that there is no instruction for moving data from one place in memory to another. That is, you can't have: mov variable2, variable1 Instead you have to have: mov ax, variable1 mov variable2, ax There is one instruction, however, where you can move a BLOCK of data from one place in memory to another. It is called MOVS. As usual with these instructions, there are two forms. movsb moves a byte from DS:SI to ES:DI and increments or decrements SI and DI by one, depending on the setting of DF, the direction flag. Notice that either both are incremented or both are decremented. You can't have one pointer incrementing while the other one is decrementing. movsw moves a word from DS:SI to ES:DI and increments or decrements SI and DI by two, depending on the setting of DF, the direction flag. This requires the same amount of setup as all the other routines we have looked at so far, so it is not efficient to use it for just a few bytes. For 30 or so, it is very efficient. This has the equivalent effect (except for changing DI and SI) as: mov WORD PTR es:[di], ds:[si] ; or BYTE PTR for bytes We are going to write some subroutines which copy strings from one place in memory to another.{1} But first we need to review text strings. The text string world is divided into Pascal and C. A Pascal string has its length in the first byte and the first character in the second byte. Since the length is in one byte, the string length may only be 0 to 255. You read the first byte of the string to get the length. A C string can have any length. The end of a C string is marked by a byte with the value 0d. This is not the character '0', it is the number 0 (0hex). In order to find the end of a C string, you need to check each character to see if it is 0h. We'll do the Pascal string first. We are going to pass the ____________________ 1. For the technically minded, these routines will be only half of a real life subroutine, since they assume that the two strings do not overlap. In robust subroutines, these routines would be the section for when the destination address is lower than the source address. The PC Assembler Tutor 200 ______________________ addresses of the strings. If you have the Pascal call: move_pascal_string (from_string, to_string) ; The first thing you need to know is that Pascal pushes things on the stack from left to right. In other words, Pascal will generate the following code: lea ax, from_string push ax lea ax, to_string push ax call move_pascal_string We will start by assuming both near data (all data is in DS), and near subroutines. After setting up BP, the stack will look like this:{2} from_string address bp + 6 to_string address bp + 4 old IP bp + 2 bp -> old BP bp + 0 Here's the subroutine. Remember, PUSHREGS and POPREGS are macros: ; ---------- move_pascal_string proc near FROM_PTR EQU [bp+6] TO_PTR EQU [bp+4] push bp ; set up bp mov bp, sp pushf ; push the flags PUSHREGS cx, si, di, es ; push the registers push ds ; move ds to es pop es mov si, FROM_PTR ; load pointers mov di, TO_PTR cld ; clear DF (increment) sub cx, cx ; zero cx mov cl, [si] ; length to cl inc cx ; increment count by one rep movsb ; the actual move POPREGS cx, si, di, es popf ; pop the flags pop bp ret (4) ; pop pointers and return ____________________ 2. If you forgot about BP, go back to the chapter on subroutines. Chapter 19 - Strings 201 ____________________ move_pascal_string endp ; ---------- The count is increased by one since we need to move not only the text, but the count itself. If the length is 0, we still need to move one byte - the count byte. The value in DS is moved to ES with a PUSH and a POP. You cannot move directly from one segment register to another. Also, at the return we POP 4 bytes (2 words) to get the pointers off the stack. Remember, in Pascal, it is the subroutine's responsibility to get rid of the arguments from a subroutine call. You will notice that this time we saved the flags register. Why? Because we are clearing DF. When we return from the subroutine, we want DF to be exactly the same as it was on entry to the subroutine. The calling program may have DF set in some special way and we don't want to interfere with that. There are three flags which I will call 'hard' flags. Once they are set they do not change. These are (1) TF, the trap flag, (2) IEF, the interrupt enable flag, and (3) DF, the direction flag. The 'soft' flags are CF, OF, ZF, etc. If you call a subroutine you expect CF, OF, ZF etc. to be unreliable, but you expect these three 'hard' flags to remain the same. TF is the domain of a debugger, so it is none of your business. IEF is only of interest to you if you are writing an interrupt procedure.{3} The third one, DF, is your concern. If you use DF in a subroutine, you MUST save the flags to ensure that the DF flag has the same value at the return that it had on entry. Now for the C subroutine. If we have a C subroutine call: move_c_string ( from_string, to_string ) ; C pushes things on the stack from right to left (the exact opposite of Pascal). The C complier will generate the following code. lea ax, to_string push ax lea ax, from_string push ax call move_pascal_string add sp, 4 After setting up BP, the stack will look like this: to_string address bp + 6 from_string address bp + 4 old IP bp + 2 bp -> old BP bp + 0 ____________________ 3. If you do an interrupt procedure you don't have to worry because INT automatically saves the flags while clearing IEF, and IRET restores the flags on exiting. The PC Assembler Tutor 202 ______________________ Here's the C subroutine: ; ---------- move_c_string proc near FROM_PTR EQU [bp+4] TO_PTR EQU [bp+6] push bp ; set up bp mov bp, sp pushf ; push the flags PUSHREGS ax, si, di, es ; push the registers push ds ; move ds to es pop es mov si, FROM_PTR ; load pointers mov di, TO_PTR cld ; clear DF (increment) move_loop: lodsb ; source to al stosb ; al to destination and al, al ; check for 0 jnz move_loop POPREGS ax, si, di, es popf ; pop the flags pop bp ret move_c_string endp ; ---------- We set up the routine the same way, but we cannot use MOVSB. We need to check each individual byte to see if it is 0 hex, so we move it to AL, move it from AL to the destination, and then check AL for 0. Also note that we did not pop the addresses off the stack with the return statement, since in C it is the calling program's responsibility to do that. If you look at the calling code above, you will see: add sp, 4 which gets rid of the two pointers from the stack. Remember, the stack grows downward, so you ADD to decrease the size of the stack. Let's do the same Pascal program again, but this time use long pointers, that is, give both the segment and offset of the string. This means that we will be able to move from any place in memory to any place in memory. Here is the calling code. mov ax, segment from_string Chapter 19 - Strings 203 ____________________ push ax mov ax, offset from_string push ax mov ax, segment to_string push ax mov ax, offest to_string push ax call move_pascal_string We will still keep it a near subroutine. After setting up BP, the stack will look like this: from_string segment bp + 10 from_string offset bp + 8 to_string segment bp + 6 to_string offset bp + 4 old IP bp + 2 bp -> old BP bp + 0 Here's the subroutine: ; ---------- move_pascal_string proc near FROM_PTR EQU [bp+8] TO_PTR EQU [bp+4] push bp ; set up bp mov bp, sp pushf ; push the flags PUSHREGS cx, si, di, ds, es ; push the registers lds si, FROM_PTR ; load pointers les di, TO_PTR cld ; clear DF (increment) sub cx, cx ; zero cx mov cl, [si] ; length to cl inc cx ; increment count by one rep movsb ; the actual move POPREGS cx, si, di, ds, es popf ; pop the flags pop bp ret (8) ; pop pointers and return move_pascal_string endp ; ---------- This takes slightly less code since we load SI and DS at the same time (with LDS -load DS) and we load DI and ES at the same time (with LES - load ES). Remember, 8086 instructions which move an offset:segment pair always have the offset in low memory and the segment in high memory; the offset is the first two bytes and the segment is the next two bytes. The PC Assembler Tutor 204 ______________________ We changed the EQU statements, and the return statement is now: ret (8) so we take 8 bytes (4 words) off the stack, but the rest is the same. CMPS The final instruction in this group is CMPS, and as usual, it comes in two varieties. cmpsb compares the byte addressed by DS:SI to the byte addressed by ES:DI. It is the same as the CMP instruction. It moves both bytes into the 8086, subtracts the DI byte from the SI byte and sets the flags. The two bytes in memory remain unchanged. You can look at the flags to see which byte is larger, or if they are equal. As usual, both SI and DI are incremented or decremented by one, depending on the setting of DF, the direction flag. cmpsw compares the word addressed by DS:SI to the word addressed by ES:DI. It is the same as the CMP instruction. It moves both words into the 8086, subtracts the DI word from the SI word and sets the flags. The two words in memory remain unchanged. You can then look at the flags to see which word is larger, or if they are equal. Both SI and DI are incremented or decremented by two, depending on the setting of DF, the direction flag. This instruction has the same effect on the flags as: push ax mov ax, ds:[si] ; or AL for bytes cmp ax, es:[di] ; performs ( DS:[si] - ES:[DI] ) pop ax What use is this instruction? It is possible to use this for word find, and we will do that later, but it is a little unsophisticated for that. It is great for data verification, however. When you use the DISKCOMP utility in DOS which compares two floppy disks, it reads each of the disks sector by sector, and then compares them. A sector is 512 bytes. The code for this utility looks like this: ; - - - - - DATA - - - - - error_message db "Sectors are not the same", 0 disk1_buffer db 512 dup (?) disk2_buffer db 512 dup (?) ; - - - - - CODE - - - - - Chapter 19 - Strings 205 ____________________ get_next_sector: ; the code for reading one sector from each disk goes here. ; then we have the code to compare the two sets of data. mov si, offset disk1_buffer mov di, offset disk2_buffer mov cx, 256 ; 512 / 2 = 256 repe cmpsw je get_next_sector lea ax, error_message ; we had an unequal comparison call print_string jmp get_next_sector ; - - - - - - - - - - We do a word compare since it takes only half as many steps. If there is an unequal comparison at any time, the REPE instruction will terminate the loop. We can test for this inequality with JE or JNE. In this example we assume that DS and ES have the same segment address. Any time you need to verify data, this is the instruction to use. We are going to build a word search program. It is not very valuable since 'a' will not match 'A', but it is a good exercise to look at CMPS. We will use ch1str.obj, the file we used at the beginning of the chapter, as the text file and you can try to find individual words in the file. Remember, the file is continuous characters (no spaces), and all characters are small. If you didn't save the file length, you will have to run that program again to find the length of the file. Here's the word_search program: ; + + + + + + + + + + + + + + + START DATA BELOW THIS LINE EXTRN ch1str:BYTE entry_banner db 13,10, "Enter a word for a word search", 0 no_match_banner db "There was no match", 0 input_buffer db 80 dup (?) letter_count dw ? ; + + + + + + + + + + + + + + + END DATA ABOVE THIS LINE ; + + + + + + + + + + + + + + + START CODE BELOW THIS LINE mov ax, seg ch1str ; load es register mov es, ax cld ; clear DF (increment) big_loop: ; get a word for the word search mov ax, offset entry_banner call print_string mov ax, offset input_buffer The PC Assembler Tutor 206 ______________________ call get_string ; find the end of string mov al, 0 ; compare with 0 mov bx, offset input_buffer mov cx, 0 ; letter count letter_count_loop: cmp al, [bx] ; compare to 0 je end_of_count_loop inc cx ; increment count inc bx ; increment pointer jmp letter_count_loop end_of_count_loop: jcxz big_loop ; if cx = 0, string is empty so redo mov letter_count, cx ; store our count ; look for word match mov di, offset ch1str mov cx, $$$$ ; $$$$ = length of ch1str sub cx, letter_count ; calculate last possible match word_search_loop: push di ; start of search push cx ; count for ch1str mov si, offset input_buffer mov cx, letter_count repe cmpsb ; the actual comparison je found_it ; if equal, we have a match ; no match. are we finished? pop cx pop di inc di ; move to next starting address loop word_search_loop ; we fell through. finished, but no match mov ax, offset no_match_banner call print_string jmp big_loop found_it: pop cx ; clear cx off the stack pop di ; start of the match mov si, offset input_buffer mov cx, 25 ; move 25 characters to buffer transfer_loop: mov al, es:[di] mov [si], al inc si inc di loop transfer_loop mov BYTE PTR [si], 0 ; end of a C string mov ax, offset input_buffer call print_string jmp big_loop ; + + + + + + + + + + + + + + + END CODE ABOVE THIS LINE Chapter 19 - Strings 207 ____________________ The code is so long that the whole assembler file has been put on disk so you don't have to do all the typing. The pathname is \XTRAFILE\COMPARE.ASM. All you need to do is enter the length of ch1str in the MOV instruction where the dollar signs are: mov cx, $$$$ ; $$$$ = length of ch1str Link with 'link compare+ch1str+\asmhelp'. You enter a text string and the program looks for an exact match in ch1str. Here is how the program is structured. First, the program prompts you to enter a string. The program then counts the number of bytes in the string. It must have a non-zero length or the program will prompt you again for a string. The program then starts at the beginning of the text. It saves a copy of the pointer to the start of the comparison so if we fail we can start over again at the next character. The actual comparison is: repe cmpsb If that makes it through all the letters in the search string, REPE will quit because CX = 0, not because we have an unequal character. If the comparison failed we pop DI (the text pointer) and start at the next character. If there is a match, we move 25 characters (starting with the matching characters) from the text to the buffer. It is necessary to move these because when you call print_string, the string must be in the DATASTUFF segment, and ch1str isn't. We haven't used MOVSB here because ES and DS are in the wrong place. For 25 characters there is only a marginal advantage to setting up for MOVS. Finally, the 25 characters are printed. If there is no match, a message to that effect is printed. The text in ch1str is the first draft of chapter 1, but just for interest, I have hidden eight C keywords and eight of your favorite Middle English words in the text.{4} See if you can find them. SEGMENT OVERRIDES Here are the string instructions and the override rules for each one. LODS moves a byte or word from DS:[si] to AL or AX. You may use CS:[si], SS:[si] or ES:[si]. STOS moves a byte (or a word) from AL (or AX) to ES:[di]. NO ____________________ 4. Two hints. You might find four of these Middle English words in the name of a boutique. The other four of the Middle English words are some of your favorite monosyllabic words. The PC Assembler Tutor 208 ______________________ OVERRIDES ARE ALLOWED. SCAS compares AL (or AX) to the byte (or word) pointed to by ES:[di]. NO OVERRIDES ARE ALLOWED. MOVS moves a byte (or a word) from DS:[si] to ES:[di]. You may use CS:[si], SS:[si] or ES:[si], but you MAY NOT OVERRIDE ES:[di]. CMPS compares the byte (or a word) from DS:[si] to ES:[di]. You may use CS:[si], SS:[si] or ES:[si], but you MAY NOT OVERRIDE ES:[di]. Looking at the whole group, you may override DS:[si], but you may not override ES:[di]. The form of the override is strict. We will take MOVS as an example. Till now, the instructions were written: movsb ; byte move movsw ; word move If you want to do an override, the syntax is: movs BYTE PTR ES:[di], SS:[si] movs WORD PTR ES:[di], SS:[si] If you write: movsb ES:[di], SS:[di] you will get an assembler error. Here are all the legal forms: LODS lodsb lodsw lods BYTE PTR SS:[si] ; or CS:[si], DS:[si], ES:[si] lods WORD PTR SS:[si] ; or CS:[si], DS:[si], ES:[si] STOS stosb stosw stos BYTE PTR ES:[di] ; no override allowed stos WORD PTR ES:[di] ; no override allowed SCAS scasb scasw scas BYTE PTR ES:[di] ; no override allowed scas WORD PTR ES:[di] ; no override allowed MOVS movsb movsw movs BYTE PTR ES:[di], SS:[si] ;or CS, DS, ES:[si] movs WORD PTR ES:[di], SS:[si] ;or CS, DS, ES:[si] CMPS cmpsb Chapter 19 - Strings 209 ____________________ cmpsw cmps BYTE PTR SS:[si], ES:[di] ;or CS, DS, ES:[si] cmps WORD PTR SS:[si], ES:[di] ;or CS, DS, ES:[si] Just because you can do overrides with these instructions doesn't mean that you should. In fact, there is a problem. If you are using the REP instruction with an override: rep movs WORD PTR ES:[di], SS:[si] and the 8086 gets a hardware interrupt,{5} the 8086 forgets the override. What this means is that one moment you are moving data from the SS segment, and the next moment you are moving data from the same offset, but in the DS segment. This just won't do. Thus the rule is: NEVER USE AN OVERRIDE WITH A REP/REPE/REPNE INSTRUCTION This actually is no hardship. Using the override adds time to the instruction. All you need to do is change the segment addresses for the duration of the string instruction, and the code will run faster. Of course, there is the setup time, but the break even point is say, 20 repeats. Here is what you would do if you needed an SS segment override: push ds ; save old DS push ss ; move SS to DS pop ds ; the same as an SS:[di] override rep movsb pop ds ; get old DS back The other possibility is to use LOOP instead of REP. It is slower, but better slower and reliable than faster and unreliable. rep movs BYTE PTR ES:[di], SS:[si] is the same as: repeat_loop: movs BYTE PTR ES:[di], SS:[si] loop repeat_loop There are even three forms of the LOOP instruction: LOOP, LOOPE, LOOPNE which are the exact counterparts to REP, REPE, REPNE. ____________________ 5. Which can be caused by such rare occurances as your pressing a key on the keyboard or one of the 18 timer interrupts that happen each second. The PC Assembler Tutor 210 ______________________ SUMMARY LODS (load from string) moves a byte or word from DS:[si] to AL or AX, and increments (or decrements) SI depending on the setting of DF, the direction flag (by 1 for bytes and by 2 for words). You may use CS:[si], SS:[si] or ES:[si]. This performs the same action (except for changing SI) as: mov ax, DS:[SI] ; or AL for bytes The allowable forms are: lodsb lodsw lods BYTE PTR SS:[si] ; or CS:[si], DS:[si], ES:[si] lods WORD PTR SS:[si] ; or CS:[si], DS:[si], ES:[si] STOS (store to string) moves a byte (or a word) from AL (or AX) to ES:[di], and increments (or decrements) DI depending on the setting of DF, the direction flag (by 1 for bytes and by 2 for words). NO OVERRIDES ARE ALLOWED. This performs the same action (except for changing DI) as: mov ES:[DI], ax ; or AL for bytes The allowable forms are: stosb stosw stos BYTE PTR ES:[di] ; no override allowed stos WORD PTR ES:[di] ; no override allowed SCAS compares AL (or AX) to the byte (or word) pointed to by ES:[di], and increments (or decrements) DI depending on the setting of DF, the direction flag (by 1 for bytes and by 2 for words). NO OVERRIDES ARE ALLOWED. This sets the flags the same way as: cmp ax, ES:[DI] ; or AL for bytes The allowable forms are: scasb scasw scas BYTE PTR ES:[di] ; no override allowed scas WORD PTR ES:[di] ; no override allowed MOVS moves a byte (or a word) from DS:[si] to ES:[di], and increments (or decrements) SI and DI, depending on the setting of DF, the direction flag (by 1 for bytes and by 2 for words). You may use CS:[si], SS:[si] or ES:[si], but you MAY NOT OVERRIDE Chapter 19 - Strings 211 ____________________ ES:[di]. Though the following is not a legal instruction, it signifies the equivalent action to MOVS (not including changing DI and SI): mov WORD PTR ES:[DI], DS:[SI] ; or BYTE PTR for bytes The allowable forms are: movsb movsw movs BYTE PTR ES:[di], SS:[si] ;or CS, DS, ES:[si] movs WORD PTR ES:[di], SS:[si] ;or CS, DS, ES:[si] CMPS compares the byte (or a word) at DS:[si] to the one at ES:[di], and increments (or decrements) SI and DI, depending on the setting of DF, the direction flag (by 1 for bytes and by 2 for words). You may use CS:[si], SS:[si] or ES:[si], but you MAY NOT OVERRIDE ES:[di]. Although the following is not a legal action, it signifies the equivalent action to CMPS (not including changing DI and SI): cmp WORD PTR DS:[SI], ES:[DI] ; or BYTE PTR for bytes The allowable forms are: cmpsb cmpsw cmps BYTE PTR SS:[si], ES:[di] ;or CS, DS, ES:[si] cmps WORD PTR SS:[si], ES:[di] ;or CS, DS, ES:[si] The string instructions may be prefixed by REP/REPE/REPNE which will repeat the instructions according to the following conditions: rep decrement cx ; repeat if cx is not zero repe decrement cx ; repeat if cx not zero AND zf = 1 repz decrement cx ; repeat if cx not zero AND zf = 1 repne decrement cx ; repeat if cx not zero AND zf = 0 repnz decrement cx ; repeat if cx not zero AND zf = 0 Here, 'e' stands for equal, 'z' is zero and 'n' is not. These repeat instructions should NEVER be used with a segment override, since the 8086 will forget the override if a hardware interrupt occurs in the middle of the REP loop. 'HARD' FLAGS IEF, TF and DF are 'hard' flags. Once they are set they remain in the same setting. If you use DF, the direction flag, in a subroutine, you must save the flags upon entry and restore the flags on exiting to make sure that DF has not been altered.