Frostbyte's 1980s DOS Shareware Collection

home *** CD-ROM | disk | FTP | other *** search

/ Frostbyte's 1980s DOS Shareware Collection / floppyshareware.zip / floppyshareware / DOOG / MD8610.ZIP / MD86.DOC < prev next >

Wrap

Text File | 1990-05-04 | 79.8 KB | 1,755 lines

page 1 Masterful Disassembler - Intel 8086 version 1.00 page 1 1.0) Introduction The MD86 program is a powerful utility for examining and disassembling any executable program or any series of machine instructions (like a ROM image). MD86 is designed to run on any IBM PC, XT, or AT or compatible with at least 128k of ram memory. Neither a graphics adaptor or a color monitor is required. A hard disk is desirable but MD86 runs fine (actually a trifle slower) on floppy based systems. MD86 was developed with one goal in mind. Produce useable source code from an executable program file. By useable, we mean that the resulting assembly instructions should be understandable. This necessitates meaningful label names and comments. Normally the disassembly of a large program is a time consuming, laborious task. MD86 speeds this up as much as possible. MD86 produces source files that are compatible with the Microsoft assembler MASM version 4.00 or reasonably compatible with the IBM assembler. While this is not the easiest assembler to use (in fact it is down right difficult), it was chosen because it is more "standard" than any other assembler. Eventhough the instruction syntax is compatible, the organization of the segments may not be for some programs. After MD86 has produced a source file, it is not uncommon that an editor is needed to make some minor changes before it can be assembled without error. This will be especially true with EXE type programs which have complex segment structures. 1.1) What MD86 Looks Like MD86's unique video display works very much like a full screen editor; allowing movement within the disassembled source file with single key ease. Most of the difficulties associated with other disassemblers is gone. When executed, MD86 presents the user with a full screen of information that looks very similar to the printed output from an assembler. Figure I shows a typical screen from a freshly disassembled program. This is actually the file COMMAND.COM for PCDOS v3.1. The bold line towards the top of the display is the active line. The cursor (shown as an underline "_") is at the start of the label field. Both the label field and the comment field may be edited. Note how the display does not seem cluttered. The label field only has an entry if the address is referenced. The comments shown here have been automatically inserted by MD86 These help you remember the less common instructions and MSDOS function calls. page 2 Masterful Disassembler - Intel 8086 version 1.00 page 2 Figure I, Typical Display Of Freshly Disassembled Program 05DB:51 L05DBH PUSH CX ; 05DC:1E PUSH DS ; 05DD:07 POP ES ; 05DE:C536610A _ LDS SI,[L0A61H] ;Load DS:reg with 32b pointr 05E2:57 PUSH DI ; 05E3:BF5A08 MOV DI,#L085AH ; 05E6:B90B00 MOV CX,#L000BH ; 05E9:FC CLD ;Set forward direction for 05EB:F3A4 REPZ MOVSB ;Move byt. (SI)+- to (DX)+- 05EC:5F POP DI ; 05ED:06 PUSH ES ; 05EE:1F POP DS ; 05EF:59 POP CX ; 05F0:BA4708 MOV DX,#L0847H ; 05F3:B409CD21 MSDOS _OUTSTR ;Display string at (DX) 05F7:BA9E08 L05F7H MOV DX,#L089EH ; 05FA:E8A300 CALL L06A0H ; 05FD:F606D30AFF TEST [L0AD3H],0FFH ; 0602:7404 JZ SHORT L0683H ; 0604:B403 MOV AH,#3 ; 0606:EB7B JMP L0683H ; 0608:B8010C L0608H MOV AX,#L0C01H ;Flush buffer, read keyboard 060B:CD21 MSDOS ; 060D:E88D00 CALL L069DH ; CS:: Labels= 185/ 8%, Types= 0/ 0%, 0 cmnts No Edit 10/ 2/87 1:20:35 A note to programmers familiar with the Microsoft assembler MASM. MD86 creates compatible data files, but the screen display has been simplified. In particular, the word OFFSET (as required by MASM) is replaced with the pound sign ("#") and all WORD PTR and BYTE PTR phrases have been removed. Generated labels are not shown with an appending colon. When a source file is generated, the source code will be compatible. The line at the bottom contains status information. This tells you that the code being viewed is in the code segment, 185 address labels have been identified, no data types have been defined and no user entered comment records exist. In this case we are not as yet editing any field thus "No Edit" is displayed. If we were, then either "INSERT" or "REPLACE" would show indicating how characters are being added to the field. At the far right corner, the current time and date are as shown. The comment field may extend past the right edge of the screen and the active line scrolls horizontally as necessary to keep the cursor within view. è The function keys are used to control MD86. A window "pops up" in the upper left corner for instructions. Inadvertently entered commands may generally be aborted by a null response (ie, only pressing the RETURN page 3 Masterful Disassembler - Intel 8086 version 1.00 page 3 key) to one of the questions. 2.0) Using MD86 To disassemble a file it must first exist in the current directory. MD86 may be placed in any other directory as long as the PATH command includes that directory. The companion file, MD86.CMT is only used to supply the automatic comments and is only needed when a program is disassembled for the first time. If this is not found, MD86 will turn off automatic commenting. If this is not acceptable, then QUIT (see Section 2.2), move MD86.CMT to the current directory and begin again. To disassemble the program COMMAND.COM, use the following command. C>MD86 COMMAND.COM If MD86 cannot locate the associated data files, then MD86 will create them (you will be asked for confirmation first just in case you misspelled the program name). MD86 will automatically determine the extent of the program and put the cursor on the first address of the program. Note that COM type files start at 100 (hex) and EXE type files start at 0000. See reference 1 for a discussion of the dissection of EXE type files. The Alter Parameters command may be used to override the choices made by MD86. MD86 creates two data files when it disassembles a program. These have the same name with the extensions of .001 and .002 (ie, COMMAND.001 and COMMAND.002). The first file contains the symbol table and other parameters and the second file contains the comment records. If neither of these files are present, then MD86 assumes this is a disassembly of a file for the first time. If they are both present (and readable) then MD86 will pick up right where you left off. If only one of these files is present or one is unreadable, then MD86 issues an error message and terminates. Refer to Section 7 for a discussion of error causes and cures. During the disassembly process, there are three groups of commands that MD86 will recognize. Commands that require additional input will cause a window to pop up in the upper left corner of the display. User dialogue occurs within this window. The three command groups consist of 1) editing commands, 2) non-editing commands, and 3) general commands. The editing and non-editing commands are mutually exclusive. When editing, the non-editing commands are not allowed and visa-versa. General commands are always valid. MD86 will "beep" when an invalid command character is entered. Note that some keys generate more than one character and while the first characterè may be invalid, the others may not. So when you here the beep, examine the characters around the cursor to be sure no extraneous characters were inserted. page 4 Masterful Disassembler - Intel 8086 version 1.00 page 4 2.1) General Command Keys The general commands can be typed at any time. They will always be recognized. o Left-Arrow This will move the cursor one position left within the current field (either the label field or the comment field). It the cursor is already at the beginning column, then a "beep" will be heard. o Right-Arrow This will move the cursor one position right within the current field (either the label field or the comment field). Note that the fields are always filled with blanks on the right. If the cursor is already past the right hand column, then a "beep" is heard. o Insert This will change the way editing character keys are entered. They will either replace existing characters or insert in front of the characters. In editing mode, either INSERT or REPLACE is displayed in the bottom status line. 2.2) Editing Command Keys The label and comment fields can be edited by moving the cursor to the desired location and just typing; similar to a word processor. This allows label names and or comments to be associated with an address. The current line may be in the code segment or data segment (EXE type programs only). Once editing has begun, then only the ESCAPE key or the RETURN key will revert to non-editing command mode. When a temporary label field is edited, it is initially blanked out eliminating the "L1234H" that was present. If the cursor was not at the first column of the label, then leading blanks will exist there. Since labels must begin with a letter or underscore, this will be rejected when a RETURN key is pressed. Editing the automatic comments causes MD86 to first ask if the comment field should be blanked out or not. After this question is answered, your key is processed. page 5 Masterful Disassembler - Intel 8086 version 1.00 page 5 o Letters, Numbers, and Symbols These characters are entered into the field. If the current mode is INSERT then the characters to the right are moved (the rightmost character is lost) to make room. Otherwise, in REPLACE mode the new character overwrites the current cursor character. Note that within the label field only the characters A-Z, a-z, 0-9, $, and _ are valid. Other characters cause a "beep" and are ignored. Note that a further restriction that labels not begin with a number is not checked until a RETURN key is pressed. o Escape This key will cancel any editing on the current field and its original contents will be restored. This effectively returns to non-editing mode without saving any changes to the current field. o Return Use this key to tell MD86 that the editing changes you have made are correct and should be remembered. Note that this is not the same as saving the data as it is not actually written to the data files yet. If the field contents are valid, then the cursor will be returned to the starting column of the current field and the mode is set to non-editing. o Backspace If the cursor is not at the left edge already, this will erase the character immediately to the left of the cursor. The remainder of the line will be shifted left and the rightmost column will be blank filled. If at the left, then this just "beeps". o Delete This will erase the character immediately under the cursor and cause the remainder of the line to be shifted left. The rightmost column will be blank filled. o End This moves the cursor to the last column of the field. o Homeè This moves the cursor to the leftmost column of the field. page 6 Masterful Disassembler - Intel 8086 version 1.00 page 6 2.3) Non-editing Command Keys A good portion of time spent disassembling a program is spent rooming around various areas and other non-editing type functions. The simpler cursor movement functions use a single key stroke for this work while the more involved commands use the function keys and pop up windows. The cursor movement keys are as follows. o Down-Arrow This moves the cursor down one line and to the beginning of the same field. You cannot move beyond the end of the current segment. o Up-Arrow Move the cursor up one line to the beginning of the same field. You cannot move past the beginning of the current segment. o Home If the cursor is right of the leftmost column of the comment field, then it is moved to the start of the comment field. Otherwise the cursor moves to the first column in the label field within the same line. o End This moves the cursor to the beginning of the comment field if it was within the label field. Otherwise the cursor moves to the end of the comment field. o Page-Up This moves the cursor up approximately a full page. This may be more or less than 24 lines as this assumes there are three bytes per line on the average. Note that no attempt is made to locate the beginning of an instruction. It is probable that the first line or so will be disassembled incorrectly. The cursor will be positioned at the start of the label field in the top line. o Page-Down è This moves the cursor to the top of the next page. The line that is currently at the bottom of the screen will be at the top after this command. The cursor will be positioned at the start of the label field. page 7 Masterful Disassembler - Intel 8086 version 1.00 page 7 The following commands utilize the function keys on the PC keyboard either alone or in combination with the shift key. Remember these only function in non-editing mode. o F1 - Help Command This displays a one screen summary of the function keys. Press any key to refresh the screen. o Shift-F1 - Alter System Parameters This command is use to make changes (if possible) to the default parameters. For COM files, the beginning and ending addresses can be modified. EXE files however, have ranges for the data and code segments that are defined in the header. These cannot be changed. For COM files, the following parameters can be changed. o The Start Address. Normally this is 100 hex for COM files and 0000 hex for EXE files. But for special work, like ROM disassembly, this may be set to something else. o The End Address. MD86 sets this to the physical end of the program or FFFE hex if this is more than 64k. If you find that a smaller value is correct, then change it here. This will prevent MD86 from accessing garbage areas and contaminating the label table. The following parameters effect how MD86 displays the disassembled lines. These are changeable at all times. o Translate MSDOS Functions. MD86 normally tries to translate common MSDOS functions into a pseudo instruction that has more meaning when trying to understand the code. However, if you don't want this done, then it can be disabled here. o Enable Automatic Comments. When MD86 finds certain instructions, it tries to add a comment line explaining the instruction in more or less English. If you don't want to see these comments, then they can be eliminated. o F2 - Goto a Specified Address è This allows a quick jump to any valid address to begin disassembly. A null response (only the RETURN key pressed) causes MD86 to try and return to the location of the last Goto command. A stack with the page 8 Masterful Disassembler - Intel 8086 version 1.00 page 8 previous 16 locations is maintained. This is handy in jumping to one location and then returning without having to remember where you were. Valid destinations are anywhere within the data segment or the code segment. Note that for a COM type file, these are assumed to be the same. o Shift-F2 - Follow That Instruction If the current line contains a direct jump or call instruction, this command will do an automatic Goto to the destination address. This does not apply to intra-segment calls or jumps or any indirect calls or jumps. MD86 saves the current address on its internal stack so that a return can be made via the F2 command. With this you can conveniently examine a subroutine and then continue from where you left off. o F3 - Set Data Type When MD86 first looks at a program, it thinks that all of the data segment is made up of 8-bit data bytes and all of the code segment is machine instructions. This more than likely is not 100% correct. When disassembling a portion of the program, you may notice that the present interpretation does not make sense. Some other data type is necessary. MD86 can recognize one of four data types. These are instructions (type #0), 8-bit binary data (type #1), 8-bit ascii characters (type #2), and 16-bit addresses (type #3). See Section 2.6 for more details. This command allows any range of the program to be set to a specific type. You will be asked for the data type and the first and last addresses. Addresses must be in the same segment of course. The internal type table has room for 512 entries. This is the total of the data and code segment types. The current total is displayed on the bottom status line along with a percent used figure. o F4 - Set Data Type for Unspecified Range Often the extent of a different data type is not known. What is known is the initial address and a suspected data type. This command uses the current line as the beginning address and will request the suspected data type (0 to 3). Then MD86 temporarily considers all data following the cursor to be this type. You would move the cursor down until you reach an address that is not of this type and press F4 again to fix the range for the specified type. In this special mode, only one data item per line is displayed.è page 9 Masterful Disassembler - Intel 8086 version 1.00 page 9 o F5 - Write Source to Disk MD86 would be of limited use if you could not generate a disk file with the source code. Use this command and specify the file name and MD86 will write out the data. If an extension of PRN is used, then the file will have the address and binary code along with each instruction line. This is the way the screen appears. Before this file can be used by an assembler, some hand editing will be required. Segments may have to be specified differently than MD86. o F6 - Scan Code Segment MD86 builds the label table as code is disassembled. At times, the disassembly is not correct and erroneous address references may be entered into the label table. This function cleans this up. When all of the code segment has been given the correct data type, then this function should be used to properly build the label table. It will remove any temporary labels and begin disassembling the entire code segment. When this has completed, the label table will be correct and erroneous references will be removed. o F7 - Dump Program in Hexadecimal and ASCII It is difficult to determine the location of data areas and character strings by just looking at a page of disassembled instructions. This function will begin in the data segment (for EXE programs) and then dump the code segment. The data is displayed 16 bytes per line in hexadecimal and also in ascii (if possible). Pressing any key will halt the display so you can inspect the data and maybe write down addresses of obvious data areas. Pressing any key other than the ESCAPE will continue the display. Press the ESCAPE key to end this segment and dump the next segment or return to where you when this command was initiated. o F8 - Set Label Name If it is desired to associate a name with a particular address you can either move the cursor to that address (if possible) and edit the label field or use this command to set the name without having to move there. Enter any valid label name and address to be set. A null label name will delete a label from the tables. In valid names cause a "beep" to be heard and is ignored. Valid names start with a non-digit and have no imbedded spaces. The label table is limited to 2048 label names which should beè satisfactory for any reasonable size program. page 10 Masterful Disassembler - Intel 8086 version 1.00 page 10 o F9 - Search for Address Reference This command will allow any address to be searched for. Use this when it is desired to find out how (or if) a particular area is referenced within the code segment. The initial address to start is requested. A null response causes the search to begin at the start address. You will see the program disassembled on the screen and it will stop when the specified address is referenced. During the search, press any key to abort. Note that the search is limited to the code segment. However, the particular address may be in any other segment. For example to search for address 1234 within the extra segment, enter ES:1234 as the search string. o Shift-F9 - Search for Next Reference Once function F9 has been used to find the first occurrence of an address, use this command to locate the next. As with function F9, press any key to abort the search. o F10 - Save and/or Exit Use this command to save your current data tables (often!) and exit or quit MD86 You are given the option to save the data or not and to exit to MSDOS or not. If you wish to quit without saving any of the work you have done, then respond No to saving the data and Yes to exiting. 2.4) Label Name Specification MD86 allows, even encourages, you to associate a label name with each referenced address. Names are far more understandable than numbers. The label field within the display is either blank (the address has not been referenced), contains a temporary label (the form is LnnnnH for address nnnn), or contains a user defined name. Label names can be up to eight characters long and may contain letters, digits, the dollar sign "$", or the underscore "_" characters. Labels may not begin with a digit however. Label names may contain upper and lower case letters and the case is maintained. However, when searching for a name, MD86 ignores differences in case. Thus the name "HelpMsg" is perfectly valid and will appear this way in the output file. You could also jump to address "HELPMSG" and "HelpMsg" would be found. Upper and lower case letters make reading names easier, but you don'tè have to remember the exact form to reference the name. page 11 Masterful Disassembler - Intel 8086 version 1.00 page 11 2.5) Specification of Addresses When MD86 requests an address (like the destination of a Jump command), the form the address must be entered as follows. Address ?{ss:}nnnn The brackets indicate optional qualifiers. If the address is within a segment other than the current segment, then the segment name must be included. The "ss:" in the line above is the segment name and it must then be either "CS:", "DS:", "ES:", or "SS:". The case of the letters is not important, but the segment name must precede the address (or offset) portion. If the actual address within the segment is entered as a number then it must be in hexadecimal. In place of a number, a label name could be used. This name must be resolvable within the segment. For example, the following are valid addresses. Address ?100 Address ?ds:HelpMsg The label table is stored internally and has room for 2048 entries. This is generally enough to disassemble a 10,000 to 15,000 line program. For larger programs it is recommended that they be divided into smaller sections if at all possible. 2.6) Data Type Specification MD86 initially thinks the entire code segment contains instructions and the data segment (for EXE type files) contains 8-bit binary data. This is a good place to start but there will be other data types mixed in with these. Functions F3 and F4 can be used to tell MD86 to assume a different data type for a specified address range. The types are specified by a numeric code number and the ones recognized are: 0 - Machine instructions. 1 - 8-bit binary data. 2 - 8-bit ASCII character data. è 3 - 16-bit address data. 1 page 12 Masterful Disassembler - Intel 8086 version 1.00 page 12 When using function F3, MD86 must be told the first address of the newly defined type and the last address with this type. For data types that occupy more than one byte (type 0 or 3), the last address must be the address at the end of the field not the start. Thus if address 100 contains a single 16 bit address, then MD86 is given the first address as 100 and the last address as 101 (not 102 as you might think). Function F4 works a little differently. The start of the current line is taken as the first address when this is initiated. When this is pressed again, then the start of the now current line (if below the first address) is assumed to be just passed the type being defined. In other words, if the address range 120 to 140 is being defined as type 2 (ascii character data), then the current line should be at address 120 when F4 is pressed the first time and then moved to address 141 when pressed the second time. A code type table is maintained internally by MD86 that contains the beginning and ending addresses (with segment of course) and the type of data this address range contains. Instructions are the default type and (to save memory space) are not actually stored in the table. There is room for 512 entries which should be plenty for most normal applications. 2.7) Output Source File Format MD86 produces a standard ASCII text file as output. This should be suitable as input to most any assembler and editor. Note that MD86 does not insert tab characters and thus the lines will contain many blanks. This causes the files to be quite large. The judicious insertion of tabs would shrink the file size significantly. When MD86 disassembles a program, it remembers how addresses are referenced. As a convenience, the output file will have separating lines just in front of all subroutines. That is, those addresses that were the target of near call instructions. 3.0) The Inner Details of MD86 In the next few sections we will describe in more detail how MD86 functions. It is not necessary that you remember or understand all of this material, but when questions arise this will make a handy reference. Many choices made during the construction of MD86 were ones of programmer preference. There are several ways to tackle the many problems encountered in creating a disassembler. Which one is better? If the choice was notè obvious, then personal preference would be the deciding factor. MD86 was written in TURBO Pascal and owes a lot of its speed to this fine compiler. Some of the limitations of this compiler necessitated tradeoffs in the design of MD86. In particular, to avoid the use of overlays, the program was limited to a 61k code segment. Not all of the "nice" options could be included. page 13 Masterful Disassembler - Intel 8086 version 1.00 page 13 3.1) Moving the Cursor Upward When MD86 displays a full screen of disassembled lines it remembers the exact starting address for each line. Thus to move upward within the display screen, MD86 only has to display the new current line in bright characters again. However, when the cursor moves off the top of the screen (either by an Up Arrow or Up Page command), MD86 has a difficult task in determining the starting address for the instruction. There are times when more than one legitimate starting address is located. For this situation, MD86 chooses the longest instruction. This may not be correct. The problem comes in when MD86 has moved up more than one time (you pressed Up Arrow more than once when at the top of the screen) and it finally comes to a point where it cannot find any legitimate instruction to disassemble. You here a "beep" and MD86 only backs up one byte. This tells you that the screen is not displaying correct instructions. One or more of those at the top (where MD86 backed up) is not correct. If the screen is not correct, then how do you make it correct? The easiest way is to back up a full page (Up Page command) and then go forward a full page (Down Page command). More than likely, MD86 will correctly synchronize somewhere on the screen when you backed up (probably at a labeled instruction) and everything from there on downward will be correct. One point of interest will surely pop up. When a NOP instruction is disassembled, it is flaged as "questionable" unless the previously disassembled instruction was a short JMP. However, if the cursor is moved upward to a NOP instruction, MD86 marks this line incorrectly since the instruction disassembled prior to this was actually a following instruction. A word of caution. When MD86 incorrectly disassembles a line for whatever reason, erroneous address references may be added to the label pool. This is where the Scan command (F6 key) comes in. It erases all temporary labels (ones that have not been given names by you) and begins disassembling the whole program from the starting address. This will correctly build the temporary label pool. 3.2) Questionable Instructions When MD86 disassembles an instruction line, it will check to see whether or not this instruction makes sense. If it does not, a flag ("?" to the left of the label field) is set but it is disassembled anyway if possible. Valid instructions are considered "questionable" if they are very rarely used or are "meaningless". In the rarely used catagory are the instructions "LOCK", "ESC", "INT",è "XLAT", "WAIT", "HLT", and far returns within a "COM" file. Further any instructions made up of the exact same bytes (eg "ADD [BX+SI],AL" which is two bytes of 00) is considered questionable. page 14 Masterful Disassembler - Intel 8086 version 1.00 page 14 Meaningless instructions are "NOP" and "MOV destination,source" where the source and destinations are the same. Note that a "NOP" is allowed following a two byte forward jump instruction. Depending on the type of program being disassembled, there may be a few or a lot of such "questionable" instructions that are actually supposed to be there. Thus this "questionable" instruction flag is just a guide to help you locate imbedded data areas. 3.3) Instructions prefix bytes Like the Intel 8086 processor, MD recognizes certain prefix bytes. These are the segment override instructions and the repeat instructions. The bus lock instruction is considered separate as this is very rarely used (it is also flaged as "questionable"). When moving the cursor around, it is possible miss the prefix byte. This will occur most often when moving upward. MD only looks at the previous six bytes to determine where an instruction starts. If the seventh byte were a prefix byte, this would be missed. When a jump is made (Function F2) MD does not check to see if this is in the middle of an instruction. Here too, a prefix byte could be missed. The effect of missing a prefix byte could cause label addresses to be associated with the wrong segment. The scan option (Function F6) can be used to clean up this type of misinformation. While this is not the same as a prefix byte, MD86 checks for NOP instructions that are preceeded by a short forward jump instruction. If this is not the case, then the NOP is marked as "questionable". This logic fails when the NOP is in the top line of the screen. There is no preceeding instruction to check and the NOP is marked "questionable" even though it may be perfectly valid. When moving the cursor upward, MD86 will incorrectly mark a NOP as questionable. The logic only works when moving the cursor downward. 3.4) Segment Handling MD86 recognizes references to the four segments of the Intel 8086 processor. It keeps four separate tables for the labels within these segments. An exception is with COM type files. For these, the data segment and the code segment are assumed to be the same. Data segment references are forced into the code segment space. This is generally correct, but it is possible that the program creates a separate data segment. In this case the labels generated by MD86 will be put into the code segment when they belong in the data segment. Not much can be done about this until after a source file has been produced. Use an editor to fix these things.è page 15 Masterful Disassembler - Intel 8086 version 1.00 page 15 3.5) Known Compatibility Problems With MASM MD86 was designed to produce source files to be used with the MASM assembler from Microsoft. This is the most common assembler and as you probably know it is not the easiest assembler to use. MASM tries to guard you against yourself. When variables are defined as bytes, then MASM checks to be sure they are referenced as such. At times this is handy. But mostly it is an annoyance. When disassembling a program it is often times difficult to determine how labels are referenced. At the very least it would consume lots of time which would be better spent on other aspects of the code. To prevent MASM from generating numerous errors due to seemingly inconsistent references, MD86 inserts WORD PTR or BYTE PTR to force MASM into accepting these references. This results in the over use of these override phrases (and a larger than necessary source file). When MD86 notices a reference to an item within the data or code segments that is outside of the limits of the actual program file, it inserts an EQU statement to equate the label with its value. MD86 is thus assuming the the reference is to a constant value and not a variable address. Under some assemblers there is no difference between a constant and an address (or offset). But MASM does make the distinction and flags as an error an inconsistent reference. The following error message is typical of this occurrence. CMP.ASM(176) : error 56: No immediate mode This is MASM's way of telling you that line 176 in file CMP.ASM contains a reference to a variable where the label has been defined as a number. The solution is to modify the source code file and change the definition of this label from an EQU into a DB or DW. Thus the line HELPTXT EQU 00050H should be change into the following lines. ORG 00050H HELPTXT: DB 0 You won't need the ORG statement in front of every definition line as long as the data type is consistent. EXE programs pose other problems that must be dealt with. MD86 does not resolve FAR jumps and calls. You will have to determine the destinationè address and equate this to a label. The absolute address is shown but MD86 cannot find out where this ends up. 1 page 16 Masterful Disassembler - Intel 8086 version 1.00 page 16 Returns from FAR procedures are flaged by MD86, but the instruction inserted is RET. MASM will determine if this is a NEAR or FAR return from the definition of the procedure. Thus, you will have to define the procedure containing the FAR return as a FAR procedure. If you don't, then MASM will assume a NEAR return is to be generated. No error message is displayed but this will not execute the same as a FAR return. Another problem with EXE type files concerns segment definition. The output file generated by MD86 should assemble and even link okay, but it probably won't execute correctly without being changed. You will need to look into adding ASSUME and GROUP statements in the various segment blocks. EXE files can be constructed in many, many ways. It will take some persistence to resolve these differences. 4.0) A Short Course In Generating Source Code The process of disassembling a program and recreating source code is more art than science. The more practice you have the easier this becomes. MD86 (as well as other programs) have been written to make this process as easy as possible. As "smart" as they are, there is a long way to go before this can be considered "automated". Prior to starting to disassemble that super new program, there are some questions you must ask your self. o Do I really need the source code for this program? o Is the program small enough for me to handle? o Was the program written in a lower level language (C or assembler)? o Do I really know how this program functions and all of what it does? These questions are important as the answers can give you an idea of whether you can finish the job once you start it. There is no point in "cheating" on the answers either. You only have yourself to convenience. Source code generation is a three step process. There are three distinct phases users go through when they disassemble a program. The first phase is to identify the type of data the program isè composed of. Programs consist of machine instructions and data. But which is which? You must follow the logic to tell. The only technical difference is that machine instructions are executed while data is referenced. It is quite possible for instructions in one part of a program to be data to another part. With the different segments of the Intel 8086 this is not very common. But it is certainly possible. page 17 Masterful Disassembler - Intel 8086 version 1.00 page 17 Once the program has been divided into instruction and data areas, the second phase begins. This is the process of identifying the different logical parts. This is usually the most difficult and time consuming part. It is not easy to understand what purpose a sequence of instructions has, but with persistence this can be done. The third stage involves generating an assembler source file and getting it to re-assemble properly. Disassemblers are only "human". Their output may assemble without error but it probably won't be a byte-for-byte copy of the original file. Some "touch-up" will be required to rectify such things as long and short jumps. While you are at it, you could clean up the comments and "pretty up" the source file. 4.1) Identifying Data Types There is a real knack to separating the code into data and instruction areas. MD86 goes a long way my marking "questionable" instructions with a question mark to the left of the label field. This mark will appear on the screen as well as in a PRN type output source file. It will be removed when a non-PRN output file is generated. Initially MD86 sees the entire code segment as instructions and the entire data segment (EXE type programs only) as 8-bit binary data. However, most of the time this is not the case. It is very common to find character strings imbedded within the code as well as normal data areas. When MD86 marks an instruction line as "questionable", examine the lines above and below to determine where the instructions end and data begins. Of course it is possible that MD86 was wrong in it judgement and the line is correct. MD86 assumes that memory is made up of either machine instructions or data. The data may be either 8-bit binary (numbers in the range 0-255), 8-bit character data, or 16-bit address (or offset) data. As mentioned above, the Intel 8086 processor sees instructions as data that it is to execute and other data is just referenced. In the sections below we will assume that instruction areas are sequences of instructions that are executed in order and everything else is data. There are five basic rules that can be used to determine data area types. When you identify data areas, make sure these rules have been satisfied. If not, be very suspicious. o rule 1 The instruction preceeding a data area must be a transfer (jump, call, interupt, or return). Conditional jumps would not be allowed unless the condition was ALWAYS met. page 18 Masterful Disassembler - Intel 8086 version 1.00 page 18 o rule 2 The first instruction in an instruction area must have a label unless the preceeding data area was an argument to a call or interupt instruction. o rule 3 An absolute transfer of control (jump or return) may be followed only by a labeled instruction or a labeled data area. o rule 4 For the type of data to change (from instructions to data or from ASCII data to 16-bit address data etc.), the first line of the newer type must have a label. o rule 5 ASCII character data (including carriage returns, line feeds, etc.) must either begin with a character count byte (or word) or it must end with with a special (generally non-ASCII) byte. Is is common within MSDOS applications that character strings end with a dollar sign. This is the way the console output and printer output functions know the end of a string. Assembly programmers also like to use null characters (value of zero) as an end of string mark. The Intel 8086 processor can easily detect these. For purposes of an example, Figures IIa through IIc will be used. This is fairly typical of the kind of code you will encounter. But be forewarned, by its very nature assembly code can be very obscure. If the programmer wishes, it could be extremely difficult to decipher. Refering to Figure IIa, note how several lines have been marked as "questionable". Here it is obvious that the lines following the jump instruction at address 1283 cannot be instructions. The PUSH instruction at address 1286 is erroneous because of rule #1. Notice how most of the bytes following address 1283 have a value in the range 20 to 7E (hex). It is quite possible that this area consists mainly of ASCII characters. But where does this area end? Rule #2 says we should look for the next valid instruction line containing a label. In this example we find this at address 12A2. A word of caution here. Since we may not have disassembled the entire program, the label pool may be incomplete. It is then possible that at this time an instruction does not have a label. We need to be cautious in the application of rule #2. page 19 Masterful Disassembler - Intel 8086 version 1.00 page 19 Figure IIa, Typical Display Of Partially Disassembled Program 127D:BE5011 _ MOV SI,#L1150H ; 1280:E822FC CALL L0EA5H ; 1283:E9DB1B JMP L2E61H ; 1286:55 L1286H PUSH BP ; 1287:6E ? DB 06EH ; 1288:6B ? DB 06BH ; 1289:6E ? DB 06EH ; 128A:6F ? DB 06FH ; 128B:776E JA L12FBH ; 128D:207665 AND [BP]+65H,DH ; 1290:7273 JC L1305H ; 1292:69 ? DB 069H ; 1293:6F ? DB 06FH ; 1294:6E ? DB 06EH ; 1295:206F66 AND [BX]+66H,CH ; 1298:205475 AND [SI]+75H,DL ; 129B:7262 JC L12FFH ; 129D:6F ? DB 06FH ; 129E:0D0A00 OR AX,#L000AH ; 12A1:FF ?L12A1H DB 0FFH ; 12A2:9C L12A2H CBW ;Convert byte (AL) to word 12A3:2E803EA112FF CMP CS:[L12A1H],0FFH; 12A9:7404 JZ L12AFH ; 12AB:9D CWD ;Convert word (AX) to dbl w CS:: Labels= 492/23%, Types= 21/ 4%, 0 cmnts No Edit 10/ 3/87 10:20:35 As a first step, we use Function key F3 to set the area from 1286 to 12A1 as characters (type #2). The code now makes more sense (see Figure IIb). But now notice address 12B4. This instruction does not have a label and yet it follows an unconditional transfer (return) instruction. Rule #3 says this is not correct. Now it could be that there should be a label here and we have just not disassembled the section of code that references it, but the instructions don't look right do they? The hex sequences 06, 07, 08, 09, and C3, C4, C5, C6, C7, C8 would not likely be instructions (although obviously possible). It looks more like numbers or data. In fact the whole area from address 12AD up to 12B8 does not look like instructions at all. Most probably this is just a data area containing numerical values. And 8-bit values at that. If they were 16-bit values (or addresses), they would be way beyond the bounds of our code. So again using Function key F3, we set this area to 8-bit binary data (type #1). Figure IIc shows what the screen looks like now. Compare this with Figure IIa and you can see the improvement. In this way areas of the program are disassembled one section at a time. Progress at first seems slow I realize, but after a while the pieces start to fit together. Asè you begin to understand these small portions the remainder of the program becomes that much easier. You are well on your way to a useful source file. page 20 Masterful Disassembler - Intel 8086 version 1.00 page 20 Figure IIb, Typical Display Of Partially Disassembled Program 127D:BE5011 _ MOV SI,#L1150H ; 1280:E822FC CALL L0EA5H ; 1283:E9DB1B JMP L2E61H ; 1286:556E6B6E6F77 L1286H DB 'Unknown version '; 1296:6F6620547572 DB 'of Turbo',CR,LF,0; 12A1:FF L12A1 DB 0FFH ; 12A2:9C L12A2H CBW ;Convert byte (AL) to word 12A3:2E803EA112FF CMP CS:[L12A1H],0FFH; 12A9:7404 JZ L12ACH ; 12AB:9D CWD ;Convert (AX) to dbl word 12AC:C3 L12ACH RET ; 12AD:2EC606070809 L12ADH MOV CS:[L0807H],#09; 12B3:C3 RET ; 12B4:C4C5 LES AX,BP ; 12B6:C6C7C8 MOV BH,C8 ; 12B9:E8C6FC L12B8H CALL L0F82H ; 12BE:8B4616 MOV AX,[BP]+16H ; 12C1:A38A01 MOV [L018AH],AX ; 12C4:8B4604 MOV AX,[BP]+4 ; 12C7:A38C01 MOV [L018CH],AX ; 12CA:1E PUSH DS ; 12CB:C516AA11 LDS DX,[L11AAH] ;Load DS:DX with 32b pointe 12CF:B010 MOV AL,#10H ; 12D1:B425CD21 MSDOS _SIVEC ;Set vector. CS:: Labels= 492/23%, Types= 21/ 4%, 0 cmnts No Edit 10/ 3/87 10:20:35 For EXE type programs, there is a separate data segment to worry about. While this probably does not contain instructions, it is still necessary to determine if there are any address references stored here. If there are, then they should be identified as such so they can be entered into the label pool. In some cases tables of addresses can be spotted easily. If most of the addresses are close (within a few pages) then you will see similar hexadecimal values every other byte. For example: 1234:017F097F0F7F L1234H DB 1,7FH,9,7FH,0FH,7FH; 123A:137F4F7F1080 DB 13H,7FH,4FH,7FH,10H,80H; When these areas are changed into 16-bit address (type #3) then they appear as follows. è 1234:017F097F0F7F L1234H DW 7F01H,7F09H,7F0FH; 123A:137F4F7F1080 DW 7F13H,7F4FH,8010H; page 21 Masterful Disassembler - Intel 8086 version 1.00 page 21 Figure IIc, Typical Display Of Partially Disassembled Program 127D:BE5011 _ MOV SI,#L1150H ; 1280:E822FC CALL L0EA5H ; 1283:E9DB1B JMP L2E61H ; 1286:556E6B6E6F77 L1286H DB 'Unknown version '; 1296:6F6620547572 DB 'of Turbo',CR,LF,0; 12A1:FF L12A1 DB 0FFH ; 12A2:9C L12A2H CBW ;Convert byte (AL) to word 12A3:2E803EA112FF CMP CS:[L12A1H],0FFH; 12A9:7404 JZ L12ACH ; 12AB:9D CWD ;Convert (AX) to dbl word 12AC:C3 L12ACH RET ; 12AD:2EC606070809 L12ADH DB 2EH,0C6H,6,7,8,9,0C3H; 12B4:C4C5C6C7 DB 0C4H,0C5H,0C6H,0C7H; 12B8:C8 DB 0C8H ; 12B9:E8C6FC L12B8H CALL L0F82H ; 12BC:8B4616 MOV AX,[BP]+16H ; 12BF:A38A01 MOV [L018AH],AX ; 12C2:8B4604 MOV AX,[BP]+4 ; 12C5:A38C01 MOV [L018CH],AX ; 12C8:1E PUSH DS ; 12C9:C516AA11 LDS DX,[L11AAH] ;Load DS:DX with 32b pointe 12CD:B010 MOV AL,#10H ; 12CF:B425CD21 MSDOS _SIVEC ;Set vector. 12D3:1F POP DS ; CS:: Labels= 492/23%, Types= 21/ 4%, 0 cmnts No Edit 10/ 3/87 10:20:35 The contents of these areas are then added to the address label pool. When disassembled, these areas will have a label to let you know that they are referenced somewhere. Notice how the first address of this table has a reference. Rule 4 indicates that this is required. However this is not strictly true. It is possible that the beginning of this area is implied by the end of the previous structure. One common approach is to have a sequence of flag bytes that is followed by a corresponding address table. Because the program "knows" how long the leading byte table is, is then knows the start of the address table. MD86 assumes that any address references present in the data segment refer to offsets within the code segment. While this is generally true, at times this is incorrect. If it is known to be incorrect (by examination of the code that refers to the table of addresses), then a choice has to be made. Either these addresses must not be defined as 16-bit addresses (change this to 8-bit binary data), or the erroreousè references to the code segment must be tolerated. It is suggested that these be changed into 8-bit binary data. You could then add label names to these references within the data segment to keep this correct. page 22 Masterful Disassembler - Intel 8086 version 1.00 page 22 4.2) Understanding the Code This is the part you have been waiting for. The real guts of the job! You have now separated all data from instructions but what do the instructions mean? The Intel 8086 executes instructions in a logical order; the order chosen by the programmer. To truly understand the function of the instructions you must know how they are executed. For example, just knowing the instruction 123A:2C07 SUB AL,7 will subtract 7 from the contents of register AL is not very helpful. However, if the surrounding instructions were 1234:8A07 MOV AL,[BX] ; 1236:3C3A CMP AL,':' ; 1238:7202 JC L123BH ; 123A:2C07 SUB AL,7 ; 123B:2C30 L123BH SUB AL,'0' ; 123D:8807 MOV [BX],AL ; you then have the feeling that register BX is pointing to one or more bytes. And if these bytes are greater than the digit 9 (the character ":" is just passed the digit "9" in the ASCII character set) then 7 is subtracted. Looking 7 passed the "9" digit in the table of ASCII characters you find the letter "A". Then in either case the value of the digit "0" is subtracted. In other words, if register BX were pointing to an "8", then this would be replaced with the binary value of 8. If, however, BX points to the letter "C", it will be replaced with the value 12. So this is just converting a hexadecimal digit or digits from ASCII to binary. Well of course! We "know" this program asks for hexadecimal values and has to interpret them because in this case we are looking at a DEBUG. Because the processor executes instructions in a certain order, we must examine them in that order. This might seem obvious (and in the above example it is) but in many cases it is not easy to determine the way in which instructions are executed. Consider the following code. 1234:E83033 CALL L4567H ; 1237:0130 ADD [BX+SI],SI ;è 1239:337200 XOR SI,[BP+SI]+0 ; page 23 Masterful Disassembler - Intel 8086 version 1.00 page 23 The ADD instruction following the CALL is not actually executed at all. By looking at the routine at address 4567 we find that the byte following the initial CALL is just a parameter. This byte gets used and the return will be to the following address (1238). We would not have been able to tell this if we hadn't looked at the instructions in the same order the processor does. When you pick apart even a small section of code you should enter a few comments and add a label name if you can. Then you won't have to reinvent the wheel the next time you look at this code (and you will look at it more than once!). This process is going to be very laborious. It takes many instructions in assembly language to accomplish seemingly trivial functions. Like the simple BASIC statement "LET A(1,2)=B+C^2" may take thousands of instructions and involve many subroutines. But all is not lost. Because you know how the program executes (at least in a gross sense), you will be able to tackle small portions of it at a time. Any information you can get your hands on will help. User manuals, especially reference manuals are a valuable source of information. Some go so far as to include memory maps and descriptions of internal data types. Take TURBO Pascal for example, the manual is a real gold mine! A bottoms up approach has proven to be the most useful when disassembling a program. Start from the lowest level. Look for the operating system interface. The reason is that these are well defined and have a specific calling sequence. MD86 recognizes many of the MSDOS system calls and uses more meaningful representations. For example the instructions 1234:B409 MOV AL,9 ; 1236:CD21 INT 21H ; is replaced with a single macro instruction 1234:B409CD21 MSDOS _OUTSTR ;Display string at (DX) In this way you can identify the lowest level routines. Those that write characters to the screen or read the keyboard. How about opening and closing files and input and output from the communications ports? Generally these are short subroutines (<100 lines) that you can comprehend. Try to find as many of these routines as possible and give each one a name that will help you to remember what it does. Also toss in as many comments as you can. Once the lowest routines have been worked on, the next higher levelè becomes easier. Now you can find those routines that read and write to files buffers without worrying about all those instructions required to actually get the data out to the disk. page 24 Masterful Disassembler - Intel 8086 version 1.00 page 24 In this way the program gradually starts to unravel and before you know it you will actually understand how the programmer was able to write it. Execute files (those with the extension EXE) introduce a whole set of additional problems. Not the least of which is determining actual physical address for instructions. You see, the Intel 8086 constructs the physical address at run time from a segment register and an offset. The relationship is: physical address = segment*16 + offset Because each register is 16 bits long, there is the possibility of tremendous overlap. An offset of 100 into segment 1234 is the same as offset 110 into segment 1233. To further complicate matters, the segment registers can be changed at will. Thus when an instruction is executed, the contents of the segment registers (which may have been defined who knows where) are of vital importance. The more segment registers are modified within a program, the tougher the job of disassembly is. As an example of a typical execute program, lets look at EXE2BIN.EXE. Within the first few instructions we see the following code. 0000:1E PUSH DS 0001:33C0 XOR AX,AX 0003:50 PUSH AX 0004:B430CD21 MSDOS _GETVER 0008:3C02 CMP AL,2 000A:7D13 JGE L001FH 000C:BB3900 MOV BX,#L0039H 000F:8EDB MOV DS,BX 0011:BA5B01 MOV DX,#L015BH 0014:0E PUSH CS 0015:1F POP DS 0016:B409CD21 MSDOS _OUTST 001A:06 PUSH ES 001B:33C0 XOR AX,AX 001D:50 PUSH AX 001E:CB RET 001F:BE8100 MOV SI,#L0081H 0022:BB3900 MOV BX,#L0039H 0025:8EC3 MOV ES,BX Lets look at this code for a second. We see that almost the first action of this is to call MSDOS and find out what its version number is. If this number is greater than or equal to 2 then this jumps to offset 001F. Soè the code between 000C and 001E is only executed if the version number is less than 2. Following the jump instruction, the next two instructions initialize the data segment register (DS) to 39 hex. That means that page 25 Masterful Disassembler - Intel 8086 version 1.00 page 25 further references into the data segment will get to physical address 390 hex + offset. The next instruction loads the DX register with the value 15B hex. Now if we take a quick look at address 4EB hex (390+15B=4EB) in our code we will find the start of the ascii message "Incorrect DOS version$". A quick note, normally these addresses (ie 4EB) will be relative to the start of the data segment within the EXE file and the code segment follows this immediately. Thus we have to look at 4EB - data_segment_size within our code. But for EXE2BIN.EXE, the data segment size was zero so we can look directly at address 4EB. Now the two following instructions are very curious. By executing the PUSH CS and POP DS we will effectively reset the data segment register to the code segment register, or zero within our file. Thus the call to MSDOS function to display an ascii character string will try to get the characters from offset 15B instead of 4EB. This is a definite bug in EXE2BIN.EXE! The PUSH and POP instructions should not be there. Even the best programs can contain bugs. Don't be too alarmed when you run into one. Moving on, at addresses 22 and 25 we see that the extra segment register (ES) is being set to 39 hex just like the data segment register was set. This should give us a real strong indication that address 390 hex (or a few bytes beyond) we will find the start of a data area within our code. This will help us later on. One further note, when MSDOS executes an EXE type program, it initializes the data segment and extra segment registers to point to an area called the Program Segment Prefix (PSP). This area contains many useful items that the program will need. So prior to changing these registers, the program will examine this area for those items it needs. Figure III lists those items that are of most interest to us. Refer to reference 1 for a more complete discussion of this area. page 26 Masterful Disassembler - Intel 8086 version 1.00 page 26 Figure III, The Program Segment Prefix Summary Offset | Contents ------ | ------------------------------------------------------------ 0002 | System memory size in paragraphs (16 byte blocks). This is a | 16 bit integer. | 000E | Control-C exit address. First 2 bytes are offset and second | 2 bytes are the segment. | 0012 | Hard error exit. 2 byte offset and 2 byte segment. | 005C | Unopened file control block for first file specified after | command. Only valid if a path is not specified. | 006C | Unopened file control block for second file specified after | command. Only valid if a path is not specified. | 0080 | Entire text string the follows the command. The first byte | is a character count. Note redirection information is not | passed on to the program (it is stripped first). 4.3) Polishing the Source Code Sooner or later you will come to the point where you must abandon the disassembler. It has done its job but now an editor would be better suited to working on the files. Once you get a source file out of MD86 then you can try assembling it. There will undoubtedly be many areas where MASM will complain. Segments may be defined in the wrong order or some external references are not defined at all. Get yourself a good screen oriented editor. One with virtual memory support is vital. Assembly programs tend to be very large and it will be a real pain if you have to break it into small pieces because your editor limits the code to 64k. You are going to especially need global search and replace functions. WordStar, although rather slow, does work fine for this type of work as long as you don't use document mode. MD86 always inserts data type pointer override instructions. These are the WORD PTR and BYTE PTR sequences you see all over the place. MASM does not require an override if the types already match. That is, a value is referenced as a 16-bit word and it has previously been defined as this type, then an override is not required. Since MD86 does not know enough to be sure these conditions have been met, WORD PTR will be inserted. One of the first things you will want to do is to remove these phrases whereè they are not needed. They just clutter the code. EXE type files pose the biggest challenge to MD86 and MASM will certainly page 27 Masterful Disassembler - Intel 8086 version 1.00 page 27 complain about some aspect of the way the different segments are handled. MD86 rather simplemindedly inserts tables for each segment that has any labels defined at the start of the program. Although careful use of MD86 will limit the number of erroneous labels, some extra ones will exits and these tables will end up being quite long. When MD86 encounters an instruction that references a 16-bit quantity it assumes that this is an address (or more properly an offset into a segment). This address is put into the label pool. It is not possible to distinguish an address reference from a pure constant. Thus you will see many labels in the segment tables (mainly the data segment) with values line 0, 1, 2, 7, etc. Now these may be valid addresses, but most likely they are just constants. A worth while exercise is to eliminate as many of these as possible. Change the address reference into a constant (ie, change "MOV AX,OFFSET L03E8H" into MOV AX,OFFSET 1000") so you can eliminate the "L03E8H:" definition from the data segment table. 4.4) Deciphering More Obscure Code In the good old days when memory was expensive and processors had a limited address range, assembly programmers delighted in seeing how much they could squeeze into small spaces. This tendency has lessened somewhat with the newer processors and cheap memory but you will still find some real funny looking code. Consider the following which was found at the start of a disk input and output routine. 1234:F9 STC 1235:73F8 JNC L122FH 1237:B80100 MOV AX,1 123A:7304 JNC L4567H 123B:7304 JC L7654H Wait a minute, you say. How can you have a set carry instruction (STC) immediately followed by a jump on no carry (JNC)?. There must be something wrong. No one writes code like that! Actually this code is correct. Since the jump on no carry is never executed, the destination byte is always skipped if the instructions are executed in the order shown. However, the programmer sometimes jumps directly to address 1236 which is in the middle of the jump instruction. In this case, the displacement is executed and this becomes a clear carry instruction (the F8 byte). What happens is that the routine has two functions that are very similar (like keyboard input with and without echo) and the state of the carry flag is used to determine which function is desired. A jump to address 1234 does one thing and a jump to 1236 does the other. Veryè sneaky! Or how about this piece of code. page 28 Masterful Disassembler - Intel 8086 version 1.00 page 28 1234:40 INC AX 1235:40 INC AX 1236:40 INC AX 1237:40 INC AX 1238:40 INC AX 1239:E82B33 CALL L4567H Surely it doesn't make sense to have that many increment instructions in a row. Or does it? Actually this is part of an error handling routine. The idea is to load the AX register with an error number and call the routine at 4567 to print out a message based on the error number. To display error number 1, then the programmers writes the code 2345:31C0 XOR AX,AX 2347:E8EEEE CALL L1238H To display error message number 4, then the call goes to address 1235 instead. For this particular procedure, the AX register always contains a zero (it is used as an error flag) and so the XOR AX,AX instruction can be eliminated. Then this requires only a three byte call instruction to flag an error condition (instead of the usual five bytes). Some programmers go to great lengths to save a few bytes of code! 5.0) Examples A couple of example disassembly files have been included on the distribution disk. These give you an idea of how a typical (if there is such a thing) disassembly proceeds. The first example is the complete disassembly of a disk file comparison utility program called CMP.COM. This is a short (1/2k) program that took about an hour or so to disassemble. Using MD86 to examine the progress, you will note that all labels have been given names that more or less make sense. In addition, numerous comments have been entered. You can write a source file to the disk and try to assemble it or print it. If you use MASM to assemble this file, you will run into error 56 (No immediate mode) a few times. Refer to section 3.5 as to why this happens and how to correct it. The second example was included to show the results of a basic disassembly. Here the program EXE2BIN.EXE has been disassembled but only the first step has been completed. Only the data areas have been separated. Note that this EXE program does not have a separate data segment. When MD86 reaches a statement like "MOV DX,L0582H" it notes that offset 582 hex within the data segment has been referenced. Since there isn't any data segment in theè file, an equate statement is inserted when a source file is generated. But note that the code internally sets the data segment address to be within the code segment. Thus the reference to DS:582 is really somewhere within page 29 Masterful Disassembler - Intel 8086 version 1.00 page 29 the code segment. MD86 does not know this and the corresponding address within the code segment does not appear to be referenced. This is all too typical of EXE programs. They are a real bear to disassemble. 6.0) MD86 Limitations MD86 is designed to provide as much functionality as is reasonably possible without requiring any special equipment. There are some restrictions imposed on the user although the disassembly of a normal file should not be hampered. These are: o 2048 Address References. o 512 Entries in the Data Type Tables. o 2048 Comment Strings. o 64k Maximum Data Segment Size. o 64k Maximum Code Segment Size. These parameters have been chosen such that a program up to 30k can be disassembled as a single file. A 30k program would result in a 15,000 line assembly file. When disassembling a larger file, it should be broken up. This can be very difficult for EXE type programs. Even if MD86 could process larger files, MASM has its own restrictions which would require smaller sections. 7.0) MD86 Error Messages During the process of initializing its internal tables, MD86 may display one of a few error messages. At other times, MD86 just beeps to indicate that some process could not be completed properly. After the beep, MD86 just waits for a "correct" response or the error to be corrected. The error messages that may appear are: o Help, file filename does not exist. MD86 tried to locate the file with the name "filename" and it did not exist. You are requested to enter another filename. A leading path may be included if the file is under another directory or on another drive. o Help, error reading the auto comment file. Cannot continue.è While reading the file MD86.CMT (which did exist), MSDOS or TURBO Pascal returned an error code. Try again. If this error persists, then page 30 Masterful Disassembler - Intel 8086 version 1.00 page 30 re-copy this file from the master distribution disk. If this does not help, then send a copy of MD86.CMT and MD86.COM to CC Software for analysis. You will be contacted with the solution as soon as possible. o Help, auto comment file MD86.CMT cannot be found on the current directory. Automatic comment generation is disabled. The file MD86.CMT, which contains the automatic comment strings, could not be located. MD86 looks only at the current drive and directory. It does not use the PATH variable. Under most circumstances, you should quit (F10) without saving the data. Then copy this file from the master distribution disk over to the correct directory. Re-execute MD86 and this error message should not appear. In the event that you do not need these comments, you may continue. MD86 will just ignore attempts to insert the comment strings. o Help, data file filename.001 does not exist! Help, Data file filename.002 does not exist! One or both of the required data files could not be found. Both of these files are used by MD86 for label and comment storage. When disassembling file "filename", MD86 creates files "filename.001" and "filename.002" to store related parameters. These files are created under the same directory as "filename.com" or "filename.exe" was located. More than likely you are trying to disassemble another copy of this program under a different directory. Another possibility is that the filename was not in a correct form (ie C>MD86 MYFILE..COM). If neither of these situations are the cause, then contact CC Software for additional help. o Help, one or more data files cannot be read properly. One of the data files (either filename.001 or filename.002) was not in the correct form for MD86 to read. This can occur when an I/O error or Run-time error aborts the writing of these files. Also one of these files could just have a bad block. If there has not been a lot of work already invested in these files, the safest procedure is to erase them and start over. Or use DEBUG to read these files into memory to check that they are at least readable. If they are, then send copies of these files to CC Software for analysis. 7.1) Error Beep While Editing a Field If you hear a beep while you are editing a field MD86 is saying that theè last command could not be completed or it was an illegal command. The status line at the bottom of the display will show either INSERT or REPLACE if you are editing. The following are the sources of editing page 31 Masterful Disassembler - Intel 8086 version 1.00 page 31 errors. o Trying to move the cursor beyond the edges of the field. Left and right arrow keys. o Entering a non-editing command (a function key or an up or down arrow). o Trying to update a field that contains an illegal character. In particular, labels are restricted to a leading alphabetic key and imbedded spaces are not allowed. If you are editing, the ESCAPE key can always be used to cancel and restore the field to its original content. Often times a key is pressed by mistake which causes edit mode to be entered but is itself illegal. For example, pressing the backspace key (<X) at the first column of a field. MD86 enters edit mode but rejects the key because there is nothing to delete. However, MD86 remains in edit mode. Press the ESCAPE key to cancel. 7.2) Error Beep While Not Editing a Field If the last line of the displays contains "No Edit", then you are not editing a field. In this case MD86 beeps when an illegal command is entered. This could be an unbound function key, one of the key pad keys, or the ESCAPE key. MD86 ignores these keys. No harm is done. References 1) "MS-DOS Developer's Guide", John Angermeyer and Kevin Jaeger, Howard W. Sams & Co, 1986 2) "Peter Norton's Assembly Language Book for the IBM PC", Peter Norton and John Socha, Prentice Hall Press, 1986 page 32 Masterful Disassembler - Intel 8086 version 1.00 page 32 C O M M A N D K E Y S U M M A R Y Key Mode Description ---------- ---- ----------------------------------------------------------- Left-Arrow 3 Move left one space. Right-Arrow 3 Move right one space. Up-Arrow 2 Move up one line. Down-Arrow 2 Move down one line. Page-Up 2 Move up about one page. Page-Down 2 Move down one page. Home 1 Move to beginning of field. Home 2 Move to beginning of label field. End 1 Move to end of field. End 2 Move to beginning of comment field. Insert 3 Switch between insert and replace modes for editing. Delete 1 Delete character under the cursor. Backspace 1 Delete the character to the left of the cursor. Escape 1 Cancel editing changes and return cursor to start of field. Return/Enter 1 Make editing changes permanent. Return/Enter 2 Move down one line (same as Down-Arrow). F1 2 Display a one page help summary. Shift-F1 2 Alter system parameters. F2 2 Goto a specified address or return from previous Goto. Shift-F2 2 Follow current instruction (jump or call only). F3 2 Set data type for given address range. F4 2 Set data type for unspecified address range. F5 2 Write source file to disk. F6 2 Scan code segment to build label table. F7 2 Dump program in hex and ascii. F8 2 Set label name for specified address. F9 2 Search for an address reference. Shift-F9 2 Search for next reference. F10 2 Save and/or exit. Modes: 1= Editing, 2=Non-editing, 3=Either editing or non-editing.