home *** CD-ROM | disk | FTP | other *** search
Text File | 1990-05-04 | 79.8 KB | 1,755 lines |
-
-
- page 1 Masterful Disassembler - Intel 8086 version 1.00 page 1
-
-
-
- 1.0) Introduction
-
- The MD86 program is a powerful utility for examining and disassembling any
- executable program or any series of machine instructions (like a ROM
- image). MD86 is designed to run on any IBM PC, XT, or AT or compatible with
- at least 128k of ram memory. Neither a graphics adaptor or a color monitor
- is required. A hard disk is desirable but MD86 runs fine (actually a trifle
- slower) on floppy based systems.
-
- MD86 was developed with one goal in mind. Produce useable source code from
- an executable program file. By useable, we mean that the resulting assembly
- instructions should be understandable. This necessitates meaningful label
- names and comments. Normally the disassembly of a large program is a time
- consuming, laborious task. MD86 speeds this up as much as possible.
-
- MD86 produces source files that are compatible with the Microsoft assembler
- MASM version 4.00 or reasonably compatible with the IBM assembler. While
- this is not the easiest assembler to use (in fact it is down right
- difficult), it was chosen because it is more "standard" than any other
- assembler. Eventhough the instruction syntax is compatible, the
- organization of the segments may not be for some programs. After MD86 has
- produced a source file, it is not uncommon that an editor is needed to make
- some minor changes before it can be assembled without error. This will be
- especially true with EXE type programs which have complex segment
- structures.
-
-
- 1.1) What MD86 Looks Like
-
- MD86's unique video display works very much like a full screen editor;
- allowing movement within the disassembled source file with single key
- ease. Most of the difficulties associated with other disassemblers is
- gone.
-
- When executed, MD86 presents the user with a full screen of information
- that looks very similar to the printed output from an assembler. Figure I
- shows a typical screen from a freshly disassembled program. This is
- actually the file COMMAND.COM for PCDOS v3.1.
-
- The bold line towards the top of the display is the active line. The
- cursor (shown as an underline "_") is at the start of the label field.
- Both the label field and the comment field may be edited.
-
- Note how the display does not seem cluttered. The label field only has an
- entry if the address is referenced. The comments shown here have been
- automatically inserted by MD86 These help you remember the less common
- instructions and MSDOS function calls.
-
-
- page 2 Masterful Disassembler - Intel 8086 version 1.00 page 2
-
-
- Figure I, Typical Display Of Freshly Disassembled Program
-
-
- 05DB:51 L05DBH PUSH CX ;
- 05DC:1E PUSH DS ;
- 05DD:07 POP ES ;
- 05DE:C536610A _ LDS SI,[L0A61H] ;Load DS:reg with 32b pointr
- 05E2:57 PUSH DI ;
- 05E3:BF5A08 MOV DI,#L085AH ;
- 05E6:B90B00 MOV CX,#L000BH ;
- 05E9:FC CLD ;Set forward direction for
- 05EB:F3A4 REPZ MOVSB ;Move byt. (SI)+- to (DX)+-
- 05EC:5F POP DI ;
- 05ED:06 PUSH ES ;
- 05EE:1F POP DS ;
- 05EF:59 POP CX ;
- 05F0:BA4708 MOV DX,#L0847H ;
- 05F3:B409CD21 MSDOS _OUTSTR ;Display string at (DX)
- 05F7:BA9E08 L05F7H MOV DX,#L089EH ;
- 05FA:E8A300 CALL L06A0H ;
- 05FD:F606D30AFF TEST [L0AD3H],0FFH ;
- 0602:7404 JZ SHORT L0683H ;
- 0604:B403 MOV AH,#3 ;
- 0606:EB7B JMP L0683H ;
- 0608:B8010C L0608H MOV AX,#L0C01H ;Flush buffer, read keyboard
- 060B:CD21 MSDOS ;
- 060D:E88D00 CALL L069DH ;
- CS:: Labels= 185/ 8%, Types= 0/ 0%, 0 cmnts No Edit 10/ 2/87 1:20:35
-
-
- A note to programmers familiar with the Microsoft assembler MASM. MD86
- creates compatible data files, but the screen display has been
- simplified. In particular, the word OFFSET (as required by MASM) is
- replaced with the pound sign ("#") and all WORD PTR and BYTE PTR phrases
- have been removed. Generated labels are not shown with an appending
- colon. When a source file is generated, the source code will be
- compatible.
-
- The line at the bottom contains status information. This tells you that
- the code being viewed is in the code segment, 185 address labels have
- been identified, no data types have been defined and no user entered
- comment records exist. In this case we are not as yet editing any field
- thus "No Edit" is displayed. If we were, then either "INSERT" or
- "REPLACE" would show indicating how characters are being added to the
- field. At the far right corner, the current time and date are as shown.
-
- The comment field may extend past the right edge of the screen and the
- active line scrolls horizontally as necessary to keep the cursor within
- view.
- è The function keys are used to control MD86. A window "pops up" in the
- upper left corner for instructions. Inadvertently entered commands may
- generally be aborted by a null response (ie, only pressing the RETURN
-
-
- page 3 Masterful Disassembler - Intel 8086 version 1.00 page 3
-
-
- key) to one of the questions.
-
-
- 2.0) Using MD86
-
-
- To disassemble a file it must first exist in the current directory. MD86
- may be placed in any other directory as long as the PATH command includes
- that directory. The companion file, MD86.CMT is only used to supply the
- automatic comments and is only needed when a program is disassembled for
- the first time. If this is not found, MD86 will turn off automatic
- commenting. If this is not acceptable, then QUIT (see Section 2.2), move
- MD86.CMT to the current directory and begin again.
-
- To disassemble the program COMMAND.COM, use the following command.
-
-
- C>MD86 COMMAND.COM
-
-
- If MD86 cannot locate the associated data files, then MD86 will create them
- (you will be asked for confirmation first just in case you misspelled the
- program name). MD86 will automatically determine the extent of the program
- and put the cursor on the first address of the program. Note that COM type
- files start at 100 (hex) and EXE type files start at 0000. See reference 1
- for a discussion of the dissection of EXE type files. The Alter Parameters
- command may be used to override the choices made by MD86.
-
- MD86 creates two data files when it disassembles a program. These have the
- same name with the extensions of .001 and .002 (ie, COMMAND.001 and
- COMMAND.002). The first file contains the symbol table and other parameters
- and the second file contains the comment records. If neither of these files
- are present, then MD86 assumes this is a disassembly of a file for the
- first time. If they are both present (and readable) then MD86 will pick up
- right where you left off. If only one of these files is present or one is
- unreadable, then MD86 issues an error message and terminates. Refer to
- Section 7 for a discussion of error causes and cures.
-
- During the disassembly process, there are three groups of commands that
- MD86 will recognize. Commands that require additional input will cause a
- window to pop up in the upper left corner of the display. User dialogue
- occurs within this window.
-
- The three command groups consist of 1) editing commands, 2) non-editing
- commands, and 3) general commands. The editing and non-editing commands are
- mutually exclusive. When editing, the non-editing commands are not allowed
- and visa-versa. General commands are always valid.
-
- MD86 will "beep" when an invalid command character is entered. Note that
- some keys generate more than one character and while the first characterè may be invalid, the others may not. So when you here the beep, examine the
- characters around the cursor to be sure no extraneous characters were
- inserted.
-
-
- page 4 Masterful Disassembler - Intel 8086 version 1.00 page 4
-
-
-
-
- 2.1) General Command Keys
-
- The general commands can be typed at any time. They will always be
- recognized.
-
-
- o Left-Arrow
-
- This will move the cursor one position left within the current field
- (either the label field or the comment field). It the cursor is already
- at the beginning column, then a "beep" will be heard.
-
-
- o Right-Arrow
-
- This will move the cursor one position right within the current field
- (either the label field or the comment field). Note that the fields are
- always filled with blanks on the right. If the cursor is already past
- the right hand column, then a "beep" is heard.
-
-
- o Insert
-
- This will change the way editing character keys are entered. They will
- either replace existing characters or insert in front of the
- characters. In editing mode, either INSERT or REPLACE is displayed in
- the bottom status line.
-
-
- 2.2) Editing Command Keys
-
- The label and comment fields can be edited by moving the cursor to the
- desired location and just typing; similar to a word processor. This
- allows label names and or comments to be associated with an address. The
- current line may be in the code segment or data segment (EXE type
- programs only). Once editing has begun, then only the ESCAPE key or the
- RETURN key will revert to non-editing command mode.
-
- When a temporary label field is edited, it is initially blanked out
- eliminating the "L1234H" that was present. If the cursor was not at the
- first column of the label, then leading blanks will exist there. Since
- labels must begin with a letter or underscore, this will be rejected when
- a RETURN key is pressed.
-
- Editing the automatic comments causes MD86 to first ask if the comment
- field should be blanked out or not. After this question is answered, your
- key is processed.
-
-
- page 5 Masterful Disassembler - Intel 8086 version 1.00 page 5
-
-
- o Letters, Numbers, and Symbols
-
- These characters are entered into the field. If the current mode is
- INSERT then the characters to the right are moved (the rightmost
- character is lost) to make room. Otherwise, in REPLACE mode the new
- character overwrites the current cursor character. Note that within the
- label field only the characters A-Z, a-z, 0-9, $, and _ are valid.
- Other characters cause a "beep" and are ignored. Note that a further
- restriction that labels not begin with a number is not checked until a
- RETURN key is pressed.
-
-
- o Escape
-
- This key will cancel any editing on the current field and its original
- contents will be restored. This effectively returns to non-editing mode
- without saving any changes to the current field.
-
-
- o Return
-
- Use this key to tell MD86 that the editing changes you have made are
- correct and should be remembered. Note that this is not the same as
- saving the data as it is not actually written to the data files yet. If
- the field contents are valid, then the cursor will be returned to the
- starting column of the current field and the mode is set to
- non-editing.
-
-
- o Backspace
-
- If the cursor is not at the left edge already, this will erase the
- character immediately to the left of the cursor. The remainder of the
- line will be shifted left and the rightmost column will be blank
- filled. If at the left, then this just "beeps".
-
-
- o Delete
-
- This will erase the character immediately under the cursor and cause
- the remainder of the line to be shifted left. The rightmost column will
- be blank filled.
-
-
- o End
-
- This moves the cursor to the last column of the field.
-
-
- o Homeè
- This moves the cursor to the leftmost column of the field.
-
-
- page 6 Masterful Disassembler - Intel 8086 version 1.00 page 6
-
-
-
- 2.3) Non-editing Command Keys
-
- A good portion of time spent disassembling a program is spent rooming
- around various areas and other non-editing type functions. The simpler
- cursor movement functions use a single key stroke for this work while the
- more involved commands use the function keys and pop up windows.
-
- The cursor movement keys are as follows.
-
-
- o Down-Arrow
-
- This moves the cursor down one line and to the beginning of the same
- field. You cannot move beyond the end of the current segment.
-
-
- o Up-Arrow
-
- Move the cursor up one line to the beginning of the same field. You
- cannot move past the beginning of the current segment.
-
-
- o Home
-
- If the cursor is right of the leftmost column of the comment field,
- then it is moved to the start of the comment field. Otherwise the
- cursor moves to the first column in the label field within the same
- line.
-
-
- o End
-
- This moves the cursor to the beginning of the comment field if it was
- within the label field. Otherwise the cursor moves to the end of the
- comment field.
-
-
- o Page-Up
-
- This moves the cursor up approximately a full page. This may be more or
- less than 24 lines as this assumes there are three bytes per line on
- the average. Note that no attempt is made to locate the beginning of an
- instruction. It is probable that the first line or so will be
- disassembled incorrectly. The cursor will be positioned at the start of
- the label field in the top line.
-
-
- o Page-Down
- è This moves the cursor to the top of the next page. The line that is
- currently at the bottom of the screen will be at the top after this
- command. The cursor will be positioned at the start of the label field.
-
-
- page 7 Masterful Disassembler - Intel 8086 version 1.00 page 7
-
-
-
-
- The following commands utilize the function keys on the PC keyboard
- either alone or in combination with the shift key. Remember these only
- function in non-editing mode.
-
-
- o F1 - Help Command
-
- This displays a one screen summary of the function keys. Press any key
- to refresh the screen.
-
-
- o Shift-F1 - Alter System Parameters
-
- This command is use to make changes (if possible) to the default
- parameters. For COM files, the beginning and ending addresses can be
- modified. EXE files however, have ranges for the data and code segments
- that are defined in the header. These cannot be changed.
-
- For COM files, the following parameters can be changed.
-
-
- o The Start Address. Normally this is 100 hex for COM files and 0000
- hex for EXE files. But for special work, like ROM disassembly, this
- may be set to something else.
-
- o The End Address. MD86 sets this to the physical end of the program
- or FFFE hex if this is more than 64k. If you find that a smaller
- value is correct, then change it here. This will prevent MD86 from
- accessing garbage areas and contaminating the label table.
-
-
- The following parameters effect how MD86 displays the disassembled
- lines. These are changeable at all times.
-
-
- o Translate MSDOS Functions. MD86 normally tries to translate common
- MSDOS functions into a pseudo instruction that has more meaning when
- trying to understand the code. However, if you don't want this done,
- then it can be disabled here.
-
- o Enable Automatic Comments. When MD86 finds certain instructions, it
- tries to add a comment line explaining the instruction in more or
- less English. If you don't want to see these comments, then they can
- be eliminated.
-
-
- o F2 - Goto a Specified Address
- è This allows a quick jump to any valid address to begin disassembly. A
- null response (only the RETURN key pressed) causes MD86 to try and
- return to the location of the last Goto command. A stack with the
-
-
- page 8 Masterful Disassembler - Intel 8086 version 1.00 page 8
-
-
- previous 16 locations is maintained. This is handy in jumping to one
- location and then returning without having to remember where you were.
-
- Valid destinations are anywhere within the data segment or the code
- segment. Note that for a COM type file, these are assumed to be the
- same.
-
-
- o Shift-F2 - Follow That Instruction
-
- If the current line contains a direct jump or call instruction, this
- command will do an automatic Goto to the destination address. This does
- not apply to intra-segment calls or jumps or any indirect calls or
- jumps.
-
- MD86 saves the current address on its internal stack so that a return
- can be made via the F2 command. With this you can conveniently examine
- a subroutine and then continue from where you left off.
-
-
- o F3 - Set Data Type
-
- When MD86 first looks at a program, it thinks that all of the data
- segment is made up of 8-bit data bytes and all of the code segment is
- machine instructions. This more than likely is not 100% correct. When
- disassembling a portion of the program, you may notice that the present
- interpretation does not make sense. Some other data type is necessary.
- MD86 can recognize one of four data types. These are instructions (type
- #0), 8-bit binary data (type #1), 8-bit ascii characters (type #2), and
- 16-bit addresses (type #3). See Section 2.6 for more details.
-
- This command allows any range of the program to be set to a specific
- type. You will be asked for the data type and the first and last
- addresses. Addresses must be in the same segment of course.
-
- The internal type table has room for 512 entries. This is the total of
- the data and code segment types. The current total is displayed on the
- bottom status line along with a percent used figure.
-
-
- o F4 - Set Data Type for Unspecified Range
-
- Often the extent of a different data type is not known. What is known
- is the initial address and a suspected data type. This command uses the
- current line as the beginning address and will request the suspected
- data type (0 to 3). Then MD86 temporarily considers all data following
- the cursor to be this type. You would move the cursor down until you
- reach an address that is not of this type and press F4 again to fix the
- range for the specified type. In this special mode, only one data item
- per line is displayed.è
-
- page 9 Masterful Disassembler - Intel 8086 version 1.00 page 9
-
-
- o F5 - Write Source to Disk
-
- MD86 would be of limited use if you could not generate a disk file with
- the source code. Use this command and specify the file name and MD86
- will write out the data. If an extension of PRN is used, then the file
- will have the address and binary code along with each instruction line.
- This is the way the screen appears.
-
- Before this file can be used by an assembler, some hand editing will be
- required. Segments may have to be specified differently than MD86.
-
-
- o F6 - Scan Code Segment
-
- MD86 builds the label table as code is disassembled. At times, the
- disassembly is not correct and erroneous address references may be
- entered into the label table. This function cleans this up.
-
- When all of the code segment has been given the correct data type, then
- this function should be used to properly build the label table. It will
- remove any temporary labels and begin disassembling the entire code
- segment. When this has completed, the label table will be correct and
- erroneous references will be removed.
-
-
- o F7 - Dump Program in Hexadecimal and ASCII
-
- It is difficult to determine the location of data areas and character
- strings by just looking at a page of disassembled instructions. This
- function will begin in the data segment (for EXE programs) and then
- dump the code segment. The data is displayed 16 bytes per line in
- hexadecimal and also in ascii (if possible). Pressing any key will halt
- the display so you can inspect the data and maybe write down addresses
- of obvious data areas. Pressing any key other than the ESCAPE will
- continue the display. Press the ESCAPE key to end this segment and dump
- the next segment or return to where you when this command was
- initiated.
-
-
- o F8 - Set Label Name
-
- If it is desired to associate a name with a particular address you can
- either move the cursor to that address (if possible) and edit the label
- field or use this command to set the name without having to move there.
- Enter any valid label name and address to be set. A null label name
- will delete a label from the tables. In valid names cause a "beep" to
- be heard and is ignored. Valid names start with a non-digit and have no
- imbedded spaces.
-
- The label table is limited to 2048 label names which should beè satisfactory for any reasonable size program.
-
-
- page 10 Masterful Disassembler - Intel 8086 version 1.00 page 10
-
-
- o F9 - Search for Address Reference
-
- This command will allow any address to be searched for. Use this when
- it is desired to find out how (or if) a particular area is referenced
- within the code segment. The initial address to start is requested. A
- null response causes the search to begin at the start address. You will
- see the program disassembled on the screen and it will stop when the
- specified address is referenced. During the search, press any key to
- abort.
-
- Note that the search is limited to the code segment. However, the
- particular address may be in any other segment. For example to search
- for address 1234 within the extra segment, enter ES:1234 as the search
- string.
-
-
- o Shift-F9 - Search for Next Reference
-
- Once function F9 has been used to find the first occurrence of an
- address, use this command to locate the next. As with function F9,
- press any key to abort the search.
-
-
- o F10 - Save and/or Exit
-
- Use this command to save your current data tables (often!) and exit or
- quit MD86 You are given the option to save the data or not and to exit
- to MSDOS or not.
-
- If you wish to quit without saving any of the work you have done, then
- respond No to saving the data and Yes to exiting.
-
-
- 2.4) Label Name Specification
-
- MD86 allows, even encourages, you to associate a label name with each
- referenced address. Names are far more understandable than numbers. The
- label field within the display is either blank (the address has not been
- referenced), contains a temporary label (the form is LnnnnH for address
- nnnn), or contains a user defined name. Label names can be up to eight
- characters long and may contain letters, digits, the dollar sign "$", or
- the underscore "_" characters. Labels may not begin with a digit however.
-
- Label names may contain upper and lower case letters and the case is
- maintained. However, when searching for a name, MD86 ignores differences
- in case. Thus the name "HelpMsg" is perfectly valid and will appear this
- way in the output file. You could also jump to address "HELPMSG" and
- "HelpMsg" would be found.
-
- Upper and lower case letters make reading names easier, but you don'tè have to remember the exact form to reference the name.
-
-
- page 11 Masterful Disassembler - Intel 8086 version 1.00 page 11
-
-
- 2.5) Specification of Addresses
-
- When MD86 requests an address (like the destination of a Jump command),
- the form the address must be entered as follows.
-
-
- Address ?{ss:}nnnn
-
-
- The brackets indicate optional qualifiers. If the address is within a
- segment other than the current segment, then the segment name must be
- included. The "ss:" in the line above is the segment name and it must
- then be either "CS:", "DS:", "ES:", or "SS:". The case of the letters is
- not important, but the segment name must precede the address (or offset)
- portion.
-
- If the actual address within the segment is entered as a number then it
- must be in hexadecimal. In place of a number, a label name could be used.
- This name must be resolvable within the segment.
-
- For example, the following are valid addresses.
-
-
- Address ?100
-
- Address ?ds:HelpMsg
-
-
- The label table is stored internally and has room for 2048 entries. This
- is generally enough to disassemble a 10,000 to 15,000 line program. For
- larger programs it is recommended that they be divided into smaller
- sections if at all possible.
-
-
- 2.6) Data Type Specification
-
- MD86 initially thinks the entire code segment contains instructions and
- the data segment (for EXE type files) contains 8-bit binary data. This is
- a good place to start but there will be other data types mixed in with
- these. Functions F3 and F4 can be used to tell MD86 to assume a different
- data type for a specified address range. The types are specified by a
- numeric code number and the ones recognized are:
-
-
- 0 - Machine instructions.
-
- 1 - 8-bit binary data.
-
- 2 - 8-bit ASCII character data.
- è 3 - 16-bit address data.
-
-
-
- 1
-
- page 12 Masterful Disassembler - Intel 8086 version 1.00 page 12
-
-
- When using function F3, MD86 must be told the first address of the newly
- defined type and the last address with this type. For data types that
- occupy more than one byte (type 0 or 3), the last address must be the
- address at the end of the field not the start. Thus if address 100
- contains a single 16 bit address, then MD86 is given the first address as
- 100 and the last address as 101 (not 102 as you might think).
-
- Function F4 works a little differently. The start of the current line is
- taken as the first address when this is initiated. When this is pressed
- again, then the start of the now current line (if below the first
- address) is assumed to be just passed the type being defined. In other
- words, if the address range 120 to 140 is being defined as type 2 (ascii
- character data), then the current line should be at address 120 when F4
- is pressed the first time and then moved to address 141 when pressed the
- second time.
-
- A code type table is maintained internally by MD86 that contains the
- beginning and ending addresses (with segment of course) and the type of
- data this address range contains. Instructions are the default type and
- (to save memory space) are not actually stored in the table. There is
- room for 512 entries which should be plenty for most normal applications.
-
-
- 2.7) Output Source File Format
-
- MD86 produces a standard ASCII text file as output. This should be
- suitable as input to most any assembler and editor. Note that MD86 does
- not insert tab characters and thus the lines will contain many blanks.
- This causes the files to be quite large. The judicious insertion of tabs
- would shrink the file size significantly.
-
- When MD86 disassembles a program, it remembers how addresses are
- referenced. As a convenience, the output file will have separating lines
- just in front of all subroutines. That is, those addresses that were the
- target of near call instructions.
-
-
- 3.0) The Inner Details of MD86
-
- In the next few sections we will describe in more detail how MD86
- functions. It is not necessary that you remember or understand all of this
- material, but when questions arise this will make a handy reference.
-
- Many choices made during the construction of MD86 were ones of programmer
- preference. There are several ways to tackle the many problems encountered
- in creating a disassembler. Which one is better? If the choice was notè obvious, then personal preference would be the deciding factor.
-
- MD86 was written in TURBO Pascal and owes a lot of its speed to this fine
- compiler. Some of the limitations of this compiler necessitated tradeoffs
- in the design of MD86. In particular, to avoid the use of overlays, the
- program was limited to a 61k code segment. Not all of the "nice" options
- could be included.
-
-
- page 13 Masterful Disassembler - Intel 8086 version 1.00 page 13
-
-
-
-
- 3.1) Moving the Cursor Upward
-
- When MD86 displays a full screen of disassembled lines it remembers the
- exact starting address for each line. Thus to move upward within the
- display screen, MD86 only has to display the new current line in bright
- characters again. However, when the cursor moves off the top of the
- screen (either by an Up Arrow or Up Page command), MD86 has a difficult
- task in determining the starting address for the instruction. There are
- times when more than one legitimate starting address is located. For this
- situation, MD86 chooses the longest instruction. This may not be correct.
- The problem comes in when MD86 has moved up more than one time (you
- pressed Up Arrow more than once when at the top of the screen) and it
- finally comes to a point where it cannot find any legitimate instruction
- to disassemble. You here a "beep" and MD86 only backs up one byte. This
- tells you that the screen is not displaying correct instructions. One or
- more of those at the top (where MD86 backed up) is not correct.
-
- If the screen is not correct, then how do you make it correct? The
- easiest way is to back up a full page (Up Page command) and then go
- forward a full page (Down Page command). More than likely, MD86 will
- correctly synchronize somewhere on the screen when you backed up
- (probably at a labeled instruction) and everything from there on downward
- will be correct.
-
- One point of interest will surely pop up. When a NOP instruction is
- disassembled, it is flaged as "questionable" unless the previously
- disassembled instruction was a short JMP. However, if the cursor is moved
- upward to a NOP instruction, MD86 marks this line incorrectly since the
- instruction disassembled prior to this was actually a following
- instruction.
-
- A word of caution. When MD86 incorrectly disassembles a line for whatever
- reason, erroneous address references may be added to the label pool. This
- is where the Scan command (F6 key) comes in. It erases all temporary
- labels (ones that have not been given names by you) and begins
- disassembling the whole program from the starting address. This will
- correctly build the temporary label pool.
-
-
- 3.2) Questionable Instructions
-
- When MD86 disassembles an instruction line, it will check to see whether
- or not this instruction makes sense. If it does not, a flag ("?" to the
- left of the label field) is set but it is disassembled anyway if
- possible. Valid instructions are considered "questionable" if they are
- very rarely used or are "meaningless".
-
- In the rarely used catagory are the instructions "LOCK", "ESC", "INT",è "XLAT", "WAIT", "HLT", and far returns within a "COM" file. Further any
- instructions made up of the exact same bytes (eg "ADD [BX+SI],AL" which
- is two bytes of 00) is considered questionable.
-
-
- page 14 Masterful Disassembler - Intel 8086 version 1.00 page 14
-
-
-
- Meaningless instructions are "NOP" and "MOV destination,source" where the
- source and destinations are the same. Note that a "NOP" is allowed
- following a two byte forward jump instruction.
-
- Depending on the type of program being disassembled, there may be a few
- or a lot of such "questionable" instructions that are actually supposed
- to be there. Thus this "questionable" instruction flag is just a guide to
- help you locate imbedded data areas.
-
-
- 3.3) Instructions prefix bytes
-
- Like the Intel 8086 processor, MD recognizes certain prefix bytes. These
- are the segment override instructions and the repeat instructions. The
- bus lock instruction is considered separate as this is very rarely used
- (it is also flaged as "questionable"). When moving the cursor around, it
- is possible miss the prefix byte. This will occur most often when moving
- upward. MD only looks at the previous six bytes to determine where an
- instruction starts. If the seventh byte were a prefix byte, this would be
- missed. When a jump is made (Function F2) MD does not check to see if
- this is in the middle of an instruction. Here too, a prefix byte could be
- missed.
-
- The effect of missing a prefix byte could cause label addresses to be
- associated with the wrong segment. The scan option (Function F6) can be
- used to clean up this type of misinformation.
-
- While this is not the same as a prefix byte, MD86 checks for NOP
- instructions that are preceeded by a short forward jump instruction. If
- this is not the case, then the NOP is marked as "questionable". This
- logic fails when the NOP is in the top line of the screen. There is no
- preceeding instruction to check and the NOP is marked "questionable" even
- though it may be perfectly valid. When moving the cursor upward, MD86
- will incorrectly mark a NOP as questionable. The logic only works when
- moving the cursor downward.
-
-
- 3.4) Segment Handling
-
- MD86 recognizes references to the four segments of the Intel 8086
- processor. It keeps four separate tables for the labels within these
- segments. An exception is with COM type files. For these, the data
- segment and the code segment are assumed to be the same. Data segment
- references are forced into the code segment space. This is generally
- correct, but it is possible that the program creates a separate data
- segment. In this case the labels generated by MD86 will be put into the
- code segment when they belong in the data segment. Not much can be done
- about this until after a source file has been produced. Use an editor to
- fix these things.è
-
- page 15 Masterful Disassembler - Intel 8086 version 1.00 page 15
-
-
-
- 3.5) Known Compatibility Problems With MASM
-
- MD86 was designed to produce source files to be used with the MASM
- assembler from Microsoft. This is the most common assembler and as you
- probably know it is not the easiest assembler to use. MASM tries to guard
- you against yourself. When variables are defined as bytes, then MASM
- checks to be sure they are referenced as such. At times this is handy.
- But mostly it is an annoyance. When disassembling a program it is often
- times difficult to determine how labels are referenced. At the very least
- it would consume lots of time which would be better spent on other
- aspects of the code. To prevent MASM from generating numerous errors due
- to seemingly inconsistent references, MD86 inserts WORD PTR or BYTE PTR
- to force MASM into accepting these references. This results in the over
- use of these override phrases (and a larger than necessary source file).
-
- When MD86 notices a reference to an item within the data or code segments
- that is outside of the limits of the actual program file, it inserts an
- EQU statement to equate the label with its value. MD86 is thus assuming
- the the reference is to a constant value and not a variable address.
- Under some assemblers there is no difference between a constant and an
- address (or offset). But MASM does make the distinction and flags as an
- error an inconsistent reference. The following error message is typical
- of this occurrence.
-
-
- CMP.ASM(176) : error 56: No immediate mode
-
-
- This is MASM's way of telling you that line 176 in file CMP.ASM contains
- a reference to a variable where the label has been defined as a number.
- The solution is to modify the source code file and change the definition
- of this label from an EQU into a DB or DW. Thus the line
-
-
- HELPTXT EQU 00050H
-
-
- should be change into the following lines.
-
-
- ORG 00050H
- HELPTXT: DB 0
-
-
- You won't need the ORG statement in front of every definition line as
- long as the data type is consistent.
-
- EXE programs pose other problems that must be dealt with. MD86 does not
- resolve FAR jumps and calls. You will have to determine the destinationè address and equate this to a label. The absolute address is shown but
- MD86 cannot find out where this ends up.
-
-
- 1
-
- page 16 Masterful Disassembler - Intel 8086 version 1.00 page 16
-
-
- Returns from FAR procedures are flaged by MD86, but the instruction
- inserted is RET. MASM will determine if this is a NEAR or FAR return from
- the definition of the procedure. Thus, you will have to define the
- procedure containing the FAR return as a FAR procedure. If you don't,
- then MASM will assume a NEAR return is to be generated. No error message
- is displayed but this will not execute the same as a FAR return.
-
- Another problem with EXE type files concerns segment definition. The
- output file generated by MD86 should assemble and even link okay, but it
- probably won't execute correctly without being changed. You will need to
- look into adding ASSUME and GROUP statements in the various segment
- blocks.
-
- EXE files can be constructed in many, many ways. It will take some
- persistence to resolve these differences.
-
-
- 4.0) A Short Course In Generating Source Code
-
- The process of disassembling a program and recreating source code is more
- art than science. The more practice you have the easier this becomes. MD86
- (as well as other programs) have been written to make this process as easy
- as possible. As "smart" as they are, there is a long way to go before this
- can be considered "automated".
-
- Prior to starting to disassemble that super new program, there are some
- questions you must ask your self.
-
-
- o Do I really need the source code for this program?
-
- o Is the program small enough for me to handle?
-
- o Was the program written in a lower level language (C or assembler)?
-
- o Do I really know how this program functions and all of what it does?
-
-
- These questions are important as the answers can give you an idea of
- whether you can finish the job once you start it. There is no point in
- "cheating" on the answers either. You only have yourself to convenience.
-
- Source code generation is a three step process.
-
- There are three distinct phases users go through when they disassemble a
- program. The first phase is to identify the type of data the program isè composed of. Programs consist of machine instructions and data. But which
- is which? You must follow the logic to tell. The only technical difference
- is that machine instructions are executed while data is referenced. It is
- quite possible for instructions in one part of a program to be data to
- another part. With the different segments of the Intel 8086 this is not
- very common. But it is certainly possible.
-
-
- page 17 Masterful Disassembler - Intel 8086 version 1.00 page 17
-
-
- Once the program has been divided into instruction and data areas, the
- second phase begins. This is the process of identifying the different
- logical parts. This is usually the most difficult and time consuming part.
- It is not easy to understand what purpose a sequence of instructions has,
- but with persistence this can be done.
-
- The third stage involves generating an assembler source file and getting it
- to re-assemble properly. Disassemblers are only "human". Their output may
- assemble without error but it probably won't be a byte-for-byte copy of the
- original file. Some "touch-up" will be required to rectify such things as
- long and short jumps. While you are at it, you could clean up the comments
- and "pretty up" the source file.
-
-
- 4.1) Identifying Data Types
-
- There is a real knack to separating the code into data and instruction
- areas. MD86 goes a long way my marking "questionable" instructions with a
- question mark to the left of the label field. This mark will appear on
- the screen as well as in a PRN type output source file. It will be
- removed when a non-PRN output file is generated.
-
- Initially MD86 sees the entire code segment as instructions and the
- entire data segment (EXE type programs only) as 8-bit binary data.
- However, most of the time this is not the case. It is very common to find
- character strings imbedded within the code as well as normal data areas.
- When MD86 marks an instruction line as "questionable", examine the lines
- above and below to determine where the instructions end and data begins.
- Of course it is possible that MD86 was wrong in it judgement and the line
- is correct.
-
- MD86 assumes that memory is made up of either machine instructions or
- data. The data may be either 8-bit binary (numbers in the range 0-255),
- 8-bit character data, or 16-bit address (or offset) data. As mentioned
- above, the Intel 8086 processor sees instructions as data that it is to
- execute and other data is just referenced. In the sections below we will
- assume that instruction areas are sequences of instructions that are
- executed in order and everything else is data.
-
- There are five basic rules that can be used to determine data area types.
- When you identify data areas, make sure these rules have been satisfied.
- If not, be very suspicious.
-
-
- o rule 1
-
- The instruction preceeding a data area must be a transfer (jump, call,
- interupt, or return). Conditional jumps would not be allowed unless the
- condition was ALWAYS met.
-
-
- page 18 Masterful Disassembler - Intel 8086 version 1.00 page 18
-
-
- o rule 2
-
- The first instruction in an instruction area must have a label unless
- the preceeding data area was an argument to a call or interupt
- instruction.
-
-
- o rule 3
-
- An absolute transfer of control (jump or return) may be followed only
- by a labeled instruction or a labeled data area.
-
-
- o rule 4
-
- For the type of data to change (from instructions to data or from ASCII
- data to 16-bit address data etc.), the first line of the newer type
- must have a label.
-
-
- o rule 5
-
- ASCII character data (including carriage returns, line feeds, etc.)
- must either begin with a character count byte (or word) or it must end
- with with a special (generally non-ASCII) byte. Is is common within
- MSDOS applications that character strings end with a dollar sign. This
- is the way the console output and printer output functions know the end
- of a string. Assembly programmers also like to use null characters
- (value of zero) as an end of string mark. The Intel 8086 processor can
- easily detect these.
-
-
- For purposes of an example, Figures IIa through IIc will be used. This is
- fairly typical of the kind of code you will encounter. But be forewarned,
- by its very nature assembly code can be very obscure. If the programmer
- wishes, it could be extremely difficult to decipher.
-
- Refering to Figure IIa, note how several lines have been marked as
- "questionable". Here it is obvious that the lines following the jump
- instruction at address 1283 cannot be instructions. The PUSH instruction
- at address 1286 is erroneous because of rule #1. Notice how most of the
- bytes following address 1283 have a value in the range 20 to 7E (hex). It
- is quite possible that this area consists mainly of ASCII characters. But
- where does this area end? Rule #2 says we should look for the next valid
- instruction line containing a label. In this example we find this at
- address 12A2. A word of caution here. Since we may not have disassembled
- the entire program, the label pool may be incomplete. It is then possible
- that at this time an instruction does not have a label. We need to be
- cautious in the application of rule #2.
-
-
- page 19 Masterful Disassembler - Intel 8086 version 1.00 page 19
-
-
- Figure IIa, Typical Display Of Partially Disassembled Program
-
-
- 127D:BE5011 _ MOV SI,#L1150H ;
- 1280:E822FC CALL L0EA5H ;
- 1283:E9DB1B JMP L2E61H ;
- 1286:55 L1286H PUSH BP ;
- 1287:6E ? DB 06EH ;
- 1288:6B ? DB 06BH ;
- 1289:6E ? DB 06EH ;
- 128A:6F ? DB 06FH ;
- 128B:776E JA L12FBH ;
- 128D:207665 AND [BP]+65H,DH ;
- 1290:7273 JC L1305H ;
- 1292:69 ? DB 069H ;
- 1293:6F ? DB 06FH ;
- 1294:6E ? DB 06EH ;
- 1295:206F66 AND [BX]+66H,CH ;
- 1298:205475 AND [SI]+75H,DL ;
- 129B:7262 JC L12FFH ;
- 129D:6F ? DB 06FH ;
- 129E:0D0A00 OR AX,#L000AH ;
- 12A1:FF ?L12A1H DB 0FFH ;
- 12A2:9C L12A2H CBW ;Convert byte (AL) to word
- 12A3:2E803EA112FF CMP CS:[L12A1H],0FFH;
- 12A9:7404 JZ L12AFH ;
- 12AB:9D CWD ;Convert word (AX) to dbl w
- CS:: Labels= 492/23%, Types= 21/ 4%, 0 cmnts No Edit 10/ 3/87 10:20:35
-
-
- As a first step, we use Function key F3 to set the area from 1286 to 12A1
- as characters (type #2). The code now makes more sense (see Figure IIb).
- But now notice address 12B4. This instruction does not have a label and
- yet it follows an unconditional transfer (return) instruction. Rule #3
- says this is not correct. Now it could be that there should be a label
- here and we have just not disassembled the section of code that
- references it, but the instructions don't look right do they? The hex
- sequences 06, 07, 08, 09, and C3, C4, C5, C6, C7, C8 would not likely be
- instructions (although obviously possible). It looks more like numbers or
- data. In fact the whole area from address 12AD up to 12B8 does not look
- like instructions at all. Most probably this is just a data area
- containing numerical values. And 8-bit values at that. If they were
- 16-bit values (or addresses), they would be way beyond the bounds of our
- code.
-
- So again using Function key F3, we set this area to 8-bit binary data
- (type #1). Figure IIc shows what the screen looks like now. Compare this
- with Figure IIa and you can see the improvement. In this way areas of the
- program are disassembled one section at a time. Progress at first seems
- slow I realize, but after a while the pieces start to fit together. Asè you begin to understand these small portions the remainder of the program
- becomes that much easier. You are well on your way to a useful source
- file.
-
-
- page 20 Masterful Disassembler - Intel 8086 version 1.00 page 20
-
-
-
-
- Figure IIb, Typical Display Of Partially Disassembled Program
-
-
- 127D:BE5011 _ MOV SI,#L1150H ;
- 1280:E822FC CALL L0EA5H ;
- 1283:E9DB1B JMP L2E61H ;
- 1286:556E6B6E6F77 L1286H DB 'Unknown version ';
- 1296:6F6620547572 DB 'of Turbo',CR,LF,0;
- 12A1:FF L12A1 DB 0FFH ;
- 12A2:9C L12A2H CBW ;Convert byte (AL) to word
- 12A3:2E803EA112FF CMP CS:[L12A1H],0FFH;
- 12A9:7404 JZ L12ACH ;
- 12AB:9D CWD ;Convert (AX) to dbl word
- 12AC:C3 L12ACH RET ;
- 12AD:2EC606070809 L12ADH MOV CS:[L0807H],#09;
- 12B3:C3 RET ;
- 12B4:C4C5 LES AX,BP ;
- 12B6:C6C7C8 MOV BH,C8 ;
- 12B9:E8C6FC L12B8H CALL L0F82H ;
- 12BE:8B4616 MOV AX,[BP]+16H ;
- 12C1:A38A01 MOV [L018AH],AX ;
- 12C4:8B4604 MOV AX,[BP]+4 ;
- 12C7:A38C01 MOV [L018CH],AX ;
- 12CA:1E PUSH DS ;
- 12CB:C516AA11 LDS DX,[L11AAH] ;Load DS:DX with 32b pointe
- 12CF:B010 MOV AL,#10H ;
- 12D1:B425CD21 MSDOS _SIVEC ;Set vector.
- CS:: Labels= 492/23%, Types= 21/ 4%, 0 cmnts No Edit 10/ 3/87 10:20:35
-
-
- For EXE type programs, there is a separate data segment to worry about.
- While this probably does not contain instructions, it is still necessary
- to determine if there are any address references stored here. If there
- are, then they should be identified as such so they can be entered into
- the label pool.
-
- In some cases tables of addresses can be spotted easily. If most of the
- addresses are close (within a few pages) then you will see similar
- hexadecimal values every other byte. For example:
-
-
- 1234:017F097F0F7F L1234H DB 1,7FH,9,7FH,0FH,7FH;
- 123A:137F4F7F1080 DB 13H,7FH,4FH,7FH,10H,80H;
-
-
- When these areas are changed into 16-bit address (type #3) then they
- appear as follows.
- è
- 1234:017F097F0F7F L1234H DW 7F01H,7F09H,7F0FH;
- 123A:137F4F7F1080 DW 7F13H,7F4FH,8010H;
-
-
- page 21 Masterful Disassembler - Intel 8086 version 1.00 page 21
-
-
-
-
- Figure IIc, Typical Display Of Partially Disassembled Program
-
-
- 127D:BE5011 _ MOV SI,#L1150H ;
- 1280:E822FC CALL L0EA5H ;
- 1283:E9DB1B JMP L2E61H ;
- 1286:556E6B6E6F77 L1286H DB 'Unknown version ';
- 1296:6F6620547572 DB 'of Turbo',CR,LF,0;
- 12A1:FF L12A1 DB 0FFH ;
- 12A2:9C L12A2H CBW ;Convert byte (AL) to word
- 12A3:2E803EA112FF CMP CS:[L12A1H],0FFH;
- 12A9:7404 JZ L12ACH ;
- 12AB:9D CWD ;Convert (AX) to dbl word
- 12AC:C3 L12ACH RET ;
- 12AD:2EC606070809 L12ADH DB 2EH,0C6H,6,7,8,9,0C3H;
- 12B4:C4C5C6C7 DB 0C4H,0C5H,0C6H,0C7H;
- 12B8:C8 DB 0C8H ;
- 12B9:E8C6FC L12B8H CALL L0F82H ;
- 12BC:8B4616 MOV AX,[BP]+16H ;
- 12BF:A38A01 MOV [L018AH],AX ;
- 12C2:8B4604 MOV AX,[BP]+4 ;
- 12C5:A38C01 MOV [L018CH],AX ;
- 12C8:1E PUSH DS ;
- 12C9:C516AA11 LDS DX,[L11AAH] ;Load DS:DX with 32b pointe
- 12CD:B010 MOV AL,#10H ;
- 12CF:B425CD21 MSDOS _SIVEC ;Set vector.
- 12D3:1F POP DS ;
- CS:: Labels= 492/23%, Types= 21/ 4%, 0 cmnts No Edit 10/ 3/87 10:20:35
-
-
- The contents of these areas are then added to the address label pool.
- When disassembled, these areas will have a label to let you know that
- they are referenced somewhere.
-
- Notice how the first address of this table has a reference. Rule 4
- indicates that this is required. However this is not strictly true. It is
- possible that the beginning of this area is implied by the end of the
- previous structure. One common approach is to have a sequence of flag
- bytes that is followed by a corresponding address table. Because the
- program "knows" how long the leading byte table is, is then knows the
- start of the address table.
-
- MD86 assumes that any address references present in the data segment
- refer to offsets within the code segment. While this is generally true,
- at times this is incorrect. If it is known to be incorrect (by
- examination of the code that refers to the table of addresses), then a
- choice has to be made. Either these addresses must not be defined as
- 16-bit addresses (change this to 8-bit binary data), or the erroreousè references to the code segment must be tolerated. It is suggested that
- these be changed into 8-bit binary data. You could then add label names
- to these references within the data segment to keep this correct.
-
-
- page 22 Masterful Disassembler - Intel 8086 version 1.00 page 22
-
-
-
-
- 4.2) Understanding the Code
-
- This is the part you have been waiting for. The real guts of the job! You
- have now separated all data from instructions but what do the
- instructions mean?
-
- The Intel 8086 executes instructions in a logical order; the order chosen
- by the programmer. To truly understand the function of the instructions
- you must know how they are executed. For example, just knowing the
- instruction
-
-
- 123A:2C07 SUB AL,7
-
-
- will subtract 7 from the contents of register AL is not very helpful.
- However, if the surrounding instructions were
-
-
- 1234:8A07 MOV AL,[BX] ;
- 1236:3C3A CMP AL,':' ;
- 1238:7202 JC L123BH ;
- 123A:2C07 SUB AL,7 ;
- 123B:2C30 L123BH SUB AL,'0' ;
- 123D:8807 MOV [BX],AL ;
-
-
- you then have the feeling that register BX is pointing to one or more
- bytes. And if these bytes are greater than the digit 9 (the character ":"
- is just passed the digit "9" in the ASCII character set) then 7 is
- subtracted. Looking 7 passed the "9" digit in the table of ASCII
- characters you find the letter "A". Then in either case the value of the
- digit "0" is subtracted. In other words, if register BX were pointing to
- an "8", then this would be replaced with the binary value of 8. If,
- however, BX points to the letter "C", it will be replaced with the value
- 12. So this is just converting a hexadecimal digit or digits from ASCII
- to binary. Well of course! We "know" this program asks for hexadecimal
- values and has to interpret them because in this case we are looking at a
- DEBUG.
-
- Because the processor executes instructions in a certain order, we must
- examine them in that order. This might seem obvious (and in the above
- example it is) but in many cases it is not easy to determine the way in
- which instructions are executed. Consider the following code.
-
-
- 1234:E83033 CALL L4567H ;
- 1237:0130 ADD [BX+SI],SI ;è 1239:337200 XOR SI,[BP+SI]+0 ;
-
-
- page 23 Masterful Disassembler - Intel 8086 version 1.00 page 23
-
-
- The ADD instruction following the CALL is not actually executed at all.
- By looking at the routine at address 4567 we find that the byte following
- the initial CALL is just a parameter. This byte gets used and the return
- will be to the following address (1238). We would not have been able to
- tell this if we hadn't looked at the instructions in the same order the
- processor does.
-
- When you pick apart even a small section of code you should enter a few
- comments and add a label name if you can. Then you won't have to reinvent
- the wheel the next time you look at this code (and you will look at it
- more than once!).
-
- This process is going to be very laborious. It takes many instructions in
- assembly language to accomplish seemingly trivial functions. Like the
- simple BASIC statement "LET A(1,2)=B+C^2" may take thousands of
- instructions and involve many subroutines. But all is not lost. Because
- you know how the program executes (at least in a gross sense), you will
- be able to tackle small portions of it at a time.
-
- Any information you can get your hands on will help. User manuals,
- especially reference manuals are a valuable source of information. Some
- go so far as to include memory maps and descriptions of internal data
- types. Take TURBO Pascal for example, the manual is a real gold mine!
-
- A bottoms up approach has proven to be the most useful when disassembling
- a program. Start from the lowest level. Look for the operating system
- interface. The reason is that these are well defined and have a specific
- calling sequence. MD86 recognizes many of the MSDOS system calls and uses
- more meaningful representations. For example the instructions
-
-
- 1234:B409 MOV AL,9 ;
- 1236:CD21 INT 21H ;
-
-
- is replaced with a single macro instruction
-
-
- 1234:B409CD21 MSDOS _OUTSTR ;Display string at (DX)
-
-
- In this way you can identify the lowest level routines. Those that write
- characters to the screen or read the keyboard. How about opening and
- closing files and input and output from the communications ports?
- Generally these are short subroutines (<100 lines) that you can
- comprehend. Try to find as many of these routines as possible and give
- each one a name that will help you to remember what it does. Also toss in
- as many comments as you can.
-
- Once the lowest routines have been worked on, the next higher levelè becomes easier. Now you can find those routines that read and write to
- files buffers without worrying about all those instructions required to
- actually get the data out to the disk.
-
-
- page 24 Masterful Disassembler - Intel 8086 version 1.00 page 24
-
-
-
- In this way the program gradually starts to unravel and before you know
- it you will actually understand how the programmer was able to write it.
-
- Execute files (those with the extension EXE) introduce a whole set of
- additional problems. Not the least of which is determining actual
- physical address for instructions. You see, the Intel 8086 constructs the
- physical address at run time from a segment register and an offset. The
- relationship is:
-
-
- physical address = segment*16 + offset
-
-
- Because each register is 16 bits long, there is the possibility of
- tremendous overlap. An offset of 100 into segment 1234 is the same as
- offset 110 into segment 1233. To further complicate matters, the segment
- registers can be changed at will. Thus when an instruction is executed,
- the contents of the segment registers (which may have been defined who
- knows where) are of vital importance. The more segment registers are
- modified within a program, the tougher the job of disassembly is.
-
- As an example of a typical execute program, lets look at EXE2BIN.EXE.
- Within the first few instructions we see the following code.
-
-
- 0000:1E PUSH DS
- 0001:33C0 XOR AX,AX
- 0003:50 PUSH AX
- 0004:B430CD21 MSDOS _GETVER
- 0008:3C02 CMP AL,2
- 000A:7D13 JGE L001FH
- 000C:BB3900 MOV BX,#L0039H
- 000F:8EDB MOV DS,BX
- 0011:BA5B01 MOV DX,#L015BH
- 0014:0E PUSH CS
- 0015:1F POP DS
- 0016:B409CD21 MSDOS _OUTST
- 001A:06 PUSH ES
- 001B:33C0 XOR AX,AX
- 001D:50 PUSH AX
- 001E:CB RET
- 001F:BE8100 MOV SI,#L0081H
- 0022:BB3900 MOV BX,#L0039H
- 0025:8EC3 MOV ES,BX
-
-
- Lets look at this code for a second. We see that almost the first action
- of this is to call MSDOS and find out what its version number is. If this
- number is greater than or equal to 2 then this jumps to offset 001F. Soè the code between 000C and 001E is only executed if the version number is
- less than 2. Following the jump instruction, the next two instructions
- initialize the data segment register (DS) to 39 hex. That means that
-
-
- page 25 Masterful Disassembler - Intel 8086 version 1.00 page 25
-
-
- further references into the data segment will get to physical address 390
- hex + offset. The next instruction loads the DX register with the value
- 15B hex. Now if we take a quick look at address 4EB hex (390+15B=4EB) in
- our code we will find the start of the ascii message "Incorrect DOS
- version$". A quick note, normally these addresses (ie 4EB) will be
- relative to the start of the data segment within the EXE file and the
- code segment follows this immediately. Thus we have to look at 4EB -
- data_segment_size within our code. But for EXE2BIN.EXE, the data segment
- size was zero so we can look directly at address 4EB. Now the two
- following instructions are very curious. By executing the PUSH CS and POP
- DS we will effectively reset the data segment register to the code
- segment register, or zero within our file. Thus the call to MSDOS
- function to display an ascii character string will try to get the
- characters from offset 15B instead of 4EB. This is a definite bug in
- EXE2BIN.EXE! The PUSH and POP instructions should not be there. Even the
- best programs can contain bugs. Don't be too alarmed when you run into
- one.
-
- Moving on, at addresses 22 and 25 we see that the extra segment register
- (ES) is being set to 39 hex just like the data segment register was set.
- This should give us a real strong indication that address 390 hex (or a
- few bytes beyond) we will find the start of a data area within our code.
- This will help us later on.
-
- One further note, when MSDOS executes an EXE type program, it initializes
- the data segment and extra segment registers to point to an area called
- the Program Segment Prefix (PSP). This area contains many useful items
- that the program will need. So prior to changing these registers, the
- program will examine this area for those items it needs. Figure III lists
- those items that are of most interest to us. Refer to reference 1 for a
- more complete discussion of this area.
-
-
- page 26 Masterful Disassembler - Intel 8086 version 1.00 page 26
-
-
-
- Figure III, The Program Segment Prefix Summary
-
-
- Offset | Contents
- ------ | ------------------------------------------------------------
- 0002 | System memory size in paragraphs (16 byte blocks). This is a
- | 16 bit integer.
- |
- 000E | Control-C exit address. First 2 bytes are offset and second
- | 2 bytes are the segment.
- |
- 0012 | Hard error exit. 2 byte offset and 2 byte segment.
- |
- 005C | Unopened file control block for first file specified after
- | command. Only valid if a path is not specified.
- |
- 006C | Unopened file control block for second file specified after
- | command. Only valid if a path is not specified.
- |
- 0080 | Entire text string the follows the command. The first byte
- | is a character count. Note redirection information is not
- | passed on to the program (it is stripped first).
-
-
- 4.3) Polishing the Source Code
-
- Sooner or later you will come to the point where you must abandon the
- disassembler. It has done its job but now an editor would be better
- suited to working on the files.
-
- Once you get a source file out of MD86 then you can try assembling it.
- There will undoubtedly be many areas where MASM will complain. Segments
- may be defined in the wrong order or some external references are not
- defined at all.
-
- Get yourself a good screen oriented editor. One with virtual memory
- support is vital. Assembly programs tend to be very large and it will be
- a real pain if you have to break it into small pieces because your editor
- limits the code to 64k. You are going to especially need global search
- and replace functions. WordStar, although rather slow, does work fine for
- this type of work as long as you don't use document mode.
-
- MD86 always inserts data type pointer override instructions. These are
- the WORD PTR and BYTE PTR sequences you see all over the place. MASM does
- not require an override if the types already match. That is, a value is
- referenced as a 16-bit word and it has previously been defined as this
- type, then an override is not required. Since MD86 does not know enough
- to be sure these conditions have been met, WORD PTR will be inserted. One
- of the first things you will want to do is to remove these phrases whereè they are not needed. They just clutter the code.
-
- EXE type files pose the biggest challenge to MD86 and MASM will certainly
-
-
- page 27 Masterful Disassembler - Intel 8086 version 1.00 page 27
-
-
- complain about some aspect of the way the different segments are handled.
-
- MD86 rather simplemindedly inserts tables for each segment that has any
- labels defined at the start of the program. Although careful use of MD86
- will limit the number of erroneous labels, some extra ones will exits and
- these tables will end up being quite long.
-
- When MD86 encounters an instruction that references a 16-bit quantity it
- assumes that this is an address (or more properly an offset into a
- segment). This address is put into the label pool. It is not possible to
- distinguish an address reference from a pure constant. Thus you will see
- many labels in the segment tables (mainly the data segment) with values
- line 0, 1, 2, 7, etc. Now these may be valid addresses, but most likely
- they are just constants. A worth while exercise is to eliminate as many
- of these as possible. Change the address reference into a constant (ie,
- change "MOV AX,OFFSET L03E8H" into MOV AX,OFFSET 1000") so you can
- eliminate the "L03E8H:" definition from the data segment table.
-
-
- 4.4) Deciphering More Obscure Code
-
- In the good old days when memory was expensive and processors had a
- limited address range, assembly programmers delighted in seeing how much
- they could squeeze into small spaces. This tendency has lessened somewhat
- with the newer processors and cheap memory but you will still find some
- real funny looking code.
-
- Consider the following which was found at the start of a disk input and
- output routine.
-
-
- 1234:F9 STC
- 1235:73F8 JNC L122FH
- 1237:B80100 MOV AX,1
- 123A:7304 JNC L4567H
- 123B:7304 JC L7654H
-
-
- Wait a minute, you say. How can you have a set carry instruction (STC)
- immediately followed by a jump on no carry (JNC)?. There must be
- something wrong. No one writes code like that! Actually this code is
- correct. Since the jump on no carry is never executed, the destination
- byte is always skipped if the instructions are executed in the order
- shown. However, the programmer sometimes jumps directly to address 1236
- which is in the middle of the jump instruction. In this case, the
- displacement is executed and this becomes a clear carry instruction (the
- F8 byte). What happens is that the routine has two functions that are
- very similar (like keyboard input with and without echo) and the state of
- the carry flag is used to determine which function is desired. A jump to
- address 1234 does one thing and a jump to 1236 does the other. Veryè sneaky!
-
- Or how about this piece of code.
-
-
- page 28 Masterful Disassembler - Intel 8086 version 1.00 page 28
-
-
-
-
- 1234:40 INC AX
- 1235:40 INC AX
- 1236:40 INC AX
- 1237:40 INC AX
- 1238:40 INC AX
- 1239:E82B33 CALL L4567H
-
-
- Surely it doesn't make sense to have that many increment instructions in
- a row. Or does it? Actually this is part of an error handling routine.
- The idea is to load the AX register with an error number and call the
- routine at 4567 to print out a message based on the error number. To
- display error number 1, then the programmers writes the code
-
-
- 2345:31C0 XOR AX,AX
- 2347:E8EEEE CALL L1238H
-
-
- To display error message number 4, then the call goes to address 1235
- instead. For this particular procedure, the AX register always contains a
- zero (it is used as an error flag) and so the XOR AX,AX instruction can
- be eliminated. Then this requires only a three byte call instruction to
- flag an error condition (instead of the usual five bytes). Some
- programmers go to great lengths to save a few bytes of code!
-
-
- 5.0) Examples
-
- A couple of example disassembly files have been included on the
- distribution disk. These give you an idea of how a typical (if there is
- such a thing) disassembly proceeds.
-
- The first example is the complete disassembly of a disk file comparison
- utility program called CMP.COM. This is a short (1/2k) program that took
- about an hour or so to disassemble. Using MD86 to examine the progress, you
- will note that all labels have been given names that more or less make
- sense. In addition, numerous comments have been entered. You can write a
- source file to the disk and try to assemble it or print it. If you use MASM
- to assemble this file, you will run into error 56 (No immediate mode) a few
- times. Refer to section 3.5 as to why this happens and how to correct it.
-
- The second example was included to show the results of a basic disassembly.
- Here the program EXE2BIN.EXE has been disassembled but only the first step
- has been completed. Only the data areas have been separated. Note that this
- EXE program does not have a separate data segment. When MD86 reaches a
- statement like "MOV DX,L0582H" it notes that offset 582 hex within the data
- segment has been referenced. Since there isn't any data segment in theè file, an equate statement is inserted when a source file is generated. But
- note that the code internally sets the data segment address to be within
- the code segment. Thus the reference to DS:582 is really somewhere within
-
-
- page 29 Masterful Disassembler - Intel 8086 version 1.00 page 29
-
-
- the code segment. MD86 does not know this and the corresponding address
- within the code segment does not appear to be referenced. This is all too
- typical of EXE programs. They are a real bear to disassemble.
-
-
- 6.0) MD86 Limitations
-
- MD86 is designed to provide as much functionality as is reasonably possible
- without requiring any special equipment. There are some restrictions
- imposed on the user although the disassembly of a normal file should not be
- hampered. These are:
-
-
- o 2048 Address References.
-
- o 512 Entries in the Data Type Tables.
-
- o 2048 Comment Strings.
-
- o 64k Maximum Data Segment Size.
-
- o 64k Maximum Code Segment Size.
-
-
- These parameters have been chosen such that a program up to 30k can be
- disassembled as a single file. A 30k program would result in a 15,000 line
- assembly file. When disassembling a larger file, it should be broken up.
- This can be very difficult for EXE type programs. Even if MD86 could
- process larger files, MASM has its own restrictions which would require
- smaller sections.
-
-
- 7.0) MD86 Error Messages
-
- During the process of initializing its internal tables, MD86 may display
- one of a few error messages. At other times, MD86 just beeps to indicate
- that some process could not be completed properly. After the beep, MD86
- just waits for a "correct" response or the error to be corrected.
-
- The error messages that may appear are:
-
-
- o Help, file filename does not exist.
-
- MD86 tried to locate the file with the name "filename" and it did not
- exist. You are requested to enter another filename. A leading path may
- be included if the file is under another directory or on another drive.
-
-
- o Help, error reading the auto comment file. Cannot continue.è
- While reading the file MD86.CMT (which did exist), MSDOS or TURBO
- Pascal returned an error code. Try again. If this error persists, then
-
-
- page 30 Masterful Disassembler - Intel 8086 version 1.00 page 30
-
-
- re-copy this file from the master distribution disk. If this does not
- help, then send a copy of MD86.CMT and MD86.COM to CC Software for
- analysis. You will be contacted with the solution as soon as possible.
-
-
- o Help, auto comment file MD86.CMT cannot be found on the current
- directory.
- Automatic comment generation is disabled.
-
- The file MD86.CMT, which contains the automatic comment strings, could
- not be located. MD86 looks only at the current drive and directory. It
- does not use the PATH variable. Under most circumstances, you should
- quit (F10) without saving the data. Then copy this file from the master
- distribution disk over to the correct directory. Re-execute MD86 and
- this error message should not appear.
-
- In the event that you do not need these comments, you may continue.
- MD86 will just ignore attempts to insert the comment strings.
-
-
- o Help, data file filename.001 does not exist!
- Help, Data file filename.002 does not exist!
-
- One or both of the required data files could not be found. Both of
- these files are used by MD86 for label and comment storage. When
- disassembling file "filename", MD86 creates files "filename.001" and
- "filename.002" to store related parameters. These files are created
- under the same directory as "filename.com" or "filename.exe" was
- located. More than likely you are trying to disassemble another copy of
- this program under a different directory. Another possibility is that
- the filename was not in a correct form (ie C>MD86 MYFILE..COM). If
- neither of these situations are the cause, then contact CC Software for
- additional help.
-
-
- o Help, one or more data files cannot be read properly.
-
- One of the data files (either filename.001 or filename.002) was not in
- the correct form for MD86 to read. This can occur when an I/O error or
- Run-time error aborts the writing of these files. Also one of these
- files could just have a bad block. If there has not been a lot of work
- already invested in these files, the safest procedure is to erase them
- and start over. Or use DEBUG to read these files into memory to check
- that they are at least readable. If they are, then send copies of these
- files to CC Software for analysis.
-
-
- 7.1) Error Beep While Editing a Field
-
- If you hear a beep while you are editing a field MD86 is saying that theè last command could not be completed or it was an illegal command. The
- status line at the bottom of the display will show either INSERT or
- REPLACE if you are editing. The following are the sources of editing
-
-
- page 31 Masterful Disassembler - Intel 8086 version 1.00 page 31
-
-
- errors.
-
-
- o Trying to move the cursor beyond the edges of the field. Left and
- right arrow keys.
-
- o Entering a non-editing command (a function key or an up or down
- arrow).
-
- o Trying to update a field that contains an illegal character. In
- particular, labels are restricted to a leading alphabetic key and
- imbedded spaces are not allowed.
-
-
- If you are editing, the ESCAPE key can always be used to cancel and
- restore the field to its original content. Often times a key is pressed
- by mistake which causes edit mode to be entered but is itself illegal.
- For example, pressing the backspace key (<X) at the first column of a
- field. MD86 enters edit mode but rejects the key because there is nothing
- to delete. However, MD86 remains in edit mode. Press the ESCAPE key to
- cancel.
-
-
- 7.2) Error Beep While Not Editing a Field
-
- If the last line of the displays contains "No Edit", then you are not
- editing a field. In this case MD86 beeps when an illegal command is
- entered. This could be an unbound function key, one of the key pad keys,
- or the ESCAPE key. MD86 ignores these keys. No harm is done.
-
-
-
-
-
- References
-
- 1) "MS-DOS Developer's Guide", John Angermeyer and Kevin Jaeger, Howard W.
- Sams & Co, 1986
-
- 2) "Peter Norton's Assembly Language Book for the IBM PC", Peter Norton and
- John Socha, Prentice Hall Press, 1986
-
-
- page 32 Masterful Disassembler - Intel 8086 version 1.00 page 32
-
-
- C O M M A N D K E Y S U M M A R Y
-
- Key Mode Description
- ---------- ---- -----------------------------------------------------------
- Left-Arrow 3 Move left one space.
- Right-Arrow 3 Move right one space.
- Up-Arrow 2 Move up one line.
- Down-Arrow 2 Move down one line.
- Page-Up 2 Move up about one page.
- Page-Down 2 Move down one page.
- Home 1 Move to beginning of field.
- Home 2 Move to beginning of label field.
- End 1 Move to end of field.
- End 2 Move to beginning of comment field.
-
- Insert 3 Switch between insert and replace modes for editing.
- Delete 1 Delete character under the cursor.
- Backspace 1 Delete the character to the left of the cursor.
- Escape 1 Cancel editing changes and return cursor to start of field.
- Return/Enter 1 Make editing changes permanent.
- Return/Enter 2 Move down one line (same as Down-Arrow).
-
- F1 2 Display a one page help summary.
- Shift-F1 2 Alter system parameters.
- F2 2 Goto a specified address or return from previous Goto.
- Shift-F2 2 Follow current instruction (jump or call only).
- F3 2 Set data type for given address range.
- F4 2 Set data type for unspecified address range.
- F5 2 Write source file to disk.
- F6 2 Scan code segment to build label table.
- F7 2 Dump program in hex and ascii.
- F8 2 Set label name for specified address.
- F9 2 Search for an address reference.
- Shift-F9 2 Search for next reference.
- F10 2 Save and/or exit.
-
-
-
- Modes: 1= Editing, 2=Non-editing, 3=Either editing or non-editing.