home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Frostbyte's 1980s DOS Shareware Collection
/
floppyshareware.zip
/
floppyshareware
/
DOOG
/
MD8610.ZIP
/
MD86.DOC
< prev
next >
Wrap
Text File
|
1990-05-04
|
82KB
|
1,755 lines
page 1 Masterful Disassembler - Intel 8086 version 1.00 page 1
1.0) Introduction
The MD86 program is a powerful utility for examining and disassembling any
executable program or any series of machine instructions (like a ROM
image). MD86 is designed to run on any IBM PC, XT, or AT or compatible with
at least 128k of ram memory. Neither a graphics adaptor or a color monitor
is required. A hard disk is desirable but MD86 runs fine (actually a trifle
slower) on floppy based systems.
MD86 was developed with one goal in mind. Produce useable source code from
an executable program file. By useable, we mean that the resulting assembly
instructions should be understandable. This necessitates meaningful label
names and comments. Normally the disassembly of a large program is a time
consuming, laborious task. MD86 speeds this up as much as possible.
MD86 produces source files that are compatible with the Microsoft assembler
MASM version 4.00 or reasonably compatible with the IBM assembler. While
this is not the easiest assembler to use (in fact it is down right
difficult), it was chosen because it is more "standard" than any other
assembler. Eventhough the instruction syntax is compatible, the
organization of the segments may not be for some programs. After MD86 has
produced a source file, it is not uncommon that an editor is needed to make
some minor changes before it can be assembled without error. This will be
especially true with EXE type programs which have complex segment
structures.
1.1) What MD86 Looks Like
MD86's unique video display works very much like a full screen editor;
allowing movement within the disassembled source file with single key
ease. Most of the difficulties associated with other disassemblers is
gone.
When executed, MD86 presents the user with a full screen of information
that looks very similar to the printed output from an assembler. Figure I
shows a typical screen from a freshly disassembled program. This is
actually the file COMMAND.COM for PCDOS v3.1.
The bold line towards the top of the display is the active line. The
cursor (shown as an underline "_") is at the start of the label field.
Both the label field and the comment field may be edited.
Note how the display does not seem cluttered. The label field only has an
entry if the address is referenced. The comments shown here have been
automatically inserted by MD86 These help you remember the less common
instructions and MSDOS function calls.
page 2 Masterful Disassembler - Intel 8086 version 1.00 page 2
Figure I, Typical Display Of Freshly Disassembled Program
05DB:51 L05DBH PUSH CX ;
05DC:1E PUSH DS ;
05DD:07 POP ES ;
05DE:C536610A _ LDS SI,[L0A61H] ;Load DS:reg with 32b pointr
05E2:57 PUSH DI ;
05E3:BF5A08 MOV DI,#L085AH ;
05E6:B90B00 MOV CX,#L000BH ;
05E9:FC CLD ;Set forward direction for
05EB:F3A4 REPZ MOVSB ;Move byt. (SI)+- to (DX)+-
05EC:5F POP DI ;
05ED:06 PUSH ES ;
05EE:1F POP DS ;
05EF:59 POP CX ;
05F0:BA4708 MOV DX,#L0847H ;
05F3:B409CD21 MSDOS _OUTSTR ;Display string at (DX)
05F7:BA9E08 L05F7H MOV DX,#L089EH ;
05FA:E8A300 CALL L06A0H ;
05FD:F606D30AFF TEST [L0AD3H],0FFH ;
0602:7404 JZ SHORT L0683H ;
0604:B403 MOV AH,#3 ;
0606:EB7B JMP L0683H ;
0608:B8010C L0608H MOV AX,#L0C01H ;Flush buffer, read keyboard
060B:CD21 MSDOS ;
060D:E88D00 CALL L069DH ;
CS:: Labels= 185/ 8%, Types= 0/ 0%, 0 cmnts No Edit 10/ 2/87 1:20:35
A note to programmers familiar with the Microsoft assembler MASM. MD86
creates compatible data files, but the screen display has been
simplified. In particular, the word OFFSET (as required by MASM) is
replaced with the pound sign ("#") and all WORD PTR and BYTE PTR phrases
have been removed. Generated labels are not shown with an appending
colon. When a source file is generated, the source code will be
compatible.
The line at the bottom contains status information. This tells you that
the code being viewed is in the code segment, 185 address labels have
been identified, no data types have been defined and no user entered
comment records exist. In this case we are not as yet editing any field
thus "No Edit" is displayed. If we were, then either "INSERT" or
"REPLACE" would show indicating how characters are being added to the
field. At the far right corner, the current time and date are as shown.
The comment field may extend past the right edge of the screen and the
active line scrolls horizontally as necessary to keep the cursor within
view.
è The function keys are used to control MD86. A window "pops up" in the
upper left corner for instructions. Inadvertently entered commands may
generally be aborted by a null response (ie, only pressing the RETURN
page 3 Masterful Disassembler - Intel 8086 version 1.00 page 3
key) to one of the questions.
2.0) Using MD86
To disassemble a file it must first exist in the current directory. MD86
may be placed in any other directory as long as the PATH command includes
that directory. The companion file, MD86.CMT is only used to supply the
automatic comments and is only needed when a program is disassembled for
the first time. If this is not found, MD86 will turn off automatic
commenting. If this is not acceptable, then QUIT (see Section 2.2), move
MD86.CMT to the current directory and begin again.
To disassemble the program COMMAND.COM, use the following command.
C>MD86 COMMAND.COM
If MD86 cannot locate the associated data files, then MD86 will create them
(you will be asked for confirmation first just in case you misspelled the
program name). MD86 will automatically determine the extent of the program
and put the cursor on the first address of the program. Note that COM type
files start at 100 (hex) and EXE type files start at 0000. See reference 1
for a discussion of the dissection of EXE type files. The Alter Parameters
command may be used to override the choices made by MD86.
MD86 creates two data files when it disassembles a program. These have the
same name with the extensions of .001 and .002 (ie, COMMAND.001 and
COMMAND.002). The first file contains the symbol table and other parameters
and the second file contains the comment records. If neither of these files
are present, then MD86 assumes this is a disassembly of a file for the
first time. If they are both present (and readable) then MD86 will pick up
right where you left off. If only one of these files is present or one is
unreadable, then MD86 issues an error message and terminates. Refer to
Section 7 for a discussion of error causes and cures.
During the disassembly process, there are three groups of commands that
MD86 will recognize. Commands that require additional input will cause a
window to pop up in the upper left corner of the display. User dialogue
occurs within this window.
The three command groups consist of 1) editing commands, 2) non-editing
commands, and 3) general commands. The editing and non-editing commands are
mutually exclusive. When editing, the non-editing commands are not allowed
and visa-versa. General commands are always valid.
MD86 will "beep" when an invalid command character is entered. Note that
some keys generate more than one character and while the first characterè may be invalid, the others may not. So when you here the beep, examine the
characters around the cursor to be sure no extraneous characters were
inserted.
page 4 Masterful Disassembler - Intel 8086 version 1.00 page 4
2.1) General Command Keys
The general commands can be typed at any time. They will always be
recognized.
o Left-Arrow
This will move the cursor one position left within the current field
(either the label field or the comment field). It the cursor is already
at the beginning column, then a "beep" will be heard.
o Right-Arrow
This will move the cursor one position right within the current field
(either the label field or the comment field). Note that the fields are
always filled with blanks on the right. If the cursor is already past
the right hand column, then a "beep" is heard.
o Insert
This will change the way editing character keys are entered. They will
either replace existing characters or insert in front of the
characters. In editing mode, either INSERT or REPLACE is displayed in
the bottom status line.
2.2) Editing Command Keys
The label and comment fields can be edited by moving the cursor to the
desired location and just typing; similar to a word processor. This
allows label names and or comments to be associated with an address. The
current line may be in the code segment or data segment (EXE type
programs only). Once editing has begun, then only the ESCAPE key or the
RETURN key will revert to non-editing command mode.
When a temporary label field is edited, it is initially blanked out
eliminating the "L1234H" that was present. If the cursor was not at the
first column of the label, then leading blanks will exist there. Since
labels must begin with a letter or underscore, this will be rejected when
a RETURN key is pressed.
Editing the automatic comments causes MD86 to first ask if the comment
field should be blanked out or not. After this question is answered, your
key is processed.
page 5 Masterful Disassembler - Intel 8086 version 1.00 page 5
o Letters, Numbers, and Symbols
These characters are entered into the field. If the current mode is
INSERT then the characters to the right are moved (the rightmost
character is lost) to make room. Otherwise, in REPLACE mode the new
character overwrites the current cursor character. Note that within the
label field only the characters A-Z, a-z, 0-9, $, and _ are valid.
Other characters cause a "beep" and are ignored. Note that a further
restriction that labels not begin with a number is not checked until a
RETURN key is pressed.
o Escape
This key will cancel any editing on the current field and its original
contents will be restored. This effectively returns to non-editing mode
without saving any changes to the current field.
o Return
Use this key to tell MD86 that the editing changes you have made are
correct and should be remembered. Note that this is not the same as
saving the data as it is not actually written to the data files yet. If
the field contents are valid, then the cursor will be returned to the
starting column of the current field and the mode is set to
non-editing.
o Backspace
If the cursor is not at the left edge already, this will erase the
character immediately to the left of the cursor. The remainder of the
line will be shifted left and the rightmost column will be blank
filled. If at the left, then this just "beeps".
o Delete
This will erase the character immediately under the cursor and cause
the remainder of the line to be shifted left. The rightmost column will
be blank filled.
o End
This moves the cursor to the last column of the field.
o Homeè
This moves the cursor to the leftmost column of the field.
page 6 Masterful Disassembler - Intel 8086 version 1.00 page 6
2.3) Non-editing Command Keys
A good portion of time spent disassembling a program is spent rooming
around various areas and other non-editing type functions. The simpler
cursor movement functions use a single key stroke for this work while the
more involved commands use the function keys and pop up windows.
The cursor movement keys are as follows.
o Down-Arrow
This moves the cursor down one line and to the beginning of the same
field. You cannot move beyond the end of the current segment.
o Up-Arrow
Move the cursor up one line to the beginning of the same field. You
cannot move past the beginning of the current segment.
o Home
If the cursor is right of the leftmost column of the comment field,
then it is moved to the start of the comment field. Otherwise the
cursor moves to the first column in the label field within the same
line.
o End
This moves the cursor to the beginning of the comment field if it was
within the label field. Otherwise the cursor moves to the end of the
comment field.
o Page-Up
This moves the cursor up approximately a full page. This may be more or
less than 24 lines as this assumes there are three bytes per line on
the average. Note that no attempt is made to locate the beginning of an
instruction. It is probable that the first line or so will be
disassembled incorrectly. The cursor will be positioned at the start of
the label field in the top line.
o Page-Down
è This moves the cursor to the top of the next page. The line that is
currently at the bottom of the screen will be at the top after this
command. The cursor will be positioned at the start of the label field.
page 7 Masterful Disassembler - Intel 8086 version 1.00 page 7
The following commands utilize the function keys on the PC keyboard
either alone or in combination with the shift key. Remember these only
function in non-editing mode.
o F1 - Help Command
This displays a one screen summary of the function keys. Press any key
to refresh the screen.
o Shift-F1 - Alter System Parameters
This command is use to make changes (if possible) to the default
parameters. For COM files, the beginning and ending addresses can be
modified. EXE files however, have ranges for the data and code segments
that are defined in the header. These cannot be changed.
For COM files, the following parameters can be changed.
o The Start Address. Normally this is 100 hex for COM files and 0000
hex for EXE files. But for special work, like ROM disassembly, this
may be set to something else.
o The End Address. MD86 sets this to the physical end of the program
or FFFE hex if this is more than 64k. If you find that a smaller
value is correct, then change it here. This will prevent MD86 from
accessing garbage areas and contaminating the label table.
The following parameters effect how MD86 displays the disassembled
lines. These are changeable at all times.
o Translate MSDOS Functions. MD86 normally tries to translate common
MSDOS functions into a pseudo instruction that has more meaning when
trying to understand the code. However, if you don't want this done,
then it can be disabled here.
o Enable Automatic Comments. When MD86 finds certain instructions, it
tries to add a comment line explaining the instruction in more or
less English. If you don't want to see these comments, then they can
be eliminated.
o F2 - Goto a Specified Address
è This allows a quick jump to any valid address to begin disassembly. A
null response (only the RETURN key pressed) causes MD86 to try and
return to the location of the last Goto command. A stack with the
page 8 Masterful Disassembler - Intel 8086 version 1.00 page 8
previous 16 locations is maintained. This is handy in jumping to one
location and then returning without having to remember where you were.
Valid destinations are anywhere within the data segment or the code
segment. Note that for a COM type file, these are assumed to be the
same.
o Shift-F2 - Follow That Instruction
If the current line contains a direct jump or call instruction, this
command will do an automatic Goto to the destination address. This does
not apply to intra-segment calls or jumps or any indirect calls or
jumps.
MD86 saves the current address on its internal stack so that a return
can be made via the F2 command. With this you can conveniently examine
a subroutine and then continue from where you left off.
o F3 - Set Data Type
When MD86 first looks at a program, it thinks that all of the data
segment is made up of 8-bit data bytes and all of the code segment is
machine instructions. This more than likely is not 100% correct. When
disassembling a portion of the program, you may notice that the present
interpretation does not make sense. Some other data type is necessary.
MD86 can recognize one of four data types. These are instructions (type
#0), 8-bit binary data (type #1), 8-bit ascii characters (type #2), and
16-bit addresses (type #3). See Section 2.6 for more details.
This command allows any range of the program to be set to a specific
type. You will be asked for the data type and the first and last
addresses. Addresses must be in the same segment of course.
The internal type table has room for 512 entries. This is the total of
the data and code segment types. The current total is displayed on the
bottom status line along with a percent used figure.
o F4 - Set Data Type for Unspecified Range
Often the extent of a different data type is not known. What is known
is the initial address and a suspected data type. This command uses the
current line as the beginning address and will request the suspected
data type (0 to 3). Then MD86 temporarily considers all data following
the cursor to be this type. You would move the cursor down until you
reach an address that is not of this type and press F4 again to fix the
range for the specified type. In this special mode, only one data item
per line is displayed.è
page 9 Masterful Disassembler - Intel 8086 version 1.00 page 9
o F5 - Write Source to Disk
MD86 would be of limited use if you could not generate a disk file with
the source code. Use this command and specify the file name and MD86
will write out the data. If an extension of PRN is used, then the file
will have the address and binary code along with each instruction line.
This is the way the screen appears.
Before this file can be used by an assembler, some hand editing will be
required. Segments may have to be specified differently than MD86.
o F6 - Scan Code Segment
MD86 builds the label table as code is disassembled. At times, the
disassembly is not correct and erroneous address references may be
entered into the label table. This function cleans this up.
When all of the code segment has been given the correct data type, then
this function should be used to properly build the label table. It will
remove any temporary labels and begin disassembling the entire code
segment. When this has completed, the label table will be correct and
erroneous references will be removed.
o F7 - Dump Program in Hexadecimal and ASCII
It is difficult to determine the location of data areas and character
strings by just looking at a page of disassembled instructions. This
function will begin in the data segment (for EXE programs) and then
dump the code segment. The data is displayed 16 bytes per line in
hexadecimal and also in ascii (if possible). Pressing any key will halt
the display so you can inspect the data and maybe write down addresses
of obvious data areas. Pressing any key other than the ESCAPE will
continue the display. Press the ESCAPE key to end this segment and dump
the next segment or return to where you when this command was
initiated.
o F8 - Set Label Name
If it is desired to associate a name with a particular address you can
either move the cursor to that address (if possible) and edit the label
field or use this command to set the name without having to move there.
Enter any valid label name and address to be set. A null label name
will delete a label from the tables. In valid names cause a "beep" to
be heard and is ignored. Valid names start with a non-digit and have no
imbedded spaces.
The label table is limited to 2048 label names which should beè satisfactory for any reasonable size program.
page 10 Masterful Disassembler - Intel 8086 version 1.00 page 10
o F9 - Search for Address Reference
This command will allow any address to be searched for. Use this when
it is desired to find out how (or if) a particular area is referenced
within the code segment. The initial address to start is requested. A
null response causes the search to begin at the start address. You will
see the program disassembled on the screen and it will stop when the
specified address is referenced. During the search, press any key to
abort.
Note that the search is limited to the code segment. However, the
particular address may be in any other segment. For example to search
for address 1234 within the extra segment, enter ES:1234 as the search
string.
o Shift-F9 - Search for Next Reference
Once function F9 has been used to find the first occurrence of an
address, use this command to locate the next. As with function F9,
press any key to abort the search.
o F10 - Save and/or Exit
Use this command to save your current data tables (often!) and exit or
quit MD86 You are given the option to save the data or not and to exit
to MSDOS or not.
If you wish to quit without saving any of the work you have done, then
respond No to saving the data and Yes to exiting.
2.4) Label Name Specification
MD86 allows, even encourages, you to associate a label name with each
referenced address. Names are far more understandable than numbers. The
label field within the display is either blank (the address has not been
referenced), contains a temporary label (the form is LnnnnH for address
nnnn), or contains a user defined name. Label names can be up to eight
characters long and may contain letters, digits, the dollar sign "$", or
the underscore "_" characters. Labels may not begin with a digit however.
Label names may contain upper and lower case letters and the case is
maintained. However, when searching for a name, MD86 ignores differences
in case. Thus the name "HelpMsg" is perfectly valid and will appear this
way in the output file. You could also jump to address "HELPMSG" and
"HelpMsg" would be found.
Upper and lower case letters make reading names easier, but you don'tè have to remember the exact form to reference the name.
page 11 Masterful Disassembler - Intel 8086 version 1.00 page 11
2.5) Specification of Addresses
When MD86 requests an address (like the destination of a Jump command),
the form the address must be entered as follows.
Address ?{ss:}nnnn
The brackets indicate optional qualifiers. If the address is within a
segment other than the current segment, then the segment name must be
included. The "ss:" in the line above is the segment name and it must
then be either "CS:", "DS:", "ES:", or "SS:". The case of the letters is
not important, but the segment name must precede the address (or offset)
portion.
If the actual address within the segment is entered as a number then it
must be in hexadecimal. In place of a number, a label name could be used.
This name must be resolvable within the segment.
For example, the following are valid addresses.
Address ?100
Address ?ds:HelpMsg
The label table is stored internally and has room for 2048 entries. This
is generally enough to disassemble a 10,000 to 15,000 line program. For
larger programs it is recommended that they be divided into smaller
sections if at all possible.
2.6) Data Type Specification
MD86 initially thinks the entire code segment contains instructions and
the data segment (for EXE type files) contains 8-bit binary data. This is
a good place to start but there will be other data types mixed in with
these. Functions F3 and F4 can be used to tell MD86 to assume a different
data type for a specified address range. The types are specified by a
numeric code number and the ones recognized are:
0 - Machine instructions.
1 - 8-bit binary data.
2 - 8-bit ASCII character data.
è 3 - 16-bit address data.
1
page 12 Masterful Disassembler - Intel 8086 version 1.00 page 12
When using function F3, MD86 must be told the first address of the newly
defined type and the last address with this type. For data types that
occupy more than one byte (type 0 or 3), the last address must be the
address at the end of the field not the start. Thus if address 100
contains a single 16 bit address, then MD86 is given the first address as
100 and the last address as 101 (not 102 as you might think).
Function F4 works a little differently. The start of the current line is
taken as the first address when this is initiated. When this is pressed
again, then the start of the now current line (if below the first
address) is assumed to be just passed the type being defined. In other
words, if the address range 120 to 140 is being defined as type 2 (ascii
character data), then the current line should be at address 120 when F4
is pressed the first time and then moved to address 141 when pressed the
second time.
A code type table is maintained internally by MD86 that contains the
beginning and ending addresses (with segment of course) and the type of
data this address range contains. Instructions are the default type and
(to save memory space) are not actually stored in the table. There is
room for 512 entries which should be plenty for most normal applications.
2.7) Output Source File Format
MD86 produces a standard ASCII text file as output. This should be
suitable as input to most any assembler and editor. Note that MD86 does
not insert tab characters and thus the lines will contain many blanks.
This causes the files to be quite large. The judicious insertion of tabs
would shrink the file size significantly.
When MD86 disassembles a program, it remembers how addresses are
referenced. As a convenience, the output file will have separating lines
just in front of all subroutines. That is, those addresses that were the
target of near call instructions.
3.0) The Inner Details of MD86
In the next few sections we will describe in more detail how MD86
functions. It is not necessary that you remember or understand all of this
material, but when questions arise this will make a handy reference.
Many choices made during the construction of MD86 were ones of programmer
preference. There are several ways to tackle the many problems encountered
in creating a disassembler. Which one is better? If the choice was notè obvious, then personal preference would be the deciding factor.
MD86 was written in TURBO Pascal and owes a lot of its speed to this fine
compiler. Some of the limitations of this compiler necessitated tradeoffs
in the design of MD86. In particular, to avoid the use of overlays, the
program was limited to a 61k code segment. Not all of the "nice" options
could be included.
page 13 Masterful Disassembler - Intel 8086 version 1.00 page 13
3.1) Moving the Cursor Upward
When MD86 displays a full screen of disassembled lines it remembers the
exact starting address for each line. Thus to move upward within the
display screen, MD86 only has to display the new current line in bright
characters again. However, when the cursor moves off the top of the
screen (either by an Up Arrow or Up Page command), MD86 has a difficult
task in determining the starting address for the instruction. There are
times when more than one legitimate starting address is located. For this
situation, MD86 chooses the longest instruction. This may not be correct.
The problem comes in when MD86 has moved up more than one time (you
pressed Up Arrow more than once when at the top of the screen) and it
finally comes to a point where it cannot find any legitimate instruction
to disassemble. You here a "beep" and MD86 only backs up one byte. This
tells you that the screen is not displaying correct instructions. One or
more of those at the top (where MD86 backed up) is not correct.
If the screen is not correct, then how do you make it correct? The
easiest way is to back up a full page (Up Page command) and then go
forward a full page (Down Page command). More than likely, MD86 will
correctly synchronize somewhere on the screen when you backed up
(probably at a labeled instruction) and everything from there on downward
will be correct.
One point of interest will surely pop up. When a NOP instruction is
disassembled, it is flaged as "questionable" unless the previously
disassembled instruction was a short JMP. However, if the cursor is moved
upward to a NOP instruction, MD86 marks this line incorrectly since the
instruction disassembled prior to this was actually a following
instruction.
A word of caution. When MD86 incorrectly disassembles a line for whatever
reason, erroneous address references may be added to the label pool. This
is where the Scan command (F6 key) comes in. It erases all temporary
labels (ones that have not been given names by you) and begins
disassembling the whole program from the starting address. This will
correctly build the temporary label pool.
3.2) Questionable Instructions
When MD86 disassembles an instruction line, it will check to see whether
or not this instruction makes sense. If it does not, a flag ("?" to the
left of the label field) is set but it is disassembled anyway if
possible. Valid instructions are considered "questionable" if they are
very rarely used or are "meaningless".
In the rarely used catagory are the instructions "LOCK", "ESC", "INT",è "XLAT", "WAIT", "HLT", and far returns within a "COM" file. Further any
instructions made up of the exact same bytes (eg "ADD [BX+SI],AL" which
is two bytes of 00) is considered questionable.
page 14 Masterful Disassembler - Intel 8086 version 1.00 page 14
Meaningless instructions are "NOP" and "MOV destination,source" where the
source and destinations are the same. Note that a "NOP" is allowed
following a two byte forward jump instruction.
Depending on the type of program being disassembled, there may be a few
or a lot of such "questionable" instructions that are actually supposed
to be there. Thus this "questionable" instruction flag is just a guide to
help you locate imbedded data areas.
3.3) Instructions prefix bytes
Like the Intel 8086 processor, MD recognizes certain prefix bytes. These
are the segment override instructions and the repeat instructions. The
bus lock instruction is considered separate as this is very rarely used
(it is also flaged as "questionable"). When moving the cursor around, it
is possible miss the prefix byte. This will occur most often when moving
upward. MD only looks at the previous six bytes to determine where an
instruction starts. If the seventh byte were a prefix byte, this would be
missed. When a jump is made (Function F2) MD does not check to see if
this is in the middle of an instruction. Here too, a prefix byte could be
missed.
The effect of missing a prefix byte could cause label addresses to be
associated with the wrong segment. The scan option (Function F6) can be
used to clean up this type of misinformation.
While this is not the same as a prefix byte, MD86 checks for NOP
instructions that are preceeded by a short forward jump instruction. If
this is not the case, then the NOP is marked as "questionable". This
logic fails when the NOP is in the top line of the screen. There is no
preceeding instruction to check and the NOP is marked "questionable" even
though it may be perfectly valid. When moving the cursor upward, MD86
will incorrectly mark a NOP as questionable. The logic only works when
moving the cursor downward.
3.4) Segment Handling
MD86 recognizes references to the four segments of the Intel 8086
processor. It keeps four separate tables for the labels within these
segments. An exception is with COM type files. For these, the data
segment and the code segment are assumed to be the same. Data segment
references are forced into the code segment space. This is generally
correct, but it is possible that the program creates a separate data
segment. In this case the labels generated by MD86 will be put into the
code segment when they belong in the data segment. Not much can be done
about this until after a source file has been produced. Use an editor to
fix these things.è
page 15 Masterful Disassembler - Intel 8086 version 1.00 page 15
3.5) Known Compatibility Problems With MASM
MD86 was designed to produce source files to be used with the MASM
assembler from Microsoft. This is the most common assembler and as you
probably know it is not the easiest assembler to use. MASM tries to guard
you against yourself. When variables are defined as bytes, then MASM
checks to be sure they are referenced as such. At times this is handy.
But mostly it is an annoyance. When disassembling a program it is often
times difficult to determine how labels are referenced. At the very least
it would consume lots of time which would be better spent on other
aspects of the code. To prevent MASM from generating numerous errors due
to seemingly inconsistent references, MD86 inserts WORD PTR or BYTE PTR
to force MASM into accepting these references. This results in the over
use of these override phrases (and a larger than necessary source file).
When MD86 notices a reference to an item within the data or code segments
that is outside of the limits of the actual program file, it inserts an
EQU statement to equate the label with its value. MD86 is thus assuming
the the reference is to a constant value and not a variable address.
Under some assemblers there is no difference between a constant and an
address (or offset). But MASM does make the distinction and flags as an
error an inconsistent reference. The following error message is typical
of this occurrence.
CMP.ASM(176) : error 56: No immediate mode
This is MASM's way of telling you that line 176 in file CMP.ASM contains
a reference to a variable where the label has been defined as a number.
The solution is to modify the source code file and change the definition
of this label from an EQU into a DB or DW. Thus the line
HELPTXT EQU 00050H
should be change into the following lines.
ORG 00050H
HELPTXT: DB 0
You won't need the ORG statement in front of every definition line as
long as the data type is consistent.
EXE programs pose other problems that must be dealt with. MD86 does not
resolve FAR jumps and calls. You will have to determine the destinationè address and equate this to a label. The absolute address is shown but
MD86 cannot find out where this ends up.
1
page 16 Masterful Disassembler - Intel 8086 version 1.00 page 16
Returns from FAR procedures are flaged by MD86, but the instruction
inserted is RET. MASM will determine if this is a NEAR or FAR return from
the definition of the procedure. Thus, you will have to define the
procedure containing the FAR return as a FAR procedure. If you don't,
then MASM will assume a NEAR return is to be generated. No error message
is displayed but this will not execute the same as a FAR return.
Another problem with EXE type files concerns segment definition. The
output file generated by MD86 should assemble and even link okay, but it
probably won't execute correctly without being changed. You will need to
look into adding ASSUME and GROUP statements in the various segment
blocks.
EXE files can be constructed in many, many ways. It will take some
persistence to resolve these differences.
4.0) A Short Course In Generating Source Code
The process of disassembling a program and recreating source code is more
art than science. The more practice you have the easier this becomes. MD86
(as well as other programs) have been written to make this process as easy
as possible. As "smart" as they are, there is a long way to go before this
can be considered "automated".
Prior to starting to disassemble that super new program, there are some
questions you must ask your self.
o Do I really need the source code for this program?
o Is the program small enough for me to handle?
o Was the program written in a lower level language (C or assembler)?
o Do I really know how this program functions and all of what it does?
These questions are important as the answers can give you an idea of
whether you can finish the job once you start it. There is no point in
"cheating" on the answers either. You only have yourself to convenience.
Source code generation is a three step process.
There are three distinct phases users go through when they disassemble a
program. The first phase is to identify the type of data the program isè composed of. Programs consist of machine instructions and data. But which
is which? You must follow the logic to tell. The only technical difference
is that machine instructions are executed while data is referenced. It is
quite possible for instructions in one part of a program to be data to
another part. With the different segments of the Intel 8086 this is not
very common. But it is certainly possible.
page 17 Masterful Disassembler - Intel 8086 version 1.00 page 17
Once the program has been divided into instruction and data areas, the
second phase begins. This is the process of identifying the different
logical parts. This is usually the most difficult and time consuming part.
It is not easy to understand what purpose a sequence of instructions has,
but with persistence this can be done.
The third stage involves generating an assembler source file and getting it
to re-assemble properly. Disassemblers are only "human". Their output may
assemble without error but it probably won't be a byte-for-byte copy of the
original file. Some "touch-up" will be required to rectify such things as
long and short jumps. While you are at it, you could clean up the comments
and "pretty up" the source file.
4.1) Identifying Data Types
There is a real knack to separating the code into data and instruction
areas. MD86 goes a long way my marking "questionable" instructions with a
question mark to the left of the label field. This mark will appear on
the screen as well as in a PRN type output source file. It will be
removed when a non-PRN output file is generated.
Initially MD86 sees the entire code segment as instructions and the
entire data segment (EXE type programs only) as 8-bit binary data.
However, most of the time this is not the case. It is very common to find
character strings imbedded within the code as well as normal data areas.
When MD86 marks an instruction line as "questionable", examine the lines
above and below to determine where the instructions end and data begins.
Of course it is possible that MD86 was wrong in it judgement and the line
is correct.
MD86 assumes that memory is made up of either machine instructions or
data. The data may be either 8-bit binary (numbers in the range 0-255),
8-bit character data, or 16-bit address (or offset) data. As mentioned
above, the Intel 8086 processor sees instructions as data that it is to
execute and other data is just referenced. In the sections below we will
assume that instruction areas are sequences of instructions that are
executed in order and everything else is data.
There are five basic rules that can be used to determine data area types.
When you identify data areas, make sure these rules have been satisfied.
If not, be very suspicious.
o rule 1
The instruction preceeding a data area must be a transfer (jump, call,
interupt, or return). Conditional jumps would not be allowed unless the
condition was ALWAYS met.
page 18 Masterful Disassembler - Intel 8086 version 1.00 page 18
o rule 2
The first instruction in an instruction area must have a label unless
the preceeding data area was an argument to a call or interupt
instruction.
o rule 3
An absolute transfer of control (jump or return) may be followed only
by a labeled instruction or a labeled data area.
o rule 4
For the type of data to change (from instructions to data or from ASCII
data to 16-bit address data etc.), the first line of the newer type
must have a label.
o rule 5
ASCII character data (including carriage returns, line feeds, etc.)
must either begin with a character count byte (or word) or it must end
with with a special (generally non-ASCII) byte. Is is common within
MSDOS applications that character strings end with a dollar sign. This
is the way the console output and printer output functions know the end
of a string. Assembly programmers also like to use null characters
(value of zero) as an end of string mark. The Intel 8086 processor can
easily detect these.
For purposes of an example, Figures IIa through IIc will be used. This is
fairly typical of the kind of code you will encounter. But be forewarned,
by its very nature assembly code can be very obscure. If the programmer
wishes, it could be extremely difficult to decipher.
Refering to Figure IIa, note how several lines have been marked as
"questionable". Here it is obvious that the lines following the jump
instruction at address 1283 cannot be instructions. The PUSH instruction
at address 1286 is erroneous because of rule #1. Notice how most of the
bytes following address 1283 have a value in the range 20 to 7E (hex). It
is quite possible that this area consists mainly of ASCII characters. But
where does this area end? Rule #2 says we should look for the next valid
instruction line containing a label. In this example we find this at
address 12A2. A word of caution here. Since we may not have disassembled
the entire program, the label pool may be incomplete. It is then possible
that at this time an instruction does not have a label. We need to be
cautious in the application of rule #2.
page 19 Masterful Disassembler - Intel 8086 version 1.00 page 19
Figure IIa, Typical Display Of Partially Disassembled Program
127D:BE5011 _ MOV SI,#L1150H ;
1280:E822FC CALL L0EA5H ;
1283:E9DB1B JMP L2E61H ;
1286:55 L1286H PUSH BP ;
1287:6E ? DB 06EH ;
1288:6B ? DB 06BH ;
1289:6E ? DB 06EH ;
128A:6F ? DB 06FH ;
128B:776E JA L12FBH ;
128D:207665 AND [BP]+65H,DH ;
1290:7273 JC L1305H ;
1292:69 ? DB 069H ;
1293:6F ? DB 06FH ;
1294:6E ? DB 06EH ;
1295:206F66 AND [BX]+66H,CH ;
1298:205475 AND [SI]+75H,DL ;
129B:7262 JC L12FFH ;
129D:6F ? DB 06FH ;
129E:0D0A00 OR AX,#L000AH ;
12A1:FF ?L12A1H DB 0FFH ;
12A2:9C L12A2H CBW ;Convert byte (AL) to word
12A3:2E803EA112FF CMP CS:[L12A1H],0FFH;
12A9:7404 JZ L12AFH ;
12AB:9D CWD ;Convert word (AX) to dbl w
CS:: Labels= 492/23%, Types= 21/ 4%, 0 cmnts No Edit 10/ 3/87 10:20:35
As a first step, we use Function key F3 to set the area from 1286 to 12A1
as characters (type #2). The code now makes more sense (see Figure IIb).
But now notice address 12B4. This instruction does not have a label and
yet it follows an unconditional transfer (return) instruction. Rule #3
says this is not correct. Now it could be that there should be a label
here and we have just not disassembled the section of code that
references it, but the instructions don't look right do they? The hex
sequences 06, 07, 08, 09, and C3, C4, C5, C6, C7, C8 would not likely be
instructions (although obviously possible). It looks more like numbers or
data. In fact the whole area from address 12AD up to 12B8 does not look
like instructions at all. Most probably this is just a data area
containing numerical values. And 8-bit values at that. If they were
16-bit values (or addresses), they would be way beyond the bounds of our
code.
So again using Function key F3, we set this area to 8-bit binary data
(type #1). Figure IIc shows what the screen looks like now. Compare this
with Figure IIa and you can see the improvement. In this way areas of the
program are disassembled one section at a time. Progress at first seems
slow I realize, but after a while the pieces start to fit together. Asè you begin to understand these small portions the remainder of the program
becomes that much easier. You are well on your way to a useful source
file.
page 20 Masterful Disassembler - Intel 8086 version 1.00 page 20
Figure IIb, Typical Display Of Partially Disassembled Program
127D:BE5011 _ MOV SI,#L1150H ;
1280:E822FC CALL L0EA5H ;
1283:E9DB1B JMP L2E61H ;
1286:556E6B6E6F77 L1286H DB 'Unknown version ';
1296:6F6620547572 DB 'of Turbo',CR,LF,0;
12A1:FF L12A1 DB 0FFH ;
12A2:9C L12A2H CBW ;Convert byte (AL) to word
12A3:2E803EA112FF CMP CS:[L12A1H],0FFH;
12A9:7404 JZ L12ACH ;
12AB:9D CWD ;Convert (AX) to dbl word
12AC:C3 L12ACH RET ;
12AD:2EC606070809 L12ADH MOV CS:[L0807H],#09;
12B3:C3 RET ;
12B4:C4C5 LES AX,BP ;
12B6:C6C7C8 MOV BH,C8 ;
12B9:E8C6FC L12B8H CALL L0F82H ;
12BE:8B4616 MOV AX,[BP]+16H ;
12C1:A38A01 MOV [L018AH],AX ;
12C4:8B4604 MOV AX,[BP]+4 ;
12C7:A38C01 MOV [L018CH],AX ;
12CA:1E PUSH DS ;
12CB:C516AA11 LDS DX,[L11AAH] ;Load DS:DX with 32b pointe
12CF:B010 MOV AL,#10H ;
12D1:B425CD21 MSDOS _SIVEC ;Set vector.
CS:: Labels= 492/23%, Types= 21/ 4%, 0 cmnts No Edit 10/ 3/87 10:20:35
For EXE type programs, there is a separate data segment to worry about.
While this probably does not contain instructions, it is still necessary
to determine if there are any address references stored here. If there
are, then they should be identified as such so they can be entered into
the label pool.
In some cases tables of addresses can be spotted easily. If most of the
addresses are close (within a few pages) then you will see similar
hexadecimal values every other byte. For example:
1234:017F097F0F7F L1234H DB 1,7FH,9,7FH,0FH,7FH;
123A:137F4F7F1080 DB 13H,7FH,4FH,7FH,10H,80H;
When these areas are changed into 16-bit address (type #3) then they
appear as follows.
è
1234:017F097F0F7F L1234H DW 7F01H,7F09H,7F0FH;
123A:137F4F7F1080 DW 7F13H,7F4FH,8010H;
page 21 Masterful Disassembler - Intel 8086 version 1.00 page 21
Figure IIc, Typical Display Of Partially Disassembled Program
127D:BE5011 _ MOV SI,#L1150H ;
1280:E822FC CALL L0EA5H ;
1283:E9DB1B JMP L2E61H ;
1286:556E6B6E6F77 L1286H DB 'Unknown version ';
1296:6F6620547572 DB 'of Turbo',CR,LF,0;
12A1:FF L12A1 DB 0FFH ;
12A2:9C L12A2H CBW ;Convert byte (AL) to word
12A3:2E803EA112FF CMP CS:[L12A1H],0FFH;
12A9:7404 JZ L12ACH ;
12AB:9D CWD ;Convert (AX) to dbl word
12AC:C3 L12ACH RET ;
12AD:2EC606070809 L12ADH DB 2EH,0C6H,6,7,8,9,0C3H;
12B4:C4C5C6C7 DB 0C4H,0C5H,0C6H,0C7H;
12B8:C8 DB 0C8H ;
12B9:E8C6FC L12B8H CALL L0F82H ;
12BC:8B4616 MOV AX,[BP]+16H ;
12BF:A38A01 MOV [L018AH],AX ;
12C2:8B4604 MOV AX,[BP]+4 ;
12C5:A38C01 MOV [L018CH],AX ;
12C8:1E PUSH DS ;
12C9:C516AA11 LDS DX,[L11AAH] ;Load DS:DX with 32b pointe
12CD:B010 MOV AL,#10H ;
12CF:B425CD21 MSDOS _SIVEC ;Set vector.
12D3:1F POP DS ;
CS:: Labels= 492/23%, Types= 21/ 4%, 0 cmnts No Edit 10/ 3/87 10:20:35
The contents of these areas are then added to the address label pool.
When disassembled, these areas will have a label to let you know that
they are referenced somewhere.
Notice how the first address of this table has a reference. Rule 4
indicates that this is required. However this is not strictly true. It is
possible that the beginning of this area is implied by the end of the
previous structure. One common approach is to have a sequence of flag
bytes that is followed by a corresponding address table. Because the
program "knows" how long the leading byte table is, is then knows the
start of the address table.
MD86 assumes that any address references present in the data segment
refer to offsets within the code segment. While this is generally true,
at times this is incorrect. If it is known to be incorrect (by
examination of the code that refers to the table of addresses), then a
choice has to be made. Either these addresses must not be defined as
16-bit addresses (change this to 8-bit binary data), or the erroreousè references to the code segment must be tolerated. It is suggested that
these be changed into 8-bit binary data. You could then add label names
to these references within the data segment to keep this correct.
page 22 Masterful Disassembler - Intel 8086 version 1.00 page 22
4.2) Understanding the Code
This is the part you have been waiting for. The real guts of the job! You
have now separated all data from instructions but what do the
instructions mean?
The Intel 8086 executes instructions in a logical order; the order chosen
by the programmer. To truly understand the function of the instructions
you must know how they are executed. For example, just knowing the
instruction
123A:2C07 SUB AL,7
will subtract 7 from the contents of register AL is not very helpful.
However, if the surrounding instructions were
1234:8A07 MOV AL,[BX] ;
1236:3C3A CMP AL,':' ;
1238:7202 JC L123BH ;
123A:2C07 SUB AL,7 ;
123B:2C30 L123BH SUB AL,'0' ;
123D:8807 MOV [BX],AL ;
you then have the feeling that register BX is pointing to one or more
bytes. And if these bytes are greater than the digit 9 (the character ":"
is just passed the digit "9" in the ASCII character set) then 7 is
subtracted. Looking 7 passed the "9" digit in the table of ASCII
characters you find the letter "A". Then in either case the value of the
digit "0" is subtracted. In other words, if register BX were pointing to
an "8", then this would be replaced with the binary value of 8. If,
however, BX points to the letter "C", it will be replaced with the value
12. So this is just converting a hexadecimal digit or digits from ASCII
to binary. Well of course! We "know" this program asks for hexadecimal
values and has to interpret them because in this case we are looking at a
DEBUG.
Because the processor executes instructions in a certain order, we must
examine them in that order. This might seem obvious (and in the above
example it is) but in many cases it is not easy to determine the way in
which instructions are executed. Consider the following code.
1234:E83033 CALL L4567H ;
1237:0130 ADD [BX+SI],SI ;è 1239:337200 XOR SI,[BP+SI]+0 ;
page 23 Masterful Disassembler - Intel 8086 version 1.00 page 23
The ADD instruction following the CALL is not actually executed at all.
By looking at the routine at address 4567 we find that the byte following
the initial CALL is just a parameter. This byte gets used and the return
will be to the following address (1238). We would not have been able to
tell this if we hadn't looked at the instructions in the same order the
processor does.
When you pick apart even a small section of code you should enter a few
comments and add a label name if you can. Then you won't have to reinvent
the wheel the next time you look at this code (and you will look at it
more than once!).
This process is going to be very laborious. It takes many instructions in
assembly language to accomplish seemingly trivial functions. Like the
simple BASIC statement "LET A(1,2)=B+C^2" may take thousands of
instructions and involve many subroutines. But all is not lost. Because
you know how the program executes (at least in a gross sense), you will
be able to tackle small portions of it at a time.
Any information you can get your hands on will help. User manuals,
especially reference manuals are a valuable source of information. Some
go so far as to include memory maps and descriptions of internal data
types. Take TURBO Pascal for example, the manual is a real gold mine!
A bottoms up approach has proven to be the most useful when disassembling
a program. Start from the lowest level. Look for the operating system
interface. The reason is that these are well defined and have a specific
calling sequence. MD86 recognizes many of the MSDOS system calls and uses
more meaningful representations. For example the instructions
1234:B409 MOV AL,9 ;
1236:CD21 INT 21H ;
is replaced with a single macro instruction
1234:B409CD21 MSDOS _OUTSTR ;Display string at (DX)
In this way you can identify the lowest level routines. Those that write
characters to the screen or read the keyboard. How about opening and
closing files and input and output from the communications ports?
Generally these are short subroutines (<100 lines) that you can
comprehend. Try to find as many of these routines as possible and give
each one a name that will help you to remember what it does. Also toss in
as many comments as you can.
Once the lowest routines have been worked on, the next higher levelè becomes easier. Now you can find those routines that read and write to
files buffers without worrying about all those instructions required to
actually get the data out to the disk.
page 24 Masterful Disassembler - Intel 8086 version 1.00 page 24
In this way the program gradually starts to unravel and before you know
it you will actually understand how the programmer was able to write it.
Execute files (those with the extension EXE) introduce a whole set of
additional problems. Not the least of which is determining actual
physical address for instructions. You see, the Intel 8086 constructs the
physical address at run time from a segment register and an offset. The
relationship is:
physical address = segment*16 + offset
Because each register is 16 bits long, there is the possibility of
tremendous overlap. An offset of 100 into segment 1234 is the same as
offset 110 into segment 1233. To further complicate matters, the segment
registers can be changed at will. Thus when an instruction is executed,
the contents of the segment registers (which may have been defined who
knows where) are of vital importance. The more segment registers are
modified within a program, the tougher the job of disassembly is.
As an example of a typical execute program, lets look at EXE2BIN.EXE.
Within the first few instructions we see the following code.
0000:1E PUSH DS
0001:33C0 XOR AX,AX
0003:50 PUSH AX
0004:B430CD21 MSDOS _GETVER
0008:3C02 CMP AL,2
000A:7D13 JGE L001FH
000C:BB3900 MOV BX,#L0039H
000F:8EDB MOV DS,BX
0011:BA5B01 MOV DX,#L015BH
0014:0E PUSH CS
0015:1F POP DS
0016:B409CD21 MSDOS _OUTST
001A:06 PUSH ES
001B:33C0 XOR AX,AX
001D:50 PUSH AX
001E:CB RET
001F:BE8100 MOV SI,#L0081H
0022:BB3900 MOV BX,#L0039H
0025:8EC3 MOV ES,BX
Lets look at this code for a second. We see that almost the first action
of this is to call MSDOS and find out what its version number is. If this
number is greater than or equal to 2 then this jumps to offset 001F. Soè the code between 000C and 001E is only executed if the version number is
less than 2. Following the jump instruction, the next two instructions
initialize the data segment register (DS) to 39 hex. That means that
page 25 Masterful Disassembler - Intel 8086 version 1.00 page 25
further references into the data segment will get to physical address 390
hex + offset. The next instruction loads the DX register with the value
15B hex. Now if we take a quick look at address 4EB hex (390+15B=4EB) in
our code we will find the start of the ascii message "Incorrect DOS
version$". A quick note, normally these addresses (ie 4EB) will be
relative to the start of the data segment within the EXE file and the
code segment follows this immediately. Thus we have to look at 4EB -
data_segment_size within our code. But for EXE2BIN.EXE, the data segment
size was zero so we can look directly at address 4EB. Now the two
following instructions are very curious. By executing the PUSH CS and POP
DS we will effectively reset the data segment register to the code
segment register, or zero within our file. Thus the call to MSDOS
function to display an ascii character string will try to get the
characters from offset 15B instead of 4EB. This is a definite bug in
EXE2BIN.EXE! The PUSH and POP instructions should not be there. Even the
best programs can contain bugs. Don't be too alarmed when you run into
one.
Moving on, at addresses 22 and 25 we see that the extra segment register
(ES) is being set to 39 hex just like the data segment register was set.
This should give us a real strong indication that address 390 hex (or a
few bytes beyond) we will find the start of a data area within our code.
This will help us later on.
One further note, when MSDOS executes an EXE type program, it initializes
the data segment and extra segment registers to point to an area called
the Program Segment Prefix (PSP). This area contains many useful items
that the program will need. So prior to changing these registers, the
program will examine this area for those items it needs. Figure III lists
those items that are of most interest to us. Refer to reference 1 for a
more complete discussion of this area.
page 26 Masterful Disassembler - Intel 8086 version 1.00 page 26
Figure III, The Program Segment Prefix Summary
Offset | Contents
------ | ------------------------------------------------------------
0002 | System memory size in paragraphs (16 byte blocks). This is a
| 16 bit integer.
|
000E | Control-C exit address. First 2 bytes are offset and second
| 2 bytes are the segment.
|
0012 | Hard error exit. 2 byte offset and 2 byte segment.
|
005C | Unopened file control block for first file specified after
| command. Only valid if a path is not specified.
|
006C | Unopened file control block for second file specified after
| command. Only valid if a path is not specified.
|
0080 | Entire text string the follows the command. The first byte
| is a character count. Note redirection information is not
| passed on to the program (it is stripped first).
4.3) Polishing the Source Code
Sooner or later you will come to the point where you must abandon the
disassembler. It has done its job but now an editor would be better
suited to working on the files.
Once you get a source file out of MD86 then you can try assembling it.
There will undoubtedly be many areas where MASM will complain. Segments
may be defined in the wrong order or some external references are not
defined at all.
Get yourself a good screen oriented editor. One with virtual memory
support is vital. Assembly programs tend to be very large and it will be
a real pain if you have to break it into small pieces because your editor
limits the code to 64k. You are going to especially need global search
and replace functions. WordStar, although rather slow, does work fine for
this type of work as long as you don't use document mode.
MD86 always inserts data type pointer override instructions. These are
the WORD PTR and BYTE PTR sequences you see all over the place. MASM does
not require an override if the types already match. That is, a value is
referenced as a 16-bit word and it has previously been defined as this
type, then an override is not required. Since MD86 does not know enough
to be sure these conditions have been met, WORD PTR will be inserted. One
of the first things you will want to do is to remove these phrases whereè they are not needed. They just clutter the code.
EXE type files pose the biggest challenge to MD86 and MASM will certainly
page 27 Masterful Disassembler - Intel 8086 version 1.00 page 27
complain about some aspect of the way the different segments are handled.
MD86 rather simplemindedly inserts tables for each segment that has any
labels defined at the start of the program. Although careful use of MD86
will limit the number of erroneous labels, some extra ones will exits and
these tables will end up being quite long.
When MD86 encounters an instruction that references a 16-bit quantity it
assumes that this is an address (or more properly an offset into a
segment). This address is put into the label pool. It is not possible to
distinguish an address reference from a pure constant. Thus you will see
many labels in the segment tables (mainly the data segment) with values
line 0, 1, 2, 7, etc. Now these may be valid addresses, but most likely
they are just constants. A worth while exercise is to eliminate as many
of these as possible. Change the address reference into a constant (ie,
change "MOV AX,OFFSET L03E8H" into MOV AX,OFFSET 1000") so you can
eliminate the "L03E8H:" definition from the data segment table.
4.4) Deciphering More Obscure Code
In the good old days when memory was expensive and processors had a
limited address range, assembly programmers delighted in seeing how much
they could squeeze into small spaces. This tendency has lessened somewhat
with the newer processors and cheap memory but you will still find some
real funny looking code.
Consider the following which was found at the start of a disk input and
output routine.
1234:F9 STC
1235:73F8 JNC L122FH
1237:B80100 MOV AX,1
123A:7304 JNC L4567H
123B:7304 JC L7654H
Wait a minute, you say. How can you have a set carry instruction (STC)
immediately followed by a jump on no carry (JNC)?. There must be
something wrong. No one writes code like that! Actually this code is
correct. Since the jump on no carry is never executed, the destination
byte is always skipped if the instructions are executed in the order
shown. However, the programmer sometimes jumps directly to address 1236
which is in the middle of the jump instruction. In this case, the
displacement is executed and this becomes a clear carry instruction (the
F8 byte). What happens is that the routine has two functions that are
very similar (like keyboard input with and without echo) and the state of
the carry flag is used to determine which function is desired. A jump to
address 1234 does one thing and a jump to 1236 does the other. Veryè sneaky!
Or how about this piece of code.
page 28 Masterful Disassembler - Intel 8086 version 1.00 page 28
1234:40 INC AX
1235:40 INC AX
1236:40 INC AX
1237:40 INC AX
1238:40 INC AX
1239:E82B33 CALL L4567H
Surely it doesn't make sense to have that many increment instructions in
a row. Or does it? Actually this is part of an error handling routine.
The idea is to load the AX register with an error number and call the
routine at 4567 to print out a message based on the error number. To
display error number 1, then the programmers writes the code
2345:31C0 XOR AX,AX
2347:E8EEEE CALL L1238H
To display error message number 4, then the call goes to address 1235
instead. For this particular procedure, the AX register always contains a
zero (it is used as an error flag) and so the XOR AX,AX instruction can
be eliminated. Then this requires only a three byte call instruction to
flag an error condition (instead of the usual five bytes). Some
programmers go to great lengths to save a few bytes of code!
5.0) Examples
A couple of example disassembly files have been included on the
distribution disk. These give you an idea of how a typical (if there is
such a thing) disassembly proceeds.
The first example is the complete disassembly of a disk file comparison
utility program called CMP.COM. This is a short (1/2k) program that took
about an hour or so to disassemble. Using MD86 to examine the progress, you
will note that all labels have been given names that more or less make
sense. In addition, numerous comments have been entered. You can write a
source file to the disk and try to assemble it or print it. If you use MASM
to assemble this file, you will run into error 56 (No immediate mode) a few
times. Refer to section 3.5 as to why this happens and how to correct it.
The second example was included to show the results of a basic disassembly.
Here the program EXE2BIN.EXE has been disassembled but only the first step
has been completed. Only the data areas have been separated. Note that this
EXE program does not have a separate data segment. When MD86 reaches a
statement like "MOV DX,L0582H" it notes that offset 582 hex within the data
segment has been referenced. Since there isn't any data segment in theè file, an equate statement is inserted when a source file is generated. But
note that the code internally sets the data segment address to be within
the code segment. Thus the reference to DS:582 is really somewhere within
page 29 Masterful Disassembler - Intel 8086 version 1.00 page 29
the code segment. MD86 does not know this and the corresponding address
within the code segment does not appear to be referenced. This is all too
typical of EXE programs. They are a real bear to disassemble.
6.0) MD86 Limitations
MD86 is designed to provide as much functionality as is reasonably possible
without requiring any special equipment. There are some restrictions
imposed on the user although the disassembly of a normal file should not be
hampered. These are:
o 2048 Address References.
o 512 Entries in the Data Type Tables.
o 2048 Comment Strings.
o 64k Maximum Data Segment Size.
o 64k Maximum Code Segment Size.
These parameters have been chosen such that a program up to 30k can be
disassembled as a single file. A 30k program would result in a 15,000 line
assembly file. When disassembling a larger file, it should be broken up.
This can be very difficult for EXE type programs. Even if MD86 could
process larger files, MASM has its own restrictions which would require
smaller sections.
7.0) MD86 Error Messages
During the process of initializing its internal tables, MD86 may display
one of a few error messages. At other times, MD86 just beeps to indicate
that some process could not be completed properly. After the beep, MD86
just waits for a "correct" response or the error to be corrected.
The error messages that may appear are:
o Help, file filename does not exist.
MD86 tried to locate the file with the name "filename" and it did not
exist. You are requested to enter another filename. A leading path may
be included if the file is under another directory or on another drive.
o Help, error reading the auto comment file. Cannot continue.è
While reading the file MD86.CMT (which did exist), MSDOS or TURBO
Pascal returned an error code. Try again. If this error persists, then
page 30 Masterful Disassembler - Intel 8086 version 1.00 page 30
re-copy this file from the master distribution disk. If this does not
help, then send a copy of MD86.CMT and MD86.COM to CC Software for
analysis. You will be contacted with the solution as soon as possible.
o Help, auto comment file MD86.CMT cannot be found on the current
directory.
Automatic comment generation is disabled.
The file MD86.CMT, which contains the automatic comment strings, could
not be located. MD86 looks only at the current drive and directory. It
does not use the PATH variable. Under most circumstances, you should
quit (F10) without saving the data. Then copy this file from the master
distribution disk over to the correct directory. Re-execute MD86 and
this error message should not appear.
In the event that you do not need these comments, you may continue.
MD86 will just ignore attempts to insert the comment strings.
o Help, data file filename.001 does not exist!
Help, Data file filename.002 does not exist!
One or both of the required data files could not be found. Both of
these files are used by MD86 for label and comment storage. When
disassembling file "filename", MD86 creates files "filename.001" and
"filename.002" to store related parameters. These files are created
under the same directory as "filename.com" or "filename.exe" was
located. More than likely you are trying to disassemble another copy of
this program under a different directory. Another possibility is that
the filename was not in a correct form (ie C>MD86 MYFILE..COM). If
neither of these situations are the cause, then contact CC Software for
additional help.
o Help, one or more data files cannot be read properly.
One of the data files (either filename.001 or filename.002) was not in
the correct form for MD86 to read. This can occur when an I/O error or
Run-time error aborts the writing of these files. Also one of these
files could just have a bad block. If there has not been a lot of work
already invested in these files, the safest procedure is to erase them
and start over. Or use DEBUG to read these files into memory to check
that they are at least readable. If they are, then send copies of these
files to CC Software for analysis.
7.1) Error Beep While Editing a Field
If you hear a beep while you are editing a field MD86 is saying that theè last command could not be completed or it was an illegal command. The
status line at the bottom of the display will show either INSERT or
REPLACE if you are editing. The following are the sources of editing
page 31 Masterful Disassembler - Intel 8086 version 1.00 page 31
errors.
o Trying to move the cursor beyond the edges of the field. Left and
right arrow keys.
o Entering a non-editing command (a function key or an up or down
arrow).
o Trying to update a field that contains an illegal character. In
particular, labels are restricted to a leading alphabetic key and
imbedded spaces are not allowed.
If you are editing, the ESCAPE key can always be used to cancel and
restore the field to its original content. Often times a key is pressed
by mistake which causes edit mode to be entered but is itself illegal.
For example, pressing the backspace key (<X) at the first column of a
field. MD86 enters edit mode but rejects the key because there is nothing
to delete. However, MD86 remains in edit mode. Press the ESCAPE key to
cancel.
7.2) Error Beep While Not Editing a Field
If the last line of the displays contains "No Edit", then you are not
editing a field. In this case MD86 beeps when an illegal command is
entered. This could be an unbound function key, one of the key pad keys,
or the ESCAPE key. MD86 ignores these keys. No harm is done.
References
1) "MS-DOS Developer's Guide", John Angermeyer and Kevin Jaeger, Howard W.
Sams & Co, 1986
2) "Peter Norton's Assembly Language Book for the IBM PC", Peter Norton and
John Socha, Prentice Hall Press, 1986
page 32 Masterful Disassembler - Intel 8086 version 1.00 page 32
C O M M A N D K E Y S U M M A R Y
Key Mode Description
---------- ---- -----------------------------------------------------------
Left-Arrow 3 Move left one space.
Right-Arrow 3 Move right one space.
Up-Arrow 2 Move up one line.
Down-Arrow 2 Move down one line.
Page-Up 2 Move up about one page.
Page-Down 2 Move down one page.
Home 1 Move to beginning of field.
Home 2 Move to beginning of label field.
End 1 Move to end of field.
End 2 Move to beginning of comment field.
Insert 3 Switch between insert and replace modes for editing.
Delete 1 Delete character under the cursor.
Backspace 1 Delete the character to the left of the cursor.
Escape 1 Cancel editing changes and return cursor to start of field.
Return/Enter 1 Make editing changes permanent.
Return/Enter 2 Move down one line (same as Down-Arrow).
F1 2 Display a one page help summary.
Shift-F1 2 Alter system parameters.
F2 2 Goto a specified address or return from previous Goto.
Shift-F2 2 Follow current instruction (jump or call only).
F3 2 Set data type for given address range.
F4 2 Set data type for unspecified address range.
F5 2 Write source file to disk.
F6 2 Scan code segment to build label table.
F7 2 Dump program in hex and ascii.
F8 2 Set label name for specified address.
F9 2 Search for an address reference.
Shift-F9 2 Search for next reference.
F10 2 Save and/or exit.
Modes: 1= Editing, 2=Non-editing, 3=Either editing or non-editing.