home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The Unsorted BBS Collection
/
thegreatunsorted.tar
/
thegreatunsorted
/
programming
/
asm_programming
/
UASM.ZIP
/
UASM.DOC
< prev
Wrap
Text File
|
1980-01-01
|
26KB
|
547 lines
1
UASM.DOC
UASM (for Unassembler) consists of five files at this time:
UASM.DOC, UASM-JMP.BAS, UASM-INT.BAS, UASM-STR.BAS and UASM-
DOS.MAC, with the purpose of converting the unassembled listing
of a .COM file from DEBUG into a .ASM file which can be modified
and re-assembled with the Macro assembler.
**************************** NOTICE *****************************
USER SUPPORTED SOFTWARE (With thanks to Andrew Flugelman)
A limited license is granted to all users of these programs,
to make and distribute copies for other users subject to the
following conditions:
1. None of the notices or credits are to be bypassed,
altered, or removed.
2. The programs are not to be distributed in modified form.
(Users are encouraged to distribute MERGE files.)
3. No fee is to be charged (or any other consideration
received) for copying or distributing the programs without
an express written agreement with White Crane Systems.
***************************************************************
UASM - The White Crane Systems Unassembler
If you are using these program and finding them of value
please send a cash contribution to support their upkeep and
distribution. Use the UASM system of programs to unassemble
one average length .COM file, look over the results and calculate
how many hours this would have taken you to produce. Multiply
this by the minimum wage, contribute that amount and use the
program free thereafter. If that's too much just send $20.
Supporters will receive free notice of enhancements and updates.
In any case you are encouraged to copy and distribute UASM
to your friends provided you do so free of charge and in unmodi-
fied form.
Guy C. Gordon
White Crane Systems
3194 Friar Tuck Way
Doraville, GA 30340
2
INTRODUCTION
The strategy used in this system is to capture the output
of DEBUG and run it through a series of BASIC programs, each
of which modifies one type of statement in the listing, making
it more like an .ASM source file. This keeps each program short
and fast, and allows you to look over the output at each step
to make sure no mistakes have been entered. It also makes the
programs easy to understand and improve as new steps can be
added without interfering with the first steps. Later in its
development UASM will combine these steps. I hope that users
of these programs will send me their improvements so that I
may add them to future releases.
UASM-JMP takes captured unassembled code from DEBUG (which
we will name FILE.DB) and finds all addresses referenced by
the various Jump, Call, and Loop instructions. These referenced
addresses are made into labels of the form Lhhhh (where hhhh
is the hex address). A new file (FILE.JMP) is then written
in the form of assembler source code. All of the addresses
and hex opcodes in the left two columns of the DEBUG listing
are left out. Referenced lines are appropriately labeled as
Lhhhh:. In addition, unconditional program transfers such as
JMP, JMPS, RET and IRET have blank lines inserted after them.
If the next line is not referenced it will be force labeled,
and a warning comment will be appended. The line after a RET
or IRET is most likely the beginning of a Procedure, and is
preceeded by three blank lines.
UASM-INT reads FILE.JMP and writes FILE.INT in which it
has added Macro calls and comments explaining the various Inter-
rupts. The macros, symbols, and comments are read from the
file UASM-DOS.MAC. This file contains a table of EQUates which
define the symbols for the various DOS function calls and the
DOSCALL macro. It is included in FILE.INT by means of an INCLUDE
directive.
UASM-STR reads FILE.INT and writes FILE.STR. Whenever it
encounters a DOSCALL PRINT$ hhhh it reads the string beginning
at hhhh from the original .COM file and prints it as a comment
beside the macro call. It also generates a Dhhhh: DB 'string'
instruction at the end of the file. Carriage Returns, Line
Feeds, TABs and ESCapes are expressed as symbols. All other
non-printing characters are expressed as hex data bytes. Because
this will not catch all text strings in the file, you are also
allowed to specify ranges of DEBUG addresses in which UASM-STR
is to find all the strings it can. Whenever the code loads
the DX register with the address of one of these strings, that
address is converted to a label and the string is added to the
line as a comment.
3
From that point on, you must take over and supply the remain-
ing text strings and variables that are addressed. You should
heavily comment the code as you go through it and change the
labels that UASM has assigned into more meaningful names. This
is best done with the global change command in your text editor.
I also recommend using the Macro CREF program to obtain a cross
reference map of the symbols.
These programs are by no means infallible, and they can
no more read the programmers' mind than you or I, so you will
have to check the output closely. If you expect to simply run
UASM and be handed a usable source file you're going to be disap-
pointed. On the other hand, if you've ever tried to understand
a program from just a DEBUG listing you will be pleasantly sur-
prised. UASM will aid you in studying other programs by doing
a lot of the dirty work for you, but if you don't study the
code you won't get usable output. For example an interrupt
handling subroutine will not necessarily be assigned a label
by UASM-JMP since it is not accessed by a Jump but by an inter-
rupt. Therefore if you find a DOSCALL SET$INT hhhh in the UASM-
INT output you must check to see if the label Lhhhh was gener-
ated. If not will have to go back to the DEBUG output to find
the routine at address hhhh and assign it a label of your own.
At present, UASM-INT only keeps track of the AX, AH, AL,
DX, and DL registers. Future improvements will involve a more
complete (and much more complicated) DOSCALL macro in the UASM-
DOS.MAC file and the proper calling of it by UASM-INT. For
now, keep a close eye on the interrupts.
I have been using these programs to unassemble DEBUG.COM
and COMMAND.COM. When I have them sufficiently commented I
will post them on the BBS's. At present I use mainly the Multi-
Link BBS at (404) 252-9438. It is my hope that UASM will lead
to a whole library of well commented, "reverse engineered" source
code for the MS-DOS operating system and utilities. I would
appreciate anyone else working on the same to upload your results
to the BBS. Suggestions and improvements are welcome. Please
post them on the MultiLink BBS or send them directly to:
Guy C. Gordon
White Crane Systems
3194 Friar Tuck Way
Doraville, GA 30340
OPERATING INSTRUCTIONS
-DEBUG-
As an example, we will unassemble a fictitious file, FILE.COM
A>debug file.com
-r
.....CX=1780 ... ;file length in hex bytes
-d 100 l 1780 ;display entire file
4
In the listing that follows you should be able to spot ASCII
text and any regular binary tables. Write down the beginning
and ending addresses of these, as we do not want to unassemble
them, but we will want a printed copy. Our aim is to put togeth-
er a list of all blocks of code to be unassembled and string
addresses for UASM-STR. Look at the code before each block
of text. Usually it will be preceded by a hex C3 which is a
RET instruction, but there may be a JMP, JMPS, IRET, or RETF
instead. This is the last instruction we want to unassemble
in the block of code preceding the text. Take your time and
go through the entire file, unassembling code and making sure
that the output looks reasonable.
Reasonable code contains such things as CALL or Jump instruc-
tions to nearby addresses, INT 21 instructions and multiple
operations on single registers. It does not contain DB instruc-
tions or very many 00 bytes. Also the ASCII display of a section
of code will look totally random, with about 50% of it being
displayable characters. (The rest will be periods.) Peter
Norton has given a good demonstration of this in chapter 6 of
"Inside the IBM-PC". One warning--the DEBUG unassembler tends
to lock into phase with the correct code, which is very nice,
but be certain that the beginning few instructions are also
in phase. Sections of code that are in phase will contain Jumps
and CALLs to other sections, thus telling you where to start
unassembling.
At the end of this investigation of the .COM file you should
have a list of the starting and ending addresses of all the
code blocks and all the string blocks. The next step depends
upon whether you have DOS 2.0 or not. It is much easier if
you have 2.0, or can to this part on a friend's machine who
has it. This is because under DOS 2.0 we can pipe the output
of DEBUG into a file thus capturing the unassembled code for
input to UASM-JMP. Under DOS version 1. we must modify DEBUG
(using DEBUG of course) to get it to write the file we need.
5
DEBUG - 2.0 Instructions
Create a file, FILE.IN, with the following DEBUG instruc-
tions:
u addr 1 addr 2 ;addresses of blocks of
u addr 3 addr 4 ; code to unassemble
u addr 5 addr 6 ; from our initial investiga-
tion
q ;Quit instruction at end
Now we can run DEBUG and pipe the output to a disk file
DEBUG FILE.COM <FILE.IN >FILE.DB
FILE.DB is the input for UASM-JMP.
DEBUG - 1.1 Instructions
While it is quite easy to capture the output of DEBUG under
DOS 2.0 since we can pip it to a file, under earlier versions
of DOS we have no such option. However, DEBUG is an exceptional-
ly powerful program, and already contains the code necessary
to write a disk file with the Write command. We will use this
to capture the Unassembled code.
If we unassemble and examine DEBUG, we can find the following
subroutine:
02C8:02C0 PUSH AX ;save registers
PUSH DX
AND AL,7F ;insure character is ASCII
XCHG DX,AX ;put character in DL
MOV AH,02 ;DOS Function 2 to display
DL
INT 21
POP DX ;restore registers
POP AX
RET ;return
As it turns out, DEBUG does all of its screen output through
this subroutine. Thus we can modify just this subroutine and
capture each character as it is displayed. What we will do
with it is write it out to an unused portion of memory. From
there we can write all the output to a file using the Write
command.
6
Our subroutine to store character AL in consecutive memory
locations will be very small--about 20 bytes. We'll need some-
place to put it. For DEBUG 1.07 I chose to put it inside a
string which is only printed once--the message "DEBUG version
1.07" located at 0102. Here is the subroutine:
02C8:0102 DW 3300 ;pointer to memory
PUSH DI ;save index register
SEG CS ;offset form code, not ES
MOV DI,[0102] ;get pointer
SEG CS ;
STOSB ;store char in AL into memory
SEG CS ;
MOV [0102],DI ;store incremented pointer
POP DI ;restore register
XCHG DX,AX ;complete the instructions
that
MOV AH,02 ; CALL to this routine re-
placed
RET ;Return to Display routine
We can store this subroutine over the string with the Enter
command. (here 02C8 is the base address where DEBUG is loaded):
E 2C8:102 00 33 57 2E 8B 3E 02 01 2E AA 2E 89 3E 02 01 5F 92
B4 02 C3
We can check that this was entered correctly by Unassembling
it:
U 2C8:104 ;you should see the subroutine listed above.
The choice of memory location is up to you. 3300 Is the
value I used while unassembling DEBUG. It should be larger
than the sum of the sizes (in bytes) of DEBUG and the program
you are unassembling. To have this subroutine called each time
DEBUG writes a character, we insert a subroutine Call:
E 2C8:2C4 E8 3D FE ;Call 0104
This puts a CALL 0104 in place of XCHG DX,AX and MOV AH,02.
That is why we perform those instructions before returning to
the display routine. The very next charter printed by DEBUG
after you Enter the above command will be stored in location
2C8:3300 as well as displayed on the screen.
7
Immediately after entering the CALL instruction above you
should begin the Unassemble commands that you determined will
give you all the code for the program.
U 100 4D5
U 6b0 799
etc.
D 2C8:102 103 ;This displays the pointer to the
end of text
B3 D9 ;This means we filled memory to D9B3
;(remember the 8088 stores words backwards)
H D9B3 3300 ;Hex arithmetic
0CB3 A6B3 ; D9B3 - 3300 = A6B3
R CX
CX=1748
:A6B3 ;load CX register with number of bytes to
write
N FILE.DB ;name the output file
W 2C8:3300 ;start writing at 3300 off. from DEBUG base
Writing A6B3 bytes
E 2C8:102 00 33 ;reset pointer if out of space
Remember, you can only write text to memory up to 2C8:FFFF.
If you exceed that you will write over DEBUG at 2C8:0000 and
will probably have to re-boot. If FILE.COM is too big to Unas-
semble in one pass you'll have to do it in pieces and append
them together with your text editor. For this reason it is
a good idea to modify and save a copy of DEBUG under another
name such as UDEBUG. If you need to perform any other operations
with a modified DEBUG that you do not want written to memory
you can restore DEBUG to normal operation with:
E 2C8:2C4 92 B4 02 ;restores XCHG DX,AX and MOV AH,02
Now text edit FILE.DB and remove any extraneous lines such
as debug prompts that might have been displayed. If there are
any TABs in FILE.DB they will confuse UASM-JMP and the others.
DEBUG 1.1 appears to put a TAB after each instruction while
version 2.0 does not. I always use the text editor to change
all TABs to the appropriate number of spaces. (Users of PMATE,
use the YF command.)
Any of the memory addresses above may vary with your operat-
ing system and DEBUG version. The values given are for the
Victor 9000, MS-DOS 1.25a, and DEBUG 1.07. The Base Segment
where DEBUG is loaded (2C8 above) will depend upon your machine
and operating system, and is found by using DEBUG to Search
for itself in memory. The display subroutine (2C0 above) depends
upon your DEBUG version number. The same subroutine occurs
at 2B5 in the DEBUG that comes with PC-DOS 1.10, and will appear
near these locations in any other version 1 DEBUGs. If you
store the capture subroutine at some other place in memory you
need to change the two [0102] references and the CALL 0104 in-
struction.
8
UASM-JMP Instructions
Run UASM-JMP as you would any basic program. It will prompt
you for the name of input and output files. Respond with
FILE.DB ,which we created above, and B:FILE.JMP for output.
If file extensions are not provided, .DB and .JMP will be assumed
for input and output respectively. Also the output file name
will default to the input file name. I highly recommend putting
these files on separate drives if you don't have a fixed disk.
This will speed up the program and save wear on your floppies.
UASM-JMP will make two passes through the input file. On
the first pass it will build a list of all referenced lines.
It then sorts this list (shell sort), eliminates duplicate ref-
erences, and on the second pass, labels all of the references.
The output will be displayed on your screen as well as written
out on the second pass.
If the program finds a Jump or CALL to an address not con-
tained in the file you will get the message "WARNING! No code
for this label". This most likely means you missed the block
of code starting at address hhhh and will have to add it to
the input file for DEBUG. The statement after an unconditional
program transfer (JMP or RET) is always labeled. The message
"WARNING! This label not referenced" means that there is no
Jump or CALL to this label. It might be an interrupt handler,
or it might just be left over code in a modified program. A
large number of these errors might indicate that they are ac-
cessed by an address table. Both of the above errors might
occur if you miss a block of code, unassemble a data area, or
the code modifies itself.
UASM-INT Instructions
To run UASM-INT you must also have the data file UASM-DOS.MAC
on the default drive. UASM-INT will prompt you for an input
and output file names. If extensions are not provided, .JMP
and .INT will be assumed for input and output respectively.
The program then loads the symbol table contained in UASM-
DOS.MAC. While reading through FILE.JMP, whenever UASM-INT
encounters an INT instruction it adds a Macro call, Symbols
for the DOS function calls, and Comments from the UASM-DOS.MAC
file. These lines will also be displayed on the screen as the
program progresses. Note that the DOSCALL Macro is inserted
in the text, but the INT instruction is not deleted. After
you have checked the code you must delete the INT and any MOV
instructions that will be duplicated by the Macro.
9
UASM-STR Instructions
To run UASM-STR you must have the original FILE.COM or other
binary file on disk. The program will prompt you for the input,
output, and binary file names. These will default to .INT,
.STR and .COM if no other extension is given. As usual, the
input file name will be used as a default if you do not specify
the others, and you should put the output file on a different
floppy drive than the input file.
You will then be prompted for any string area addresses
that you may have found while examining FILE.COM with DEBUG.
You may enter an address range (hhhh kkkk) or the address of
a single string (hhhh) on each line. (Up to ten lines) Each
address must be a four digit hex offset (taken directly from
DEBUG). Upon receiving a blank line as input, the program will
find all strings terminated with a $ starting at the first
address in a range and continue finding multiple strings to
the second address if present. If a single address is given
on a line a single string will be read. Each string is dis-
played as it is found.
Following this the program reads through FILE.INT. For
each "DOSCALL PRINT$ hhhh" encountered it reads the string
from FILE.COM at the specified location (taking into account
the 100H byte program prefix) and prints that string as a comment
next to the Macro. Also, each time the DX register is loaded
with the address of a string, that string is shown next to the
code. At the end of the file, UASM-INT will append a number
of EQUates and Data statements and define the string variables
with names Dhhhh. Non-printing characters are converted into
hex bytes. CR, LF, TAB, ESC, and $ are defined as symbols.
10
SAMPLE OUTPUT - Excerpts from DEBUG.STR
INCLUDE UASM-DOS.MAC
.RADIX 16
START: JMPS L011D
L011D: MOV SP,1822
MOV [1897],AL
MOV DX,0102
MOV AH,09
INT 21
DOSCALL PRINT$,D0102 ;CR,LF,'DEBUG-86 version 1.07',CR,LF,$
MOV AX,2522
MOV DX,01E6
INT 21
DOSCALL SET$INT 01E6 ; Set interrupt vector (AL=INT, DS:DX=VECTOR)
MOV AL,23
MOV DX,01EB
INT 21
DOSCALL SET$INT 01EB ; Set interrupt vector (AL=INT, DS:DX=VECTOR)
MOV DX,CS
ADD DX,01AB
MOV AH,26
INT 21
DOSCALL BUILD$PS 01AB ; Create new program segment (DX=SEGMENT)
MOV AX,DX
MOV DI,1832
STOSW
MOV DX,0080
MOV AH,1A
INT 21
DOSCALL SET$DTA 0080 ; Set Disk Transfer Address to DX
MOV AX,[0006]
MOV BX,AX
CMP AX,FFF0
PUSH CS
POP DS
ADD [0008],BX
MOV DI,005C
MOV SI,0081
MOV AX,2901
11
INT 21
DOSCALL PARSE$ ; Parse Filespec (SI -> LINE, DI -> FCB, AL=CODE)
CALL L0917
PUSH CS
POP ES
CMP B,[005D],20
JZ L01B5
JMPS L01B5
L01E3: JMP L04CB
L01E6: MOV DX,167A ;WARNING! This label not referenced
MOV DS,AX
MOV SS,AX
MOV SP,1822
MOV AH,09
INT 21
DOSCALL PRINT$ ; Display string @DX till terminator
JMPS L01B5
L01FD: MOV AH,0A
MOV DX,1844
INT 21
DOSCALL INSTR$ 1844 ; Input keyboard string (DX -> size,cnt,buffer)
MOV SI,1846
;END CODE
.RADIX 16
CR EQU 0D
LF EQU 0A
TAB EQU 09
ESC EQU 1B
$ EQU 24
D167A DB CR,LF,'Program terminated normally',CR,LF,$
D169A DB 'Invalid drive or file name',CR,LF,$
D16B7 DB 'File not found',CR,LF,$
D16C8 DB 'No room in disk directory',CR,LF,$
D16E4 DB 'Insufficient space on disk',CR,LF,$
D1701 DB 'Disk$'
D1706 DB 'Write protect$'
D1714 DB ' error reading drive A',CR,LF,$
D172D DB 'readwritInsufficient memory',CR,LF,$
D174B DB '^ Error',CR,8A,' ',88,'Error in EXE/HEX file',CR,LF,$
D176E DB 'EXE/HEX file cannot be written',CR,LF,$
D178F DB 'Writing $'
D1798 DB ' bytes',CR,LF,$
D0102 DB CR,LF,'DEBUG-86 version 1.07',CR,LF,$
' bytes',CR,LF,$
D0102 DB CR,LF,'DEBUG-86 version 1.07',CR,LF,$