home *** CD-ROM | disk | FTP | other *** search
- 1
- UASM.DOC
-
- UASM (for Unassembler) consists of five files at this time:
- UASM.DOC, UASM-JMP.BAS, UASM-INT.BAS, UASM-STR.BAS and UASM-
- DOS.MAC, with the purpose of converting the unassembled listing
- of a .COM file from DEBUG into a .ASM file which can be modified
- and re-assembled with the Macro assembler.
-
-
-
- **************************** NOTICE *****************************
-
- USER SUPPORTED SOFTWARE (With thanks to Andrew Flugelman)
-
- A limited license is granted to all users of these programs,
- to make and distribute copies for other users subject to the
- following conditions:
-
- 1. None of the notices or credits are to be bypassed,
- altered, or removed.
- 2. The programs are not to be distributed in modified form.
- (Users are encouraged to distribute MERGE files.)
- 3. No fee is to be charged (or any other consideration
- received) for copying or distributing the programs without
- an express written agreement with White Crane Systems.
-
- ***************************************************************
-
-
-
- UASM - The White Crane Systems Unassembler
-
- If you are using these program and finding them of value
- please send a cash contribution to support their upkeep and
- distribution. Use the UASM system of programs to unassemble
- one average length .COM file, look over the results and calculate
- how many hours this would have taken you to produce. Multiply
- this by the minimum wage, contribute that amount and use the
- program free thereafter. If that's too much just send $20.
- Supporters will receive free notice of enhancements and updates.
-
- In any case you are encouraged to copy and distribute UASM
- to your friends provided you do so free of charge and in unmodi-
- fied form.
-
- Guy C. Gordon
- White Crane Systems
- 3194 Friar Tuck Way
- Doraville, GA 30340
-
-
- 2
-
- INTRODUCTION
-
- The strategy used in this system is to capture the output
- of DEBUG and run it through a series of BASIC programs, each
- of which modifies one type of statement in the listing, making
- it more like an .ASM source file. This keeps each program short
- and fast, and allows you to look over the output at each step
- to make sure no mistakes have been entered. It also makes the
- programs easy to understand and improve as new steps can be
- added without interfering with the first steps. Later in its
- development UASM will combine these steps. I hope that users
- of these programs will send me their improvements so that I
- may add them to future releases.
-
- UASM-JMP takes captured unassembled code from DEBUG (which
- we will name FILE.DB) and finds all addresses referenced by
- the various Jump, Call, and Loop instructions. These referenced
- addresses are made into labels of the form Lhhhh (where hhhh
- is the hex address). A new file (FILE.JMP) is then written
- in the form of assembler source code. All of the addresses
- and hex opcodes in the left two columns of the DEBUG listing
- are left out. Referenced lines are appropriately labeled as
- Lhhhh:. In addition, unconditional program transfers such as
- JMP, JMPS, RET and IRET have blank lines inserted after them.
- If the next line is not referenced it will be force labeled,
- and a warning comment will be appended. The line after a RET
- or IRET is most likely the beginning of a Procedure, and is
- preceeded by three blank lines.
-
- UASM-INT reads FILE.JMP and writes FILE.INT in which it
- has added Macro calls and comments explaining the various Inter-
- rupts. The macros, symbols, and comments are read from the
- file UASM-DOS.MAC. This file contains a table of EQUates which
- define the symbols for the various DOS function calls and the
- DOSCALL macro. It is included in FILE.INT by means of an INCLUDE
- directive.
-
- UASM-STR reads FILE.INT and writes FILE.STR. Whenever it
- encounters a DOSCALL PRINT$ hhhh it reads the string beginning
- at hhhh from the original .COM file and prints it as a comment
- beside the macro call. It also generates a Dhhhh: DB 'string'
- instruction at the end of the file. Carriage Returns, Line
- Feeds, TABs and ESCapes are expressed as symbols. All other
- non-printing characters are expressed as hex data bytes. Because
- this will not catch all text strings in the file, you are also
- allowed to specify ranges of DEBUG addresses in which UASM-STR
- is to find all the strings it can. Whenever the code loads
- the DX register with the address of one of these strings, that
- address is converted to a label and the string is added to the
- line as a comment.
-
- 3
- From that point on, you must take over and supply the remain-
- ing text strings and variables that are addressed. You should
- heavily comment the code as you go through it and change the
- labels that UASM has assigned into more meaningful names. This
- is best done with the global change command in your text editor.
- I also recommend using the Macro CREF program to obtain a cross
- reference map of the symbols.
-
- These programs are by no means infallible, and they can
- no more read the programmers' mind than you or I, so you will
- have to check the output closely. If you expect to simply run
- UASM and be handed a usable source file you're going to be disap-
- pointed. On the other hand, if you've ever tried to understand
- a program from just a DEBUG listing you will be pleasantly sur-
- prised. UASM will aid you in studying other programs by doing
- a lot of the dirty work for you, but if you don't study the
- code you won't get usable output. For example an interrupt
- handling subroutine will not necessarily be assigned a label
- by UASM-JMP since it is not accessed by a Jump but by an inter-
- rupt. Therefore if you find a DOSCALL SET$INT hhhh in the UASM-
- INT output you must check to see if the label Lhhhh was gener-
- ated. If not will have to go back to the DEBUG output to find
- the routine at address hhhh and assign it a label of your own.
-
- At present, UASM-INT only keeps track of the AX, AH, AL,
- DX, and DL registers. Future improvements will involve a more
- complete (and much more complicated) DOSCALL macro in the UASM-
- DOS.MAC file and the proper calling of it by UASM-INT. For
- now, keep a close eye on the interrupts.
-
- I have been using these programs to unassemble DEBUG.COM
- and COMMAND.COM. When I have them sufficiently commented I
- will post them on the BBS's. At present I use mainly the Multi-
- Link BBS at (404) 252-9438. It is my hope that UASM will lead
- to a whole library of well commented, "reverse engineered" source
- code for the MS-DOS operating system and utilities. I would
- appreciate anyone else working on the same to upload your results
- to the BBS. Suggestions and improvements are welcome. Please
- post them on the MultiLink BBS or send them directly to:
-
- Guy C. Gordon
- White Crane Systems
- 3194 Friar Tuck Way
- Doraville, GA 30340
- OPERATING INSTRUCTIONS
- -DEBUG-
-
- As an example, we will unassemble a fictitious file, FILE.COM
- A>debug file.com
- -r
- .....CX=1780 ... ;file length in hex bytes
- -d 100 l 1780 ;display entire file
-
-
- 4
-
- In the listing that follows you should be able to spot ASCII
- text and any regular binary tables. Write down the beginning
- and ending addresses of these, as we do not want to unassemble
- them, but we will want a printed copy. Our aim is to put togeth-
- er a list of all blocks of code to be unassembled and string
- addresses for UASM-STR. Look at the code before each block
- of text. Usually it will be preceded by a hex C3 which is a
- RET instruction, but there may be a JMP, JMPS, IRET, or RETF
- instead. This is the last instruction we want to unassemble
- in the block of code preceding the text. Take your time and
- go through the entire file, unassembling code and making sure
- that the output looks reasonable.
-
- Reasonable code contains such things as CALL or Jump instruc-
- tions to nearby addresses, INT 21 instructions and multiple
- operations on single registers. It does not contain DB instruc-
- tions or very many 00 bytes. Also the ASCII display of a section
- of code will look totally random, with about 50% of it being
- displayable characters. (The rest will be periods.) Peter
- Norton has given a good demonstration of this in chapter 6 of
- "Inside the IBM-PC". One warning--the DEBUG unassembler tends
- to lock into phase with the correct code, which is very nice,
- but be certain that the beginning few instructions are also
- in phase. Sections of code that are in phase will contain Jumps
- and CALLs to other sections, thus telling you where to start
- unassembling.
-
- At the end of this investigation of the .COM file you should
- have a list of the starting and ending addresses of all the
- code blocks and all the string blocks. The next step depends
- upon whether you have DOS 2.0 or not. It is much easier if
- you have 2.0, or can to this part on a friend's machine who
- has it. This is because under DOS 2.0 we can pipe the output
- of DEBUG into a file thus capturing the unassembled code for
- input to UASM-JMP. Under DOS version 1. we must modify DEBUG
- (using DEBUG of course) to get it to write the file we need.
-
-
- 5
- DEBUG - 2.0 Instructions
-
- Create a file, FILE.IN, with the following DEBUG instruc-
- tions:
-
- u addr 1 addr 2 ;addresses of blocks of
- u addr 3 addr 4 ; code to unassemble
- u addr 5 addr 6 ; from our initial investiga-
- tion
- q ;Quit instruction at end
-
- Now we can run DEBUG and pipe the output to a disk file
-
- DEBUG FILE.COM <FILE.IN >FILE.DB
-
- FILE.DB is the input for UASM-JMP.
-
-
- DEBUG - 1.1 Instructions
-
- While it is quite easy to capture the output of DEBUG under
- DOS 2.0 since we can pip it to a file, under earlier versions
- of DOS we have no such option. However, DEBUG is an exceptional-
- ly powerful program, and already contains the code necessary
- to write a disk file with the Write command. We will use this
- to capture the Unassembled code.
-
- If we unassemble and examine DEBUG, we can find the following
- subroutine:
-
- 02C8:02C0 PUSH AX ;save registers
- PUSH DX
- AND AL,7F ;insure character is ASCII
- XCHG DX,AX ;put character in DL
- MOV AH,02 ;DOS Function 2 to display
- DL
- INT 21
- POP DX ;restore registers
- POP AX
- RET ;return
-
- As it turns out, DEBUG does all of its screen output through
- this subroutine. Thus we can modify just this subroutine and
- capture each character as it is displayed. What we will do
- with it is write it out to an unused portion of memory. From
- there we can write all the output to a file using the Write
- command.
-
- 6
- Our subroutine to store character AL in consecutive memory
- locations will be very small--about 20 bytes. We'll need some-
- place to put it. For DEBUG 1.07 I chose to put it inside a
- string which is only printed once--the message "DEBUG version
- 1.07" located at 0102. Here is the subroutine:
-
- 02C8:0102 DW 3300 ;pointer to memory
- PUSH DI ;save index register
- SEG CS ;offset form code, not ES
- MOV DI,[0102] ;get pointer
- SEG CS ;
- STOSB ;store char in AL into memory
- SEG CS ;
- MOV [0102],DI ;store incremented pointer
- POP DI ;restore register
- XCHG DX,AX ;complete the instructions
- that
- MOV AH,02 ; CALL to this routine re-
- placed
- RET ;Return to Display routine
-
- We can store this subroutine over the string with the Enter
- command. (here 02C8 is the base address where DEBUG is loaded):
-
- E 2C8:102 00 33 57 2E 8B 3E 02 01 2E AA 2E 89 3E 02 01 5F 92
- B4 02 C3
-
- We can check that this was entered correctly by Unassembling
- it:
-
- U 2C8:104 ;you should see the subroutine listed above.
-
- The choice of memory location is up to you. 3300 Is the
- value I used while unassembling DEBUG. It should be larger
- than the sum of the sizes (in bytes) of DEBUG and the program
- you are unassembling. To have this subroutine called each time
- DEBUG writes a character, we insert a subroutine Call:
-
- E 2C8:2C4 E8 3D FE ;Call 0104
-
- This puts a CALL 0104 in place of XCHG DX,AX and MOV AH,02.
- That is why we perform those instructions before returning to
- the display routine. The very next charter printed by DEBUG
- after you Enter the above command will be stored in location
- 2C8:3300 as well as displayed on the screen.
-
-
- 7
- Immediately after entering the CALL instruction above you
- should begin the Unassemble commands that you determined will
- give you all the code for the program.
-
- U 100 4D5
- U 6b0 799
- etc.
- D 2C8:102 103 ;This displays the pointer to the
- end of text
- B3 D9 ;This means we filled memory to D9B3
- ;(remember the 8088 stores words backwards)
- H D9B3 3300 ;Hex arithmetic
- 0CB3 A6B3 ; D9B3 - 3300 = A6B3
- R CX
- CX=1748
- :A6B3 ;load CX register with number of bytes to
- write
- N FILE.DB ;name the output file
- W 2C8:3300 ;start writing at 3300 off. from DEBUG base
- Writing A6B3 bytes
- E 2C8:102 00 33 ;reset pointer if out of space
-
- Remember, you can only write text to memory up to 2C8:FFFF.
- If you exceed that you will write over DEBUG at 2C8:0000 and
- will probably have to re-boot. If FILE.COM is too big to Unas-
- semble in one pass you'll have to do it in pieces and append
- them together with your text editor. For this reason it is
- a good idea to modify and save a copy of DEBUG under another
- name such as UDEBUG. If you need to perform any other operations
- with a modified DEBUG that you do not want written to memory
- you can restore DEBUG to normal operation with:
-
- E 2C8:2C4 92 B4 02 ;restores XCHG DX,AX and MOV AH,02
-
- Now text edit FILE.DB and remove any extraneous lines such
- as debug prompts that might have been displayed. If there are
- any TABs in FILE.DB they will confuse UASM-JMP and the others.
- DEBUG 1.1 appears to put a TAB after each instruction while
- version 2.0 does not. I always use the text editor to change
- all TABs to the appropriate number of spaces. (Users of PMATE,
- use the YF command.)
-
- Any of the memory addresses above may vary with your operat-
- ing system and DEBUG version. The values given are for the
- Victor 9000, MS-DOS 1.25a, and DEBUG 1.07. The Base Segment
- where DEBUG is loaded (2C8 above) will depend upon your machine
- and operating system, and is found by using DEBUG to Search
- for itself in memory. The display subroutine (2C0 above) depends
- upon your DEBUG version number. The same subroutine occurs
- at 2B5 in the DEBUG that comes with PC-DOS 1.10, and will appear
- near these locations in any other version 1 DEBUGs. If you
- store the capture subroutine at some other place in memory you
- need to change the two [0102] references and the CALL 0104 in-
- struction.
-
- 8
- UASM-JMP Instructions
-
- Run UASM-JMP as you would any basic program. It will prompt
- you for the name of input and output files. Respond with
- FILE.DB ,which we created above, and B:FILE.JMP for output.
- If file extensions are not provided, .DB and .JMP will be assumed
- for input and output respectively. Also the output file name
- will default to the input file name. I highly recommend putting
- these files on separate drives if you don't have a fixed disk.
- This will speed up the program and save wear on your floppies.
-
- UASM-JMP will make two passes through the input file. On
- the first pass it will build a list of all referenced lines.
- It then sorts this list (shell sort), eliminates duplicate ref-
- erences, and on the second pass, labels all of the references.
- The output will be displayed on your screen as well as written
- out on the second pass.
-
- If the program finds a Jump or CALL to an address not con-
- tained in the file you will get the message "WARNING! No code
- for this label". This most likely means you missed the block
- of code starting at address hhhh and will have to add it to
- the input file for DEBUG. The statement after an unconditional
- program transfer (JMP or RET) is always labeled. The message
- "WARNING! This label not referenced" means that there is no
- Jump or CALL to this label. It might be an interrupt handler,
- or it might just be left over code in a modified program. A
- large number of these errors might indicate that they are ac-
- cessed by an address table. Both of the above errors might
- occur if you miss a block of code, unassemble a data area, or
- the code modifies itself.
-
-
- UASM-INT Instructions
-
- To run UASM-INT you must also have the data file UASM-DOS.MAC
- on the default drive. UASM-INT will prompt you for an input
- and output file names. If extensions are not provided, .JMP
- and .INT will be assumed for input and output respectively.
- The program then loads the symbol table contained in UASM-
- DOS.MAC. While reading through FILE.JMP, whenever UASM-INT
- encounters an INT instruction it adds a Macro call, Symbols
- for the DOS function calls, and Comments from the UASM-DOS.MAC
- file. These lines will also be displayed on the screen as the
- program progresses. Note that the DOSCALL Macro is inserted
- in the text, but the INT instruction is not deleted. After
- you have checked the code you must delete the INT and any MOV
- instructions that will be duplicated by the Macro.
-
- 9
- UASM-STR Instructions
-
- To run UASM-STR you must have the original FILE.COM or other
- binary file on disk. The program will prompt you for the input,
- output, and binary file names. These will default to .INT,
- .STR and .COM if no other extension is given. As usual, the
- input file name will be used as a default if you do not specify
- the others, and you should put the output file on a different
- floppy drive than the input file.
-
- You will then be prompted for any string area addresses
- that you may have found while examining FILE.COM with DEBUG.
- You may enter an address range (hhhh kkkk) or the address of
- a single string (hhhh) on each line. (Up to ten lines) Each
- address must be a four digit hex offset (taken directly from
- DEBUG). Upon receiving a blank line as input, the program will
- find all strings terminated with a $ starting at the first
- address in a range and continue finding multiple strings to
- the second address if present. If a single address is given
- on a line a single string will be read. Each string is dis-
- played as it is found.
-
- Following this the program reads through FILE.INT. For
- each "DOSCALL PRINT$ hhhh" encountered it reads the string
- from FILE.COM at the specified location (taking into account
- the 100H byte program prefix) and prints that string as a comment
- next to the Macro. Also, each time the DX register is loaded
- with the address of a string, that string is shown next to the
- code. At the end of the file, UASM-INT will append a number
- of EQUates and Data statements and define the string variables
- with names Dhhhh. Non-printing characters are converted into
- hex bytes. CR, LF, TAB, ESC, and $ are defined as symbols.
-
- 10
- SAMPLE OUTPUT - Excerpts from DEBUG.STR
-
- INCLUDE UASM-DOS.MAC
- .RADIX 16
-
- START: JMPS L011D
-
- L011D: MOV SP,1822
- MOV [1897],AL
- MOV DX,0102
- MOV AH,09
- INT 21
-
- DOSCALL PRINT$,D0102 ;CR,LF,'DEBUG-86 version 1.07',CR,LF,$
- MOV AX,2522
- MOV DX,01E6
- INT 21
-
- DOSCALL SET$INT 01E6 ; Set interrupt vector (AL=INT, DS:DX=VECTOR)
-
- MOV AL,23
- MOV DX,01EB
- INT 21
-
- DOSCALL SET$INT 01EB ; Set interrupt vector (AL=INT, DS:DX=VECTOR)
-
- MOV DX,CS
- ADD DX,01AB
- MOV AH,26
- INT 21
-
- DOSCALL BUILD$PS 01AB ; Create new program segment (DX=SEGMENT)
-
-
- MOV AX,DX
- MOV DI,1832
- STOSW
- MOV DX,0080
- MOV AH,1A
- INT 21
-
- DOSCALL SET$DTA 0080 ; Set Disk Transfer Address to DX
-
- MOV AX,[0006]
- MOV BX,AX
- CMP AX,FFF0
- PUSH CS
- POP DS
- ADD [0008],BX
- MOV DI,005C
- MOV SI,0081
- MOV AX,2901
-
- 11
- INT 21
-
- DOSCALL PARSE$ ; Parse Filespec (SI -> LINE, DI -> FCB, AL=CODE)
-
- CALL L0917
- PUSH CS
- POP ES
- CMP B,[005D],20
- JZ L01B5
- JMPS L01B5
-
- L01E3: JMP L04CB
-
- L01E6: MOV DX,167A ;WARNING! This label not referenced
- MOV DS,AX
- MOV SS,AX
- MOV SP,1822
- MOV AH,09
- INT 21
-
- DOSCALL PRINT$ ; Display string @DX till terminator
-
- JMPS L01B5
-
- L01FD: MOV AH,0A
- MOV DX,1844
- INT 21
-
- DOSCALL INSTR$ 1844 ; Input keyboard string (DX -> size,cnt,buffer)
-
- MOV SI,1846
- ;END CODE
- .RADIX 16
- CR EQU 0D
- LF EQU 0A
- TAB EQU 09
- ESC EQU 1B
- $ EQU 24
- D167A DB CR,LF,'Program terminated normally',CR,LF,$
- D169A DB 'Invalid drive or file name',CR,LF,$
- D16B7 DB 'File not found',CR,LF,$
- D16C8 DB 'No room in disk directory',CR,LF,$
- D16E4 DB 'Insufficient space on disk',CR,LF,$
- D1701 DB 'Disk$'
- D1706 DB 'Write protect$'
- D1714 DB ' error reading drive A',CR,LF,$
- D172D DB 'readwritInsufficient memory',CR,LF,$
- D174B DB '^ Error',CR,8A,' ',88,'Error in EXE/HEX file',CR,LF,$
- D176E DB 'EXE/HEX file cannot be written',CR,LF,$
- D178F DB 'Writing $'
- D1798 DB ' bytes',CR,LF,$
- D0102 DB CR,LF,'DEBUG-86 version 1.07',CR,LF,$
-
- ' bytes',CR,LF,$
- D0102 DB CR,LF,'DEBUG-86 version 1.07',CR,LF,$
-