home *** CD-ROM | disk | FTP | other *** search
- Dis - an AmigaDOS File Disassembler/Dumper
-
-
- General
-
- Dis is a program which will symbolically disassemble AmigaDOS object
- files, whether unlinked or fully linked. It can also operate in a non-
- disassembly mode and just dump the hunks in the files, or even just
- list which hunks are found. It uses my 'disassemble.library' as the
- disassembler, so it has all of the features of that library, including
- support for the full MC68000 family, label generation, controllable
- output format, etc. It also has special support for the case/switch
- constructs generated by several compilers, and hooks to allow better
- disassembly of AmigaDOS libraries and devices. 'Dis' will also generate
- appropriate output for data and bss hunks.
-
- This simplest way to use Dis is as in:
-
- Dis <object-file>
-
- Dis itself has been compiled with symbolic information left in, so a
- good sample of what it does can be seen by having it disassemble
- itself, as in
-
- Dis Dis
-
- Some interesting things can be seen from the disassembly. First, it was
- linked with an old version of BLink, since the first hunk is an extra
- text hunk added to just branch to the actual starting point. Second, it
- was not linked with 'smalldata', since there are multiple bss sections.
- Third, it was linked with 'smallcode', since there is only one text
- hunk (other than the one BLink added). The first few hunks are bss
- hunks that correspond to file level variables in the source files for
- Dis. Symbolic information for them was present, so Dis was able to show
- the individual variables. (This symbolic information is, unfortunately,
- only available with my Beta 1.3 version of Draco, and not with any
- version that has been distributed.)
-
- Hunk #1 is the space used by the run-time system to save the contents
- of registers D0 and A0, so that the command-line parameters can be
- accessed. Hunks 2, 3, 4, 5, and 6 are file-level storage for the
- various source files of Dis. Hunks 7 and 8 come from the interface
- libraries for dos.library and exec.library. Hunk 9 is the main code
- hunk. It will take a while to disassemble.
-
- The lines reading "Loop 1: 594 new labels.", etc. need some
- explanation. They are to show that Dis is actually working and hasn't
- gone away somehow. In its default mode of operation, it takes multiple
- passes through each text hunk, initially assuming that only the first
- part is code. It runs through all parts known to be code, identifying
- all branches, jumps, calls, etc. The branch and jump targets are
- identified, and the data there is taken to be code, and, if the branch
- is unconditional or is a jump, the code immediately after it is not
- assumed to be code unless some other instruction branches to it. This
- process is repeated for the entire hunk, until no more targets are
- defined. A final pass is then used to produce the actual output. See
- below for methods of altering this behavior. Similar special cases are
- made for subroutine returns of various forms. The goal behind this
- special handling is to accurately distinguish between actual
- instructions in a text hunk and non-instructions in the hunk. Other
- things that can be in a text hunk are constants of various types,
- tables of labels used for switch/case statements, etc.
-
- As you can see, symbolic information is used to provide labels, which
- here are procedure names, and symbolic addresses of variables. This
- type of information can be produced by various compilers - see the
- instructions for the compiler for how to do it. Often, however, fully
- linked object files have little or no symbolic information left in
- them. In these cases Dis will not be able to show symbolic labels and
- variable names. It will still be able to generate labels for referenced
- locations, however. Variable or data references (or references to code
- in another hunk) will be shown as a hunk-number/offset pair.
-
- Since unlinked object files contain references, by name, to symbols
- defined elsewhere, Dis is able to produce symbolic information in a
- disassembly even though explicit HUNK_SYMBOLs are not present. This
- type of disassembly is one of my main uses for Dis - to see what code
- my compiler is generating in test cases.
-
- There are many more special cases in Dis. Those interested in the gory
- details are invited to examine the source, which is included here.
- In particular, special handling is done for two types of Draco case
- statements, two types of Lattice C switch statements, one type of Aztec
- C switch statement, and a switch statement from some other compiler,
- which I assume is the GreenHills compiler used by Commodore.
-
- Dis can be stopped at any time by typing control-C. It may take a few
- seconds to stop if it is in the initial phase of processing symbolic
- and relocation information for a large text hunk. Those who get
- impatient with this phase are invited to rewrite the code to make it
- quicker. Perhaps a binary tree of relocation offsets would work. The
- culprit routines are 'insertRelocs' and 'mergeRelocs' in 'symbol.d'.
-
-
- Dis Options
-
- Several flags can be given to Dis to control how it works. A brief
- summary of them can be obtained by entering
-
- Dis -?
-
- In more detail, the flags are:
-
- -h - show all hunks. Some hunks, such as HUNK_NAME, HUNK_UNIT,
- HUNK_END, are of interest only to those who have to deal with
- object files directly, so Dis does not normally show them.
- Specifying '-h' will force it to do so.
-
- -x - raw dump of hunks. In this mode of operation, no disassembly or
- symbolic output is done. Instead, each hunk is dumped in a raw
- form. Data and text hunks are dumped in hexadecimal and ascii.
- Again, this form is probably only of use to those who need to know
- the details of AmigaDOS object files.
-
- -a - absolute. By default, Dis will use relative addresses on
- disassembly. This flag forces it to use absolute (hunk-relative)
- addresses. For text hunks with symbolic information, typically
- generated by a compiler, Dis will display position information and
- branch labels as relative to the most recent symbolic label, which
- means, for high-level languages, relative to the current function.
- With '-a' specified, all addresses and labels are expanded to 32
- bits and are relative to the beginning of the text hunk.
-
- -s - show hex addresses. If Dis can find a symbolic value to show for
- an address, it will use that, rather than a hex value. '-s' forces
- it to show both. This can be useful if Dis is getting confused by
- some symbolic information, and is using it in inappropriate places.
-
- -f - force disassembly of all code. As explained previously, Dis by
- default only disassembles those parts of text hunks that it thinks
- are "reachable" from the initial code in the hunk. This method can
- miss parts of the code if those parts are not referenced directly.
- This is sometimes because a function is not being called, but can
- also be because parts of the code are only referenced indirectly,
- through a function table or jump table of some kind. Specifying
- flag '-f' forces Dis to treat the entire text hunk as code. This
- can lead to a mess, however, since most strings and constants don't
- disassemble very well.
-
- -n - Dis normally tries to generate labels for all branch and jump
- targets, and to use those labels at the branch points. Specifying
- '-n' tells it to not do that. Branches will then display both the
- hex form of the target and, as appropriate, the PC-relative form.
-
- -w - the previously mentioned technique of doing multiple scans of the
- code to find out what is code and what is data can be quite time
- consuming. Flag '-w' tells Dis to not work so hard. If '-n' is not
- specified, it will do a single pre-pass to identify labels, and
- then disassemble with what it has found by then. This will often
- work, but is confused by switch/case statements. If '-n' is
- specified along with '-w', then Dis will do no prescans, but will
- just disassemble in one pass. This can cut down on the time quite a
- bit, but can also miss a lot. Adding '-f' to the flag list gives a
- raw, non-labelled disassembly of the entire text hunk.
-
- -l - AmigaDOS libraries (in LIBS:) and devices (in DEVS:) have a very
- specific format, with a couple of header sections which point to
- the routines of the library/device. Flag '-l' tells Dis that it is
- looking at one of these. With this flag, it will try to find the
- special parts, and treat them separately. The code to do this works
- well for libraries generated by my 'fdCompile' program, but
- sometimes has trouble with other libraries/devices. (The difficulty
- seems to be that they put parts of the special stuff a long way
- down in the code, so that Dis doesn't find it early enough. Also,
- it seems that some of the Commodore-supplied ones don't follow the
- form layed out in the ROM Kernel Manuals.)
-
- -r - this flag tells Dis to show the relative form of branches and data
- references as well as a symbolic or label form.
-
- -p - by default, Dis shows a hexadecimal offset to the left of the
- disassembly it generates. This flag suppresses that part, which
- makes the output closer to something that an assembler will like.
-
- -c - by default, Dis will capitalize any non-MC68000 instructions or
- addressing forms that it finds. This is useful for people with
- compilers that can handle the MC68020/MC68881, etc. Specifying '-c'
- will suppress that capitalization.
-
- -q - this flag enables a quick summary mode. A single line only is
- printed for each hunk found in the object file.
-
- -d - a fully linked object file with no symbolic information has jumps
- and branches within it - to subroutines, run-time library routines,
- interface stubs, etc. One of the special cases in Dis is for a type
- of case statement in Draco which JMPs to a run-time routine to do a
- binary search through a sorted table. If no symbolic information is
- present, Dis has no way to distinguish this JMP from others. Flag
- '-d' tells Dis to treat all such JMPs as JMPs to those special run-
- time system routines. If you know that an object file is not fully
- linked, isn't compiled by Draco, or has symbolic information, this
- flag will not be needed.
-
- -k - problems similar to the above can occur in other object files with
- "unusual" code. Flag '-k' tells Dis to assume that the entire text
- hunk is code when doing prepasses to find labels. The normal checks
- for code v.s. data will be done when the actual disassembly is
- produced. The disadvantage of this flag is that spurious labels can
- be produced, and occasionally, data will be treated as code.
-
-
- Some Sample Disassemblies to Try
-
- dis dis
- dis -dl libs:disassemble.library
- dis -lkf libs:mathieeedoubtrans.library
- if you have enough patience, you can see the FPU
- instructions start at offset 0x385c
- dis c:format
- What? you don't have format copied to your C: directory so
- you can use it from CLI? Note the empty code and data hunks
- - must be a funny compiler Commodore uses.
- dis -q c:format
- dis -x c:format
- So does the trashcan icon look like a trashcan in hex?
-
-
- Expansion
-
- My original need was for a disassembler that would just nicely
- disassemble the object files my compiler produces for small test files.
- Dis has grown considerably from that initial requirement. As with any
- program like this, there is still much that could be done:
-
- - better heuristics for distinguishing code from data in text hunks
- - better handling of libraries and devices that don't follow the
- model that Dis likes
- - flags to allow some hunks to be skipped
- - flags to force certain regions to be treated as code or data
- - an interactive mode
- - intuition interface
- - flags to force output to be directly acceptable to assemblers
-
- I don't plan to do any of these things myself. Some minor changes may
- get done if something becomes a nuisance. Since the full source to Dis
- is included, however, others are invited to try their hand at some of
- them. I would especially be interested in improvements to the code/data
- distinguishing code.
-