home *** CD-ROM | disk | FTP | other *** search
-
-
-
-
-
-
- DISTOMWC: By Howard C. Johnson
- 30 Roosevelt Ave.
- Morganville, NJ 07751
-
-
- PURPOSE
-
- Disassemble DRI formatted files to Mark Williams C (MWC)
- assembler directives. This produces output similar to as68toas
- output. Ideally, output of distomwc may be assembled to a file
- that is identical to the one being disassembled.
-
- HEREDITY
-
- DISTOMWC was created by disassembling DIS2ND by Scott
- Swintec and then modifying it. To be used by MWC, the output of
- dis2nd must be further processed by as68toas. Four basic
- problems are addressed:
- 1. The operand order is reversed for 'eor' instructions.
- 2. Addresses in the data section do not become labels,
- producing many undefined symbols.
- 3. As68toas does not recognize either short branches (bxx.s)
- or short addresses (wwww:s). It also cannot handle intermixed
- hex and ascii strings.
- 4. Data embedded in the text section is disassembled as
- meaningless, and often illegal, instructions.
-
- DISTOMWC OPERATION
-
- Symbol Table:
-
- Input to DISTOMWC is a DRI compatible object file. Several
- assemblers/loaders may produce load modules that are not DRI
- compatible. MWC is one of these. The principal differences lie
- in handling symbols. DRI symbols have two properties:
-
- a. The length of the symbol table is in the file header.
- b. Symbol names are a maximum of 8 characters long.
-
- MWC, for example, has 16 character names and the length of the
- symbol table is 0 in the file header. The program MWTODRI (GEnie
- # 4098), will convert a MWC file to DRI symbol format. It
- includes source, so that other files can be handled with reasona-
- ble effort. However, as very few files worth disassembling have
- symbol tables attached, it is not very important to worry about
- symbol handling.
- External variable names are preceded with a '_' in DRI. MWC
- uses a trailing '_' for the same purpose. All symbol names are
- converted in DISTOMWC.
-
- Backward and Forward References:
-
- All text references are resolved by processing the text
- section twice. Thus a branch backward will produce a label for
-
-
-
-
-
- the target of the branch. However, problems occur when longs in
- the data section reference otherwise unreferenced addresses.
- This occurs in 'C' from two constructs:
-
- a. switch (c) { case: .... }
- b. char *list = {"a", "b", ...}
-
- The switch constructs produce text references that must be known.
- Normally these target addresses are not the object of a branch,
- but are preceded by one. Text processing will add labels to the
- addresses following bra, jmp, and rts instructions. This picks
- up most switch constructs.
- Data lists that are pointers to strings usually proceed the
- string. All forward references in the data section are resolved.
- A .long in the data section will show up as undefined if it
- reverences an address preceding it that isn't preceded by a
- branch.
-
- Data Embedded In Text:
-
- Some programs have variables, and initialized data in the
- text section rather than the data section. When disassembled,
- this produces both unrealistic instruction and invalid instructi-
- ons. The human is best at detecting this problem. The resoluti-
- on is to introduce a new file, name.emb to allow the operator to
- mark special areas that need exception handling. Name.emb is the
- same 'name' as is being disassembled. This file may be located
- in either the current directory, or in the same directory as the
- file being disassembled.
-
- .EMB File:
- The name.emb file allows three kinds of addresses to be
- indicated:
-
- 1. S lines, such as 's a010' permit labels to be defined. A
- location that needs a label can be handled with this. All S
- lines must be located in the first part of the file before any
- other lines.
- 2. W lines such as 'w b124 b135' will cause the text
- addresses within the range, inclusively, to be output as words or
- longs. The first address must be even, the second must be odd.
- 3. A lines such as 'a b136 c001' will cause the text
- addresses within the range, inclusively, to be output as ASCII
- pairs. The first address must be even, the second must be odd.
-
- W and A lines can be intermixed as necessary after any S lines.
- The addresses must increase monotonically.
-
- Longs in Text:
-
- 1. A .long is recognized in the text section because the
- opcode word is a relocatable value. Whenever this is
- encountered, a .long will be produced, and the symbol it
- references will be entered into the symbol table.
-
-
-
-
-
- 2. As many instructions are multiple words, .longs may not
- appear until the area in which they are located is first marked
- in the name.enb file as W or A. However, they will become longs,
- overriding the name.emb file.
-
- Special Notes:
-
- 1. When a disassembled file is reassembled it will (by
- default) contain a symbol table of all the generated labels. The
- presence of these symbols may prevent the program from running.
- I believe that this is caused by symbol table residue producing
- non zero values in the bss (.bssi) section. This occurs because
- MWC fails to indicate the symbol table size in the file header.
- Stripping the symbols or using the mwtodri program will correct
- this.
- 2. MWC will not reassemble addresses in the form
- 0x416:s
- which is produced by the disassembler. It is necessary to equate
- the values to a name, and use the name in the program. I.e.,
- my_name = 0x416
- move my_name:s, d0
- 3. Short ascii lines in the data section may actually be hex
- binary values. The binary values are output as a comment to the
- ascii lines. These are suppressed if the HEX output option is
- not used.
- 4. Embedded ASCII (or constants) can and will produce
- nonsense instructions that can affect other valid instructions.
- For example, ASCII produces many b.. instructions whose apparent
- destination address are added to the symbol table. Besides
- producing strange internal symbols, absolute address fields can
- become marked as relocatable. This in turn can cause really
- strange output. Further, some of these corrupted instructions
- may break the disassembler, causing it to fail with no useful
- output. When this occurs, try marking the whole text space as
- ASCII in the name.emb file to identify the ASCII areas first.
-
-
-
- A little more now that MWC 3.0 is out.
-
- 1. The symbol table length is now in the header but it still is not
- in DRI format.
- 2. Don't throw away the 2.0 version of the MWC assembler. They
- broke it in 3.0, and it can not handle addr:s format at all.
-
-
-