home *** CD-ROM | disk | FTP | other *** search
- >Manual
-
- Directory Structure
- ===================
-
- MOD_DIS
- |
- +--------+----+-----+----+----------+-----------+-----------+---------+
- | | | | | | | | |
- GetStarted | Changes | Checker MODULES ORIGINALS SCRIPTS SOURCES
- | | | | | |
- Manual ModDis201 ---+--- ---+--- ---+--- ---+---
-
-
- GetStarted Text How to start quickly without reading the manual
- Manual Text The manual
- Changes Text Major changes since version 1.04
- ModDis201 BASIC The main program
- Checker BASIC Checks the validity of a re-assembled module
- MODULES Dir Newly-assembled modules are saved here
- ORIGINALS Dir The module to be disassembled should be put here
- SCRIPTS Dir User responses are stored here
- SOURCES Dir Disassembled modules are stored here
-
-
- This is version 2.01 of ModDis, and replaces those versions previously
- published (0.E, 0.F and 1.04). It has various bug fixes and enhancements not
- available in previous versions. If you already have version 1.04, see the
- file Changes for details of important changes. If you don't want to read
- this manual yet try reading GetStarted and following its instructions.
-
- ModDis a semi-intelligent module disassembler. It will read a relocatable
- module file from the ORIGINALS directory and attempt to disassemble it into
- a form suitable for the BASIC assembler. If successful, the output file is
- stored in the SOURCES directory, under the same filename as the module. If,
- as is likely, the program cannot resolve the disassembly unaided, it will
- prompt the user for help, hence the "semi-intelligent" description. These
- responses can be stored in the SCRIPTS directory and played back later to
- re-create the disassembly. The validity of the disassembly can be tested by
- re-assembling, which makes a new version in the MODULES directory. The BASIC
- program Checker can then be used to do a byte-by-byte comparison of the
- original and new versions.
-
- I had already disassembled several modules by hand, and, while it is
- relatively easy, it becomes quite tedious as the module size increases. This
- program started as a quick utility to do some of the mechanical
- "donkey-work" aspects of disassembly, but has grown out of all proportion to
- its original aims. I have successfully disassembled about 30 of the RISC OS
- modules, and there are usually only a handful of errors to sort out in the
- output assembler listing (see the Problems section for details). You should
- be warned, however, that some of the larger RISC OS modules can produce
- thousands of lines of assembler listing.
-
- There are more enhancements in the pipeline, but it will probably be some
- time before they become available.
-
- In use
- ======
- Start by making ModDis your current directory, and load the ModDis201
- program. There is one feature which the user can change, located in
- PROCinit, around line 8000:
- debug%=TRUE
- If debug% is TRUE, extra information is printed out in cyan while the
- program is running. I find it reassuring to be able to see what the program
- is doing. Setting it to FALSE will supress this printout, and there will be
- sometimes be long pauses when it is not obvious if the program is doing
- anything.
-
- How the program works
- =====================
- The program identifies the "type" of each byte of the module, according to
- the following scheme:
-
- type description
- -4 'Acorn'-format compressed string
- -3 Padded string, ie a string padded with zeros up to a fixed length
- -2 Carriage-return-terminated string
- -1 Zero-terminated string
- 0 Don't know
- 1 A byte of data
- 2 A word (2 bytes of data)
- 3 Address, ie a 4-byte number pointing to another point in the module
- 4 A double word (4 bytes of data)
- 5 ARM code
- 6 Alignment bytes, 1, 2 or 3 bytes to bring us to a word boundary
- 7 External. See below
-
- The first pass through the module is made armed with the entry points, eg
- initialisation code, finalisation code, etc. At this stage, ModDis is just
- setting the "type" of each byte it comes across, code, data or whatever. If
- it finds code, it follows it, and investigates branches and subroutine calls
- recursively, asking the user for help with identifying what it has found,
- where necessary. The program is smart enough to spot when the code ends, so
- it doesn't "fall off" into following data. It also recognises such things as
- SWI "OS_Exit", the zero-terminated string which always follows SWI
- "OS_WriteS", and stuff like that.
-
- After investigating all the entry points, it does one sequential pass
- through the module, searching for any bytes whose type has not yet been
- identified, and asking the user what they are.
-
- Once all the types have been sorted out, the disassembly proper starts. If
- parts of the module are marked as data, they are expanded with EQU
- directives. The code is disassembled with calls to the Debugger module, but
- the following enhancements are made to its output:
-
- 1. Labels are placed at all necessary points, as .xABCD where ABCD is the
- offset in hex from the module start.
- 2. Wherever possible, sensible label names are constructed, eg SWI tables,
- Keyword code, etc.
- 3. Branches are expanded to refer to labels.
- 4. Subroutines are identified by sABCD, rather than xABCD
- 5. PC-relative addressing is expanded as the ADR macro.
- 6. SWI names are enclosed in quotes.
- 7. The 2 instructions making up a 'long ADR' are recognised and represented
- by a macro, FNadrl, to aid readability.
-
-
- The "What's this?" prompt
- =========================
- During the first pass through the module, the program will ask for help if
- it finds a reference to another part of the module, asking whether this
- reference is code to be disassembled, or data, in which case it wants to
- know what kind of data it is. The display looks like this:
-
- [ ] This is the line currently being disassembled
- [ Block of ] and the three following, to show the context.
- [ yellow text ]
- [ ]
-
- [ ]
- [ Block of ] A few lines preceding the reference
- [ white text ]
- [ ]
-
- [ RED LINE ] The address being referred to
-
- [ ]
- [ More ] A few lines after the reference
- [ white text ]
- [ ]
-
- What's this? The prompt
-
- The program expects a numeric response in the range -4 to 7.
-
- It should be easy to tell if it's a string. Most strings are zero-terminated
- (type -1), but OS_CLI strings may be CR-terminated (-2). RISC OS menu labels
- are usually padded with zeros to 12 bytes; if you reply -3 you will be asked
- for the length of the padded string. 'Acorn'-format strings contain ASCII 27
- to indicate compressed dictionary entries; although zero-terminated they may
- also contain ASCII 0 immediately after ASCII 27.
-
- Try to avoid replying 0 (don't know) if at all possible, because you'll have
- to answer sooner or later, and this may be your best oppertunity. There are
- however, two occasions when it is useful to answer 0. One is when the offset
- in question is obviously something like the module's title string, help
- string, or some other known entry point which is going to be recognised by
- the program later on. If you identify this offset now, it may not be fully
- followed-up by the program later. The other case is when the offset is
- pointing to the middle of an instruction, somewhere where a label cannnot be
- placed. See 'Odd labels' in the Problems section for more details.
-
- Byte, word and address (1,2,3) are fairly rare. Make sure that the reference
- is word-aligned before replying 4 or 5 (double or code); getting this wrong
- will almost certainly cause an execution error later on. It is unlikely that
- a reference will be made to alignment bytes (6). See the External section
- below for details of type 7.
-
- The "bytes of unknown type" prompt
- ==================================
- This prompt will occur during the final sequential pass through the module.
- The display looks like this:
-
- [ ]
- [ Block of ] A few lines preceding the address
- [ white text ]
- [ ]
-
- [ RED LINE ] The address being referred to
-
- [ ]
- [ More ] A few lines after the address
- [ white text ]
- [ ]
-
- n byte(s) at ABC of unknown type The prompt
-
- This is your last chance to work out unresolved byte types. Note that if
- n is 69, this means that there is a block of more than 68 bytes whose types
- are unknown. The allowed responses are the similar to those above, with the
- following changes:
-
- You must NOT reply "don't know" (0) at this prompt. If you answer 1,2,3 or 4
- (byte, word, address, double) the program will ask you how many of these
- items there are. If you answer 5 (code), you may get some more "What's
- this?" type prompts as the program re-enters its recursive code-following
- phase, but it will eventually return to where it left off. Alignment bytes
- (6) are easily recognised as 1, 2 or 3 zeros at the end of a string,
- bringing it to a word boundary. See the External section below for details
- of type 7.
-
- If you can see by inspection that the block of bytes contains more than one
- "type", simply reply with the type number of whatever comes first, and the
- program will re-prompt you for the remainder.
-
-
- The external type (7)
- =====================
- Rather than making the program general enough to cope with anything, I have
- allowed for the heading "none of the above" by creating type 7. If your
- module contains something weird, you can edit the program to cope with it,
- by writing your own PROCexternal, which marks the weird block as type 7,
- and PROCdisexternal, which copes with the disassembly of the block.
- For example, the RISC OS Desktop module contains 5 template files embedded
- in it (these appear under the DeskFS filing system). The PROCexternal
- supplied simply marks these 5 blocks as type 7 throughout, and
- PROCdisexternal loads the files in and increments the assembler counters
- O% and P% to jump over them. Something similar would be needed for the 6502
- emulator module, which contains embedded the BASIC ROM from the BBC.
-
- Another good use for PROCexternal is if you know that the module contains a
- large block of, say, zero-terminated strings. rather than identifying each
- one individually at run time, you could put something like this in
- PROCexternal:
-
- FOR offset%=&1234 TO &2468
- type%(offset%)=-1
- NEXT
-
- This will mark the block en-masse before the program proper starts.
-
- CHECKER
- =======
- To verify that the disassembly worked, you can re-assemble the module; the
- new version is stored in directory MODULES. The program Checker does a
- byte-by-byte comparison of the original and new module, reporting on any
- differences it finds, as a disassembled instruction from each version. Any
- differences should be unimportant, as outlined in the Problems section below.
-
-
- PROBLEMS
- ========
- This is a list of known problem areas with ModDis. In most cases the
- problems are obscure, or difficult to correct in the program, but easy to
- spot and fix in the final assembler listing.
-
- Immediate constants
- -------------------
- An immediate constant is expressed as an 8-bit value with a 4-bit rotation
- applied to it. This means that there can be more than one way of expressing
- a constant, eg #1 can be represented as #1 with 0 rotation, #4 with a
- rotation of 2, etc. The BASIC assembler hides this process from the
- programmer, who merely supplies the required constant and lets the assembler
- figure out how to represent it. If the original module has #1 stored as
- 4>>2, the disassembler will convert this to #1, and the BASIC assembler will
- reassemble it as 1>>0. This is functionally identical, but a byte-by-byte
- comparison of the two modules will show a difference.
-
- Another related problem is that the program sometimes fails to expand the
- constant at all, giving something like MOV R0,#0,24, ie 0 rotated by 24,
- which is still 0. This format, which the program generates through the
- debugger module, is not valid for the BASIC assembler and should be replaced
- by MOV R0,#0. Some of these are detected and translated by the this version
- of the program, but it's not foolproof, and some still slip through. This
- problem is caused by a bug in the Debugger module. To solve it permanently,
- you need to modify the Debugger module itself, changing offset &920 from
- &1A000028 to &FA000028, effectively changing a BNE to a BNV. This seems to
- solve the problem, although I haven't tested this fix exhaustively, so
- proceed with caution.
-
-
- The ADR directive
- -----------------
- ADR Rn,#address is converted by the assembler into either SUB Rn,PC,#offset
- or ADD Rn,PC,#offset, depending on whether the address comes before or after
- the current PC value. A problem arises in that ADD Rn,PC,#0 is functionally
- identical to SUB Rn,PC,#0. The BASIC assembler always chooses the former, so
- if the latter occurs in a module it will be converted from SUB to ADD
- leading to an apparent difference between the modules when compared
- byte-by-byte. It should have no effect on running the module.
-
-
- Long labels
- -----------
- The assembler mnemonics start in column 20, leaving room for long labels.
- Sometimes, however, the labels are too long, in which case the mnemonic is
- moved to the next line, this being still syntactically correct. But if the
- label is exactly 19 characters long then it 'touches' the mnemonic, and BASIC
- sees this as one big label with a bad mnemonic after it. It's rare, so fix it
- manually if it happens.
-
- The comment field
- -----------------
- The program generates a comment field which may be useful. If, however, it
- contains a colon, the BASIC assembler thinks this is a new statement
- starting. It's rare, so fix it manually. I also had a case once when the
- comment field contained the character &8D, which BASIC tried to tokenise. I
- did consider removing the commenr field altogether, as the simplest way of
- solving these problems, but overall it's too useful to lose, despite its
- problems.
-
- SWI branch table
- ----------------
- The program attempts to match the SWI names to the SWI branch table, but it
- sometimes fails to identify the branch table as belonging to the SWI
- routine. In this case it substitutes labels of the type xABCD, so the
- assembly is still syntactically correct (and logically) correct. I think
- this is one of those things you're going to have to live with.
-
- Embedded text
- -------------
- Something that cropped up in the NetPrint module was like this:
- BL s1234
- EQUS "A message"
- More code...
- where the subroutine at s1234 adjusts the return address to skip the string.
- While ModDis recognises this if used with SWI "OS_WriteS", it cannot check
- for a user routine which does the same. ModDis will try to interpret the
- string as code, giving a warning message if this is not possible. It's up to
- you to be aware of this pitfall and watch out for it.
-
- Floating point
- --------------
- Although ModDis could disassemble instructions intended for the FPU, in
- practice there isn't much point, since the BASIC assembler doesn't recognise
- them, so you could not re-assemble the module. Instead, it assumes that any
- FPU instruction it comes across is a mistake caused by an incorrect user
- response, and gives a warning message. I may do something about FPU at some
- time, but it's a low priority. In the meantime, if you want to disassemble a
- module which contains some FPU instructions, you will have to mark them as
- words (4-byte data).
-
- Odd labels
- ----------
- Sometimes a reference is made to, for example, the byte BEFORE a block of
- data, the pointer being incremented before use. Something like:
- ADR xF6F
- ........
- MOV PC,R14
- .xF70 EQUS "...."
- It is not possible to place a label at &F6F, since it lies in the middle of
- an instruction. Instead, you should manually substitute something like this:
- ADR xF70-1
- This sort of thing is usually fairly easy to spot.
-
- ALIGN
- -----
- When assembling a module, the ALIGN directive is used to move to a word
- boundary, simply incrementing the pointer by 1, 2 or 3. The bytes "skipped"
- like this are not explicitly set to zero, and will contain whatever was
- there at the time the module was originally compiled. When recompiling after
- running ModDis, these 'skipped" bytes may well have different values, and the
- Checker program will detect this, but since they play no part in the module,
- they are unimportant.
-
-
- THE END
- =======
-
- My thanks to those users who reported bugs in the previous versions of
- ModDis, and to those who made suggestions on improving it. If you have any
- problems or (constructive) comments about ModDis, please get in touch with
- me. If you have any ideas for improving it, or if you extend it yourself,
- I'd be very glad to hear from you.
-
- Lorcan Mongey
- 56 Salisbury Court
- Dublin Road
- Belfast BT7 1DD
- Tel: 0232 234386
-
- I can also be reached at the following BBS:
-
- BBS Telephone Username Number
- --- --------- -------- ------
- CIX 081 390 1244 lorcan@cix.uucp
- The World of Cryton 0749 670030 lorcan #237
- Arcade 081 654 2212 lorcan #417
-