home *** CD-ROM | disk | FTP | other *** search
- >Manual
-
-
- DIRECTORY STRUCTURE
- ===================
-
- !Boot Obey Boot file
- !Help Text Help file
- !Run Obey Run file
- !Sprites Sprite Sprite file
- Checker BASIC Checks the validity of a re-assembled module
- Docs.Changes12 Text Major changes between versions 1 and 2
- Docs.Changes23 Text Major changes between versions 2 and 3
- Dosc.Copyright Text Copyright details
- Docs.GetStarted Text How to start quickly without reading the manual
- Docs.Manual Text This manual
- Externals.Default BASIC Library containing default external procedures
- Flop BASIC Library containing floating-point macros
- ModDis304 BASIC The main program
- Modules. Dir Newly-assembled modules are saved here
- Originals. Dir The module to be disassembled should be put here
- Scripts. Dir User responses are stored here
- Sources. Dir Disassembled modules are stored here
-
-
- PROGRAM DESCRIPTION
- ===================
- This is version 3.04 of ModDis, and replaces those versions previously
- published (0.E, 0.F 1.04 and 2.01). It has various bug fixes and
- enhancements not available in previous versions. If you already have version
- 2.01, see the file Changes23 for details of important changes. If you don't
- want to read this manual yet try reading GetStarted and following its
- instructions.
-
- ModDis a semi-intelligent module disassembler. It will read a relocatable
- module file from the Originals directory and attempt to disassemble it into
- a form suitable for the BASIC assembler. If successful, the Source file is
- stored in the Sources directory, under the same filename as the module. If,
- as is likely, the program cannot resolve the disassembly unaided, it will
- prompt the user for help, hence the "semi-intelligent" description. These
- responses will be stored in a Script file in the Scripts directory and can
- be played back later to re-create the disassembly. The validity of the
- disassembly can be tested by re-assembling, which makes a new version in the
- Modules directory. The BASIC program Checker can then be used to do a
- byte-by-byte comparison of the original and new versions.
-
- I had already disassembled several modules by hand, and, while it is
- relatively easy, it becomes quite tedious as the module size increases. This
- program started as a quick utility to do some of the mechanical
- "donkey-work" aspects of disassembly, but has grown out of all proportion to
- its original aims. I have successfully disassembled all of the RISC OS 2
- modules, and there are usually only a handful of errors to sort out in the
- output assembler listing (see the Problems section for details). You should
- be warned, however, that some of the larger RISC OS modules can produce
- thousands of lines of assembler listing.
-
- There are more enhancements in the pipeline, but it will probably be some
- time before they become available.
-
-
- IN USE
- ======
- Start by storing the module you want to disassemble in the Originals
- directory inside the ModDis app. Then double-click on !ModDis. There are 6
- logical variables which control the disassembly and which the user can
- change, located in PROCinit, around line 12300. All are TRUE by default,
- and the program can normally be run with these defaults.
-
- debug% If TRUE debugging information is displayed in cyan while
- the program is running. This is useful for keeping an eye
- on where the program has got to.
- flop% If TRUE floating point instructions are recognised and
- disassembled using the FNflop macro, otherwise floating
- point instructions are rejected as coprocessor instructions.
- verbose% If TRUE full output is enabled while Script files are being
- replayed. This displays the full prompts and responses.If you
- are re-creating a particularly large Source file from its
- Script you may like to set verbose% to FALSE to speed up the
- process.
- guessalign% If TRUE alignment-byte guessing is enabled
- guessbratab% If TRUE branch-table guessing is enabled
- guessldrstr% If TRUE load/store double guessing is enabled
-
-
- HOW THE PROGRAM WORKS
- =====================
- The program identifies the "type" of each byte of the module, according to
- the following scheme:
-
- type description
- -4 'Acorn'-format compressed string. This is expanded as an EQUS
- directive until a null-byte is encountered, provided the null-byte
- is not preceded by an ASCII 27 byte.
- -3 Padded string, ie a string padded with zeros up to a fixed length.
- The program prompts the user for the length of the string and
- expands it as an EQUS directive. It is also useful for dealing
- with 4-byte strings with no terminator, eg "TASK".
- -2 Carriage-return-terminated string. This is expanded as an EQUS
- directive until a carriage-return is encountered,
- -1 Zero-terminated string. This is expanded as an EQUS directive
- until a null-byte is encountered.
- 0 Don't know. All bytes have this value when the program starts, and
- none should have it when the program ends.
- 1 A byte of data. Expanded as EQUB.
- 2 A word (2 bytes of data). Expanded as EQUW.
- 3 Address, ie a 4-byte number pointing to another point in the
- module. It is expanded as EQUD followed by a label.
- 4 A double word (4 bytes of data). It is expanded as EQUD.
- 5 ARM code
- 6 Alignment bytes, 1, 2 or 3 bytes to bring us to a word boundary.
- It is expanded as an ALIGN directive.
- 7 External. See below
-
- These last four types should normally never be encountered by the user. They
- are used internally by the program whenever LDF or STF instructions are
- found.
-
- 8 Floating-point, standard precision (4 bytes). Expanded as FNequfs.
- 9 Floating-point, double precision (8 bytes). Expanded as FNequfd.
- 10 Floating-point, extended precision (12 bytes). Expanded as FNequfe.
- 11 Floating-point, packed decimal (12 bytes). Expanded as FNequfp.
-
- The first pass through the module is made armed with the entry points, eg
- initialisation code, finalisation code, etc. At this stage, ModDis is just
- setting the "type" of each byte it comes across, code, data or whatever. If
- it finds code, it follows it, and investigates branches and subroutine calls
- recursively, asking the user for help with identifying what it has found,
- where necessary. The program is smart enough to spot when the code ends, so
- it doesn't "fall off" into following data. It also recognises such things as
- SWI "OS_Exit", the zero-terminated string which always follows SWI
- "OS_WriteS", and stuff like that.
-
- After investigating all the entry points, it does one sequential pass
- through the module, searching for any bytes whose type has not yet been
- identified, and asking the user what they are.
-
- Once all the types have been sorted out, the disassembly proper starts. If
- parts of the module are marked as data, they are expanded with EQU
- directives. The code is disassembled with calls to the Debugger module, but
- the following enhancements are made to its output:
-
- 1. Labels are placed at all necessary points, as .xABCD where ABCD is the
- offset in hex from the module start.
- 2. Wherever possible, sensible label names are constructed, eg SWI tables,
- Keyword code, etc.
- 3. Branches are expanded to refer to labels.
- 4. Subroutines are identified by sABCD, rather than xABCD
- 5. PC-relative addressing is expanded as the ADR macro.
- 6. SWI names are enclosed in quotes.
- 7. The 2 instructions making up a 'long ADR' are recognised and represented
- by a macro, FNadrl, to aid readability.
- 8. Floating-point instructions, if enabled, are expanded as the FNflop
- macro.
-
-
- THE "What's this?" PROMPT
- =========================
- During the first pass through the module, the program will ask for help if
- it finds a reference to another part of the module, asking whether this
- reference is code to be disassembled, or data, in which case it wants to
- know what kind of data it is. The display looks like this:
-
- [ ] This is the line currently being disassembled
- [ Block of ] and the three following, to show the context.
- [ yellow text ]
- [ ]
-
- [ ]
- [ Block of ] A few lines preceding the reference
- [ white text ]
- [ ]
-
- [ RED LINE ] The address being referred to
-
- [ ]
- [ More ] A few lines after the reference
- [ white text ]
- [ ]
-
- What's this? The prompt
-
- The program expects a numeric response in the range -4 to 7.
-
- It should be easy to tell if it's a string. Most strings are zero-terminated
- (type -1), but OS_CLI strings may be CR-terminated (-2). RISC OS menu labels
- are usually padded with zeros to 12 bytes; if you reply -3 you will be asked
- for the length of the padded string. 'Acorn'-format strings contain ASCII 27
- to indicate compressed dictionary entries; although zero-terminated they may
- also contain ASCII 0 immediately after ASCII 27.
-
- Try to avoid replying 0 (don't know) if at all possible, because you'll have
- to answer sooner or later, and this may be your best oppertunity. There are
- however, two occasions when it is useful to answer 0. One is when the offset
- in question is obviously something like the module's title string, help
- string, or some other known entry point which is going to be recognised by
- the program later on. If you identify this offset now, it may not be fully
- followed-up by the program later. The other case is when the offset is
- pointing to the middle of an instruction, somewhere where a label cannnot be
- placed. See 'Odd labels' in the Problems section for more details.
-
- Byte, word and address (1,2,3) are fairly rare. Make sure that the reference
- is word-aligned before replying 4 or 5 (double or code); if you get this
- wrong the program will keep re-prompting you until you give a valid
- response. It is unlikely that a reference will be made to alignment bytes
- (6). See the External section below for details of type 7.
-
-
- THE "bytes of unknown type" PROMPT
- ==================================
- This prompt will occur during the final sequential pass through the module.
- The display looks like this:
-
- [ ]
- [ Block of ] A few lines preceding the address
- [ white text ]
- [ ]
-
- [ RED LINE ] The address being referred to
-
- [ ]
- [ More ] A few lines after the address
- [ white text ]
- [ ]
-
- n byte(s) at ABC of unknown type The prompt
-
- This is your last chance to work out unresolved byte types. Note that if
- n is 69, this means that there is a block of more than 68 bytes whose types
- are unknown. The allowed responses are the similar to those above, with the
- following changes:
-
- You must NOT reply "don't know" (0) at this prompt. If you answer 1,2,3 or 4
- (byte, word, address, double) the program will ask you how many of these
- items there are. If you answer 5 (code), you may get some more "What's
- this?" type prompts as the program re-enters its recursive code-following
- phase, but it will eventually return to where it left off. Alignment bytes
- (6) are easily recognised as 1, 2 or 3 zeros at the end of a string,
- bringing it to a word boundary. See the External section below for details
- of type 7.
-
- If you can see by inspection that the block of bytes contains more than one
- "type", simply reply with the type number of whatever comes first, and the
- program will re-prompt you for the remainder.
-
-
- WHEN IT'S FINISHED
- ==================
- Once the program has finished, the Source file will be left in the Sources
- directory. Although it has a BASIC filetype, it is still in text format and
- the lines are terminated by carriage-return and linefeed characters. Before
- using it you should load it into an editor such as Edit or Twin (to help
- Twin users, the !Boot file programs the F8 key to do this) and remove the
- carriage returns. If you have Twin you can then load it into BASIC by simply
- exiting to BASIC , otherwise you should use the command
- *BASIC -load <filename>
- If the Source file is very large you may find that this latter method
- doesn't work (there seems to be a problem with BASIC V). In this case you
- could add the line
- AUTO 1,1
- at the start of the Source file, go to the BASIC prompt and type
- *Exec <filename>
- pressing Escape when the loading stops.
-
-
- CHECKER
- =======
- To verify that the disassembly worked, you can re-assemble the module; the
- new version is stored in directory Modules. The program Checker does a
- byte-by-byte comparison of the original and new module, reporting on any
- differences it finds, as a disassembled instruction from each version. Any
- differences should be unimportant, as outlined in the Problems section below.
-
-
- FLOATING POINT MNEMONICS
- ========================
- The program will now disassemble floating-point instructions using a macro
- FNflop. This is accessed by a LIBRARY call which is built into every source
- file generated by ModDis. Also enabled are assembler directives FNequfs,
- FNequfd, FNequfe and FNequfp which store floating-point numbers to single,
- double, extended precision and packed decimal respectively. These directives
- are used by the program in response to LDF and STF instructions.
-
-
- THE EXTERNAL TYPE (7)
- =====================
- Rather than making the program general enough to cope with anything, I have
- allowed for the heading "none of the above" by creating type 7. If your
- module contains something weird, you can tell the program to cope with it,
- by writing your own PROCexternal, which marks the weird block as type 7, and
- PROCdisexternal, which copes with the disassembly of the block. These
- procedures should be put in a file in the Externals directory and the Script
- file should be edited to tell the program the name of the new External
- file. (See the section on Script file format below).
-
- The default file (Externals.Default) consists of two empty procedures,
- namely:
-
- DEFPROCdisexternal
- ENDPROC
-
- DEFPROCexternal
- ENDPROC
-
- PROCexternal is called once before the analysis of the module begins.
- PROCdisexternal is called during the final phase when the assembly listing
- is being spooled to file whenever a type 7 (external) is encountered. For
- example, if,say, a Sprite file of length &100 bytes is embedded in a module
- at offset &1234 then the user might create the following External file:
-
- DEFPROCdisexternal
- PRINT"]"
- PRINT"OSCLI""Load SpriteFile ""+STR$~(O%)"
- PRINT"O%+=&100"
- PRINT"P%+=&100"
- PRINT"[OPT pass%"
- offset%+=&100
- ENDPROC
-
- DEFPROCexternal
- FORoffset%=&1234TO&1333
- type%(offset%)=7
- NEXT
- ENDPROC
-
- In PROCexternal all the bytes corresponding to the Sprite file are set to
- type 7 (external) at the start of the program. When the disassembly proper
- comes across the first type 7 byte it calls PROCdisexternal. This prints the
- appropriate messages to exit the assembler, load the Sprite file at the
- assembler pointer O%, increment both O% and P% and re-enter the assembler.
- Finally it increments the ModDis pointer offset%.
-
- Another good use for PROCexternal is if you know that the module contains a
- large block of, say, zero-terminated strings. rather than identifying each
- one individually at run time, you could put something like this in
- PROCexternal:
- FOR offset%=&1234 TO &2468
- type%(offset%)=-1
- NEXT
- This will mark the block en-masse before the program proper starts.
-
-
- SCRIPT FILE FORMAT
- ==================
- The Script file format is a series of blocks like this:
-
- begin keyword
- ...
- ...
- ...
- end
-
- A line starting with the word "begin" marks the beginning of a block and a
- line consisting only of the word "end" marks the end. There can be any
- number of lines between the begin and end, although most blocks use only
- one. The order of the blocks is not important, except that the responses
- block must come last. The keywords currently recognised are as follows:
-
- KEYWORD MEANING
-
- version The version number of ModDis used to produce the script file.
- eg ModDis 3.04. This information is placed in the Script
- file by ModDis when it's created, and is used to make sure
- that the Script file is compatible with the version of the
- program in use.
-
- module The name and version number of the module eg
- SpriteUtils 1.04 (22 Jul 1988)
- This is placed in the Script file by ModDis when it's
- created and is used to make sure that the Script file
- refers to the same module.
-
- nonreturn The number of subroutines which modify their return address
- in some way, followed by the offsets of these subroutines.
- In most cases this will simply be a single zero. It is
- intended to cope with things like this:
- BL s1234
- EQUS "Hello world"
- ....
- where the subroutine at offset 1234 does not return to
- execute the following instruction. To cope with this, the
- user would enter 1,1234 in the nonreturn block, meaning one
- subroutine at offset &1234. Note that ModDis does not
- automatically recognise these cases - it's up to you to spot
- them and modify the Script file manually to cope with them.
-
- external The name of the BASIC file in the Externals directory which
- contains the external procedures used for this module. The
- default is "Default", which uses a file with empty
- procedures. If you want to use the external facility you
- should write your own external routines, store them in a file
- in the Externals directory and edit this block to contain
- the name of your External file.
-
- responses The user responses, written as 3 numbers separated by
- commas. They consist of the offset in hex, the type of data
- at this offset and the number of items. For example,
- 1240,-1,1 would mean one zero-terminated string at offset
- &1240. At present the responses block must be the last one
- in the Script file, and is not terminated by an "end". This
- may change in the future .
-
- If you try to run this version with a Script file from version 2 it will
- display an error message and stop, but will write a file called "Header"
- into the current directory. You can simply paste this header into your old
- Script file to bring it up to version 3 standard. (Note: A few people have
- pre-release copies of version 3. If you are one of these, you should delete
- the first 2 lines of the Script file before adding the new header)
-
-
- TYPE GUESSING
- =============
- Version 3 of ModDis includes "guessing" for the first time. If it
- encounters some bytes of unknown type and there is no entry in the Script
- file the program applies some simple rules to try to establish what their
- type is before asking the user for help. At the moment there are three types
- of guessing:
-
- Alignment bytes If the program is not on a word boundary, and all the
- bytes to the word boundary are zero then it guesses
- that these are alignment bytes. This action is
- controlled by the variable guessalign% - it takes
- place if the variable is TRUE
-
- Branch tables If the program is on a word boundary and both this
- instruction and the previous one are unconditional
- branches then it guesses that this is part if a
- branch table. This action is controlled by the
- variable guessbratab% - it takes place if the
- variable is TRUE.
-
- Load/store word If the program is at a load/store register
- instruction, then the location loaded from/stored to
- is guessed to be a double. This action is controlled
- by the variable guessldrstr% - it takes place if the
- variable is TRUE.
-
- These three types of guessing were chosen initially because they were easy
- to implement and because they give very good results in practice. In most
- cases the amount of user interaction (and hence the size of the Script file)
- is reduced by about 30% and in extreme cases by as much as 60%. In the case
- of some very short modules guessing allows disassembly to be completed with
- no user intervention at all. The branch table guessing is particularly
- welcome since the old recognition routines never worked particularly well,
- and it was very tedious to be forced to single-step through a long table
- when it was perfectly obvious what it was.
-
- There is, however, a penalty to be paid in that the guesses may sometimes
- not be what you would have chosen. For instance, it is common for certain
- strings (like TASK) to be stored as a double, but if it is accessed by an
- LDR then guessing will disassemble it as a double rather than a string.
- Also, guessing alignment bytes will only work if the original programmer has
- taken the trouble to initialise his workspace to zeros when the original
- module was assembled. If there are any garbage bytes showing through
- guessing will fail to recognise them alignment bytes.
-
- Note that the program always checks the Script file first before resorting
- to guessing, so if you find that it's guessing wrong in one particular case
- you can manually add this one case to the Script file and leave guessing
- enabled for the rest of the module. If you find the program is guessing
- consistently wrong you should disable guessing and carry on as normal.
-
-
- CONFLICTS
- =========
- Conflicts can arise if the program identifies a byte as one type and then
- finds it was already marked as another. In these circumstances, the program
- offers the prompt "[A]dvance, [R]etreat, [S]top ?" when a conflict arises.
- Pressing "A" means go ahead and overwrite the old type with the new one. "R"
- means back off and leave things as they are while "S" means stop the program
- while you figure out what's happening.
-
-
- SYSTEM VARIABLES
- ================
- ModDis now stores the current filename in a system variable called
- ModDis$File. When prompted for a filename you can just press Return and
- ModDis will look up the system variable and use it as a filename. This
- system variable is also used by the Checker program if you don't give it a
- filename. The idea is that if you need to exit ModDis to edit the Script
- file manually, you can restart easily. Obviously the system variable is
- created the first time ModDis is run and is lost if the computer is reset
- so you must supply a filename at least once at the start of the ModDis
- session.
-
-
- COMMENTS
- ========
- Comments can now be placed in the Script file, but only in the responses
- block at the moment. Putting a ! as the first character on a line will cause
- that line to be ignored by the program. I only added this feature so that I
- could experiment with removing certain responses from the Script file to see
- if the guessing routines could cope without them, so it will probably be of
- limited use to the user.
-
-
- PROBLEMS
- ========
- This is a list of known problem areas with ModDis. In most cases the
- problems are obscure, or difficult to correct in the program, but easy to
- spot and fix in the final assembler listing.
-
- Immediate constants
- -------------------
- An immediate constant is expressed as an 8-bit value with a 4-bit rotation
- applied to it. This means that there can be more than one way of expressing
- a constant, eg #1 can be represented as #1 with 0 rotation, #4 with a
- rotation of 2, etc. The BASIC assembler hides this process from the
- programmer, who merely supplies the required constant and lets the assembler
- figure out how to represent it. If the original module has #1 stored as
- 4>>2, the disassembler will convert this to #1, and the BASIC assembler will
- reassemble it as 1>>0. This is functionally identical, but a byte-by-byte
- comparison of the two modules will show a difference.
-
- Another related problem is that the program sometimes fails to expand the
- constant at all, giving something like MOV R0,#0,24, ie 0 rotated by 24,
- which is still 0. This format, which the program generates through the
- debugger module, is not valid for the BASIC assembler and should be replaced
- by MOV R0,#0. Some of these are detected and translated by this version
- of the program, but it's not foolproof, and some still slip through. This
- problem is caused by a bug in the Debugger module. To solve it permanently,
- you need to modify the Debugger module itself, changing offset &920 from
- &1A000028 to &FA000028, effectively changing a BNE to a BNV. In months of use
- I have found this modification seems to have no drawbacks and I would
- recommend that you store a modified copy of the Debugger module in the
- ModDis directory and make the !Boot file load it automatically.
-
-
- The ADR directive
- -----------------
- ADR Rn,#address is converted by the assembler into either SUB Rn,PC,#offset
- or ADD Rn,PC,#offset, depending on whether the address comes before or after
- the current PC value. A problem arises in that ADD Rn,PC,#0 is functionally
- identical to SUB Rn,PC,#0. The BASIC assembler always chooses the former, so
- if the latter occurs in a module it will be converted from SUB to ADD
- leading to an apparent difference between the modules when compared
- byte-by-byte. It should have no effect on running the module.
-
-
- SWI branch table
- ----------------
- The program attempts to match the SWI names to the SWI branch table, but it
- sometimes fails to identify the branch table as belonging to the SWI
- routine. In this case it substitutes labels of the type xABCD, so the
- assembly is still syntactically correct (and logically) correct. I think
- this is one of those things you're going to have to live with.
-
-
- Embedded text
- -------------
- Something that cropped up in the NetPrint module was like this:
- BL s1234
- EQUS "A message"
- More code...
- where the subroutine at s1234 adjusts the return address to skip the string.
- While ModDis recognises this if used with SWI "OS_WriteS", it cannot check
- for a user routine which does the same. ModDis will try to interpret the
- string as code, and there are two possible outcomes. Either it will succeed,
- and "drop through" into some following code, in which case you will never
- suspect anything untoward has happened unless you examine the Source file
- carefully, or it will find that the string disassembles as an undefined or
- coprocessor instruction, in which case it will print a warning message. In
- the latter case you can note the offset of the offending subroutine and edit
- the Script file to take account of it (See the section on Script file
- format).
-
-
- Odd labels
- ----------
- Sometimes a reference is made to, for example, the byte BEFORE a block of
- data, the pointer being incremented before use. Something like:
- ADR xF6F
- ........
- MOV PC,R14
- .xF70 EQUS "...."
- It is not possible to place a label at &F6F, since it lies in the middle of
- an instruction. Instead, you should manually substitute something like this:
- ADR xF70-1
- This sort of thing is usually fairly easy to spot.
-
-
- ALIGN
- -----
- When assembling a module, the ALIGN directive is used to move to a word
- boundary, simply incrementing the pointer by 1, 2 or 3. The bytes "skipped"
- like this are not explicitly set to zero, and will contain whatever was
- there at the time the module was originally compiled. When recompiling after
- running ModDis, these 'skipped" bytes may well have different values, and the
- Checker program will detect this, but since they play no part in the module,
- they are unimportant.
-
-
- Weirdos
- -------
- I have come across some code which looked like this:
- BICEQ PC,R14,#&10000000
- ORRNE PC,R14,#&10000000
- EQUS "...."
- Now it's obvious that the program is returning from a subroutine with the
- overfow flag either set or clear, depending on the state of the zero flag.
- Unfortunately, the program sees both of these as *conditional* end-of-codes
- and assumes that there is more code following, and goes blundering on into
- the string. I can't see any obvious simple way to cope with this, and I wish
- the author had used
- BICEQ PC,R14,#&10000000
- ORR PC,R14,#&10000000
- EQUS "...."
- instead, in which case ModDis would have had no trouble with it.
-
-
- THE END
- =======
- My thanks to those users who reported bugs in the previous versions of
- ModDis, and to those who made suggestions on improving it, especially Ralph
- Corderoy who converted the program to a RISC-OS compliant application and
- made many improvements to the code (not all of which I have included yet -
- patience Ralph :-)). If you have any problems or (constructive) comments
- about ModDis, please get in touch with me. If you have any ideas for
- improving it, or if you extend it yourself, I'd be very glad to hear from
- you. I'd also be interested in comparing notes with anyone else out there
- who has written an ARM disassembler (anybody from BASS there? I can't find
- a copy of !Dissi anywhere).
-
- Lorcan Mongey
- 56 Salisbury Court
- Dublin Road
- Belfast BT7 1DD
- Tel: 0232 234386
-
- I can also be reached at the following BBS:
-
- BBS Telephone Username Number
- --- --------- -------- ------
- CIX 081 390 1244 lorcan@cix.compulink.co.uk
- The World of Cryton 0749 670030 lorcan #237
- Arcade 081 654 2212 lorcan #417
-