home *** CD-ROM | disk | FTP | other *** search
-
- Forthmacs Implementation
- ************************
-
- This chapter describes how RISC OS Forthmacs implements the Forth virtual
- machine on the ARM processors. It assumes that you have a fairly good
- knowledge of conventional Forth implementations; it does not attempt to be a
- tutorial on how Forth works.
-
-
- Dialect
- =======
-
- RISC OS Forthmacs has been an implementation of the Forth-83 standard, with a
- few exceptions. It is now far on it's way to be an ANS compliant
- implementation. It is still rather compatible with the other implementations
- for Sun-68k, Sparc, Atari, Macintosh and OS-9 computers.
-
-
- Stack Width and Addressing
- ==========================
-
- In RISC OS Forthmacs, all stack items as well as memory cells are 32-bit wide,
- remember this when writing portable programs. Use the portable ANS operators
- like cell+ cells or the RISC OS Forthmacs specific words like /cell cells+.
-
- The address could conceivably grow to 2 to the 32nd power (4 gigabytes), but
- this is restricted by the current CPU/MMU versions to 16/256 MBytes. 16-bit
- or 2-byte memory accesses are not supported any longer and must be emulated if
- necessary.
-
- Note: word accesses are simulated by two byte accesses, take care about
- interrupts occurring here!
-
- The current ARM MMUs don't support non-aligned memory accesses. NOTE: They
- don't abort or run any exception vector but just do something UNDEFINED and
- CPU core dependent. Take care of this, it took me hours to find a bug!
-
- All accesses must be one of:
-
- 1) byte-wide access to any address in the address area
-
- 2) cell-wide ( 32-bit ) access to any aligned address
-
- The word wide access possible at least on StrongARM and ARM8 cpus is NOT
- supported by the RiscPC platforms. Special source extensions are available
- for single-board platforms.
-
-
- Both stacks are pre-decrementing/post-incrementing. The parameter stack holds
- its top-of-stack in the top-register top - r10, this allows much faster code
- definitions because of the CPUs load-and-store architecture.
-
-
- Register Usage
- ==============
-
-
- r9 user area pointer up
- r10 top-of-stack register top
- r11 returnstack pointer rp
- r12 instruction pointer ip
- r13 stack pointer sp
- r14 link register lk
- r15 pc + status + flags pc
- r15 sr sr hold the flags part of r15
-
- Note: The internal structure of the pc and flags registers differs between
- cpus. It seems to be better, to generally imagine pc and status register as
- two registers. The hardware-errors and the .registers instruction know about
- this.
-
- r0, r1, r2, r3, r4, r5, and r6 are available for use within code definitions.
- Don't try to use them for permanent storage, because they are used by many
- code words with no attempt to preserve the previous contents.
-
- Registers r7-r14 can be used within code definitions with great care, but you
- have to save and restore their values at the beginning/end of the definition.
-
-
- Inner <address> Interpreter
- ===========================
-
- The inner interpreter next is direct threaded, post incrementing. The
- compilation address of all definitions contain machine code to be executed,
- not a pointer. Each code definition ends with the next code, assembled
- in-line. The next code is:
- pc ip )+ ldr
- This means: Load the program-counter pc ( don't affect the CPU status ) from
- the 4-byte cell pointed to by the instruction pointer ip, postincrement the
- instruction-pointer. So the next is only one CPU instruction and very fast.
- It is much faster than
- address dolink branch
- ...
- pc link mov
- constructions because of only one pipeline reload per next. But on the other
- hand, there is definitely a larger overhead for calling secondaries.
-
- RISC OS Forthmacs versions >= 3.1/2.70 can switch to another next scheme. The
- first 8 cells in the user area are free debugging purposes. SLOW-NEXT ( you
- find this in lib.arm.debugm ) patches all next as well as conditional next
- calls to
- pc up mov
- Normall there is a normal next instruction at up@ but may install any service
- routine there to do additional checking at run time. The new debugger uses
- this to branch into the debugger handler. After 'debugging' you switch back
- to the normal next with FAST-NEXT
-
- If you want to use this scheme, there is one thing to remember. As SLOW-NEXT
- patches all instructions
- pc ip )+ condition ldr
- your handler might be patches as well. In these cases you should use
- pc 1 ip ia! ldm
- instead. This does the same - at least from RISC OS Forthmacs point of view -
- but isn't patched. See lib.arm.debugm again for an example in the debugger.
-
- For discussions about subroutine threaded ( macro extended ) versus threaded
- code implementations see the Forth literature. Generally, macros do bring
- some advantage in execution speed but give less information about the code
- itself, so debuggers are less useful. The penalty for direct threaded code is
- hard to predict, it depends very much on the type of application. Something
- like 50% sounds reasonable, so optimising the bottlenecks could bring big
- advantages. The 'runtimer ' utilities might help you doing this.
-
- The assembler macro c; assembles the next instruction and ends assembling by
- end-code. A fast conditional next can be done by
- ...
- r2 0 cmp
- eq next
- ...
-
-
- Other Definitions
- =================
-
- Any word that is not a code definition contains a branch+link instruction at
- the code-field, this makes a relative branch to an inline-address and saves
- the pc+sr to the lk register.
- runtime-addr dolink branch
- The inline address points to a code fragment (headerless in most cases) that
- implements the run-time action of the word. The parameter field starts just
- after this branch+link instruction and can be found by clearing the flags in
- the link register like this:
- r0 lk th fc000003 # bic
- r0 get-link
-
- The run-time codes may have to push the top-register to the stack, save the
- return pointer to the return-stack and set the instruction or stack pointer to
- the parameter field address. All standard runtime codes (those of variables,
- constants, colon definitions, user variables ...) have been optimized for best
- cache-hit rates.
-
- Note: word-type ( cfa -- addr ) finds the address of the words runtime code in
- this implementation.
-
-
- Colon definitions
- =================
-
- The runtime code:
- mlabel docolon assembler
- ip rp push
- ip get-link c;
- The body of a Colon Definition starts 4 bytes after the compilation address.
- The body contains a list of compilation addresses of other words. Each such
- compilation address is a 32-bit number which is an absolute address.
-
-
- Variable
- ========
-
- The Parameter Field of a variable contains a 32-bit number which is the value
- of the variable. The runtime code:
- mlabel dovariable assembler
- top sp push
- top get-link c;
-
-
- Constants
- =========
-
- The Parameter Field of a constant contains the 32-bit value of the constant.
- The runtime code:
- mlabel doconstant assembler
- top sp push
- r0 get-link
- top r0 ) ldr c;
-
-
- User Variables
- ==============
-
- The value of a user variable is stored in the user area as a 32-bit number.
- The Parameter Field of a user variable contains a 32-bit offset into the user
- area of the current task. r8 contains the base address of the current user
- area. r8 is symbolically defined as up in the assembler. The runtime code:
- mlabel douser assembler
- top sp push
- r0 get-link
- r0 r0 ) ldr
- top r0 up add c;
-
-
- Deferred words
- ==============
-
- The compilation address of the word to be executed by a defer word is stored
- as a 32-bit absolute address in the user area. The Parameter Field of a
- deferred word contains a 32-bit number which is an offset into the user area
- of the current task. The runtime code:
- mlabel dodefer assembler
- r0 get-link
- r0 r0 ) ldr
- pc r0 up ib ldr end-code
- The last line holds a somewhat optimized next instruction, it means: Load the
- pc from the address in the user area with the offset r0.
-
-
- ;code
- =====
-
- The compilation address of a word created by a create ... ;code data type
- construction contains the standard branch+link instruction that branches to
- the runtime code.
-
- The runtime code is defined by the programmer in the ;code part of the
- definition.
-
- In versions up to 3.1/2.62 ;code assembled two instructions for your
- convenience
- top sp push
- top get-link
- this is not the case any more. I changed this to be more portable with the
- FirmWorks implementation and i feel that all Forth programmers using ;code
- should be able to handle this.
-
-
- does>
- =====
-
-
- mlabel dodoes assembler
- ip rp push
- ip get-link c;
- The runtime code is defined by the programmer in the does> part of the
- definition. Before branching to the dodoes code, the does> instruction
- assembles
- top sp push
- top lk th fc000003 # bic
- to get the parameter field address.
-
-
- local variables
- ===============
-
- RISC OS Forthmacs has built in ANS Forth conforming local variables spending
- their lifetime on the return-stack in stack-frames. The stack-frames are
- linked via a user variable LOCAL-FRAME which is also used to locate a local
- variables value. The frame structure is like:
- | cfa:frame> | old-frame | old-rs | loc | loc | .........
- with cfa:pop-frame on top of the return-stack. pop-frame removes the current
- frame and switches to the last frame.
- headerless code pop-frame \ this routine is pushed on return stack by push-locals
- here /cell+ token,
- r0 rp 2 rp ia ldm
- r0 'user local-frame str
- ip rp pop c;
-
- The local variables are accessed using (loc) followed by an stack frame index.
- code (loc) \ ( -- n ) runtime-code of any local
- r0 'user local-frame ldr
- r1 ip )+ ldr
- top sp push
- top r0 r1 2 #asl db ldr c;
-
- Note: The decompiler can not know the local variables names, so it assumes
- names like ( v0 v1 ...).
-
-
- Tokens
- ======
-
- Within the body of a colon definition, calls to other Forth words are compiled
- as the 32-bit absolute compilation address of those words. These tokens have
- a corresponding bit in the relocation table.
-
-
- Branching
- =========
-
- Branch targets are offsets relative to the location that contains the branch
- offset. They are stored as 32-bit twos-complement numbers representing the
- number of bytes between the offset location and the branch target. For
- example, a branch to the following location could be compiled with:
-
- postpone branch 4 ,
-
- NOTE: This is implemented different in version 3.1/2.00. The relative offset
- is replaced by an immediate absolute relocated address.
-
-
- Doubles
- =======
-
- RISC OS Forthmacs versions newer than 1.83 have full double number support,
- all conversion tools convert, number?, d. use doubles, the 'scaling' words */
- */mod um/mod use double intermediate results.
-
- Also the text-interpreter and compiler accept literals as doubles when there
- is a period at the end of it.
- : test 1234. d. ;
- 1234. is a double number and d. displays it.
-
- This could only be achieved with changing stack effects in a number of words.
- So these new RISC OS Forthmacs versions are no longer compatible when these
- words are used. The lib.compatible tool does not cover these changes.
-
- The advantage of the new stack behaviour is it's ANS compliancy and the
- improved arithmetic capabilities.
-
-
- Floats
- ======
-
- RISC OS Forthmacs versions newer than 3.1/2.13 have the ANS Floating and
- Floating Extended wordsets included. There isn't any further documentation
- available so far, please use the ANS docs for this purpose.
-
-
- StrongARM compatibility
- =======================
-
- Versions from 3.1/2.30 run on StronARM based machines but optimized code is
- available from 3.1/2.40 onwards.
-
-
- Cache
- =====
-
- The newer ARM based cpus ( ARM8, StrongARM ) have a different cache structure
- than the elder versions. Separate instruction- and data caches are used and
- code synchronizing has to be done after change of the code space.
-
- flush-cache and sync-cache are both implemented in current RISC OS Forthmacs
- versions >3.1/2.40 in such a way, that the compiler is not significantly
- slowed down, in fact a StrongARM compilation is much faster than on the older
- ARM710.
-
-
- Header format - # of bytes in parentheses
- =========================================
-
- Source Field (4), Link Field (4), Name Field (n), Padding (0 to 3), Flags (1),
- Code Field (4), Parameter Field (n).
-
- As all addresses need to be, the Link Field, Name Field, and Code Field are
- all aligned.
-
- Links point to links ( not to Name Fields, as in FIG Forth! )
-
- The name field is a normal Forth packed string. (Many Forth implementations
- set the high bit in the first and last characters of the name field;
- RISC OS Forthmacs does not).
-
- Name Field: length-byte, 0-31 character name.
-
-
-
- Vocabularies
- ============
-
- Vocabularies have #threads - way hashing. This means that each vocabulary has
- 16 separate linked lists of words. The threads are stored in the user area.
- The Parameter Field of a vocabulary contains the 32-bit offset of the threads
- in the user area, followed by the vocabulary-link, a 32-bit pointer to the
- previous vocabulary. The runtime high-level code is:
- does> body> context token!
-
- Before searching a vocabulary, a hashing function is applied to the name to be
- located. The hashing function selects one of the 16 linked lists to search.
-
- The hashing function is very simple. The lower 4 bits of the first character
- in the name (the first name character, not the length byte) are interpreted as
- a number from 0 to 15, selecting a linked list.
-
- Vocabularies are not chained to one another. Search order is implemented
- using the also / only scheme. Each vocabulary thread is terminated with a
- special link field in the final word. The special link address is the address
- of the origin of the Forth system (which may change from session to session
- due to the relocation that the operating system applies when loading and
- executing the Forth system.
-
- The parameter field for a vocabulary looks like:
-
- User number (/cell), Voc-link (/cell)
-
- The user number selects the place in the user area where the head of list
- pointers for the 16 vocabulary threads are stored. Each vocabulary requires
- 16 cells bytes of user area storage for these 16 threads. The values stored
- in the user area are the Link field Addresses for the top word in each thread.
-
-
- Relocation
- ==========
-
- In the RISC OS environment all programs of the absolute type are loaded at
- $8000 and executed from there. So on first sight the relocation table doesn't
- make much sense in this version if you don't care about being portable to
- other RISC OS Forthmacs implementations.
-
- But the relocation table can be used for target/meta-compiling or for
- relocating code during run-time. This is necessary for producing turnkey
- applications with an 'Application Stripper', use of the application stripper
- requires strict adherence to the rules of relocatability.
-
- If the program is not relocatable, then a file saved with save-forth will work
- only if it is later executed at the same address where it was executing when
- it was saved. If saved with save-forth, a program that is not relocatable
- will not work at all, regardless of the address where it is later executed.
- Consequently, use of the application stripper requires strict adherence to the
- rules of relocatability.
-
- In most cases, the relocation bitmap is maintained automatically, without
- requiring any special effort on the part of the programmer. However, there
- are some cases where the programmer must take explicit actions to ensure that
- the program is relocatable.
-
- The executable file contains a relocation list used to identify the locations
- in the program's binary image which contain absolute addresses. When the
- program is loaded, each of these locations is modified by adding the starting
- address of the program to the number contained in that location. Only 32-bit
- numbers may be so modified.
-
- While RISC OS Forthmacs is running, it maintains its own relocation table,
- identifying those locations in the Forth dictionary which must be relocated
- during cold-code. Each bit in the map represents the address of one aligned
- location. This relocation table is completely different from the standard
- RISC OS relocation tables, it is only used from within RISC OS Forthmacs.
-
- In order for this to work properly, the programmer must be careful to use
- token, A, link, token!, A! or link! to store an address or token in the
- dictionary, all six set the relocation flags.
-
- Addresses may be stored into variables with ! ( without requiring the use of
- token! ) if the variable is re-initialized every time that the application is
- started. token! is only necessary if the variables value must be set before
- save-forth is executed, and then is used when the saved application is later
- invoked, without being re-initialized by the application's initialization
- code.
-
- If , or ! is used instead, the address will not be properly relocated if
- save-forth has been used to write the dictionary image to an executable file.
-
- Note: The lib/checkrel.fth program can help you catch relocation problems in
- your applications. It should be loaded before you load your application, and
- will warn you if your application does things that may not be relocatable.
- After you have fixed the relocation problems, you can load your application
- without lib/checkrel.fth .
-
-
- See: .buffers .pointers token! token, A! A, link! link, set-relocation-bit
- relocation-map
-
-
- Program header
- ==============
-
- The header of the executable binary image looks like this:
-
- h_magic ( 0) \ Magic Number
- h_tlen ( 4) \ length of text (code)
- h_dlen ( 8) \ length of initialised data
- h_blen ( c) \ length of BSS unitialised data
- h_slen ( 10) \ length of symbol table
- h_entry ( 14) \ Entry address
- h_trlen ( 18) \ Text Relocation Table length
- h_drlen ( 1c) \ Data Relocation Table length
-
- the magic number is the branch+link instruction just behind this header.
- Note: this header might be changed with future releases according to Acorns
- executable binary code standard.
-
-
- Heap memory
- ===========
-
- RISC OS Forthmacs is loaded to $8000 and will have as much memory available as
- was defined by 'WimpSlot' .
-
- The main-tasks user area immediately follows the first instructions and some
- permanent data at $8040.
-
- $600 byte will be allocated in the module-heap RMA, it will hold the env-area,
- the command-line area plus all handlers used by shelled programs.
-
- The implementation of the dynamic memory manager has changed in Version
- 3.1-2.00. From now on the dictionary and the heap share the same memory area,
- the dictionary grows from lower addresses and the heap can be as large as the
- area between the stacks and here.
-
- Note: Of course you may install another memory manager or add more heaps.
-
-
- Dictionary memory
- =================
-
- At the top of the dictionary are both stacks defined by rp0 - rs-size and sp0
- - ps-size and the tib, below this are MBytes of free memory (well, hopefully).
- here marks the end of the allocated dictionary, classically pad is here plus
- something.
-
- RISC OS Forthmacs knows about two dictionary areas, the resident (which is the
- dictionary you know in all implementations) and the transient. The transient
- dictionary is in the heap memory, definitions defined here won't use
- dictionary space in the target application. So it might be useful to do:
- transient
- fload assembler
- fload debugger
- resident
- fload myapplication
- Now the debugger and assembler will be in transient address space. To remove
- all links, pointers etc. into the transient address space use dispose, it
- will do this for you. .dispose will also give some informations what is
- removed while executing dispose.
-
-