home *** CD-ROM | disk | FTP | other *** search
-
-
- Forthmacs Implementation
- ************************
-
- This chapter describes how Risc-OS Forthmacs implements the Forth
- virtual machine on the ARM processors. The chapter assumes that you
- have a fairly good knowledge of conventional Forth implementations; it
- does not attempt to be a tutorial on how Forth works.
-
-
- Dialect
- =======
-
- Risc-OS Forthmacs is an implementation of the Forth-83 standard, with a
- few exceptions. It is a descendent of the public-domain F83
- implementation by Laxen and Perry, and contains most of the F83
- extensions, as well as many new ones. It is compatible with the other
- implementations for Sun-68k, Sparc, Atari, Macintosh and OS-9 computers.
-
-
- Stack Width and Addressing
- ==========================
-
- Risc-OS Forthmacs deviates from the 83 Standard in the width of the
- stack. Forth 83 specifies that stack items are 16-bit numbers, and that
- the address space is 64K. This wastes much of the power of modern CPUs
- and is almost impossible to implement on ARM based computers.
-
- In Risc-OS Forthmacs, all stack items as well as memory cells are 32-bit
- wide, remember this when writing portable programs.
-
- The address could conceivably grow to 2 to the 32nd power (4 gigabytes),
- but this is restricted by the current CPU/MMU versions to 16 MBytes.
- 16-bit or 2-byte memory accesses are not supported any longer and must
- be emulated if necessary.
-
- Note: word accesses are simulated by two byte accesses, take care about
- interrupts occurring here!
-
- The current ARM MMUs don't support non-aligned memory accesses. NOTE:
- They don't abort or run any exception vector but just do something not
- clearly defined and CPU dependent. Take care of this, it took me hours
- to find a bug!
-
- All accesses must be one of:
-
- 1) byte-wide access to any address in the address area
-
- 2) cell-wide ( 32-bit ) access to any aligned address
-
-
- Both stacks are pre-decrementing/post-incrementing. The parameter stack
- holds its top-of-stack in the top-register r10, this allowes much faster
- code definitions because of the CPUs load-and-store architecture.
-
-
- Register Usage
- ==============
-
-
- r7 rx reserved for later usage
- r8 instruction pointer ip
- r9 user area pointer up
- r10 top-of-stack register top
- r11 returnstack pointer rp
- r12 Risc-Os frame pointer fp never use this!
- r13 stack pointer sp
- r14 link register lk
- r15 pc + status + flags pc
- r15 sr sr hold the flags part of r15
-
- Note: In future CPU Versions the internal structure of the pc-register
- might be different, it seems to be better, to imagine pc and status
- register as two registers. The hardware-errors and the `.registers`
- instruction know about this.
-
- r0, r1, r2, r3, r4, r5, and r6 are available for use within code
- definitions. Don't try to use them for permanent storage, because they
- are used by many code words with no attempt to preserve the previous
- contents.
-
-
- Inner <address> Interpreter
- ===========================
-
- The inner interpreter `next` is direct threaded, post incrementing. The
- compilation address of all definitions contain machine code to be
- executed, not a pointer. Each `code` definition ends with the `next`
- code, assembled in-line. The `next` code is:
- pc ip )+ ldr
- This means: Load the programm-counter `pc` ( don't affect the CPU status
- ) from the 4-byte cell pointed to by the instruction pointer ip,
- postincrement the instruction-pointer. So the `next` is only one CPU
- instruction and very fast. It is much faster than
- address dolink branch
- ...
- pc link mov
- constructions because of only one pipeline reload per `next.` But on the
- other hand, there is definitly a larger overhead for calling
- secondaries.
-
- For discussions about subroutine threaded ( macro extended ) versus
- threaded code implementations see the Forth literature. Generally
- macros bring some advantage in execution speed but give less information
- about the code itself, so debuggers are less usefull. The penalty for
- direct threaded code is very hard to predict, it depends very much on
- the type of application. Something like 50% sounds reasonable, so
- optimizing the bottlenecks could bring large advantages. The 'runtimer '
- utilities might help you doing this.
-
- The assembler macro `c;` assembles the `next` instruction and ends
- assembling by `end-code.` A fast conditional next can be done by
- ...
- r2 0 cmp
- eq next
- ...
-
-
- Other Definitions
- =================
-
- Any word that is not a code definition contains a branch+link
- instruction at the code-field, this makes a relative branch to an
- inline-address and saves the pc+sr to the lk register.
- runtime-addr dolink branch
- The inline address points to a code fragment (headerless in most cases)
- that implements the run-time action of the word. The parameter field
- starts just after this branch+link instruction and can be found by
- clearing the flags in the link register like this:
- r0 lk th fc000003 # bic
- r0 get-link
-
- The run-time codes may have to push the top-register to the stack, save
- the return pointer to the return-stack and set the instruction or stack
- pointer to the parameter field address. All standard runtime codes
- (those of variables, constants, colon definitions, user variables ...)
- have been optimized for best cache-hit rates on ARM3/6 machines.
-
- Note: `word-type` ( cfa -- addr ) finds the address of the words runtime
- code in this implementation.
-
-
- Colon definitions
- =================
-
- The runtime code:
- mlabel docolon assembler
- ip rp push
- ip get-link c;
- The body of a Colon Definition starts 4 bytes after the compilation
- address. The body contains a list of compilation addresses of other
- words. Each such compilation address is a 32-bit number which is an
- absolute address.
-
-
- Variable
- ========
-
- The Parameter Field of a `variable` contains a 32-bit number which is
- the value of the variable. The runtime code:
- mlabel dovariable assembler
- top sp push
- top get-link c;
-
-
- Constants
- =========
-
- The Parameter Field of a `constant` contains the 32-bit value of the
- constant. The runtime code:
- mlabel doconstant assembler
- top sp push
- r0 get-link
- top r0 ) ldr c;
-
-
- User Variables
- ==============
-
- The value of a `user` variable is stored in the `user` area as a 32-bit
- number. The Parameter Field of a user variable contains a 32-bit offset
- into the user area of the current task. r8 contains the base address of
- the current user area. r8 is symbolically defined as `up` in the
- assembler. The runtime code:
- mlabel douser assembler
- top sp push
- r0 get-link
- r0 r0 ) ldr
- top r0 up add c;
-
-
- Deferred words
- ==============
-
- The compilation address of the word to be executed by a `defer` word is
- stored as a 32-bit absolute address in the `user` area. The Parameter
- Field of a deferred word contains a 32-bit number which is an offset
- into the user area of the current task. The runtime code:
- mlabel dodefer assembler
- r0 get-link
- r0 r0 ) ldr
- pc r0 up ib ldr end-code
- The last line holds a somewhat optimized `next` instruction, it means:
- Load the pc from the address in the user area with the offset r0.
-
-
- ;code
- =====
-
- The compilation address of a word created by a `create` ... `;code`
- data type construction contains the standard branch+link instruction
- that branches to the runtime code.
-
- The runtime code is defined by the programmer in the `;code` part of the
- definition. At first the `;code` instruction assembles
- top sp push
- top get-link
- before the programmers code. This is mainly for convenience, so the top
- is already saved to the stack and points to the parameter field. The
- programmer might do '-2 /n* allot ' to forget this for speed optimized
- code.
-
-
- does>
- =====
-
-
- mlabel dodoes assembler
- ip rp push
- ip get-link c;
- The runtime code is defined by the programmer in the `does>` part of the
- definition. Before branching to the dodoes code, the does> instruction
- assembles
- top sp push
- top lk th fc000003 # bic
- to get the parameter field address.
-
-
- local variables
- ===============
-
- Risc-OS Forthmacs has built in ANS Forth conforming local variables
- spending their lifetime on the return-stack in stack-frames. The
- stack-frames are linked via a `user` variable LOCAL-FRAME which is also
- used to locate a local variables value. The frame structure is like:
- | cfa:frame> | old-frame | old-rs | loc | loc | .........
- with cfa:pop-frame on top of the return-stack. pop-frame removes the
- current frame and switches to the last frame.
- headerless code pop-frame \ this routine is pushed on return stack by push-locals
- here-t /token-t + token,-t
- r0 rp 2 rp ia ldm
- r0 'user local-frame str
- ip rp pop c;
-
- The local variables are accessed using (loc) followed by an stack frame
- index.
- code (loc) \ ( -- n ) runtime-code of any local
- r0 'user local-frame ldr
- r1 ip )+ ldr
- top sp push
- top r0 r1 2 #asl db ldr c;
-
- Note: The disassembler can `not` know the local variables names, so it
- assumes names like V0 V1 ... .
-
-
- Vocabularies
- ============
-
- Each `vocabulary` has `#threads` ( currently 16 ) 32-bit pointers which
- are called "threads". A thread is the head of a linked list of words.
- A hashing function selects which of the 16 linked lists a particular
- word belongs to. The threads are stored in the `user` area. The
- Parameter Field of a `vocabulary` contains the 32-bit offset of the
- threads in the `user` area, followed by the vocabulary-link, a 32-bit
- pointer to the previous `vocabulary.` The runtime high-level code is:
- does> body> context token!
-
-
- Tokens
- ======
-
- Within the body of a colon definition, calls to other Forth words are
- compiled as the 32-bit absolute compilation address of those words.
- These tokens have a corresponding bit in the relocation table.
-
-
- Branching
- =========
-
- Branch targets are offsets relative to the location that contains the
- branch offset. They are stored as 32-bit twos-complement numbers
- representing the number of bytes between the offset location and the
- branch target. For example, a branch to the following location could be
- compiled with:
-
- compile branch 4 ,
-
-
- Header format - # of bytes in parentheses
- =========================================
-
- Source Field (4), Link Field (4), Name Field (n), Padding (0 to 3),
- Flags (1), Code Field (4), Parameter Field (n).
-
- As all addresses must be, the Link Field, Name Field, and Code Field are
- all aligned.
-
- Links point to links ( `not` to Name Fields, as in FIG Forth! )
-
- The name field is a normal Forth packed string. (Many Forth
- implementations set the high bit in the first and last characters of the
- name field; Risc-OS Forthmacs does not).
-
- Name Field: length-byte, 0-31 character name.
-
-
-
- Vocabulary Format
- =================
-
- Vocabularies have 8-way hashing. This means that each vocabulary has 16
- separate linked lists of words. Before searching a vocabulary, a
- hashing function is applied to the name to be located. The hashing
- function selects one of the 8 linked lists to search.
-
- The hashing function is very simple. The lower 3 bits of the first
- character in the name (the first name character, not the length byte)
- are interpreted as a number from 0 to 7, selecting a linked lists.
-
- Vocabularies are not chained to one another. Search order is
- implemented using the `also` / `only` scheme. Each vocabulary thread is
- terminated with a special link field in the final word. The special
- link address is the address of the origin of the Forth system (which may
- change from session to session due to the relocation that the operating
- system applies when loading and executing the Forth system.
-
- The parameter field for a vocabulary looks like:
-
- User number (4), Voc-link (4)
-
- The user number selects the place in the user area where the head of
- list pointers for the 4 vocabulary threads are stored. Each vocabulary
- requires 8 bytes of user area storage for these 4 threads. The values
- stored in the user area are the Link field Addresses for the top word in
- each thread.
-
-
- Relocation
- ==========
-
- In the RISC-OS environment all programs of the absolute type are loaded
- at $8000 and executed from there. So on first sight the relocation
- table doesn't make much sense in this version.
-
- But the relocation table can be used for target/meta-compiling or for
- relocating code during run-time.
-
- The executable file contains a relocation list used to identify the
- locations in the program's binary image which contain absolute
- addresses. When the program is loaded, each of these locations is
- modified by adding the starting address of the program to the number
- contained in that location. Only 32-bit numbers may be so modified.
-
- While Risc-OS Forthmacs is running, it maintains its own relocation
- table, identifying those locations in the Forth dictionary which must be
- relocated during COLD-CODE. Each bit in the map represents the address
- of one aligned location. This relocation table is completely different
- from the standard Risc OS relocation tables, it is only used from within
- Risc-OS Forthmacs.
-
- In order for this to work properly, the programmer must be careful to
- use `token,` or `token!` to store an address in the dictionary, both set
- the relocation flags. If , or ! is used instead, the address will not
- be properly relocated if `save-forth` has been used to write the
- dictionary image to an executable file.
-
- See: `token!` `token,` `set-relocation-bit,` `relocation-map`
-
-
- Program header
- ==============
-
- The header of the executable binary image looks like this:
-
- h_magic ( 0) \ Magic Number
- h_tlen ( 4) \ length of text (code)
- h_dlen ( 8) \ length of initialized data
- h_blen ( c) \ length of BSS unitialized data
- h_slen ( 10) \ length of symbol table
- h_entry ( 14) \ Entry address
- h_trlen ( 18) \ Text Relocation Table length
- h_drlen ( 1c) \ Data Relocation Table length
-
- the magic number is the branch+link instruction just behind this header.
- Note: this header might be changed with future releases according to
- Acorns executable binary code standard.
-
-
- Heap memory
- ===========
-
- Risc-OS Forthmacs is loaded to $8000 and will have as much memory
- available as was defined by 'WimpSlot' .
-
- The `main-tasks` user area immediatly follows the first instructions at
- $8050.
-
- $600 byte will be allocated in module-heap, it will hold the ENV-AREA,
- the command-line area, the `interrupt-code` at TICKER plus all handlers
- used by shelled programs.
-
- At the top of the programs private memory is the programs heap, it's
- default size is 64kB but can be set to any possible size by setting '
- Forthmacs$heapsize' (see "General Information"). All `buffer:s` will be
- allocated during bootup.
-
- The bottom of the heap is `main-heap,` it's size is `heap-size.` All
- dynamic memory handling words as `allocate,` `free,` `available` and `
- resize` all handle memory inside this heap. Note: Of course you may
- install another memory manager or add more heaps.
-
-
- Dictionary memory
- =================
-
- At the top of the dictionary are both stacks defined by `rp0` - RS-SIZE
- and `sp0` - `ps-size` and the `tib,` below this are MBytes of free
- memory (well, hopefully). `here` marks the end of the allocated
- dictionary, classically `pad` is `here` plus something.
-
- Risc-OS Forthmacs knows about two dictionary areas, the `resident`
- (which is the dictionary you know in all implementations) and the `
- transient.` The transient dictionary is in the heap memory, definitions
- defined here won't use dictionary space in the target application. So
- it might be useful to do:
- transient
- fload assembler
- fload debugger
- resident
- fload myapplication
- Now the debugger and asembler will be in transient address space. To
- remove all links, pointers etc. into the transient address space use `
- dispose,` it will do this for you. .DISPOSE will also give some
- informations what is removed.
-
-