home *** CD-ROM | disk | FTP | other *** search
- Chapter13 L.O.V.E. FORTH
-
-
-
- 13.0 Third Party Assembler Interface and Linker
- ------------------------------------------
-
- Traditionally in Forth systems, a "Forth Assembler" has been
- included. Adding assembler components to high-level language can produce
- dramatic improvements in performance and capability over high-level Forth.
- Unfortunately, these assemblers are usually written in Forth, and have
- serious limitations. Often the syntax is markedly different from the
- expected syntax for the particular processor. It is usually difficult
- enough for most programmers to work in normal assembler syntax, without
- having to learn a new one.
-
-
- L.O.V.E. Forth has been designed to use virtually any third party
- assembler, using standard assembler syntax. Whenever CODE ;CODE or
- ASM is encountered, Forth calls in the third party assembler to process
- the word, and links in the resulting object file, with a built-in linker.
- This means that not only can normal syntax be used in words created by the
- programmer, but that assembly language program sections from other sources
- can be included with little or no modification.
-
- The authors recommend the excellent assember A86 by Isaacson, also
- available as shareware. The original L.O.V.E. Forth RPN assembler is
- included with the system as source code, to be used, if desired.
-
-
- 13.1 Operation
- ---------
-
- A small amount of set-up is required in order to configure the
- system. The authors have already included configuration files for A86,
- Microsoft's MASM and Borland's TASM (see Assember Set-up below). For
- simple code words, like those supported by the old RPN assemblers, use
- is straightforward. For example, a word to make four copies of the top of
- stack:
-
- CODE DUP4 ; ( n -- n,n,n,n )
- pop ax
- push ax ; push some copies
- push ax
- push ax
- push ax
- next c;
- The operation NEXT above is a pre-defined macro.
-
- There are many other powerful features of this facility, namely,
- the use of declarations in the assembly code. Not only can machine code be
- assembled, but any other type of data, including threads, heads, and data.
- Words can be defined using PUBLIC and existing words can be referenced
- with EXTRN. These are all interpreted by the linker portion of this
- interface.
-
-
- 13.2 Errors during assembly
- ----------------------
-
- If the assembler fails to produce an object file, an error message
- is displayed, and compilation is aborted. The programmer must then examine
- the error or listing file mentioned in the error message, in order to
- determine the problem. The file containing the code to assemble is usually
- called CODE-4TH.ASM, and the file with the errors is usually named
- CODE-4TH.ERR or CODE-4TH.LST.
-
-
-
- 13.3 SEGMENT Declarations
- --------------------
-
- The linker supports several reserved segment and class names, for
- use in directing code into various segments. These are: 'CODE',
- 'THREADS', 'DATA', 'HEADS', and 'STACKS'. These reserved names can
- either be used as segment, names (most common), or as class names. When
- used as segment names, any class name then specified is ignored.
-
- The following segments are declared automatically for the
- programmer at the beginning of each assembly. The programmer need only
- switch between them (eg. HEADS SEGMENT is sufficient to switch to
- heads, without all the other parts of the declaration).
-
- code segment byte public 'CODE'
- code ends
- threads segment word public 'THREADS'
- threads ends
- data segment byte public 'DATA'
- data ends
- heads segment byte public 'HEADS'
- heads ends
- stacks segment byte public 'STACKS'
- stacks ends
-
- The code segment is the default, if no other is specified, allowing
- simple words to assemble with no declarations whatsoever. There is a
- statement CODE SEGMENT automatically inserted before the assembler
- statements, and the statements CODE ENDS and END after the end of the
- assembler word. The directive:
- ASSUME CS:CODE, DS:CODE, ES:CODE
- is also inserted, so no segment overrides will be inserted by the
- assembler, unless the programmer explicitly includes them.
-
-
- 13.4 Origins
- -------
-
- When any segment is declared in an assembler, the origin is assumed
- to be 0. This is fine, when the only code being dealt with is produced by
- the assembler; the programmer is in complete control. Here the code must
- be loaded on top of an existing program - L.O.V.E. Forth. Therefore, the
- origins have been constructed to follow a slightly different pattern.
-
- When a reserved name is used for a segment name, the real segment
- origin is at 0000 in the L.O.V.E. Forth segment. The origin (if any) given
- by the programmer is incremented by HERE (or CS:HERE, TS:HERE, etc),
- prior to the code being loaded in. This ensures that there are no
- overwritten areas of memory. Alignment attribute is not meaningful for
- standard segments; they already start on even byte, word, paragraph and
- page boundaries.
-
-
- Should the programmer desire an origin of 0, in the segment being
- declared, a different name (unreserved) should be used. In this case, the
- linker looks to the class name for direction, on where to load the code
- into memory. If the class name is not specified, the code is loaded into
- the CODE segment. The alignment type may be specified, if so desired; the
- combine type is ignored.
-
-
- 13.5 SEGMENT Examples
- ----------------
-
- The most common declaration is:
- CODE SEGMENT
- which causes the code following it to be placed in the code
- segment. The origin coming in from the object file (normally
- 0 for the first code in that segment) is incremented by the
- dictionary pointer. Therefore the ORG is forced to be CS:HERE
-
- Another more complex example is:
- MYTHREADS SEGMENT WORD PUBLIC 'THREADS'
- which causes the following code to be loaded into the thread
- segment. The origin is relative to the start of this declared
- segment.
-
- MYSEG SEGMENT
- Code/data in this segment has its own origin of 0.
- If grouped, however, it has an offset from the start of the
- group <=64k. It is placed in ram in one of the standard
- segments (in this case the code segment)
-
- THREADS SEGMENT byte public 'code'
- The segment and class conflict - in this case, the class is
- ignored.
-
-
- 13.6 GROUP Declaration
- -----------------
-
- The programmer may declare any group, that does not group different
- L.O.V.E. Forth segments together (can't because >=64k apart). A segment
- may be part of only one group.
-
- EXTRN declarations
-
- The address or value of existing Forth words may be referenced in
- the assembler code, using the EXTRN declaration. Since words in
- L.O.V.E. Forth have several parts, the address of each part may be
- obtained, by adding a special prefix to the name desired. The prefixes are
- sorted out by the linker.
-
- Prefix Segment Purpose
- Register
- CODE@ (no prefix) CS address of machine code
- THREADS@ DS compilation address
- DATA@ ES parameter field address
- HEADS@ n/a name field address
- IMMEDIATE@ n/a special - executes the
- following word at link-time to
- obtain value
-
- For example:
-
- EXTRN CODE@COUNT:NEAR, DATA@TIB:BYTE, IMMEDIATE@HERE:ABS
-
- MOV BYTE PTR ES:DATA@TIB, 0DH ; install carriage return
- ADD AX,IMMEDIATE@HERE ; add HERE
- JMP CODE@COUNT ; exit via a forth word
-
-
- If the word appears without a prefix or if CODE@ is in front of the
- word, then the address of the related machine code is returned. This is
- the same as is returned with 'CODE . Similarly THREADS@ returns the
- compilation address of the following word. The most useful prefix is
- perhaps DATA@ which returns the parameter field address, the address
- returned by a VARIABLE or other word created by CREATE. HEADS@
- returns the name field address. This is relative to the head segment, the
- actual value of which can be obtained from the label HSEG (see Frame
- Fixups below).
-
- The word IMMEDIATE@ can execute a word at link-time. This is
- typically a CONSTANT whose value is required, or a VARIABLE whose
- address is required in assembly code ( eg. IMMEDIATE@BL ). It can be any
- word that returns a single cell on the stack. If HERE or the other
- dictionary values are referenced, they return the values they had, prior to
- linking.
-
- If using MASM the programmer must pay particular attention to
- how the external references are declared. When using the reference as a
- memory pointer (eg. BYTE PTR ) the reference must be declared as :BYTE or
- :WORD (or other address delaration). A value used as an immediate type
- operand must be declared :ABS . If mis-declared, MASM ignores the
- addressing mode explicitly used in the instruction, in favour of what is
- implied in the EXTRN declaration. A reference can, therefore, not be
- used both as an immediate type operand and a memory reference.
-
-
- If using A86, the programmer need not include the EXTRN
- directive, as any symbols that are undefined, are automatically declared
- external. And if the EXTRN directive is used any type declaration
- (:NEAR, :WORD, :ABS, etc.) may be used, A86 handles all cases correctly.
-
-
- 13.7 Forth Words with Illegal Characters
- -----------------------------------
-
- When words contain characters that are illegal for the assembler, a
- prefix of %% may be used. This prefix is dealt with before assembly begins,
- and changes the name to one acceptable for the assembler. Illegal
- characters include: +-*/%^() and many more. The word prefixed by %% must,
- however, be terminated by a space, tab or end of line. For example:
-
- %%-TRAILING %%+! %%2DUP
-
- Complete example, a word which exits via */
-
- CODE 550_337_*/ ; ( scale n by this fraction to get m ( n -- m )
- extrn %%*/ :near ; reference to the word */
- mov ax,550
- push ax
- mov ax,337
- push ax
- jmp %%*/ c;
-
-
- 13.8 PUBLIC declarations
- -------------------
- Just as it is possible to reference Forth words from within
- assember with EXTRN, it is also possible to create new words. This is
- done with the PUBLIC directive. This can be used to create multiple
- entry points in words, or simply to create address references available in
- high level code or other code definitions. The %% prefix described above,
- can be used to make names with assembler-illegal characters. Example:
-
- CODE QDROP ; ( q -- )
- POP AX ; yes, there are more efficient ways of coding
- POP AX ; this word
- DDROP:POP AX
- DROP: POP AX
- NEXT
-
- PUBLIC DDROP ; ( d -- )
- PUBLIC DROP ; ( n -- )
- c;
-
- As shown in the table below, PUBLIC declarations work
- differently, depending on which segment the label is declared in. Note
- that a reference to the data segment, effectively becomes a VARIABLE .
-
- code segment A CODE word is created
- threads segment The PUBLIC address is assumed to be
- the compilation address of a word
- other segment A CONSTANT is created with the value
- names of the PUBLIC address
-
-
- A PUBLIC Caution about FORGET
-
- Words declared PUBLIC are CREATED at link-time.
- Unfortunately, most linkers do not provide PUBLIC declarations in any
- reasonable order. This means that a word declared later, may refer to a
- word lower in memory. This conflicts with FORGET which removes
- everything above the forgotten word. When using FORGET, be sure to forget
- all of the words PUBLIClTY CREATED within one code word or ASM section.
-
-
-
- 13.9 The Command ASM
- ---------------
-
- ASM is the best way to include a large body of assembly code into
- Forth. ASM simply begins a section of assembly language code. There is
- no word CREATED like CODE , Words that require access from high-level
- Forth or other assembler words, should be declared PUBLIC as described
- above. Many code words can thus be included in one section. Example:
-
- ASM
- code segment
- BIT: ; ( access a table of bits ( n -- bit )
- POP BX
- ADD BX,BX
- PUSH es: [BX+bittable]
- NEXT
- code ends
-
- data segment
- assume cs:data
- bittable: dw 1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192
- dw 16384,32768
- data ends
- PUBLIC BIT
- end c;
-
-
- 13.10 Linking Object Files
- --------------------
-
-
- The linker is automatically started after assembling a code word
- with CODE ; CODE or ASM . It is also possible for the linker to
- operate on existing object files. The authors may also be delivering
- object file versions of utilities and upgrades in the future. The syntax
- for this commmand is LINK" followed by the path and file name of a
- Microsoft format OBJ file.
-
- For example:
- LINK" MATRIX.OBJ"
- Would link in the specified file.
-
-
- 13.11 Assembler Set-up
- ----------------
-
- Three assemblers are currently supported directly. A86, Microsoft
- MASM version 5, MASM version 6 and Borland TASM. In order to use one
- of these, the configuration file must be copied to the name ASSEMBLY.CFG,
- for example to use A86 type: COPY LOVEA86.CFG ASSEMBLY.CFG for MASM,
- MASM 6 and TASM, the files are LOVEMASM.CFG LOVEML6.CFG and
- LOVETASM.CFG respectively. MASM version 6 takes so much memory that
- the extended memory version must be used. This only works if you omit
- EMM386.
-
- If using another assembler, any of the above files can be modified
- according to what the assembler needs. Read the instructions in the CFG
- files (standard ascii). The following information must be provided:
-
- command line
- input, output, listing, error files
- the macro definition for NEXT
- the segment declarations
-
- lines to precede the lines parsed from CODE or ;CODE
- lines to follow the lines from CODE or ;CODE
-
-
- When the assembly file is created, first the macro definition, then
- the segment declarations described above are inserted into the file, along
- with the name of the word being assembled (if applicable). If assembling
- the words CODE or ;CODE, the "line to preceding" those parsed above
- are inserted, then the lines between CODE (;CODE) and C;. The file is
- terminated with the "lines to follow" from above. If the command ASM is
- used, the lines between ASM and C; are inserted following the segment
- declarations, and the file is terminated.
-
-
- 13.12 Improving performance
- ---------------------
- This method of assembly can be slow on any machine. The act of
- calling another program (assembler) through DOS is time consuming,
- especially in disk accesses. There are two ways to speed this up:
-
- 1. Use the ASM facility to group CODE words together. The
- words which would otherwise have been declared separately
- will all be declared at one time, using the PUBLIC
- declaration. The assembler is only invoked once per ASM
- section.
-
- 2. Create a small RAM disk to include the temporary files
- listed in ASSEMBLY.CFG (just change the drive and/or
- directory where these are stored). For most words a size of
- 30k should be more than enough. The assembler itself can
- also be copied to the RAM disk if it is big enough.
-
-
- 13.13 Frame Fixups
- ------------
-
- Frame fixups are not supported. This means that explicit references
- to segments are not allowed. Keep in mind that, on entry to any code word,
- the segment registers contain the usual segment values. In addition, there
- are locations defined in the CS: (CODE segment) that contain the current
- addresses of the standard segments. (These are CONSTANTS).
-
- Address contains segment value also in register
- CSEG CODE CS
- TSEG THREADS DS
- VSEG DATA ES
- SSEG STACKS SS
- HSEG HEADS n/a
- PSPSEG DOS program segment prefix n/a
-
- So access to these values is via the CS register, for example,
- to load the VS value into DS:
-
- MOV DS, word ptr CS: IMMEDIATE@VSEG
-
- 13.14 Why frame fixups are not supported:
- -----------------------------------
-
-
- In order to be used interactively, any frame numbers included in
- code would have to be resolved immediately on assembly. This is not a
- problem; the problems occur later. When an application is SAVED and
- then re-executed at a later time, the location in memory where DOS loads
- the program is often different. Relocation is supported by DOS; the EXE
- file header can contain relocation items. However, when the program is
- SAVED, the segment memory images are concatenated and the result is saved
- in the EXE file. It is difficult to determine both where the fixup
- locations are, and where they are to point to, since on re-execution the
- image is expanded again. In addition, before the image is to be saved,
- these references would have to be de-relocated. Not completely impossible,
- but difficult. Further difficulties ensue if the program is saved as a
- final APPLICATION, where the program is both saved and executed in its
- concatenated form.
-
- A version of L.O.V.E. Forth in preparation is able to perform frame
- fixups (the fixup information is stored as a field in each dictionary
- head). When saving an application with APPLICATION" these data are
- transferred to the .EXE header.