home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Power-Programmierung
/
CD1.mdf
/
forth
/
compiler
/
love
/
chap13.doc
< prev
next >
Wrap
Text File
|
1993-04-11
|
19KB
|
430 lines
Chapter13 L.O.V.E. FORTH
13.0 Third Party Assembler Interface and Linker
------------------------------------------
Traditionally in Forth systems, a "Forth Assembler" has been
included. Adding assembler components to high-level language can produce
dramatic improvements in performance and capability over high-level Forth.
Unfortunately, these assemblers are usually written in Forth, and have
serious limitations. Often the syntax is markedly different from the
expected syntax for the particular processor. It is usually difficult
enough for most programmers to work in normal assembler syntax, without
having to learn a new one.
L.O.V.E. Forth has been designed to use virtually any third party
assembler, using standard assembler syntax. Whenever CODE ;CODE or
ASM is encountered, Forth calls in the third party assembler to process
the word, and links in the resulting object file, with a built-in linker.
This means that not only can normal syntax be used in words created by the
programmer, but that assembly language program sections from other sources
can be included with little or no modification.
The authors recommend the excellent assember A86 by Isaacson, also
available as shareware. The original L.O.V.E. Forth RPN assembler is
included with the system as source code, to be used, if desired.
13.1 Operation
---------
A small amount of set-up is required in order to configure the
system. The authors have already included configuration files for A86,
Microsoft's MASM and Borland's TASM (see Assember Set-up below). For
simple code words, like those supported by the old RPN assemblers, use
is straightforward. For example, a word to make four copies of the top of
stack:
CODE DUP4 ; ( n -- n,n,n,n )
pop ax
push ax ; push some copies
push ax
push ax
push ax
next c;
The operation NEXT above is a pre-defined macro.
There are many other powerful features of this facility, namely,
the use of declarations in the assembly code. Not only can machine code be
assembled, but any other type of data, including threads, heads, and data.
Words can be defined using PUBLIC and existing words can be referenced
with EXTRN. These are all interpreted by the linker portion of this
interface.
13.2 Errors during assembly
----------------------
If the assembler fails to produce an object file, an error message
is displayed, and compilation is aborted. The programmer must then examine
the error or listing file mentioned in the error message, in order to
determine the problem. The file containing the code to assemble is usually
called CODE-4TH.ASM, and the file with the errors is usually named
CODE-4TH.ERR or CODE-4TH.LST.
13.3 SEGMENT Declarations
--------------------
The linker supports several reserved segment and class names, for
use in directing code into various segments. These are: 'CODE',
'THREADS', 'DATA', 'HEADS', and 'STACKS'. These reserved names can
either be used as segment, names (most common), or as class names. When
used as segment names, any class name then specified is ignored.
The following segments are declared automatically for the
programmer at the beginning of each assembly. The programmer need only
switch between them (eg. HEADS SEGMENT is sufficient to switch to
heads, without all the other parts of the declaration).
code segment byte public 'CODE'
code ends
threads segment word public 'THREADS'
threads ends
data segment byte public 'DATA'
data ends
heads segment byte public 'HEADS'
heads ends
stacks segment byte public 'STACKS'
stacks ends
The code segment is the default, if no other is specified, allowing
simple words to assemble with no declarations whatsoever. There is a
statement CODE SEGMENT automatically inserted before the assembler
statements, and the statements CODE ENDS and END after the end of the
assembler word. The directive:
ASSUME CS:CODE, DS:CODE, ES:CODE
is also inserted, so no segment overrides will be inserted by the
assembler, unless the programmer explicitly includes them.
13.4 Origins
-------
When any segment is declared in an assembler, the origin is assumed
to be 0. This is fine, when the only code being dealt with is produced by
the assembler; the programmer is in complete control. Here the code must
be loaded on top of an existing program - L.O.V.E. Forth. Therefore, the
origins have been constructed to follow a slightly different pattern.
When a reserved name is used for a segment name, the real segment
origin is at 0000 in the L.O.V.E. Forth segment. The origin (if any) given
by the programmer is incremented by HERE (or CS:HERE, TS:HERE, etc),
prior to the code being loaded in. This ensures that there are no
overwritten areas of memory. Alignment attribute is not meaningful for
standard segments; they already start on even byte, word, paragraph and
page boundaries.
Should the programmer desire an origin of 0, in the segment being
declared, a different name (unreserved) should be used. In this case, the
linker looks to the class name for direction, on where to load the code
into memory. If the class name is not specified, the code is loaded into
the CODE segment. The alignment type may be specified, if so desired; the
combine type is ignored.
13.5 SEGMENT Examples
----------------
The most common declaration is:
CODE SEGMENT
which causes the code following it to be placed in the code
segment. The origin coming in from the object file (normally
0 for the first code in that segment) is incremented by the
dictionary pointer. Therefore the ORG is forced to be CS:HERE
Another more complex example is:
MYTHREADS SEGMENT WORD PUBLIC 'THREADS'
which causes the following code to be loaded into the thread
segment. The origin is relative to the start of this declared
segment.
MYSEG SEGMENT
Code/data in this segment has its own origin of 0.
If grouped, however, it has an offset from the start of the
group <=64k. It is placed in ram in one of the standard
segments (in this case the code segment)
THREADS SEGMENT byte public 'code'
The segment and class conflict - in this case, the class is
ignored.
13.6 GROUP Declaration
-----------------
The programmer may declare any group, that does not group different
L.O.V.E. Forth segments together (can't because >=64k apart). A segment
may be part of only one group.
EXTRN declarations
The address or value of existing Forth words may be referenced in
the assembler code, using the EXTRN declaration. Since words in
L.O.V.E. Forth have several parts, the address of each part may be
obtained, by adding a special prefix to the name desired. The prefixes are
sorted out by the linker.
Prefix Segment Purpose
Register
CODE@ (no prefix) CS address of machine code
THREADS@ DS compilation address
DATA@ ES parameter field address
HEADS@ n/a name field address
IMMEDIATE@ n/a special - executes the
following word at link-time to
obtain value
For example:
EXTRN CODE@COUNT:NEAR, DATA@TIB:BYTE, IMMEDIATE@HERE:ABS
MOV BYTE PTR ES:DATA@TIB, 0DH ; install carriage return
ADD AX,IMMEDIATE@HERE ; add HERE
JMP CODE@COUNT ; exit via a forth word
If the word appears without a prefix or if CODE@ is in front of the
word, then the address of the related machine code is returned. This is
the same as is returned with 'CODE . Similarly THREADS@ returns the
compilation address of the following word. The most useful prefix is
perhaps DATA@ which returns the parameter field address, the address
returned by a VARIABLE or other word created by CREATE. HEADS@
returns the name field address. This is relative to the head segment, the
actual value of which can be obtained from the label HSEG (see Frame
Fixups below).
The word IMMEDIATE@ can execute a word at link-time. This is
typically a CONSTANT whose value is required, or a VARIABLE whose
address is required in assembly code ( eg. IMMEDIATE@BL ). It can be any
word that returns a single cell on the stack. If HERE or the other
dictionary values are referenced, they return the values they had, prior to
linking.
If using MASM the programmer must pay particular attention to
how the external references are declared. When using the reference as a
memory pointer (eg. BYTE PTR ) the reference must be declared as :BYTE or
:WORD (or other address delaration). A value used as an immediate type
operand must be declared :ABS . If mis-declared, MASM ignores the
addressing mode explicitly used in the instruction, in favour of what is
implied in the EXTRN declaration. A reference can, therefore, not be
used both as an immediate type operand and a memory reference.
If using A86, the programmer need not include the EXTRN
directive, as any symbols that are undefined, are automatically declared
external. And if the EXTRN directive is used any type declaration
(:NEAR, :WORD, :ABS, etc.) may be used, A86 handles all cases correctly.
13.7 Forth Words with Illegal Characters
-----------------------------------
When words contain characters that are illegal for the assembler, a
prefix of %% may be used. This prefix is dealt with before assembly begins,
and changes the name to one acceptable for the assembler. Illegal
characters include: +-*/%^() and many more. The word prefixed by %% must,
however, be terminated by a space, tab or end of line. For example:
%%-TRAILING %%+! %%2DUP
Complete example, a word which exits via */
CODE 550_337_*/ ; ( scale n by this fraction to get m ( n -- m )
extrn %%*/ :near ; reference to the word */
mov ax,550
push ax
mov ax,337
push ax
jmp %%*/ c;
13.8 PUBLIC declarations
-------------------
Just as it is possible to reference Forth words from within
assember with EXTRN, it is also possible to create new words. This is
done with the PUBLIC directive. This can be used to create multiple
entry points in words, or simply to create address references available in
high level code or other code definitions. The %% prefix described above,
can be used to make names with assembler-illegal characters. Example:
CODE QDROP ; ( q -- )
POP AX ; yes, there are more efficient ways of coding
POP AX ; this word
DDROP:POP AX
DROP: POP AX
NEXT
PUBLIC DDROP ; ( d -- )
PUBLIC DROP ; ( n -- )
c;
As shown in the table below, PUBLIC declarations work
differently, depending on which segment the label is declared in. Note
that a reference to the data segment, effectively becomes a VARIABLE .
code segment A CODE word is created
threads segment The PUBLIC address is assumed to be
the compilation address of a word
other segment A CONSTANT is created with the value
names of the PUBLIC address
A PUBLIC Caution about FORGET
Words declared PUBLIC are CREATED at link-time.
Unfortunately, most linkers do not provide PUBLIC declarations in any
reasonable order. This means that a word declared later, may refer to a
word lower in memory. This conflicts with FORGET which removes
everything above the forgotten word. When using FORGET, be sure to forget
all of the words PUBLIClTY CREATED within one code word or ASM section.
13.9 The Command ASM
---------------
ASM is the best way to include a large body of assembly code into
Forth. ASM simply begins a section of assembly language code. There is
no word CREATED like CODE , Words that require access from high-level
Forth or other assembler words, should be declared PUBLIC as described
above. Many code words can thus be included in one section. Example:
ASM
code segment
BIT: ; ( access a table of bits ( n -- bit )
POP BX
ADD BX,BX
PUSH es: [BX+bittable]
NEXT
code ends
data segment
assume cs:data
bittable: dw 1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192
dw 16384,32768
data ends
PUBLIC BIT
end c;
13.10 Linking Object Files
--------------------
The linker is automatically started after assembling a code word
with CODE ; CODE or ASM . It is also possible for the linker to
operate on existing object files. The authors may also be delivering
object file versions of utilities and upgrades in the future. The syntax
for this commmand is LINK" followed by the path and file name of a
Microsoft format OBJ file.
For example:
LINK" MATRIX.OBJ"
Would link in the specified file.
13.11 Assembler Set-up
----------------
Three assemblers are currently supported directly. A86, Microsoft
MASM version 5, MASM version 6 and Borland TASM. In order to use one
of these, the configuration file must be copied to the name ASSEMBLY.CFG,
for example to use A86 type: COPY LOVEA86.CFG ASSEMBLY.CFG for MASM,
MASM 6 and TASM, the files are LOVEMASM.CFG LOVEML6.CFG and
LOVETASM.CFG respectively. MASM version 6 takes so much memory that
the extended memory version must be used. This only works if you omit
EMM386.
If using another assembler, any of the above files can be modified
according to what the assembler needs. Read the instructions in the CFG
files (standard ascii). The following information must be provided:
command line
input, output, listing, error files
the macro definition for NEXT
the segment declarations
lines to precede the lines parsed from CODE or ;CODE
lines to follow the lines from CODE or ;CODE
When the assembly file is created, first the macro definition, then
the segment declarations described above are inserted into the file, along
with the name of the word being assembled (if applicable). If assembling
the words CODE or ;CODE, the "line to preceding" those parsed above
are inserted, then the lines between CODE (;CODE) and C;. The file is
terminated with the "lines to follow" from above. If the command ASM is
used, the lines between ASM and C; are inserted following the segment
declarations, and the file is terminated.
13.12 Improving performance
---------------------
This method of assembly can be slow on any machine. The act of
calling another program (assembler) through DOS is time consuming,
especially in disk accesses. There are two ways to speed this up:
1. Use the ASM facility to group CODE words together. The
words which would otherwise have been declared separately
will all be declared at one time, using the PUBLIC
declaration. The assembler is only invoked once per ASM
section.
2. Create a small RAM disk to include the temporary files
listed in ASSEMBLY.CFG (just change the drive and/or
directory where these are stored). For most words a size of
30k should be more than enough. The assembler itself can
also be copied to the RAM disk if it is big enough.
13.13 Frame Fixups
------------
Frame fixups are not supported. This means that explicit references
to segments are not allowed. Keep in mind that, on entry to any code word,
the segment registers contain the usual segment values. In addition, there
are locations defined in the CS: (CODE segment) that contain the current
addresses of the standard segments. (These are CONSTANTS).
Address contains segment value also in register
CSEG CODE CS
TSEG THREADS DS
VSEG DATA ES
SSEG STACKS SS
HSEG HEADS n/a
PSPSEG DOS program segment prefix n/a
So access to these values is via the CS register, for example,
to load the VS value into DS:
MOV DS, word ptr CS: IMMEDIATE@VSEG
13.14 Why frame fixups are not supported:
-----------------------------------
In order to be used interactively, any frame numbers included in
code would have to be resolved immediately on assembly. This is not a
problem; the problems occur later. When an application is SAVED and
then re-executed at a later time, the location in memory where DOS loads
the program is often different. Relocation is supported by DOS; the EXE
file header can contain relocation items. However, when the program is
SAVED, the segment memory images are concatenated and the result is saved
in the EXE file. It is difficult to determine both where the fixup
locations are, and where they are to point to, since on re-execution the
image is expanded again. In addition, before the image is to be saved,
these references would have to be de-relocated. Not completely impossible,
but difficult. Further difficulties ensue if the program is saved as a
final APPLICATION, where the program is both saved and executed in its
concatenated form.
A version of L.O.V.E. Forth in preparation is able to perform frame
fixups (the fixup information is stored as a field in each dictionary
head). When saving an application with APPLICATION" these data are
transferred to the .EXE header.