Power-Programmierung

home *** CD-ROM | disk | FTP | other *** search

/ Power-Programmierung / CD1.mdf / forth / compiler / love / chap13.doc < prev next >

Wrap

Text File | 1993-04-11 | 19KB | 430 lines

Chapter13 L.O.V.E. FORTH 13.0 Third Party Assembler Interface and Linker ------------------------------------------ Traditionally in Forth systems, a "Forth Assembler" has been included. Adding assembler components to high-level language can produce dramatic improvements in performance and capability over high-level Forth. Unfortunately, these assemblers are usually written in Forth, and have serious limitations. Often the syntax is markedly different from the expected syntax for the particular processor. It is usually difficult enough for most programmers to work in normal assembler syntax, without having to learn a new one. L.O.V.E. Forth has been designed to use virtually any third party assembler, using standard assembler syntax. Whenever CODE ;CODE or ASM is encountered, Forth calls in the third party assembler to process the word, and links in the resulting object file, with a built-in linker. This means that not only can normal syntax be used in words created by the programmer, but that assembly language program sections from other sources can be included with little or no modification. The authors recommend the excellent assember A86 by Isaacson, also available as shareware. The original L.O.V.E. Forth RPN assembler is included with the system as source code, to be used, if desired. 13.1 Operation --------- A small amount of set-up is required in order to configure the system. The authors have already included configuration files for A86, Microsoft's MASM and Borland's TASM (see Assember Set-up below). For simple code words, like those supported by the old RPN assemblers, use is straightforward. For example, a word to make four copies of the top of stack: CODE DUP4 ; ( n -- n,n,n,n ) pop ax push ax ; push some copies push ax push ax push ax next c; The operation NEXT above is a pre-defined macro. There are many other powerful features of this facility, namely, the use of declarations in the assembly code. Not only can machine code be assembled, but any other type of data, including threads, heads, and data. Words can be defined using PUBLIC and existing words can be referenced with EXTRN. These are all interpreted by the linker portion of this interface. 13.2 Errors during assembly ---------------------- If the assembler fails to produce an object file, an error message is displayed, and compilation is aborted. The programmer must then examine the error or listing file mentioned in the error message, in order to determine the problem. The file containing the code to assemble is usually called CODE-4TH.ASM, and the file with the errors is usually named CODE-4TH.ERR or CODE-4TH.LST. 13.3 SEGMENT Declarations -------------------- The linker supports several reserved segment and class names, for use in directing code into various segments. These are: 'CODE', 'THREADS', 'DATA', 'HEADS', and 'STACKS'. These reserved names can either be used as segment, names (most common), or as class names. When used as segment names, any class name then specified is ignored. The following segments are declared automatically for the programmer at the beginning of each assembly. The programmer need only switch between them (eg. HEADS SEGMENT is sufficient to switch to heads, without all the other parts of the declaration). code segment byte public 'CODE' code ends threads segment word public 'THREADS' threads ends data segment byte public 'DATA' data ends heads segment byte public 'HEADS' heads ends stacks segment byte public 'STACKS' stacks ends The code segment is the default, if no other is specified, allowing simple words to assemble with no declarations whatsoever. There is a statement CODE SEGMENT automatically inserted before the assembler statements, and the statements CODE ENDS and END after the end of the assembler word. The directive: ASSUME CS:CODE, DS:CODE, ES:CODE is also inserted, so no segment overrides will be inserted by the assembler, unless the programmer explicitly includes them. 13.4 Origins ------- When any segment is declared in an assembler, the origin is assumed to be 0. This is fine, when the only code being dealt with is produced by the assembler; the programmer is in complete control. Here the code must be loaded on top of an existing program - L.O.V.E. Forth. Therefore, the origins have been constructed to follow a slightly different pattern. When a reserved name is used for a segment name, the real segment origin is at 0000 in the L.O.V.E. Forth segment. The origin (if any) given by the programmer is incremented by HERE (or CS:HERE, TS:HERE, etc), prior to the code being loaded in. This ensures that there are no overwritten areas of memory. Alignment attribute is not meaningful for standard segments; they already start on even byte, word, paragraph and page boundaries. Should the programmer desire an origin of 0, in the segment being declared, a different name (unreserved) should be used. In this case, the linker looks to the class name for direction, on where to load the code into memory. If the class name is not specified, the code is loaded into the CODE segment. The alignment type may be specified, if so desired; the combine type is ignored. 13.5 SEGMENT Examples ---------------- The most common declaration is: CODE SEGMENT which causes the code following it to be placed in the code segment. The origin coming in from the object file (normally 0 for the first code in that segment) is incremented by the dictionary pointer. Therefore the ORG is forced to be CS:HERE Another more complex example is: MYTHREADS SEGMENT WORD PUBLIC 'THREADS' which causes the following code to be loaded into the thread segment. The origin is relative to the start of this declared segment. MYSEG SEGMENT Code/data in this segment has its own origin of 0. If grouped, however, it has an offset from the start of the group <=64k. It is placed in ram in one of the standard segments (in this case the code segment) THREADS SEGMENT byte public 'code' The segment and class conflict - in this case, the class is ignored. 13.6 GROUP Declaration ----------------- The programmer may declare any group, that does not group different L.O.V.E. Forth segments together (can't because >=64k apart). A segment may be part of only one group. EXTRN declarations The address or value of existing Forth words may be referenced in the assembler code, using the EXTRN declaration. Since words in L.O.V.E. Forth have several parts, the address of each part may be obtained, by adding a special prefix to the name desired. The prefixes are sorted out by the linker. Prefix Segment Purpose Register CODE@ (no prefix) CS address of machine code THREADS@ DS compilation address DATA@ ES parameter field address HEADS@ n/a name field address IMMEDIATE@ n/a special - executes the following word at link-time to obtain value For example: EXTRN CODE@COUNT:NEAR, DATA@TIB:BYTE, IMMEDIATE@HERE:ABS MOV BYTE PTR ES:DATA@TIB, 0DH ; install carriage return ADD AX,IMMEDIATE@HERE ; add HERE JMP CODE@COUNT ; exit via a forth word If the word appears without a prefix or if CODE@ is in front of the word, then the address of the related machine code is returned. This is the same as is returned with 'CODE . Similarly THREADS@ returns the compilation address of the following word. The most useful prefix is perhaps DATA@ which returns the parameter field address, the address returned by a VARIABLE or other word created by CREATE. HEADS@ returns the name field address. This is relative to the head segment, the actual value of which can be obtained from the label HSEG (see Frame Fixups below). The word IMMEDIATE@ can execute a word at link-time. This is typically a CONSTANT whose value is required, or a VARIABLE whose address is required in assembly code ( eg. IMMEDIATE@BL ). It can be any word that returns a single cell on the stack. If HERE or the other dictionary values are referenced, they return the values they had, prior to linking. If using MASM the programmer must pay particular attention to how the external references are declared. When using the reference as a memory pointer (eg. BYTE PTR ) the reference must be declared as :BYTE or :WORD (or other address delaration). A value used as an immediate type operand must be declared :ABS . If mis-declared, MASM ignores the addressing mode explicitly used in the instruction, in favour of what is implied in the EXTRN declaration. A reference can, therefore, not be used both as an immediate type operand and a memory reference. If using A86, the programmer need not include the EXTRN directive, as any symbols that are undefined, are automatically declared external. And if the EXTRN directive is used any type declaration (:NEAR, :WORD, :ABS, etc.) may be used, A86 handles all cases correctly. 13.7 Forth Words with Illegal Characters ----------------------------------- When words contain characters that are illegal for the assembler, a prefix of %% may be used. This prefix is dealt with before assembly begins, and changes the name to one acceptable for the assembler. Illegal characters include: +-*/%^() and many more. The word prefixed by %% must, however, be terminated by a space, tab or end of line. For example: %%-TRAILING %%+! %%2DUP Complete example, a word which exits via */ CODE 550_337_*/ ; ( scale n by this fraction to get m ( n -- m ) extrn %%*/ :near ; reference to the word */ mov ax,550 push ax mov ax,337 push ax jmp %%*/ c; 13.8 PUBLIC declarations ------------------- Just as it is possible to reference Forth words from within assember with EXTRN, it is also possible to create new words. This is done with the PUBLIC directive. This can be used to create multiple entry points in words, or simply to create address references available in high level code or other code definitions. The %% prefix described above, can be used to make names with assembler-illegal characters. Example: CODE QDROP ; ( q -- ) POP AX ; yes, there are more efficient ways of coding POP AX ; this word DDROP:POP AX DROP: POP AX NEXT PUBLIC DDROP ; ( d -- ) PUBLIC DROP ; ( n -- ) c; As shown in the table below, PUBLIC declarations work differently, depending on which segment the label is declared in. Note that a reference to the data segment, effectively becomes a VARIABLE . code segment A CODE word is created threads segment The PUBLIC address is assumed to be the compilation address of a word other segment A CONSTANT is created with the value names of the PUBLIC address A PUBLIC Caution about FORGET Words declared PUBLIC are CREATED at link-time. Unfortunately, most linkers do not provide PUBLIC declarations in any reasonable order. This means that a word declared later, may refer to a word lower in memory. This conflicts with FORGET which removes everything above the forgotten word. When using FORGET, be sure to forget all of the words PUBLIClTY CREATED within one code word or ASM section. 13.9 The Command ASM --------------- ASM is the best way to include a large body of assembly code into Forth. ASM simply begins a section of assembly language code. There is no word CREATED like CODE , Words that require access from high-level Forth or other assembler words, should be declared PUBLIC as described above. Many code words can thus be included in one section. Example: ASM code segment BIT: ; ( access a table of bits ( n -- bit ) POP BX ADD BX,BX PUSH es: [BX+bittable] NEXT code ends data segment assume cs:data bittable: dw 1,2,4,8,16,32,64,128,256,512,1024,2048,4096,8192 dw 16384,32768 data ends PUBLIC BIT end c; 13.10 Linking Object Files -------------------- The linker is automatically started after assembling a code word with CODE ; CODE or ASM . It is also possible for the linker to operate on existing object files. The authors may also be delivering object file versions of utilities and upgrades in the future. The syntax for this commmand is LINK" followed by the path and file name of a Microsoft format OBJ file. For example: LINK" MATRIX.OBJ" Would link in the specified file. 13.11 Assembler Set-up ---------------- Three assemblers are currently supported directly. A86, Microsoft MASM version 5, MASM version 6 and Borland TASM. In order to use one of these, the configuration file must be copied to the name ASSEMBLY.CFG, for example to use A86 type: COPY LOVEA86.CFG ASSEMBLY.CFG for MASM, MASM 6 and TASM, the files are LOVEMASM.CFG LOVEML6.CFG and LOVETASM.CFG respectively. MASM version 6 takes so much memory that the extended memory version must be used. This only works if you omit EMM386. If using another assembler, any of the above files can be modified according to what the assembler needs. Read the instructions in the CFG files (standard ascii). The following information must be provided: command line input, output, listing, error files the macro definition for NEXT the segment declarations lines to precede the lines parsed from CODE or ;CODE lines to follow the lines from CODE or ;CODE When the assembly file is created, first the macro definition, then the segment declarations described above are inserted into the file, along with the name of the word being assembled (if applicable). If assembling the words CODE or ;CODE, the "line to preceding" those parsed above are inserted, then the lines between CODE (;CODE) and C;. The file is terminated with the "lines to follow" from above. If the command ASM is used, the lines between ASM and C; are inserted following the segment declarations, and the file is terminated. 13.12 Improving performance --------------------- This method of assembly can be slow on any machine. The act of calling another program (assembler) through DOS is time consuming, especially in disk accesses. There are two ways to speed this up: 1. Use the ASM facility to group CODE words together. The words which would otherwise have been declared separately will all be declared at one time, using the PUBLIC declaration. The assembler is only invoked once per ASM section. 2. Create a small RAM disk to include the temporary files listed in ASSEMBLY.CFG (just change the drive and/or directory where these are stored). For most words a size of 30k should be more than enough. The assembler itself can also be copied to the RAM disk if it is big enough. 13.13 Frame Fixups ------------ Frame fixups are not supported. This means that explicit references to segments are not allowed. Keep in mind that, on entry to any code word, the segment registers contain the usual segment values. In addition, there are locations defined in the CS: (CODE segment) that contain the current addresses of the standard segments. (These are CONSTANTS). Address contains segment value also in register CSEG CODE CS TSEG THREADS DS VSEG DATA ES SSEG STACKS SS HSEG HEADS n/a PSPSEG DOS program segment prefix n/a So access to these values is via the CS register, for example, to load the VS value into DS: MOV DS, word ptr CS: IMMEDIATE@VSEG 13.14 Why frame fixups are not supported: ----------------------------------- In order to be used interactively, any frame numbers included in code would have to be resolved immediately on assembly. This is not a problem; the problems occur later. When an application is SAVED and then re-executed at a later time, the location in memory where DOS loads the program is often different. Relocation is supported by DOS; the EXE file header can contain relocation items. However, when the program is SAVED, the segment memory images are concatenated and the result is saved in the EXE file. It is difficult to determine both where the fixup locations are, and where they are to point to, since on re-execution the image is expanded again. In addition, before the image is to be saved, these references would have to be de-relocated. Not completely impossible, but difficult. Further difficulties ensue if the program is saved as a final APPLICATION, where the program is both saved and executed in its concatenated form. A version of L.O.V.E. Forth in preparation is able to perform frame fixups (the fixup information is stored as a field in each dictionary head). When saving an application with APPLICATION" these data are transferred to the .EXE header.