CHAPTER 8. PASM, THE F-PC ASSEMBLER PASM is an assembler which is based on the F83 8088/86 assembler and an 8086 assembler published in Dr. Dobb's Journal, February 1982, by Ray Duncan. This assembler was subsequently modified by Robert L. Smith to repair bugs, and support the prefix assembler notation. Bob discovered a very simple method to force a postfix assembler to assemble prefix code, by deferring assembly until the next assembler command or the end of line, when all the arguments for the previous assembler command are piled up on the top of the data stack. Tom Zimmer has made additional modifications to allow syntax switching, and to increase compatibility in postfix mode with the F83 Assembler. Writing assembly programs is black magic. It is not appropriate to discuss the joys and frustrations in working at such a low level in this manual. However, F-PC provides the best environment for you to do experiments using assembly language, because you can first verify the algorithm and methodology in high level Forth code and gradually reduce the code to the assembly level. You will find numerous examples in which the high level code in F83 is recoded in assembly, in addition to many of the F83 kernel words which were in assembly already. The best way to learn 8086 assembly language is to use PASM, armed with all the code words in F-PC as templates and examples. Factor your high level words carefully so that words at the bottom level can be conveniently recoded in assembly. Take the F-PC kernel words as templates to start with, and modify them so that they will do exactly what you want them to do. 8.1. PREFIX OR POSTFIX ? PASM supports dual syntaxes. The words PREFIX and POSTFIX switch between the two supported modes. The postfix mode is very similar to F83's CPU8086 Assembler. Prefix mode, which is the default mode, allows a syntax which is much closer to MASM used by Intel and MicroSoft. The assembler supports prefix syntax in an attempt to provide a syntax which is more readable to programmers of other languages. The use of sequential text files for source code encourages the programmer to write programs in the vertical code style with one statement per line. This style is what traditional assembler requires. F-PC works well in this style, if you choose to do so. However, F-PC does not prevent you from writing in the horizontal code style, by which you can squeeze many statements into one line and make you own life miserable. It supports postfix syntax to prevent alienating the established base of F83 users. The prefix notation is close to the original Intel assembly syntax, and certainly will be more familiar to programmers of other languages. All the code words defined in F-PC are coded in the prefix notation. Please consider writing any new assembly code you need in the prefix mode for distribution and assimilation with F-PC. The assembly of a machine instruction is generally deferred to the following three events: when the next assembly mnemonic is encountered, at the end of a line, or when the command END-CODE or A; is executed. Therefore, a good style in writing code words in F-PC is to put one assembly instruction in one line, followed by the parameter specification or the arguments. Multiple assembly instructions are allowed in the same line. It is a good ideal to put the assembly structure words in separate lines with proper indentation so that the nested structures in a code definition can be perceived more readily. 8.2. PASM GLOSSARY Here we will only give a small list of PASM words in this glossary. All assembly mnemonics are identical to those defined in F83 8086 Assembler. All the structure directives and test conditions are also identical to those in F83. Only the most important FORTH words controlling the assembler are listed here. PREFIX ( -- ) Assert prefix mode for the following code definitions. POSTFIX ( -- ) Assert postfix mode for the following code definitions. CODE ( -- ) Define a new code definition using the following string as its name. Assembly commands follow, terminated by END-CODE. END-CODE ( -- ) Terminate a code definition, check error conditions, and make the code definition available for searching and execution. LOCAL_REFS ( -- ) This mode WILL NOT allow local labels to cross CODE word boundaries. The local label mechanism is cleared each time a new CODE word is started. This is the DEFAULT mode. GLOBAL_REFS ( -- ) All local labels will be available across all following code definitions. The label mechanism is NOT reset at the beginning of a CODE definition, so a local label reference can cross CODE word boundaries. The local label mechanism MUST be reset before use in this mode, with the CLEAR_LABELS word. CLEAR_LABELS ( -- ) Clear the local label mechanism, to the unused or clean state, in preparation for using local labels. This word need only be used in the GLOBAL_REFS mode. The LOCAL_REFS mode automatically performs a CLEAR_LABELS each time a CODE definition is started. A; ( -- ) Complete the assembly of the previous machine instruction. BYTE ( -- ) Assemble current and subsequent code using byte arguments, if register size is not explicitly specified in prefix mode. WORD is default in postfix mode. WORD ( -- ) Assemble current and subsequent code using 16 bit arguments, if register size is not explicitly specified in postfix mode. BYTE is default in prefix mode. LABEL ( -- ) Start an assembly subroutine or mark the current code address to be referenced later. Figure 8.1. Comparison of assembly syntax PREFIX POSTFIX MASM AAA AAA AAA ADC AX, SI SI AX ADC ADC AX,SI ADC DX, 0 [SI] 0 [SI] DX ADC ADC DX,0[SI] ADC 2 [BX+SI], DI DI 2 [BX+SI] ADC ADC 2[BX][SI],DI ADC MEM BX BX MEM #) ADC ADC MEM,BX ADC AL, # 5 5 # AL ADC ADC AL,5 AND AX, BX BX AX AND AND AX,BX AND CX, MEM CX MEM #) AND AND CX,MEM AND DL, # 3 3 # DL AND AND DL,3 CALL NAME NAME #) CALL CALL NAME CALL FAR [] NAME FAR [] NAME #) CALL ????? CMP DX, BX BX DX CMP CMP DX,BX CMP 2 [BP], SI SI 2 [BP] CMP CMP [BP+2],SI DEC BP BP DEC DEC BP DEC MEM MEM DEC DEC MEM DEC 3 [SI] 3 [SI] DEC DEC 3[SI] DIV CL CL DIV DIV CL DIV MEM MEM DIV DIV MEM IN PORT# WORD WORD PORT# IN IN AX,PORT# IN PORT# PORT# IN IN AL,PORT# IN AX, DX DX AX IN IN AX,DX INC MEM BYTE MEM INC INC MEM BYTE INC MEM WORD MEM #) INC INC MEM WORD INT 16 16 INT INT 16 JA NAME NAME JA JA NAME JNBE NAME NAME #) JNBE JNBE NAME JMP NAME NAME #) JMP JMP JMP FAR [] NAME NAME [] FAR JMP JMP [NAME] JMP FAR $F000 $E987 JMP F000:E987 LODSW AX LODS LODS WORD LODSB AL LODS LODS BYTE LOOP NAME NAME #) LOOP LOOP NAME MOV DX, NAME NAME #) DX MOV MOV DX,[NAME] MOV AX, BX BX AX MOV MOV AX,BX MOV AH, AL AL AH MOV MOV AH,AL MOV BP, 0 [BX] 0 [BX] BP MOV MOV BP,0[BX] MOV ES: BP, SI ES: BP SI MOV MOV ES:BP,SI MOVSW AX MOVS MOVS WORD POP DX DX POP POP DX POPF POPF POPF PUSH SI SI PUSH PUSH SI REP REP REP RET RET RET ROL AX, # 1 AX ROL ROL AX,1 ROL AX, CL AX CL ROL ROL AX,CL SHL AX, # 1 AX SHL SHL AX,1 XCHG AX, BP BP AX XCHG XCHG AX,BP XOR CX, DX DX, CX XOR XOR CX,DX 8.3. SYNTAX COMPARISON The differences among the F-PC prefix mode, the F83 postfix mode, and the Intel MASM notation are best illustrated by the table in Figure 8.1. Although the table is not exhaustive, it covers most of the cases useful in doing PASM programming. You are welcome to suggest additional cases to be included in this table. 8.4. USAGE OF 8086 MACHINE REGISTERS IN F-PC To write assembly code, you have to know the CPU real well. Most CPU's can be understood and programmed using a CPU model, consisting of the register set and the instructions which manipulate data among the registers, memory, and external devices. In F83, only a 64K bytes segment of memory is used, and all segment registers in 8086 are generally pointing to the same code segment. Since F-PC uses many segments to store code, heads, lists, and other data, you have to know how these segment registers are used and how information in different segments can be accessed conveniently. Following is a list of the 8086 registers and their usage in F-PC: CS Code seg: used for any code definitions (Must be preserved by code word.) DS Data seg: used for data other than ." strings (NOTE: CS=DS and underlying kernel primitives rely on this correspondence! Must be preserved by code word.) ES Extra seg: used as the segment location for the current instruction pointer (IP). (Must be preserved by code word.) SS Stack seg: used as the segment location for the current stack pointer (SP). (Must be preserved by code word.) BP Return Pointer (RP). (Must be preserved by code word.) SP Stack Pointer (SP). (Must be preserved by code word.) SI Instruction Pointer (IP). (Must be preserved by code word.) AX, BX, CX, DX, & DI Scratch registers free to use without restoration. PC Program Counter. Not used by F-PC. DF Direction Flag. Assumed to be 0/increment. Some older FF (or before?) words do an initial CLD (e.g., CMOVE), but this shouldn't be necessary. If you specifically need DF=1, then do: STD ...code... CLD AX is used as a general purpose accumulator. BX is most useful as a base register for indexing into an array. CX is used to hold a count for looping and repeating operations. DX is useful in holding the address of an I/O port. DS:SI pair is used to read memory with auto-indexing, and ES:DI pair is used to write memory with auto-indexing. F-PC uses SP as the data stack pointer and BP as the return stack pointer. SP is convenient in the PUSH and POP instructions, while BP is more convenient in indexing. There are many occasions that you might want to swap SP and BP to use the most effective way to address data on either stack. The F-PC Technical Reference Manual discusses in great details how F-PC itself is constructed based on the 8086 assembly code. If you are interested in squeezing the last drop of blood from PC/XT/AT, be sure to study carefully the Technical Reference Manual and the kernel files in F- PC. 8.5. ADDRESSING MODES The most difficult problem in using 8086 assembler is to figure out the correct addressing mode and code it into an instruction. You can get a good ideal and probably figure out most of the addressing mode syntax from the above table. However, there are cases where the table falls short. Here we will try to summarize the addressing syntax more systematically to show you how F-PC handles addresses in the prefix mode. Register Mode Source or destination is a register in the CPU. The source registers are: AL BL CL DL AH BH CH DH AX BX CX DX SP BP SI DI IP RP CS DS SS ES Destination register specifications are: AL, BL, CL, DL, AH, BH, CH, DH, AX, BX, CX, DX, SP, BP, SI, DI, IP, RP, CS, DS, SS, ES, The register name must be followed by a 'comma', to be recognized by PASM as a destination register. Immediate Mode The argument is assembled as a literal in the instruction. The immediate value must be preceded by the symbol #, which is a word and must be delimited by spaces: MOV AX, # 1234 ADD CL, # 32 ROL AX, # 3 Direct Mode An address is assembled into the instruction. This is used to specify an address to be jumped to or a memory location for data reference. The address is used directly as a 16 bit number. Depending on the instruction, the address may be assembled unmodified or assembled as an eight or 16 bit offset in the branch instructions. To jump or call beyond a 64K byte segment, the address must be preceded by FAR [] . Examples are: CALL FAR []