Brotikasten

home *** CD-ROM | disk | FTP | other *** search

/ Brotikasten / BROTCD01.iso / texte / asm65816.txt next >

Wrap

Text File | 1995-08-20 | 97KB | 2,254 lines

A Proposed Assembly Language Syntax For 65c816 Assemblers by Randall Hyde This is a proposed standard for 65c816 assembly language. The proposed standard comes in three levels: subset, full, and extended. The subset standard is intended for simple (or inexpensive) products, particularly those aimed at beginning 65c816 assembly language programmers. The full standard is the focus of this proposal. An assembler meeting the full level adopts all of the requirements outlined in this paper. The extended level is a mechanism whereby a vendor can claim full compliance with the standard and point out that there are extensions as well. An assembler cannot claim extended level compliance unless it also complies with the full standard. An assembler, no matter how many extensions are incorporated, will have to claim subset level unless the full standard is supported. This ensures that programmers who do not use any assembler extensions can assemble their programs on any assembler meeting the full or extended compliance levels. In addition to the items required for compliance, this proposal suggests several extensions in the interests of compatibility with existing 65c816 assemblers. These recommendations are not required for full compliance with the standard, they're included in this proposal as suggestions to help make conversion of existing programs easier. The suggestions are presented in two levels: recommended and optional. Recommended items should be present in any decent 65c816 package. Inclusion of the optional items is discouraged (since there are other ways to accomplish the same operation within the confines of the standard) but may be included in the assembler at the vendor's discretion to help alleviate conversion problems. 65c816 Instruction Mnemonics ---------------------------- All of the following mnemonics are required at the subset, full, and extended standard levels. The following mnemonics handle the basic 65c816 instruction set: ADC - add with carry AND - logical AND BCC - branch if carry clear BCS - branch if carry set BEQ - branch if equal BIT - bit test BMI - branch if minus BNE - branch if not equal BPL - branch if plus BRA - branch always BRK - break point instruction BVC - branch if overflow clear BVS - branch if overflow set CLC - clear the carry flag CLD - clear the decimal flag CLI - clear the interrupt flag CLP - clear bits in P CLR - store a zero into memory CMP - compare accumulator CPX - compare x register CPY - compare y register CSP - call system procedure DEC - decrement acc or memory DEX - decrement x register DEY - decrement y register EOR - exclusive-or accumulator HLT - halt (stop) the clock INC - increment acc or memory INX - increment x register INY - increment y register JMP - jump to new location JSR - jump to subroutine LDA - load accumulator LDX - load x register LDY - load y register MVN - block move (decrement) MVP - block move (increment) NOP - no operation ORA - logical or accumulator PHA - push accumulator PHP - push p PHX - push x register PHY - push y register PLA - pop accumulator PLP - pop p PLX - pop x register PLY - pop y register PSH - push operand PUL - pop operand RET - return from subroutine ROL - rotate left acc/mem ROR - rotate right acc/mem RTI - return from interrupt RTL - return from long subroutine RTS - return from short subroutine SBC - subtract with carry SED - set decimal flag SEI - set interrupt flag SEP - set bits in P SHL - shift left acc/mem SHR - shift right acc/mem STA - store accumulator STX - store x register STY - store y register SWA - swap accumulator halves TAD - transfer acc to D TAS - transfer acc to S TAX - transfer acc to x TAY - transfer acc to y TCB - test and clear bit TDA - transfer D to acc TSA - transfer S to acc TSB - test and set bit TSX - transfer S to X TXA - transfer x to acc TXS - transfer x to S TXY - transfer x to y TYA - transfer y to acc TYX - transfer y to x WAI - wait for interrupt XCE - exchange carry with emulation bit Comments: CLP replaces REP in the original 65c816 instruction set, since CLP is a tad more consistent with the original 6502 instruction set. See "recommended options" for the status of REP. CLR replaces the STZ instruction. Since STA, STX, and STY are used to store 65c816 registers, STZ seems to imply that there is a Z register. Using CLR (clear) eliminates any confusion. CSP (call system procedure) replaces the COP mnemonic. COP was little more than a software interrupt in both intent and implementation. CSP helps make this usage a little clearer. HLT replaces the STP mnemonic. STP, like the STZ mnemonic, implies that the P register is being stored somewhere. HLT (for halt) is just as obvious as "stop the clock" yet it doesn't have the same "look and feel" as a store instruction. JML and JSL are not really required by the new standard; but see recommended options concerning these two instructions. Most of the new 65c816 push and pull instructions have been collapsed into two instructions: PSH and PUL. PEA label becomes PSH #label PEI (label) becomes PSH label PER label becomes PSH @label PHB becomes PSH DBR PHD becomes PSH D PHK becomes PSH PBR PLB becomes PUL DBR PLD becomes PUL D These mnemonics are more in line with the original design of the 6502 instruction set whereby the mnemonic specifies the operation and the operand specifies the addressing mode and address. The RET instruction gets converted to RTS or RTL, depending on the type of subroutine being declared. RTS and RTL still exist in order to force a short or long return. SHL and SHR (shift left and shift right) are used instead of ASL and LSR. The 6500 family has NEVER supported an arithmetic shift left instruction. The operation performed by the ASL mnemonic is really a logical shift left. To simplify matters, SHL and SHR are used to specify shift left and shift right. SWA (swap accumulator halves) is used instead of XBA. Since this is the only instruction that references the "B" accumulator, there's no valid reason for even treating the accumulator as two distinct entities (this is just a carry-over from the 6800 MPU). Likewise, since the eight-bit accumulator cannot be distinguished from the 16-bit accumulator on an instruction by instruction basis (it depends on the setting of the M bit in the P register), the accumulator should always be referred to as A, regardless of whether the CPU is in the eight or sixteen bit mode. Therefore, instructions like TCD, TCS, TDC, and TSC should be replaced by TAD, TAS, TDA, and TSA. For more info on these new mnemonics, see the section on "recommended options". Built-in Macros --------------- The following instructions actually generate one or more instructions. They are not required at the subset level, but are required at the full and extended levels. ADD - emits CLC then ADC BFL - emits BEQ (branch if false) BGE - emits BCS BLT - emits BCC BTR - emits BNE (branch if true) BSR - emits PER *+2 then BRA (short) or PER *+3 then BRL (long) SUB - emits SEC then SBC Recommended Options ------------------- The following mnemonics are aliases of existing instructions. The (proposed) standard recommends that the assembler support these mnemonics, mainly to provide compatibility with older source code, but does not recommend their use in new programs. Some (or all) of these items may be removed from the recommended list in future revisions of the standard. None of these recommended items need be present at the subset level. If these are the only extensions over and above the full syntax, the assembler CANNOT claim to be an extended level assembler. ASL BRL COP JML JSL LSR PEA PEI PER PHB PHK PHK PLB PLD REP TCD TCS TDC TSC TRB WDM XBA Symbols, Constants, and Other Items ----------------------------------- Symbols may contain any reasonable number of characters at the full level. At the subset compliance level, at least 16 characters should be supported and 32 is recommeded. A "reasonable" number of characters should be at least 64 if the implementor needs a maximum value. Symbols must begin with an alphabetic character and may contain (only) the following symbols: A-Z, a-z, 0-9, "_", "$", and "!". The assembler must be capable of treating upper and lower case alphabetic characters identically. Note that this does not disallow an assembler from allowing the programmer to choose that upper and lower case be distinct, it simply requires that in the default case, upper and lower case characters are treated identically. Note that the standard does not require case sensitivity in the assembler (and, in fact, recommends against it). Therefore, anyone foolish enough (for many, many reasons) to create variables that differ only in the case of the letters they contain is risking port- ability problems (as well as maintenence, readability, and other problems). The following symbols are reserved and may not be redefined within the program: A, X, Y, S, DBR, PBR, D, M, P Nor may these symbol appear as fields to a record or type definition (which will be described later). Constants take six different forms: character constants, string constants, binary constants, decimal constants, hexadecimal constants and set constants. Character constants are created by surrounding a single character by a pair of apostrophes or quotation marks, e.g., "s", "a", '$', and 'p'. If the character is surrounded by apostrophes, then the ASCII code for that character WITH THE H.O. BIT CLEAR will be used. If the quotation marks are used, then the ASCII code for the character WITH THE H.O. BIT SET will be used. If you need to represent the apostrophe with the H.O. bit clear or a quotation mark with the H.O. bit set, simply double up the characters, e.g., '''' - emits a single apostrophe. """" - emits a single quotation mark. String constants are generated by placing a sequence of two or more characters within a pair of apostrophes or quotation marks. The choice of apostrophe or quotation mark controls the H.O. bit, as for character constants. Likewise, to place an apostrophe or quote within a string delimited by the same character, just double up the apostrophe or quotation mark: 'This isn''t bad!' - generates --This isn't bad-- "He said ""Hello""" - generates --He said "Hello"-- Binary integer constants consist of a sequence of 1 through 32 zeros or ones preceded by a percent sign ("%"). Examples: %10110010 %001011101 %10 %1100 Decimal integer constants consist of strings of decimal digits without any preceding characters. E.g., 25, 235, 8325, etc. Decimal constants may be (optionally) preceded by a minus sign. Hexadecimal constants consist of a dollar sign ("$") followed by a string of hexadecimal digits (0..9 and A..F). Values in the range $0 through $FFFFFFFF are allowed. Set constants are only required at the full and extended compliance levels. A set constant consists of a list of items surrounded by braces, e.g., {0,3,5}. For more information, see the .SET directive. Address Expressions ------------------- Most instructions and many pseudo-opcode/assembler directives require operands of some sort. Often these operands contain some sort of address expression (some, ultimately, numeric or string value). This proposed standard defines the operands, precision, accuracy, and available operations that constitutes an address expression. Precision: all integer expressions are computed using 32 bits. All string expressions are computed with strings up to 255 characters in length. All floating point operations are performed using IEEE 80-bit extended floating point values (i.e., Apple SANE routines). All set operations are performed using 32 bits of precision. Accuracy: all integer operations (consisting of two 32-bit operands and an operator on those operands) must produce the correct result if the actual result can fit within 32 bits. If an overflow occurs, the value is truncated and only the low order 32 bits are retained. If an underflow occurs, zero is used as the result. If an overflow or underflow occurs, a special bit will be set (until the next value is computed) that can be tested by the ".IFOVR" and ".IFUNDR" directives. Other than that, such errors are ignored. All arithmetic is performed using unsigned arithmetic operations. All floating point operations follow the IEEE (and Apple SANE) suggestions, and are otherwise ignored by the assembler. Any string operation producing a string longer than 255 characters produces an assembly time error. All set operations must be exact. Integer operations: The following integer operations must be provided at all compliance levels: + (binary) adds the two operands. - (binary) subracts second operand from the first. * multiplies the two operands. / divides the first operand by the second. \ divides the first operand by the second and returns the remainder. & logically ANDs the two operands. | logically ORs the two operands. ^ logically XORs the two operands. = <> These operators compare the two operands (unsigned comparison) and < return 1 if the comparison is true, 0 otherwise. > <= >= - (unary) negates (2's complement) the operand ~ (unary) complements (inverts - 1's complement) the operand The following operators must be provided at the full and extended compliance levels: <- shifts the first operand to the left the number of bits specified by the second operand. -> shifts the first operand to the right the number of bits specified by the second operand. @ (unary) subtracts the location counter at the beginning of the current statement from the following address expression. % (ternary, e.g.: X%Y:Z) This operator extracts bits Y through Z from X and returns that result right justified. Floating point operations: floating point numbers and operations are required only at the full and extended levels. The following operations must be available as well: + adds the two operands. - subtracts the second operand from the first. * multiplies the two operands. / divides the first operand by the second. - (unary) negates the operand. = <> These operators compare the two operands and < return 1 if the comparison is true, 0 otherwise. > <= >= String operations: strings and string operations are not required at the subset level, but the standard recommends their presence. The following string operations must be provided at the full and extended levels: + concatenates two strings % (ternary, e.g., X%Y:Z) returns the substring composed of the characters in X starting at position Y of length Z. Generate an error if X doesn't contain sufficient characters. = <> These operators compare the two operands and < return 1 if the comparison is true, 0 otherwise. > <= >= Set operations: sets and set operations are required only at the full and extended levels. The following set operations must be provided: + union of two sets (logical OR of the bits). * intersection of two sets (logical AND of the bits). - set difference (set one ANDed with the NOT of the second set) = returns 1 if the two sets are equal, zero otherwise. <> returns 1 if the two sets are not equal, zero otherwise. < returns 1 if the first set is a proper subset of the second. <= returns 1 if the first set is a subset of the second. > returns 1 if the first set is a proper superset of the second. >= returns 1 if the first set is a superset of the second. % (ternary, e.g., X % Y:Z) extracts elements Y..Z from X and returns those items. In addition to the above operators, several pre-defined functions are also available. Note that these functions are not required at the subset compliance level, only at the full and extended levels: float(i) - Converts integer "i" to a floating point value. trunc(r) - Converts real "r" to a 32-bit unsigned integer (or generates an error). valid(r) - returns "1" if r is a valid floating point value, 0 otherwise (for example, if r is NaN, infinity, etc.) length(s)- returns the length of string s. lookup(s)- returns "1" if s is a valid symbol in the symbol table. value(s) - returns value of symbol specified by string "s" in the symbol table. type(s) - returns type of symbol "s" in symbol table. Actual values returned are yet to be defined. mode(a) - returns the addressing mode of item "a". Used mainly in macros. STR(s) - returns string s with a prefixed length byte. ZRO(s) - returns string s with a suffixed zero byte. DCI(s) - returns string s with the H.O. bit of its last char inverted. RVS(s) - returns string s with its characters reversed. FLP(s) - returns string s with its H.O. bits inverted. IN(v,s) - returns one if value v is in set s, zero otherwise. The following integer functions must be present at all compliance levels: LB(i), LBYTE(i), BYTE(i) - returns the L.O. byte of i. HB(i), HBYTE(i) - returns byte #1 (bits 8-15) of i. BB(i), BBYTE(i) - returns bank byte (bits 16-23) of i. XB, XBYTE(i) - returns H.O. byte of i. LW(i), LWORD(i), WORD(i) - returns L.O. word of i. HW(i), HWORD(i) - returns H.O. word of i. WORD(i) Pack(i,j)- returns a 16-bit value whose L.O. byte is the L.O. byte of i and whose H.O. byte is the L.O. byte of j. Pack(i,j,k,l)- returns a 32-bit value consisting of (i,j,k,l) where i is the L.O. byte and l is the H.O. byte. Note: l is optional. If it isn't present, substitute zero for l. The order of evaluation for an expression is strictly left to right unless parentheses are used to modify the precedence of a sub-expression. Since parentheses are used to specify certain indirect addressing modes, the use of paretheses to override the strict left-to-right evaluation order introduces some ambiguity. For example, should the following be treated as jump indirect through location $1001 or jump directly to location $1001? JMP ($1000+1) The ambiguity is resolved as follows: if the parenthesis is the first char- acter in the operand field, then the indirect addressing mode is assumed. Otherwise, the parentheses are used to override the left-to-right precedence. The example above would be treated as a jump indirect through location $1001. If you wanted to jump directly to location $1001 in this fashion, the state- ment could be modified to JMP 0+($1000+1) so that the parenthesis is no longer the first character in the operand field. The use of parentheses to override the left-to-right precedence is only required at the full and extended compliance levels. It is not required at the subset compliance level. Expression Types ---------------- Expressions, in addition to having a value associated with them, also have a specific type. The three basic types of expressions are integer, floating point, and string expressions. Integer expressions can be broken down into subtypes as well. A hierarchical diagram is the easiest way to describe integer expressions: integers ------ constants ------------ user defined (enumerated) types | | | +----- simple numeric constants | | +-- addresses ------------ direct page addresses | +----- absolute addresses --- full 16-bit | | | +- relative 8-bit | +----- long addresses This diagram points out that there are two types of integer expres- sions: constants and addresses. Further, there are two types of constants and four types of addresses. Before discussion operations on these different types of integer values, their purpose should be presented. Until now, most 65xxx assembler did little to differentiate between the different types of integer values. In this proposed standard, however, strong type checking is enforced. Whereas in previous assemblers you could use the following code: label equ $1000 lda #Label sta Label such operations are illegal within the confines of the new standard. The problem with this short code segment is that the symbol "label" is used as both an integer constant (in the LDA instruction) and as an address expression (in the STA instruction). To help prevent logical errors from creeping into a program, the assembler doesn't allow the use of addresses where constants are expected and vice versa. To that end, a new assembler directive, CON, is used to declare constants while EQU is used to declare an (absolute) address. Symbols declared by CON cannot be (directly) used as an address. Likewise, symbols declared by EQU (and others) cannot be used where a constant is expected (such as in an immediate operand). Although this type checking can be quite useful for locating bugs within the source file, it can also be a source of major annoyance. Some- times (quite often, in fact) you may want to treat an address expression as a constant or a constant expression as an address. Two functions are used to coerce these expressions to their desired form: PTR and OFS. PTR(expr) converts the supplied constant expression to an address expression. OFS(expr) converts the supplied address expression to a constant expression. The following is perfectly legal: Cons1 CON $5A DataLoc EQU $1000 lda #OFS(DataLoc) sta PTR(Cons1) For more information, see the section on assembler directives. PTR and OFS are required at all compliance levels of this proposed standard. While any constant value may be used anywhere a constant is allowed, the 65c816 microprocessor must often differentiate between the various types of address expressions. This is particularly true when emitting code since the length of an instruction depends on the particular address expression. If an expression contains only constants, direct page values, absolute values, or long values, there isn't much of a problem. The assembler uses the specified type as the addressing mode. If the expression contains mixed types, the resulting type is as follows: Expression contains: Result is: | | | | +------------+-- Constants - Constant | | +-- Direct | - Direct | +--+ Absolute - Absolute | +--+- Long - Long Allowable forms: constant direct constant+direct absolute constant+absolute long constant+long absolute+long constant+absolute+long This says that if you expression contains only constants, then the result is a constant. If it contains a mixture of constants and direct page addresses, the result is a direct page address. Note that direct page addresses cannot be mixed with other types of addresses. An error must be reported in this situation (although you could get around it with an expression of the form "abs+OFS(direct)"). Likewise, adding a constant to an absolute address produces an absolute address. Adding an absolute and a long address produces a long address, etc. Sometimes, you need to force an expression to be a certain type. For example, the instruction "LDA $200" normally assembles to a load absolute from location $200 in the current data bank. If you need to force this to location $200 in bank zero, regardless of the content of the DBR, the address expression must be coerced to a long address. Coercion of this type is accomplished with the ":D", ":A", ":L", and ":S" expression suffixes. To force "LDA $200" to be assembled using the long address mode, the in- struction is modified to be "LDA $200:L". The coercion suffix must always follow the full address expression. The ":S" (for short branches) suffix is never required, since a short branch (for BRA and BSR) is always assumed, but it is included for completeness. For BRA and BSR, the ":L" suffix is used to imply a long branch (+/- 32K) rather than the long addressing mode. Caveats: If ":D" or ":A" is used to coerce a large address expression to direct or absolute, the high order byte(s) of the expression are truncated and ignored. The assembler must assume that when a programmer uses these constructs he knows exactly what he's doing. Therefore, "LDA $1001:D" will happily assemble this instruction into a "LDA $01" instruction despite the actual value of the address expression. Addressing Mode Specification ----------------------------- 65c816 addressing modes are specified by certain symbols in the op- erand field. A quick rundown follows: Addressing mode Format(s) Example(s) --------------- ------------------ ---------------------- Immediate #<expression> LDA #0 =<expression> CMP =LastValue Direct Page <expression> LDA DPG <expression>:D LDA ANY:D Absolute <expression> LDA ABS <expression>:A LDA ANY:A Long <expression> LDA LONG <expression>:L LDA ANY:L Accumulator {no operand} ASL INC Implied {no operand} CLC SED Direct, Indirect, Indexed by Y (<direct expr>),Y LDA (DPG),Y (<direct expr>).Y LDA (ANY:D).Y Direct, Indirect, Indexed by Y, Long [<direct expr>],Y LDA [DPG],Y [<direct expr>].Y LDA [DPG].Y Direct, Indexed by X, Indirect (<direct expr>,X) LDA (DPG,X) (<direct expr>.X) LDA (ANY:D.X) Direct, Indexed by X <direct expr>,X LDA DPG,X <direct expr>.X LDA DPG.X Direct, Indexed by Y <direct expr>,Y LDX DPG,Y <direct expr>.Y LDX DPG.Y Absolute, Indexed by X <abs expr>,X LDA ABS,X <abs expr>.X LDA ANY:A.X Long, Indexed by X <long expr>,X LDA ANY:L,X <long expr>.X LDA LONG.X Absolute, Indexed by Y <abs expr>,Y LDA ANY:A,Y <abs expr>.Y LDA ABS.Y Program Counter Relative (branches) <expression> BRA ABS @<expression> BRA @ABS PC Relative (PSH) @<expression> PSH @ABS Absolute, Indirect (<abs expr>) JMP (ABS) Absolute, Indexed, Indirect (<abs expr>,X) JMP (ABS,X) (<abs expr>.X) JMP (ABS.X) Direct, Indirect (<dpg expr>) LDA (DPG) STA (ANY:D) Stack Relative <expr8>,S LDA 2,S <expr8>.S LDA 2.S Stack Relative, Indirect, Indexed (<expr8>,S),Y LDA (2,S),Y (<expr8).S),Y LDA (2.S),y (<expr8),S).Y LDA (2,S).y (<expr8).S).Y LDA (2.S).y Block Move <long expr>,<long expr> MVN LONG,LONG MVP LONG,LONG <dpg expr>, DPG- Any direct page expression or symbol. <abs expr>, ABS- Any absolute expression or symbol. <long expr>, Long- Any long expression or symbol. expr8- Any expression evaluating to a value less than 256. Note: the only real difference between the existing standard and the proposed standard is that the period (".") can be used to form an indexed address ex- pression. This is compatible (in practice, as well as philosophy) with the record structure mechanism supported by this proposed standard. This syntax for the various addressing modes is required at all compliance levels. Suggestion: (<dpg expr>):L, (<dpg expr>):L,Y, and (<dpg expr):L.Y should be allowed as substitutes for [<dpg expr>], [<dpg expr>],Y, and [<dpg expr].Y, respectively. This, however, is not required by this proposed standard. Assembler Directives and Pseudo-Opcodes --------------------------------------- An assembler directive is a message to the assembler to change some status or otherwise affect the assembly operation. It does not generate any object code. A pseudo-opcode, on the other hand, is not a standard 65c816 instruction but does generate object code. Examples of assembler directives include instructions that turn the listing on or off, define procedures, equate labels to values, etc. Examples of pseudo-opcodes include instructions like .BYTE which emit bytes of object code based on the instruction's parameters. Equates: -------- Probably the most important assembler directives are the equates. The equate directives let you associate a value and a type with a symbol. The possible equates use the syntax: <label> .EQU <16-bit value> <label> .EDP <8-bit value> <label> .EQL <24-bit value> <label> .CON <32-bit value> <label> .FCON <SANE floating point value> All except .FCON are required at all compliance levels. .FCON is required at the full and extended levels. .EQU lets you define a absolute symbol; an address whose value is relative to the DBR. An error should be generated if the value in the operand field requires more than 16 bits. The type of the operand expression is ignored. It may be a constant expression, a direct page expression, or even a long address expression. As long as it's an integer expression an can fit into 16 bits, it's quite acceptable. .EDP (equate to direct page) is used to define direct page symbols. Again, the operand field may be of any integer type as long as the result fits into 8 bits. A recommended synonym for .EDP is .EPZ (equate to page zero) in deference to the 6502's zero page addressing mode. .EQL (equate long) defines long address expressions. As usual, the operand field may contain any integer expression that fits within 24 bits. .CON (constant) is used to define integer numeric constants. Any 32 bit numeric value may be specified in the operand field. .FCON (floating point constant) is used to declare symbolic floating point constants. Such constants must be stored in the symbol table as 80-bit SANE extended values. In addition to the typed equates, this proposed standard also allows an untyped equate, which takes the form: <label> = <operand> where "<operand>" is any valid operand that may appear in the operand field of any instruction. <operand>'s type may be integer, string, floating point and may also include an addressing mode. The following are all legal: lbl = 5 lbl = 5.5 lbl = "Five" lbl = Array,X lbl = (dp,s),y Labels defined by "=" may appear anywhere the operand field specified for that label is allowed. In general, a simple string substitution should be performed when a label defined by "=" is used. Note: a label declared by "=" can be redefined without error throughout the program. The "=" directive is required only at the full and extended compliance levels. Data Definitions: ----------------- While the equates are probably the most important assembler directives, the data definition instructions are probably the most important pseudo-opcodes around. These instructions are classed into four groups determined by the types of operands they accept. In the following paragraphs all optional items are enclosed within braces. The first group of data reservation instructions accept any integer type expression as operands. They are: {label} .BYTE {expr1, expr2, ..., exprn} {label} .WORD {expr1, expr2, ..., exprn} {label} .LONG {expr1, expr2, ..., exprn} If a label is present, it is treated as a statement label within the current segment and assigned the value of the location counter before any bytes are emitted. For the .BYTE opcode, one byte of data is emitted for each operand in the operand field, that byte being the L.O. byte of each expression. Operands are purely optional. If no operand appears, then an indeterminate value is emitted. The .WORD opcodes outputs two bytes for each expression in the operand field (or two indeterminate bytes if no operand is present). The .LONG instruction outputs four bytes for each operand. These three pseudo- opcodes must be present at all compliance levels. The next group of pseudo-opcodes are used to create tables of addresses. As such, they only allow symbols that have been defined by .EQU, .EQL, "=" (as applicable), statement labels, procedure labels, and segment labels in their operand fields. They are: {label} .OFFS expr1 {,expr2, ..., exprn} {label} .ADRS expr1 {,expr2, ..., exprn} {label} .PTR expr1 {,expr2, ..., exprn} .OFFS outputs two bytes for each operand; .ADRS outputs three bytes for each operand; and .PTR outputs four bytes for each operand. These three pseudo-opcodes are only required at the full and extended compliance levels. The third group of declarations are used to create constant tables. As such, they only allow symbols declared by .CON. They are: {label} .SHORT expr1 {,expr2, ..., exprn} {label} .INTEGER expr1 {,expr2, ..., exprn} {label} .LONGINT expr1 {,expr2, ..., exprn} These pseudo-ops output one, two, and four bytes respectively. These pseudo-opcodes are not required at the subset compliance level, they are required only at the full and extended levels. Note: non-symbolic constants are allowed in any of the above pseudo-opcodes. Only symbols should have their type information checked. The last group of data declaration pseudo-opcodes are used to initialize floating point values. These pseudo-ops are: {label} .FLOAT {item1, item2, ..., itemn} {label} .DOUBLE {item1, item2, ..., itemn} {label} .EXTENDED {item1, item2, ..., itemn} {label} .COMP {item1, item2, ..., itemn} each instruction generates operands of 4, 8, 10, or 8 bytes in length, respectively. If the operand field is left blank, the corresponding bytes contain an indeterminate value, but the assembler should initialize them to NaN (not a number). These four pseudo-opcodes are required only at the full and extended levels. Although not required by the standard, the following data declaration directives are recommended and should be supported: {label} .HBYTE expr1 {,expr2, ..., exprn} {label} .BBYTE expr1 {,expr2, ..., exprn} {label} .XBYTE expr1 {,expr2, ..., exprn} {label} .HWORD expr1 {,expr2, ..., exprn} the first three reserve one byte of memory for each operand and store the H.O (bits 8-15), bank (bits 16-23), or extra byte (bits 24-31) respectively. .HWORD reserves two bytes composed of bits 16-31 for each operand. Arrays: ------- Space for arrays and data tables can be reserved using the data declaration statement mentioned above in conjunction with the "DUP" operator. DUP is a binary operator that takes the form: count DUP (list) where count is some constant value and list is a (possibly empty) list of values. The items in (list) are repeated "count" times. For example, the following .BYTE statement reserves space for an array of 64 bytes and initializes each byte to zero: MyArray .BYTE 64 DUP (0) The following statement reserves 256 bytes consisting of the values 1, 2, 3, 4, 5, 6, 7, and 8 repeated 32 times: MyArray .BYTE 32 DUP (1,2,3,4,5,6,7,8) The DUP operator is fully recursive. That is, one of the items in the list may, itself, be a list defined by the DUP operator. For example, Example .BYTE 16 DUP (0,1,2 DUP (3,4,5)) reserves 128 bytes consisting of the list "0,1,3,4,5,3,4,5" repeated 16 times. If the DUP list is empty, e.g., "16 dup ()", then exactly one item is reserved for each entry, but it is not initialized. The following example reserves space for 128 uninitialized words: OffsetTable .WORD 128 DUP () Type definitions: ----------------- Enumerated data types can be declared with the ".TYPE" directive. This directive takes the form: {label} .TYPE item1 {,item2, ..., itemn} The items in the list are assigned consecutive values starting from zero. For example, in the following .TYPE statement, the symbols red, green, and blue are assigned the values zero, one, and two, respectively: colors .TYPE red,green,blue The symbols in the operand field of a .TYPE statement must be unique and undefined elsewhere (within the current scope, more on that later). The .TYPE statement above is almost identical to the statements: red .con 0 green .con 1 blue .con 2 However, there is one major difference. The .TYPE statement also defines a symbol specified in the label field. This symbol can be used as a pseudo- opcode to reserve space for values of the specified type. In the example above, "colors" could be used as a pseudo-opcode to reserve space for the values red, green, and blue. To differentiate type declarations from other instructions, a special lead-in character is used. The slash ("/") is recommended by this standard, but the user should have the option of choosing this character via a setup program for the assembler. From the example above, colors could be used as a pseudo-opcode in the following manner: Christmas /colors red,green Ocean /colors blue,green Sky /colors blue /colors red Primaries /colors red,blue,green Unlike other data reserving pseudo-opcodes, a "/colors" definition only allows symbols that appear in the operand field of the associated .TYPE statement or one of those symbols in a expression that contains a single such symbol plus or minus a numeric constant, as long as the result is still within the range of symbols declared for that type. E.g., Okay /colors red,green+1,blue NotOkay1 /colors blue+2 ;Outside allowable range NotOkay2 /colors red+blue ;can't add two such symbols NotOkay3 /colors $25 ;Not red, green, or blue If you need to coerce an expression to the proper form, simply use the type name as a pseudo-function. E.g., ThisIsOkay /colors colors(0),blue ;Same as red, blue If the operand is not appropriate, the assembler should generate a warning and emit the code as though the .BYTE statement were used. If there isn't a label starting in column one of a .TYPE statement then the symbols defined in the operand field are applied to the previous .TYPE statement. This allows you to create .TYPEs where several symbols (which couldn't possibly fit on a single line) are declared as constants. E.g., colors .TYPE red, yellow, blue .TYPE orange, green violet .TYPE brown, black, white All of these symbols will be associated with "colors". A maximum of 256 symbols can be associated with a symbol via the .TYPE statement. Whenever the data reservation form is used, exactly one byte is reserved for each item in the operand field. If you need to reserve more than a single byte for each item, use the record declarations described next. The DUP operator can be used to define enumerated data type arrays, e.g., LotsOfRed /colors 16 DUP (red) Another form of the .TYPE statement allows you to declare byte subrange values. A definition of this type takes the form: label .TYPE start..stop where start and stop are constant values in the range 0..255 and start <= stop. Examples: LessThan10 .TYPE 0..9 Nibbles .TYPE 0..$F PrimaryColors .TYPE red..blue ;From above, is red, yellow, blue Implementation of the .TYPE statement is required only at the full and extended compliance levels. Records: -------- A record data structure can be defined with the ".RECORD" and ".ENDR" directives using the syntax: label .RECORD <data declarations> .ENDR This creates a template, but does not generate any code. An example might be: CursorPosn .RECORD ROW .BYTE 0 COLUMN .BYTE 0 .ENDR This definition creates the type "CursorPosn". Like the .TYPE definitions, the symbol defined by .RECORD can be used as a pseudo-opcode to reserve storage for a variable. For example, to declare a variable of type "CursorPosn" the following statement is used: MyCursor /CursorPosn This statement reserves two bytes, initialized to zeros, at the current location counter. Access to the fields of the record is accomplished by using the "." operator, just like Pascal. E.g., lda MyCursor.ROW ;Fetches first byte. lda MyCursor.COLUMN ;Fetches the second byte. In the example above, the ROW and COLUMN fields of each variable declared with CursorPosn are always initialized to zero. Any other value could have been used by substituting the appropriate value, or an indeterminate value could have been specified by the definition: CursorPosn .RECORD ROW .BYTE COLUMN .BYTE .ENDR On occasion, you may want each record variable definition to specify the initial values. This can be accomplished by specifying parameters in the record definition. Parameters are specified by the symbols: ?0, ?1, ..., ?9. ?0 corresponds to the first parameter, ?1 to the second, etc. Consider the following record and variable definitions: CursorPosn .RECORD ROW .BYTE ?0 COLUMN .BYTE ?1 .ENDR HomePosn /CursorPosn 0,0 LowerRight /CursorPosn 23,79 MyCursor /CursorPosn 5,10 The only problem with this definition form is that each CursorPosn variable must supply exactly two operands. Sometimes you may want to have a default value in the event an operand isn't specified. This is accomplished using a record defintion of the form: CursorPosn .RECORD ?0=0,?1=0 ROW .BYTE ?0 COLUMN .BYTE ?1 .ENDR This definition instructions the assembler to allow zero or more parameters, defaulting ?0 and ?1 to zero if their respective entries aren't present. The .DEFAULT directive can also be used, particularly if you run out of room on the .RECORD line: OpenRec .RECORD ?0=0, ?1=1 .DEFAULT ?2=ZRO('Hello there'), ?3=2 FirstItem .WORD ?0 .LONG ?3 SecondItem .BYTE ?1, ?2 .ENDR Record definitions are required at the full and extended compliance levels, they are not required at the subset compliance level. Sets: ----- Bit string types can be declared using the .SET directive. .SET is used in a manner quite similar to .TYPE except the items in the operand field can be any constant whose value is less than 32. Up to 32 items may appear in the operand field of a .SET definition. The syntax is label .SET item1 {,item2, ..., itemn} ;n <= 32. An alternate form is to specify the name of some type variable in the operand field. The following definition creates a set of integers in the range 0..9: LessThan10 .TYPE 0..9 SetOfDigits .SET LessThan10 Declaring a set variable is quite similar to declaring an enumerated type variable or a record variable: simply use the set name as a pseudo-opcode prefaced by a "/": Digits /SetOfDigits Set constants are specified by placing the items in the set within a pair of braces. E.G.: BitValues .TYPE 0..7 SetOfBitValues .SET BitValues Bits /SetOfBitValues {0,1,2,3} ; ; lda #{0,2,7} sta Bits The assembler, by default, should allow set constants composed of the integer values 0..31. This allows programmers to easily deal with bits by bit numbers rather than the integers those bit patterns represent. For example, to strip all but the H.O. two bits in the (8-bit) accumulator, the instruction "AND #{6,7}" makes a lot more sense than "AND #$C0". All other entities appearing within "{" and "}" must appear somewhere in the operand field of a .SET statement (or must be a member of a .TYPE definition if that type appears in the operand field of a .SET). Macros: ------- Macros are created using the .MACRO and .ENDM directives. The syntax for a macro definition is label .MACRO {default parameter values} <macro body> .ENDM Macros are invoked by placing an underscore, followed by the macro name (the label in the .MACRO statement). The user should be able to change the macro lead-in character from underscore to some other character via an assembler set up program. All labels declared within the macro are local to that definition unless the ".GLOBAL" directive is used to extend their scope. In general, global macro labels (except, possibly, those defined by "=") are not useful anyway since a duplicate label error might occur on the second invocation of the macro. The macro body consists of a sequence of assembler statements. Most reasonable statements may be included in the macro body. The standard does not required nested macro definitions. Nor need the macro definitions allow .RECORD, .TYPE, or .SET definitions (since labels are local to the macro, such definitions are dubious anyway). Macro parameters are specified using ?0, ?1, ..., ?9, just as for .RECORD definitions. "?#" can be used to determine the actual number of parameters present. "?:expr" can be used to select a parameter using a numeric expression. For example, "?:?#-1" returns the value of the last parameter specified. Default values for the parameters can be specified in the .MACRO operand field, or in a .DEFAULT statement, just like specifying default values for .RECORD parameters. E.g., MyMacro .MACRO ?0=0, ?1=2 .DEFAULT ?2="Hello there" .BYTE ?0 .WORD ?1 .BYTE ?2 .ENDM then: _MyMacro 10,20 generates the bytes: 10, 20, 0, Hello there Macros, by the very nature, allow a variable number of parameters. If more parameters are specified than there are references for, the extra parameters are ignored. If fewer parameters are specified than there are references for, the additional references will be treated as undefined symbols. If you want to be able to force the user to enter an exact number of parameters, then use the ?# in the default field to specify a fixed number of parameters. The following macro definition requires the user to enter exactly two parameters whenever TwoParms is invoked: TwoParms .MACRO ?#=2 lda ?0 sta ?1 .ENDM If the number of parameters is fixed at a certain value, default values are not allowed in the macro definition. Since macro parameters, in a macro invocation, are separated by commas, you cannot directly create a macro of the form: LDAIX .MACRO ?#=1 lda ?0 .ENDM and invoke it by: _LDAIX LBL,X intending the "LDA LBL,X" instruction to be generated. Instead, the macro mechanism will think that LBL and X are two different parameters and generate an error since only a single parameter is allowed. The "<<" and ">>" symbols are used as an escape mechanism to parenthesize such operands. To handle the case above, the following statement could be used: _LDAIX <<LBL,X>> and this would generate the instruction "LDA LBL,X". The lookup, value, type, and mode functions are quite useful for dealing with macro parameters. The exact values returned by these functions will be described at a later time. For additional information on macros and dealing with macro para- meters, see the sections on conditional assembly and while loops. Macros are required only at the full and extended compliance levels. Address Expression Functions: ----------------------------- Format: label .FUNC {default parameter values} <function body> .RETURN expr .ENDF The .FUNC statement lets programmers define their own address expression functions that can be used in operand fields of assembly language statements. The function body typically contains a sequence of equates and other value computing statements; it may not contain any code generating statements. Like a macro definition, all symbols defined inside an address expression function are local to that function. Likewise, default parameters may be declared in the operand field of the .FUNC statement or via the .DEFAULT statement. Alternately, you can specify that a fixed number of parameters are required by using the "?#=expr" item in the operand field of the .FUNC statement. The expression following the .RETURN statement is the value returned by the addressing mode function. Note that more than one .RETURN may appear within the function (perhaps within the confines of a conditional assembly sequence). If more than one .RETURN statement is encountered, all but the last are ignored. The expression returned in the .RETURN operand field may contain addressing modes in addition to the actual expression value. In general, anything allowed as a macro parameter can be returned as an address expression value. An address expression function is invoked by placing the function name in some other expression followed by the parameters enclosed within parentheses. The parentheses are required even if the parameter list is empty (just like the "C" programming language). Examples follow: StripLONibble .FUNC ?#=1 value = ?0 AND $F0 .RETURN value .ENDF ; AppendTXT .FUNC ?#=1 string = ?0 + ".TXT" .RETURN string .ENDF ; . . . LDA #StripLONibble($FF) . . . .BYTE AppendTxt("MyString") The LDA instruction generates LDA #$F0, the .BYTE statement becomes .BYTE "MyString.TXT" The latter example demonstrates that address expression functions can return any valid type. This includes strings, records, sets, and any other entity allowed in an operand field. Consider the following: LBLX .FUNC ?#=2 L = ?0-?1,X .RETURN L .ENDF LDA LBLX($100,10) This generates the code: LDA $100-10,X Address expression functions are required only at the full and extended compliance levels. The Label Type -------------- The ".LABEL" directive is used to declare a valueless symbol, that is, one which is defined but is assigned no particular value. The syntax for the .LABEL directive is: .LABEL symbol1 {, symbol2, ..., symboln} Each symbol appearing in the operand field is inserted into the symbol table as a "label" typed symbol. Label-typed symbols are useful mainly in macros and in the operand fields of conditional assembly statements. The only operations you can perform using label-typed symbols are "=" and "<>". Most of the reserved symbols in the assembler (such as A, X, Y, DBR, D, M, S, etc.) are actually label-typed symbols. An example of where you might use a label-typed symbol follows: CmpReg .MACRO ?#=2 .IF ?0=A cmp ?1 .ELSE .IF ?0=X cpx ?1 .ELSE .IF ?0=Y cpy ?1 .ELSE .PAUSE .ENDIF .ENDIF .ENDIF .ENDM The "=" equate can also be used to defined label-typed symbols by specifying a label-typed symbol in the operand field, e.g., ACC = A XReg = X etc. Note that the last equate above does not allow you to enter indexed by X addressing modes as <expression>,XReg it simply allows you to use a statement of the form: .IF XReg=X and wind up assemblying the code after the ".IF". The ".LABEL" directive is required at the full and extended compliance levels; it is not required at the subset compliance level. Procedures: ----------- At the full and extended compliance levels, the .PROC and .ENDP directives can be used to declare 65c816 procedures (subroutines). Procedure declarations take the form: procname .PROC {near|far} <procedure body> .ENDP If an operand appears after the .PROC statement, it must be either "near" or "far". If no operand appears, "near" is assumed. The procedure name that appears in the label field of the .PROC statement is assigned the current value of the location counter at that point in the program. It is also given the type of near procedure or far procedure, depending upon the .PROC operand field. All labels defined inside a procedure are local to that procedure unless the .GLOBAL directive is used to extend their scope beyond the procedure. Therefore, labels inside one procedure may be reused outside that procedure. If a label inside a procedure is already defined outside that procedure an error is not generated, instead the new label supercedes the old one INSIDE THE PROCEDURE (scoping rules are the same as for Pascal). Procedures may be nested inside one other, the scoping rules used by Pascal apply in such situations. Inside the procedure, RET can be used in place of RTS or RTL. The assembler will automatically choose the appropriate version depending upon whether the procedure is a near or far procedure. If RTS is used inside a FAR procedure or RTL is used inside a NEAR procedure, the assembler will generate a warning. The assembler automatically assembles JSR using the absolute or long addressing mode depending upon the procedure definition. If the assembler supports the JSL mnemonic and a JSL is used to call a NEAR procedure, the assembler must generate an warning. If the address expression following a JSR was coerced using the ":A" or ":L" suffixes, no warning will be generated if the incorrect distance was specified. I.e., the following does NOT generate an error: JSR mysub:L . . . mysub .PROC NEAR . . . If you use a coercion operator, the assembler assumes that you know what you are doing. Note that the use of the .PROC statement is optional. You may con- tinue to build and call subroutines without the .PROC directive. However, using .PROC allows the assembler to perform additional type checking on certain operations. An external data flow analysis program can also use the procedure declarations to help locate logical bugs in your code. .PROC and .ENDP are required at all compliance levels of the standard. Module Communication Directives: -------------------------------- Three directives, .GLOBAL, .PUBLIC, and .EXTERNAL, are used to communicate symbolic values across procedure, segment, and module boundaries (a module is any one source file which is assembled as a whole unit). The .GLOBAL directive is used to make symbols visible outside of procedures, macros, functions, and records. The .PUBLIC directive is used to make certain symbols visible outside the current module. The .EXTERNAL directive is used to make symbols defined outside the current module visible within the module. The syntax for the .PUBLIC and .GLOBAL directives is identical, it takes the form: .PUBLIC symbol1 {,symbol2, ..., symboln} and, .GLOBAL symbol1 {,symbol2, ..., symboln} A label is not allowed in the label field of either mnemonic. The symbols specified in the operand field of these two instructions are made known outside the procedure or module where they currently reside. If a procedure is nested inside another, the .GLOBAL statement makes its symbols known only to the procedure encompassing the nested procedure. In the following example, LCL is known only inside procedure X1 and X2, not to the whole program: X1 .PROC . . X2 .PROC .GLOBAL LCL . . .ENDP .ENDP If you wanted to make LCL visible at the level above X1, then another .GLOBAL statement must appear inside the X1 procedure declaring LCL to be global to that procedure. Another alternative is to use the .PUBLIC statement. Any symbol declared public with .PUBLIC is instantly visible throughout the program (within the confines of the scoping rules). However, keep in mind that symbols declared as public are visible outside the current module as well and may intefere with other modules. The .EXTERNAL directive is used to obtain access to symbols declared outside the current module. The syntax for the .EXTERNAL directive is: .EXTERNAL symbol1:type {,symbol2:type, ..., symboln:type} Again, no label is allowed in the label field of the .EXTERNAL directive. The type item is any of NEAR, FAR, CONST, DIRECT, ABS, or LONG. Note: symbols declared with "=", .MACRO, .RECORD, .SET, and .TYPE may not appear as operands to the .GLOBAL, .PUBLIC, or .EXTERNAL directives. These directives are not required at the subset compliance level, only at the full and extended levels. Segments: --------- Segments are used to group a collection of logically and physically related entities within a program. A segment may contain the program code, variables, stack area, direct page area, or other such data. Typically a segment is a load module. That is, a segment is loaded as a whole into memory. If a program consists of two or more segments, they need not all reside in memory at the same time. The memory manager/loader may load segments as needed into memory. Segment definitions are required at all compliance levels. All programs must consist of at least one segment (this is a source of minor incompatibility with existing assemblers). The most general form of the segment definition is: label .SEGMENT TYPE=expr {,ALIGN=expr} {,ORG=expr} {,NOCODE} <segment body> .ENDS .SEGMENT lets you declare any general type of segment. The symbol in the label field need not be unique, but if it is redefined elsewhere within the current scope, it must appear on a .SEGMENT definition whose type is exactly the same as the current definition. Unlike .PROCs, .MACROs, etc., symbols defined inside a segment are not local to the segment, but are instantly visible to the reset of the module. If you need to declare local variables within a segment, use the .LOCAL and .RELEASE directives. The type of segment must be specified in the .SEGMENT operand field. The actual segment types will be defined at a later date. For now, assume the types used by the Apple //GS loader are specified after the TYPE= item. The segment type describes the attributes of the segment, attributes such as whether the segment is relocatable or absolute, fixed or movable, etc. The optional ALIGN operand is used to determine some number of bytes to which this segment (portion) must be aligned. If ALIGN=1 , the segment will be aligned on any byte boundary. If ALIGN=2 then the segment will be aligned on a word boundary, etc. Any value between 1 and $10000 can be used (ALIGN=$10000 will align the segment on a bank boundary). The ORG=expr option can be used to fix the starting address of the segment. This option isn't normally used with code-generating segments. It's mainly used to define I/O port addresses and other absolute variables. The NOCODE option is used to declare that a segment will not generate any code (i.e., it's just used to declare variables). If any 65c816 instruct- ion appears in a NOCODE segment, an error will be generated. All data declaring pseudo-opcodes (e.g., .BYTE) must specify indeterminate values else an error will be reported. If multiple segments with the same name appear in a module (or across modules, for that matter), they will be combined into a single, contiguous module by the assembler and/or linker. Consider the following: MyCode .SEGMENT Type=$1AF . . . .ENDS ; MyData .SEGMENT Type=$100 . . . .ENDS ; MyCode .SEGMENT Type=$1AF . . . .ENDS Although MyCode appears in two completely disjoint areas, the assembler/linker will combine these items into a single segment. Segments appear in the load module in the order they are declared in the source file. In the example above, segment MyCode appears before segment MyData (even though a portion of MyCode appears after MyData, MyCode was still declared before MyData). Segments may be nested, but they don't follow any scoping rules. Declaring one segment inside another is no different that declaring those two segments completely separate. If you have two separate segments (different names but the same type), you can combine them together using the .GROUP directive. This directive takes the form label .GROUP seg1, seg2 {,seg3, ..., segn} Referring to "label" refers to the segment obtained by combining the segments in the .GROUP operand field. To simply segment usage, there are six predeclared segments. They may be declared with the directives: .CODE .DATA .DIRECT .STACK .VAR .CONST .CODE is used to declare static, code-generating segments which allow 65c816 instructions. .DATA is used to declare static data-generating segments. .CONST is identical to .DATA except data items inside the .CONST directive are read-only. Any attempt to write to items inside a .CONST segment should generate an error by the assembler or data flow analysis programs. .DIRECT is used to declare segments containing direct page variables. This is a NOCODE segment, so only definitions are allowed, initial values are illegal. .STACK segments are also NOCODE segments. They are useful for declaring stack space down in bank zero. The .VAR segment is used like the .DATA segment, except .VAR segments are NOCODE segments. They are used for declaring unintialized variables in main RAM. The syntax for these six directives is label .xxxx {ALIGN=expr | ORG=expr} <segment body> .ENDS The ASSUME Directive -------------------- With the addition of the bank registers and the mode bits in the 65c816 processor, an assembler can no longer determine the proper addressing mode to use in all circumstances without help from the programmer. For example, if the assembler encounters an instruction of the form "LDA Label" and Label is a statement label inside some segment (i.e., not declared with EDP, EQU, EQL, or other type-defining directive), it has no idea whether to use the direct, absolute, or long addressing mode. To do so would require that the assembler know the current values of the direct page and data bank registers at assembly time. Frankly, it is not possible for the assembler to always know the content of these registers, hence the programmer must manually supply this information to the assembler. This information, as well as some other useful information, is supplied to the assembler via the .ASSUME directive. The .ASSUME directive uses the syntax: .ASSUME operand1 {,operand2, ..., operandn} where operand(i) is one of the following: DBR:expression24 DBR:NOTHING DP:expression16 DP:NOTHING M:expression1 M:NOTHING X:expression1 X:NOTHING CPU:cpu_type where expression24 is an expression yielding a 24-bit value, expression16 is an expression yielding a 16-bit value, expression1 is an expression yielding zero or one, NOTHING is a reserved word, and cpu_type is one of {6502, 65c02, 65802, 65816} or one of the later versions of the 65c816 microprocessor. DP (direct page) is used to let the assembler know where the direct page register is pointing. If a segment name is given as the expression, that segment must be one that resides in bank zero and is of type DIRECT. If the assembler encounters a symbol declared in a segment that is assumed to be a direct page segment via the DP:expression operand, the assembler will reference that location using the direct page addressing mode (if posssible). If the "DP:NOTHING" form is used, the assembler will only use the direct page addressing mode if a symbol was declared with the EDP equate. None of the segments will be treated as direct page segments, even if they were declared as type DIRECT. If you want to simultaneously refer to several segments as direct page segments, group them together using the .GROUP directive and specify the group name as the expression value after the DP:, i.e., DPGroup .GROUP DPSeg1, DPSeg2, DPSeg3 .ASSUME DP:DPGroup By default, the assembler should assume DP:NOTHING. DBR is used to tell the assembler which segment/bank the DBR (data bank register) points at. References to variables within that segment will be assembled as absolute references (unless that segment name is also specified after DP:expr, in which case the direct page addressing mode will be used, if possible). If DBR:NOTHING is specified, absolute addressing will be used only for those symbols declared via EQU, all other references will be assumed to be long references. Note that the H.O. eight bits of the 24-bit expression are used. Therefore, to set the DBR assumption to an absolute bank in memory, an expression of the form: .ASSUME DBR:$200000 ;Assume DBR=$20 must be used. By default, the assembler should assume DBR:NOTHING. Normally, a programmer should use "#" and "=" to specify eight or sixteen bit immediate operand sizes. To help ensure upwards compatibility with existing source code, a mechanism has been added whereby the "#" is used and the .ASSUME directive controls the size of immediate operands. This task is achieved using the M:expr and X:expr operands. Normally the assembler defaults to M:NOTHING and X:NOTHING. In this mode, "#" specifies 8-bit immediate operands and "=" specifies 16-bit operands. If the expression following the M or X is zero or one, then any immediate operand containing an equal sign is flagged as an error and the "#" specifies an eight-bit operand if the expression was 1, a sixteen-bit operand if the expression was zero. If the expression evaluates to any other value an error is generated. Note that M only affects accumulator and memory operations while X affects the index register operations. It is perfectly permissible to have an .ASSUME of the form: .ASSUME M:NOTHING,X:1 The "=" immediate specifier would be allowed for accumulator operations but not for X/Y index register operations. To help ensure compatibility with the existing defacto standard, LONGI, LONGA, SHORTI, and SHORTA should be provided as built-in macros generating the appropriate .ASSUME statement. The "CPU:cpu_type" operand to the .ASSUME statement lets users specify the exact 6500 family CPU they are using. The effect of this operand is to "disconnect" certain instructions. If a certain CPU is specified and a programmer uses an addressing mode or instruction which isn't available on that CPU, the assembler will generate an error. By default, the assembler should assume the CPU of the machine on which the assembler is intended to run (e.g., 65c816 for Apple //GS machines). If the assembler is running on a different processor other than a 6500 family chip, it should default to 65c816. The user should be able to choose this default value from an assembler set-up program. The .ASSUME directive, and all operands available to it, must be supported at all compliance levels. Local Symbols ------------- In addition to local labels automatically specified inside procedures, macros, and expression functions, you can also explicitly declare local sym- bols within the source file. User-defined local symbols come in two varieties: numeric and symbolic. Up to 10 active numeric local labels can be specified at any given time. The numeric local labels are similar to those used by D. E. Knuth in "The Art of Computer Programming, Vol 1", although the syntax is different. Numeric local labels are declared by placing a caret (up-arrow) in front of a single decimal digit in the label field. Examples follow: ^0 LDX #05 ^9 DEX ^4 LDA LBL Numeric local labels are referenced with the ">n" and "<n" items, where "n" represents a single decimal digit. If the greater than symbol prefaces a digit, then the next occurrence of that numeric local label in the source file is referenced. If a less than ("<") symbol is used, then the previous numeric local label is used. Examples: LDX #5 ^0 CLR Array,X DEX BPL <0 ;References 2nd line above. ; LDA Array+2 bne >0 ;References ^0 below. TXA ^0 STA Array+1 Note that multiple occurrences of the same numeric local label may appear within the program. The are differentiated by the "<" and ">" symbols. Since "<" and ">" may appear both as operators and as the beginning of an operand, a minor ambiguity results. If you see a portion of an ex- pression like ">0", does it mean 'is some value greater than zero' or does it refer to the next occurrence of "^0"? This is easily handled from context. If the ">" or "<" appears where an operator is expected, then the appropriate operation is performed. If they appear where an operand is expected and they are followed by a single decimal digit, then they are used as lead-ins for numeric local labels. Otherwise an error must be generated. Numeric local labels are great for those cases where you need to perform a short branch or to set up a small loop and you don't want to use meaningless mnemonics like "loop1", "SkipInstr12", etc. Other times, you may want to use a meaningful name like "MainLoop" or "ElseQuit", without having to worry about conflicts in other parts of the program. Such cases are easily handled by the symbolic local label facility specified by this proposal. Two assembler directives: .LOCAL and .RELEASE are used to define the scope of user-specified local labels. The syntax for these two directives is identical, it is: .LOCAL label1 {,label2, ..., labeln} .RELEASE label1 {,label2, ..., labeln} A label defined with .LOCAL is confined to the scope of the .LOCAL/.RELEASE pair. .LOCAL/.RELEASE pairs may be nested allowing you to redefine a symbol to any reasonable depth (say, a minimum of 8 levels). Numeric local labels are required at all compliance levels. Symbolic local labels are required at the full and extended compliance levels. Conditional Assembly -------------------- Conditional assembly is handled by the .IF, .ELSE, .IF1, .IF2, .IFDEF, .IFNDEF, and .ENDIF directives. .IF is followed by a numeric address expression that yields a zero (false) or non-zero (true) result. The following code (up to the .ELSE or .ENDIF) is assembled if the result is true. Otherwise the code after the .ELSE (if it is present) is assembled in its stead. .IF1 and .IF2 assemble their respective code during passes one and two. .IFDEF and .IFNDEF accept a single symbol as their parameter and test whether or not this symbol is currently defined. The .ELSE directive can be used to assemble additional code in the event the tested condition is false. Finally, the .ENDIF directive is used to terminate a conditional assembly sequence. Conditional assembly blocks can be nested to at least eight levels, preferably more. Since all conditional assembly blocks are terminated with .ENDIF, there is no need to worry about matching .ELSEs as you would, say, in Pascal. Every form of the IF statement is terminated with its own .ENDIF. The .IF1 and .IF2 directives are normally used to print messages and perform other minor housekeeping chores. In general, there's absolutely no reason why anyone would want to generate code inside one of these conditional assembly blocks. Therefore, the assembler may optionally generate an error message if the location counter is modified anywhere inside the .IF1/.IF2 conditional assembly block. .IF, .ELSE, and .ENDIF are required at all compliance levels. .IF1, .IF2, .IFDEF, and .IFNDEF are required only at the full and extended com- pliance levels. While Loops ----------- Sometimes, especially within macros, you will need some sort of looping structure to process parameters or otherwise generate sequences of code; the .WHILE/.ENDW directives are used for this purpose. The syntax for the while section is: .WHILE expression <body of loop> .ENDW The instructions in the loop body are repeated as long as the expression yields a non-zero value. For the loop to terminate, the variable(s) controlling the loop must be defined using the "=" assembler directive since this is the only directive that allows you to redefine an instance of a variable. The .WHILE directive is especially useful for processing a macro (or record definition) with a variable number of parameters. Consider the following macro: ByteTable .MACRO ParmCnt = ?# .WHILE ParmCnt .BYTE ?:(?#-ParmCnt) ParmCnt = ParmCnt-1 .ENDW .ENDM _ByteTable 0,5,4,2,7 This example emits the five bytes 0, 5, 4, 2, and 7 into the object code stream. INCLUDE Mechanism ----------------- A source file include mechanism is provided by the .INCLUDE directive. Its syntax is .INCLUDE "filename" The specified file will be inserted at the point of the .INCLUDE directive in the current assembly, as though the code were actually inserted at that point. The include mechanism must be capable of nested includes up to four levels deep. The .INCLUDE directive must be supported at all compliance levels of the assembler, although assemblers operating at the subset compliance level need not support nested include files. Programs, Modules, and Units ---------------------------- The assembler handles three types of sources files: programs, modules, and units. Unless otherwise specified, all source files are assumed to be programs. A program is differentiated from a module or unit in that the assembler/linker assumes that control is transferred to some point in a 'program' when it is loaded into memory. Modules and units are assumed to be subserviant sections of code that contain data and/or code used by programs. By default, a piece of code is assumed to be a program and control is transferred to the first byte of that code when the program is loaded into memory. This helps improve compatibility with existing source files. The .PROGRAM directive can be used to explicitly declare a piece of code as a main program, as well as provide an entry address other than the first byte of code emitted. The syntax for the .PROGRAM directive is .PROGRAM label where "label" is a program statement label somewhere within the current assembly. The address of this label is passed on to the linker/loader where it will be used to provide a starting address for the code. All of the statements in the source file will be assembled into the program from the .PROGRAM directive till the .END directive. If a .PROGRAM directive appears in the source file, it must appear before any other statement (other than a comment or listing directive) and there may only be one .PROGRAM directive encountered per assembly. No modules or units may appear as part of a program assembly (see below). The .MODULE directive is used to tell the assembler that it is assemblying an object code module which is to be linked into a separate program before execution. The .PUBLIC statement is used as the means to communicate linkage information to other modules, units, and programs. Like the .PROGRAM directive, the .MODULE directive must appear before most statements in the source file and the module is terminated with the .END directive. However, another module may appear in the source file immediately after the .END directive. Such modules are assembled as independent entries in a library. The syntax for the .MODULE directive is: .MODULE ModuleName The module name operand is stored as part of the source file for use by the linker, but is not otherwise refereced during the assembly process. In fact, this symbol may be redefined later in the source file. The .LINK directive can be used to link a module into another module, unit, or program at assembly time. The syntax for this directive is: .LINK "filename",ModuleName where filename is the operating system name of the object code file or library file containing the module, and ModuleName is the actual module name specified with the .MODULE directive. The specified object code is inserted into the assembly at the point of the .LINK directive. Access to the symbols declared public within the module is accomplished using the .EXTERNAL directive. Units are a much more structured form of modules. With a unit, you specify not only the symbols visible to the code using the unit, but also how that data is used. Units also allow you to pass type checking information so the assembler can check for possible logical errors during assembly. Finally, as an added bonus, within units you can link in macros, records, types, symbols defined by "=", and other entities that cannot be handled by modules and the .PUBLIC/.EXTERNAL mechanism. A unit takes the form: .UNIT UnitName <interface section> .BEGIN <implementation section> .END Like .MODULEs, several units may appear in the source file by simply following the .END directive with the next unit definition. In fact, .MODULEs and .UNITs can be intermixed in the same source file. If more than one module or unit appears in the source file, they will be assembled into different slots in the object file generated (i.e., a library file will be generated). The interface section of a unit contains those items that will be public to the unit. Equates, records, macros, types, sets, and any other non-code generating declaration can be used in the interface section (note: an exact list of items will be specified later). Such definitions will be made available to the code that uses this unit as well as to the code in the implementation section. In addition to such declarations, the interface section may also contain .PROC definitions and .ENTRY definitions. The .PROC definitions simply contain the .PROC statement (which must also appear in the implementation section), the .ENTRY definition is used in lieu of the .PUBLIC directive and takes the form: label .ENTRY {NEAR or FAR} An example of a simple unit might be: .UNIT SimpleUnit MyMac .macro lda #0 sta ?0 .endm ; ClrSub .proc near SetTrue .proc far SetIt .entry far ; .BEGIN ;Start implemenation section. ; ClrSub .proc near _MyMac $11 ret .endp ; SetTrue .proc far lda #1 SetIt sta $23 ret .endp .end To use the code defined in a unit, the ".USE" directive is used in a fashion not unlike the .LINK directive, namely, .USE "filename",UnitName where filename is the operating system pathname and UnitName is the name specified in the operand field of the unit directive. Whenever the .USE directive appears in a source file, the content of the implementation section will be listed if the source listing option is turned on. Whenever the .USE or .LINK directives are employed, the corresponding object code is always inserted into the assembly. Therefore the assembler is performing double duty, it's acting as both the assembler and linker. With units, the assembler always performs the link operation. With modules, you can defer the link operation to a separate linkage step, although there are only a few instances where this would be beneficial (for example, while creating libraries). All of the program linkage directives are optional at the subset compliance level, but required at the full and extended levels. Listing Controls ---------------- Several directives are used to control the appearence of the assem- bled source listing. The exact format of the listing will be specified with- in this proposal (although at a later date). The exact listing format must be adhered to so that symbolic debuggers can take advantage of an assembled source listing saved as a text file for use when stepping through a program. .ON operands .OFF operands These two directives are used to turn certain listing options or or off. Valid operands include LIST, OBJ, MAC, and COND. LIST controls whether or not the source file is listed and supercedes all other options. OBJ (if on) will force the assembler to display all bytes of object code emitted by an instruction, even if it takes more than one line to display it all; if off, OBJ will only display the number of emitted object code bytes that fit on the current source line. MAC controls macro expansions during the listing. If off, only the macro name, not the expansion, will be dis- played. COND controls the printing of statements in a false conditional or while loop section. The .TITLE and .SUBTITLE directives let you assign titles and sub- titles to the source file. The syntax for these directives is .TITLE "Title of source file" .SUBTITLE "Subtitle for this section" The title is displayed at the top of each page and the subtitle is displayed immediately below the title. .TITLE always forces a page eject, .SUBTITLE never does. The .PAGE directive forces an immediate page ejection on the listing. It requires no operands. The .PRINTF directive has the syntax: .PRINTF "Control string" {,operands} It is used in a manner analogous to the PRINTF in the "C" programming language. If expressions follow the control string, "%" modifiers in the control string specify their output format. E.g., .PRINTF "Label = $%4h",Label would print Label = $1234 assuming the value associated with Label was $1234. The .PAUSE directive can be used to force an assembly time error. It is useful mainly in macros, records, expression functions, etc. to force an error if an illegal condition (like bad number of parameters) occurs. The listing control directives are required only at the full and extended compliance levels. Data Flow Analysis Directives ----------------------------- The following directives are quite useful to add-on debuggers and data flow analysis programs. They are required only at the full and extended compliance levels: label .table <data table> .endt For .table, the label is assigned the current value of the location counter and label is treated like a statement label. .TABLE and .ENDT are otherwise ignored. label .REF label1 {, label2, ..., labeln} This statement is ignored by the assembler. The statement label, if present, is also ignored. Other Optional Goodies ---------------------- The following are not required by this proposal, but should be provided nonetheless: .system "DOS command" .SYSTEM issues the specified command to the operating system. This command is useful for deleteing files during assembly, changing directories, etc. Operation of the Assembler -------------------------- Given the structure of the assembler, there's no way it can accomplish its job in less than three passes without placing severe burdens on the user (I could provide you with a mathematical proof of this, but I don't want to bore you to death). Therefore, the standard specifies that the assembler must use three (or more) passes to do its job. During the first pass the assembler associates labels with segments (and groups of segments), determines whether or not those symbols are near or far, and performs other housekeeping chores fit for pass one. Pass two of the assembler is equivalent to the traditional pass one of an assembler, it computes the values for all of the symbols in the program. Pass three generates the actual object code. In Addition to the Assembler... ------------------------------- The standard should also include specifications for a run-time library to be provided with the assembler as well as a list of tools (e.g., debugger, linker, librarian, etc.) which must be provided with the product to meet the full compliance level. I would like to propose the following items in the run-time library: TTY_IO: A set of routines to communicate with a text-based user console. INIT, GETC, and PUTC are the basic routines. These three routines are easily supported on any system supporting a user console. TERMINAL_IO: A set of routines to communicate with a cursor-based terminal device. Routines supported should include INIT, GETC, PUTC, GOTOXY, HOME, CLREOLN, and CLREOP. CONSOLE_IO: A set of routines to communicate with a DMA-based video display device. See the specifications for ANIX's CHARIO driver for the routines to be supplied with this library entry. AUX_IO: A driver for a set of one or more serial communication ports. Routines should include INITA, SETUPA, GETA, PUTA, STATUSA. PRT_IO: A driver for a set of one or more printer ports. Routines should include INITP, SETUPP, PUTP, and STATUSP. NET_IO: A driver for a set of one or more network ports. Routines should include INITN, SETUPN, GETPacket, SendPacket, etc. CLK_IO: A driver for a real time clock or clock-calendar unit. FP: An IEEE floating point package for the 65c816 chip. MATH: A set of integer math routines (multiply, divide, extended precision, etc.). CONV: A set of conversion routines (binary -> decimal, etc.). FILE_IO: A set of routines that interface to the host's operating system providing a common interface to various operating systems. DVC_IO: A hardware independent device I/O package (allowing named devices which can be connected through a BIOS (like the AUX_IO and PRT_IO packages) to various hardware devices. STD_IO: A set of routines to perform various I/O operations such as PRINT, PRINTF, SCANF, PUTI (integer), GETI, PUTH (hex), GETH, etc. MEM_MGR: A set of memory management routines to efficiently allocate and deallocate memory. This is, by no means, an exhaustive list, but a quick sample of the types of routines that should be provided. Apple //GS users may complain that many of these routines already exist within the confines of the Apple toolbox. The intent, however, is to provide a set of useful routines that can be utilized on ANY 65c816 system so 65c816 code can be easily ported to systems other than the Apple //GS.