The C Users' Group Library 1994 August

home *** CD-ROM | disk | FTP | other *** search

/ The C Users' Group Library 1994 August / wc-cdrom-cusersgrouplibrary-1994-08.iso / vol_300 / 338_01 / as68.doc < prev next >

Wrap

Text File | 1990-09-25 | 24KB | 595 lines

as68 - 68000 Assembler, version 1.02 (c) copyright 1982 Steve Passe all rights reserved Modified to support CC68K C Compiler by Brian Brown, Nov 1989 TABLE OF CONTENTS Chapter 1 Introduction 1 Chapter 2 Usage 6 Chapter 3 Pseudo-ops 8 Chapter 4 Mnemonics 11 Chapter 5 Expressions 13 Chapter 6 S File Format 15 Chapter 7 Error Messages 16 Chapter 8 Differences 18 CHAPTER 1 INTRODUCTION The as68 assembler is a disk to disk assembler for the Motorola 68000 microprocessor chip. Written in the c programming language, it may be used as a cross assembler on any machine supporting c, or as a native assembler if compiled with a c that produces 68000 output. It's directives and mnemonic set closely follow that of the Motorola Resident Structured Assembler. It has been altered to accept assembler output from the 68000 C Compiler. This modification was performed by B Brown at the Central Institute of Technology, Heretaunga, New Zealand in 1989. SOURCE PROGRAM The input to the assembler is an ascii text file, consisting of a series of statements written in the assembly language. Each statement consists of one or more fields within a line. The assembler is free format within each line, i.e. there is no need to start a specific field of a statement in a particular column. Fields are separated from one another with whitespace (tabs or spaces). STATEMENTS There are 3 basic statement types. The most common is an assembly language instruction or mnemonic. It is a command to the assembler to produce a machine operation code to carry out a specific action. The second type of statement is called an assembly directive or pseudo-op. Pseudo-ops tell the assembler how to assemble the program. The third statement type is called a comment. It is ignored by the assembler, it's purpose being to allow the programmer to insert descriptions of what the code is doing within the text of the source program. Comments may exist as the final field of the other two statement types. INSTRUCTION STATEMENTS An instruction statement consists of from one to four fields: [label] <mnemonic> [operand] [comment] LABEL FIELD The first field, the label field, is optional. It is used to create a symbolic name for the address of the code generated by the following assembler mnemonic. This label is stored in the symbol table and any references to it evaluate to the associated address. The label field may be the only field of a statement and multiple, label only fields may follow one another. In all cases the label(s) will evaluate to the address of the first mnemonic to be assembled after the label(s) is specified. Labels are composed of alphanumeric characters and may be up to 30 characters long. All characters of a label are significant, as is the case of alphabetic characters (i.e. "Foo" is different than "foo"). The first character of a label must be either alphabetic or the character '.' (period). Following characters may also include the underscore (_), dollarsign ($), and the digits '0' thru '9'. Labels starting in any other than the first column must be terminated with a colon (:). Certain symbols are reserved for the use of the assembler and thus may not be used as labels. These include "SP", "USP", "SR", "CCR", "A0" through "A7" and "D0" through "D7". MNEMONIC FIELD The second field is the mnemonic or assembly instruction field. It will always be present in a statement except in the case of a label only statement (label only statements might more properly be described as assembler directives). If the line is unlabeled the mnemonic field must be preceeded by whitespace. A mnemonic will consist of from 3 to 5 ascii characters, the case of which is not significant. This assembler recognizes the standard Motorola instruction set. The complete mnemonic instruction set is described in chapter 4, "Mnemonics". Many 68000 instructions may work on different data sizes. The desired data size is specified by appending a length modifier or data size code to the mnemonic. A '.b' extension specifies a data size of byte (8 bits) while '.l' will cause the data size to be a long word (32 bits). No extension will cause the data size to be a word (16 bits). A '.w' extension may be used for data sizes of word, although this size is the default and as such the '.w' modifier is unnecessary. OPERAND FIELD The operand field is necessary only for those statements whose mnemonic requires an operand(s). It will contain one or two operands. When two operands are present they must be separated with a comma (no whitespace allowed between operands). The first of two operands is refered to as the source operand while the second is the destination operand. COMMENT FIELD The comment field is optional and consists of all text following the above fields. DIRECTIVES Label Field - Labels used with directive statements follow the general rules of those used in assembly statements with one important exception: they may only be used with the following directives: 1. EQU 2. SET 3. DC 4. DS Directive Field - The directive field contains an instruction to the assembler as to how the program should be assembled. This includes such things as the base address of the program, setting of symbol values, allocation of program memory storage, conditional assembly, etc. The complete list of available assembly directions is given in chapter 3, "Pseudo-ops". Operand Field - The operand field of a directive statement will consist of zero or more operands, as needed by the pseudo-op in question. Multiple operands are separated with a comma (,). No whitespace may exist between operands. Comment Field - The comment field is identical to that used in instruction statements and is optional. Comments - Comments may exist alone as separate statements. In such cases an asterisk, (*), must be the first character on the line. CHAPTER 2 USAGE Command Line Format - The command line format is: as68 <sourcefile>[.ext] [option[ option]] where: sourcefile is the source file name. ext is an optional file extension identifier. By default the assembler expects source files to have an ext of ".asm". option is one of several possible options in assembly. Whitespace must separate multiple options when they occur. Individual options are described below. OPTIONS The following options are available: e, destination of error messages. If absent all error messages will go to the console by default. If present, one or more of the following destinations may be specified: c error messages to the console. e error messages to a file named "sourcefile.err". f errors reported in listing. l destination of assembly listing. If absent no listing is made. One or more of the following option extensions are available: c listing to console. f listing to a file named "sourcefile.ls" o type and destination of object file. If not present the object file will be in Motorola 'S' FILE format. s object to a file named "sourcefile.s19" x no object file to be made. s set the symbol table size. The symbol table requires 8 bytes plus the length of the symbol for each entry. The argument should be in decimal bytes. The symbol table defaults to 2000 bytes (decimal). To reserve 3500 bytes the option would be: 's3500'. t truncate source code lines in listing. This option will cause the source code lines sent to any open list channel to be truncated at the normal wrap position (see option 'w' below). It defaults to being off, a 't' in the command line will turn it on. w set the value of wrap. Source code lines in the listing(s) will normally be 'wrapped' to the next line if they extend beyond the column number specified by this option. If the 't' option is active this option specifies the column beyond which source lines are truncated. The default value of 'w' is 80, but can be set between 60 and any reasonable number of columns. Note that this number should be set to the width of your list device, it is not the number of columns of source code to a line (i.e. a value of 80 allows 40 characters of source per line after accounting for the 40 columns used by the line/loc/code fields). OTHER SYSTEMS The code presently supports the MSDOS operating system. Some work would have to be done in the command line parser to bring it up on other operating systems. CHAPTER 3 PSEUDO-OPS Assembly Control The following assembler directives are used to control the assembly: ORG is used to specify the absolute memory origin of the code to be assembled. The operand is an expression that evaluates to an address within the first 64 kilobytes of memory space (0 thru $FFFF). Any memory references outside this range will cause an assembly error. Be aware that the 68000 chip sign extends absolute short addresses. Thus address references above 32k ($8000) will access hardware memory in the range of $FF8000 thru $FFFFFF. This pseudo-op will cause the assembler to generate code using absolute short addressing. ORG.L is also used to specify the absolute memory origin. However, in this case the entire address range of the 68000 is usable (0 thru $FFFFFF). The assembler will generate code using absolute long addressing. RORG causes the assembler to generate program counter relative code. The memory range restrictions of the "ORG" directive apply. RORG.L causes the assembler to generate program counter relative code as above, however the expression in the operand field may evaluate to any value within the 68000's address range. END signals the end of the assembly language program. SYMBOL DEFINITION The following directives control the definition of symbols: EQU defines a symbol and sets it's value to that of the operand. This symbol value is permanent, i.e. it cannot be changed later in a program. The operand may be a complex expression but cannot make forward references. SET defines or redefines a symbol and sets it's value to that of the operand. The value is temporary and may be reset with another 'SET' directive. Again, forward references are not allowed. MEMORY ALLOCATION These directives are used to reserve and/or initialize memory: DC fills memory locations with constant value(s). An extension of '.B' causes individual bytes to be filled with the value of the operand(s). An extension of '.L' will cause the operands to be evaluated as 32 bit values, which are placed in 4 byte blocks, one for each operand. No extension or '.W' signifies that the operand(s) are to be evaluated as 16 bit values, each being stored in consecutive 2 byte locations. Word and long word values are aligned on even address boundries. DC.B directives causing the location counter to end on an odd address will pad 1 byte with a zero value unless the next source statement is another DC.B. DS reserves memory locations. Again, a data size extension may be appended to 'DS' to specify either byte or long word allocation. The operand specifies the number of data cells to reserve, i.e. if the extension was '.L' and the operand evaluated to 5, forty (5 * 4 bytes for a long word) bytes would be reserved. Word alignment is not automatic as in the dc directive. To force alignment after a DS.B statement use a "ds 0" statement. LIST CONTROLS These directives are used to control the listing output: LIST causes listing output (if enabled from the command line) to be sent to each of the open list channels. LIST is active by default and remains so until a NOLIST pseudo is encountered. NOLIST causes listing output to be turned off until a LIST pseudo is encountered. CHAPTER 4 MNEMONICS The mnemonics used by this assembler follow the standard mnemonic instruction set as defined by Motorola. Mnemonics may exist in either upper or lower case in the source file, the assembler makes no distinctions. ABCD add binary coded decimal ADD add binary ADDQ add quick binary, operand in range of 1 thru 8 ADDX add binary with extend AND logical and ASL arithmetic shift left ASR arithmetic shift right Bcc branch conditionally BCHG bit test and change BCLR bit test and clear BRA branch unconditional BSET bit test and set BSR branch to subroutine BTST bit test CHK check register against boundaries CLR clear operand CMP compare CMPM compare memory DBcc test condition, decrement and branch DIVS signed divide DIVU unsigned divide EOR exclusive or EXG exchange registers EXT sign extend JMP jump to address JSR jump to subroutine LEA load effective address LINK link and allocate LSL logical shift left LSR logical shift right MOVE move MOVEM move multiple registers MOVEP move peripheral data MOVEQ move quick, operand in range og -128 thru 127 MULS signed multiply MULU unsigned multiply NBCD negate binary coded decimal NEG negate NEGX negate with extend NOP no operation NOT bitwise compliment OR logical or PEA push effective address RESET reset external devices ROL rotate left ROR rotate right ROXL rotate left with extend ROXR rotate right with extend RTE return from exception RTR return and restore condition codes RTS return from subroutine SBCD subtract binary coded decimal Scc set conditional STOP stop SUB subtract binary SUBQ subtract quick binary, operand in range of 1 thru 8 SUBX subtract binary with extend SWAP swap register halves TAS test and set operand TRAP trap TRAPV trap on overflow TST test an operand UNLK unlink CHAPTER 5 EXPRESSIONS Expressions consist of one or more symbols combined by binary and/or unary (algebraic) operators. Possible symbols include: - Symbols defined with the EQU and SET directives. - Program labels. - Numeric values. - The asterisk, ('*'), equates to the present value of the program location counter. ALGEBRAIC OPERATORS INCLUDE: - arithmetic operators: * multiplication: '*' * division: '/' * addition: '+' * subtraction: '-' LOGICAL OPERATORS: * logical (bitwise) AND: '&' * logical (bitwise) OR: '!' * left shift: '<<' * right shift: '>>' UNARY OPERATORS: * unary minus: '-' * location counter value: '*' SYMBOLS Symbols are composed of alphanumeric characters and may be up to 30 characters long. All characters of a label are significant, as is the case of alphabetic characters (i.e. "Foo" is different than "foo"). The first character of a label must be either alphabetic or the character '.' (period). Following characters may also include the underscore (_), dollarsign ($), and the digits '0' thru '9'. Numbers may be represented as either decimal, hexadecimal, or binary: - decimal numbers are represented by the normal ascii digits '0' thru '9'. - hexadecimal numbers start with a dollar sign ('$') followed by the ascii digits '0' thru '9' and the ascii characters 'A' thru 'F'. - binary numbers start with a percent sign ('%') followed by the ascii digits '0' and '1'. CHAPTER 6 S FILE FORMAT A Motorola "S file" is similar in structure to an Intel hex file. It consists of a series of ascii records, each in the following format: - The record start character, an uppercase ascii 'S', followed by an ascii numeral, '0' thru '9': * a '0' for the file header record. * a '1' for records with short (16 bit) addresses. * a '2' for records with medium (24 bit) addresses. * a '3' for records with long (32 bit) addresses. * ...... * a '9' for the tail record. - The third and forth bytes forming an ascii hex representation of the number of bytes in the body of the record: * an address field consisting of 4, 6, or 8 ascii bytes representing the load address of the record (record types S1, S2, S3, respectively). - The body of the record consists of up to 16 bytes of data, each byte represented as 2 ascii hex bytes. - A checksum byte, again represented by two ascii characters: * The checksum for each record (except the last, S9) is the least significant byte of the 1's compliment of the sum of the byte values of the count, address, and data fields. (ie, everything in the record except 'Sx'). * The checksum on the 'S9' record is undefined/non-existant. CHAPTER 7 ERROR MESSAGES - 1: statement parsing error. The occurrance of this error indicates that the line totally confuses the parser and no further diagnostic comments (legitimate comments anyway) can be made. - 2: bad character in mnemonic-psdo field. An illegal character exists in the word present in the mnenonic field, see chapter 1, introduction, instruction statements. - 3: instruction or pseudo not found in tables. Unless error 2 is also generated the instr/pseudo word is properly formed (i.e. no illegal characters are in it) but not a recognized instruction. - 4: bad character in macro field. Version 1.xx does not recognize macros! - 5: macro not found in macro table(s). Again, macros are not yet recognized. - 6: improper use of label. Most often this error flags the use of a label on a pseudo op line that does not allow the use of labels. See chapter 1, introduction, directives:label field. - 7: can't evaluate operand. The assembler cannot evaluate the value of an operand. This may be caused by a variety of reasons such as imbedded spaces, unrecognized operators, unbalanced parenthesis, etc. Additional errors will usually be reported that will help clarify the problem. - 8: can't evaluate equ operand. As above, generated in the case of an EQU statement. - 9: can't evaluate set operand. As above, generated in the case of an SET statement. - 10: attempt to redefine a permanent symbol. The symbol was previously defined via an EQU statement and thus cannot be changed. - 11: symbol table full. This is a fatal error and will cause the assembly to abort. You may try again setting an optional symbol table size from the command line. See chapter 2, usage, options:s. - 12: unrecognized operand. A legal operand cannot be formed from the data in the operand field. - 13: symbol not defined in symbol table. The symbol was not encountered in the program up to this point. Remember that forward references are not allowed in EQU and SET pseudo statements. - 14: label out of range for current addressing mode. The value of the label is outside the limits of the current statement. This usually is an attempt to reference an address beyond the 32k bytes range imposed in the short addressing mode. See chapter 3, pseudo-ops, assembly control. - 15: operand 1 is not valid for instruction type. The first operand encountered is not a valid operand for this particular instruction/addressing mode. - 16: operand 2 is not valid for instruction type. The second operand encountered is not a valid operand for this particular instruction/addressing mode. - 17: operand 1 is not correctly formed. This may be caused by illegal use of operators, undefined labels, etc. The assembler may or may not attempt to evaluate the second operand. If it does you are not assured that it will do so correctly. For more on this see error #18. - 18: operand 2 is not correctly formed. This may be for any of the reasons stated above for error 17. Remember that it could be incorrectly evaluated if error 17 was also generated and that the second operand may be incorrectly formed but not reported as such after error 17 occurs. This is dependant on the assembler correctly determining where the poorly formed operand 1 ends and operand 2 starts. - 19: code building function failed. The binary code building function has failed to properly construct a sequence of code for the statement. This error will usually be followed with additional error #s pinpointing the problem. - 20: A 3 bit immediate data value out of bounds, i.e. it is greater than 7 or less than 0. - 21: 8 bit bit field specifier out of range. - 22: 32 bit bit field specifier out of range. - 23: attempt to generate a bit field specifier failed. - 24: count operand out of range (1-8). - 25: destination register specifier illegal or out of range. - 26: source or destination register specifier illegal or out of range. - 27: attempt to generat register mask list from operand failed. - 28: operand failed to evaluate to a proper source or destination effective address. - 29: illegal destination effective address, probably label or label with index reference. - 30: illegal destination effective address. - 31: illegal multiple destination effective address, either label, label with index, or address register indirect with predecrement or postincrement. - 32: illegal jump effective address, usually address register indirect with predecrement or postincrement. - 33: 4 bit vector out of range. - 34: expected address displacement failed to evaluate correctly. - 35: 8 bit displacement out of range (-128 thru 127) - 36: 16 bit displacement out of range (-32768 thru 32766) - 37: extent word of operand failed to evaluate correctly. - 38: 8 bit operand out of range (-128 thru 255, sign is responsibility of programmer). - 39: 8 bit extension word value out of range (-128 thru 255). - 40: 16 bit extension word out of range (-32768 thru 65535). - 41: 32 bit extension word value out of range (overflow). - 42: attempt to generate a 16 bit displacement failed, probably out of range. - 43: attempt to generate an 8 bit displacemnt failed, check range. CHAPTER 8 DIFFERENCES This list must be considered incomplete and will be added to as the facts are brought to my attention. In General - As68 allows addresses assembled under an rorg.l or org.l condition to be coerced into a short address by enclosing the entire expression in a set of parenthesis followed by '.s'. This allows references to the first/last 32k with a code sequence that is 2 bytes shorter than the long address mode. As an example: org.l $8000 outport equ $ffa000 * port mapped into last * page of address space move.b d0,outport * will generate 6 bytes of code while move.b d0,(outport).s * will only generate 4 bytes of code MODIFIED format of movem instruction. movem /d4/d5/, (a7)+ NOTE the trailing /