home *** CD-ROM | disk | FTP | other *** search
- CHAPTER 9 DIRECTIVES IN A86
-
-
- Segments in A86
-
- The following discussion applies when A86 is assembling a .COM
- See the next chapter for the discussion of segmentation for .OBJ
- files.
-
- A86 views the 86 computer's memory space as having two parts: The
- first part is the program, whose contents are the object bytes
- generated by A86 during its assembly of the source. A86 calls
- this area the CODE SEGMENT. The second part is the data area,
- whose contents are generated by the program after it starts
- running. A86 calls this area the DATA SEGMENT.
-
- Please note well that the only difference between the CODE and
- DATA segments is whether the contents are generated by the
- program or the assembler. The names CODE and DATA suggest that
- program code is placed in the CODE segment, and data structures
- go in the DATA segment. This is mostly true, but there are
- exceptions. For example, there are many data structures whose
- contents are determined by the assembler: pointer tables, arrays
- of pre-defined constants, etc. These tables are assembled in the
- CODE segment.
-
- In general, you will want to begin your program with the
- directive DATA SEGMENT, followed by all your program variables
- and uninitialized data structures, using the directives DB, DW,
- and STRUC. If you do not give an ORG directive, A86 will begin
- the allocation immediately following the end of the .COM program.
- You can end the DATA SEGMENT allocation lines with the DATA ENDS
- directive, followed by the program code itself. A short program
- illustrating this suggested usage follows:
-
- DATA SEGMENT
- ANSWER_BYTE DB ?
- CALL_COUNT DW ?
-
- CODE SEGMENT
- JMP MAIN
-
- TRAN_TABLE:
- DB 16,3,56,23,0,9,12,7
-
- MAIN:
- MOV BX,TRAN_TABLE
- XLATB
- MOV ANSWER_BYTE,AL
- INC CALL_COUNT
- RET
-
- A86 allows you to intersperse CODE SEGMENTs and DATA SEGMENTs
- throughout your program; but in general it is best to put all
- your DATA SEGMENT declarations at the top of your program, to
- avoid problems with forward referencing.
- 9-2
-
- CODE ENDS and DATA ENDS Statements
-
- For compatibility with Intel/IBM assemblers, A86 provides the
- CODE ENDS and DATA ENDS statements. The CODE ENDS statement is
- ignored; we assume that you have not nested a CODE segment inside
- a DATA segment. The DATA ENDS statement is equivalent to a CODE
- SEGMENT statement.
-
-
-
- The ORG Directive
-
- Syntax: ORG address
-
- ORG moves the output pointer (the location counter at which
- assembly is currently taking place within the current segment) to
- the value of the operand. In the CODE segment, the operand
- should be an absolute constant, or an expression evaluating to an
- absolute, non-forward-referenced constant. In the DATA segment,
- the operand may be a forward reference or an expression
- containing one or more forward references. All symbols in the
- segment will be resolved when the forward references to the ORG
- operand are all resolved.
-
- There is a special side effect to ORG when it is used in the CODE
- segment. If you begin your code segment with ORG 0, then A86
- knows that you are not assembling a .COM program; but are instead
- assembling a code segment to be used in some other context
- (examples: programming a ROM, or assembling a procedure for older
- versions of Turbo Pascal). The output file will start at 0, not
- 0100 as in a .COM file; and the default extension for the output
- file will be .BIN, not .COM. However, if you later issue an ORG
- 0100 directive, the default will revert back to .COM.
-
- Other than in the above example, you should not in general issue
- an ORG within the CODE segment that would lower the value of the
- output pointer. This is because you thereby put yourself in
- danger of losing part of your assembled program. If you
- re-assemble over space you have already assembled, you will
- clobber the previously-assembled code. Also, be aware that the
- size of the output program file is determined by the value of the
- code segment output pointer when the program stops. If you ORG
- to a lower value at the end of your program, the output program
- file will be truncated to the lower-value address.
-
- Again, almost no program producing a .COM file will need any ORG
- directive in the code segment. There is an implied ORG 0100 at
- the start of the program. You just start coding instructions,
- and the assembler will put them in the right place.
- 9-3
-
- The EVEN Directive
-
- Syntax: EVEN constant
-
- The EVEN directive coerces the current output pointer to a value
- which is an exact multiple of the operand. If no operand is
- given, a value of 2 is assumed. In a DATA SEGMENT or STRUC, it
- does so by adding to the current output pointer if necessary. In
- a code segment, it outputs an appropriate number of NOP
- instruction bytes. EVEN is most often used in data segments,
- before a sequence of DW directives. Machines beyond the original
- 8088 fetch words more quickly when they are aligned onto even
- addresses; so the EVEN directive insures that your program will
- have the faster access to those DW's that follow it. Also useful
- are EVEN 4 for doubleword alignment, and EVEN 16 for paragraph
- alignment.
-
-
- Data Allocation Using DB, DW, DD, DQ, and DT
-
- The 86 computer family supports the three fundamental data types
- BYTE, WORD, and DWORD. A byte is eight bits, a word is 16 bits
- (2 bytes), and a doubleword is 32 bits (4 bytes). In addition,
- the 87 floating point processor manipulates 8-byte quantities,
- which we call Q-words, and 10-byte quantities, which we call
- T-bytes. The A86 data allocation statement is used to specify
- the bytes, words, doublewords, Q-words, and T-bytes which your
- program will use as data. The syntax for the data allocation
- statement is as follows:
-
- (optional var-name) DB (list of values)
- (optional var-name) DW (list of values)
- (optional var-name) DD (list of values)
- (optional var-name) DQ (list of values)
- (optional var-name) DT (list of values)
-
- The variable name, if present, causes that name to be entered
- into the symbol table as a memory variable with type BYTE (for
- DB), WORD (for DW), DWORD (for DD), QWORD (for DQ), or TBYTE (for
- DT). The variable name should NOT have a colon after it, unless
- you wish the name to be a label (instructions referring to it
- will interpret the label as the constant pointer to the memory
- location, not its contents).
-
- The DB statement is used to reserve bytes of storage; DW is used
- to reserve words. The list of values to the right of the DB or
- DW serves two purposes. It specifies how many bytes or words are
- allocated by the statement, as well as what their initial values
- should be. The list of values may contain a single value or more
- than one, separated by commas. The list can even be missing;
- meaning that we wish to define a byte or word variable at the
- same location as the next variable.
- 9-4
-
- If the data initialization is in the DATA segment, the values
- given are ignored, except as place markers to reserve the
- appropriate number of units of storage. The use of "?", which in
- .COM mode is a synonym for zero, is recommended in this context
- to emphasize the lack of actual memory initialization. When A86
- is assembling .OBJ files, the ?-initialization will cause a break
- in the segment (unless ? is embedded in a nested DUP containing
- non-? terms, in which case it is a synonym for zero).
-
- A special value which can be used in data initializations is the
- DUP construct, which allows the allocation and/or initialization
- of blocks of data. The expression n DUP x is equivalent to a
- list with x repeated n times. "x" can be either a single value,
- a list of values, or another DUP construct nested inside the
- first one. The nested DUP construct needs to be surrounded by
- parentheses. All other assemblers, and earlier versions of A86,
- require parentheses around all right operands to DUP, even simple
- ones; but this requirement has been removed for simple operands
- in the current A86.
-
- Here are some examples of data initialization statements, with
- and without DUP constructs:
-
- CODE SEGMENT
- DW 5 ; allocate one word, init. to 5
- DB 0,3,0 ; allocate three bytes, init. to 0,3,0
- DB 5 DUP 0 ; equivalent to DB 0,0,0,0,0
- DW 2 DUP (0,4 DUP 7) ; equivalent to DW 0,7,7,7,7,0,7,7,7,7
-
-
- DATA SEGMENT
- XX DW ? ; define a word variable XX
- YYLOW DB ; no init value: YYLOW is low byte of word var YY
- YY DW ?
- X_ARRAY DB 100 DUP ? ; X_ARRAY is a 100-byte array
- D_REAL DQ ? ; double precision floating variable
- EX_REAL DT ? ; extended precision floating variable
-
- A character string value may be used to initialize consecutive
- bytes in a DB statement. Each character will be represented by
- its ASCII code. The characters are stored in the order that they
- appear in the string, with the first character assigned to the
- lowest-addressed byte. In the DB statement that follows, five
- bytes are initialized with the ASCII representation of the
- characters in the string 'HELLO':
-
- DB 'HELLO'
- 9-5
-
- Note that except for string comparisons described in the previous
- chapter, the DB directive is the only place in your program that
- strings of length greater than 2 may occur. In all other
- contexts (including DW), a string is treated as the constant
- number representing the ASCII value of the string; for example,
- CMP AL,'@' is the instruction comparing the AL register with the
- ASCII value of the at-sign. Note further that 2-character string
- constants, like all constants in the 8086, have their bytes
- reversed. Thus, while DB 'AB' will produce hex 41 followed by
- hex 42, the similar looking DW 'AB' reverses the bytes: hex 42
- followed by hex 41.
-
- For compatibility, A86 now accepts double quotes, as well as
- single quotes, for strings in DB directives.
-
-
- The DD directive is used to initialize 32-bit doubleword pointers
- to locations in arbitrary segments of the 86's memory space.
- Values for such pointers are given by two numbers separated by a
- colon. The segment register value appears to the left of the
- colon; and the offset appears to the right of the colon. In
- keeping with the reversed-bytes nature of memory storage in the
- 86 family, the offset comes first in memory. For example, the
- statement
-
- DD 01234:05678
-
- appearing in a CODE segment will cause the hex bytes 78 56 34 12
- to be generated, which is a long pointer to segment 01234, offset
- 05678.
-
- DD, DQ, and DT can also be used to initialize large integers and
- floating point numbers. Examples:
-
- DD 500000 ; half million, too big for most 86 instructions
- DD 3.5 ; single precision floating point number
- DQ 3.5 ; the same number in a double precision format
- DT 3.5 ; the same number in an extended precision format
-
-
- The STRUC Directive
-
- The STRUC directive is used to define a template of data to be
- addressed by one of the 8086's base and/or index registers. The
- syntax of STRUC is as follows:
-
- (optional strucname) STRUC (optional effective address)
-
- The optional structure name given at the beginning of the line
- can appear in subsequent expressions in the program, with the
- operator TYPE applied to it, to yield the number of bytes in the
- structure template.
- 9-6
-
- The STRUC directive causes the assembler to enter a mode similar
- to DATA SEGMENT: assembly within the structure declares symbols
- (the elements of the structure), using a location counter that
- starts out at the address following STRUC. If no address is
- given, assembly starts at location 0. An option not available to
- the DATA SEGMENT is that the address can include one base
- register [BX] or [BP] and/or one index register [SI] or [DI]. The
- registers are part of the implicit declaration of all structure
- elements, with the offset value increasing by the number of bytes
- allocated in each structure line. For example:
-
- LINE STRUC [BP] ; the template starts at [BP]
- DB 80 DUP (?) ; these 80 bytes advance us to [BP+80]
- LSIZE DB ? ; this 1 byte advances us to [BP+81]
- LPROT DB ?
- ENDS
-
- The STRUC just given defines the variables LSIZE, equivalent to
- B[BP+80], and LPROT, equivalent to B[BP+81]. You can now issue
- instructions such as MOV AL,LSIZE; which automatically generates
- the correct indexing for you.
-
- The mode entered by STRUC is terminated by the ENDS directive,
- which returns the assembler to whatever segment (CODE or DATA) it
- was in before the STRUC, with the location counter restored to
- its value within that segment before the STRUC was declared.
-
-
-
- Forward References
-
- A86 allows names for a variety of program elements to be forward
- referenced. This means that you may use a symbol in one
- statement and define it later with another statement. For
- example:
-
- JNZ TARGET
- .
- .
- TARGET:
- ADD AX,10
-
- In this example, a conditional jump is made to TARGET, a label
- farther down in the code. When JNZ TARGET is seen, TARGET is
- undefined, so this is a forward reference.
- 9-7
-
- Earlier versions of A86 were much more restricted in the kinds of
- forward references allowed. Almost all of the restrictions have
- now been eased, for convenience as well as compatibility with
- other assemblers. In particular, you may now make forward
- references to variable names. You just need to see to it that
- A86 has enough information about the type of the operand to
- generate the correct instruction. For example, MOV FOO,AL will
- cause A86 to correctly deduce that FOO is a byte variable. You
- can even code a subsequent MOV FOO,1 and A86 will remember that
- FOO was assumed to be a byte variable. But if you code MOV FOO,1
- first, A86 won't know whether to issue a byte or a word MOV
- instruction; and will thus issue an error message. You then
- specify the type by MOV FOO B,1.
-
- In general, A86's compatibility with other assemblers has
- improved dramatically for forward references. You'll need only
- sprinkle a very few B's and W's into your references. And you'll
- be rewarded: in many cases the word form is longer than the byte
- form, so that other assemblers wind up inserting a wasted NOP in
- your program. You'll wind up with tighter code by using A86!
-
-
- Forward References in Expressions
-
- A86 now allows you to include any number of forward-reference
- symbols in expressions of arbitrary complexity. If the
- expression is legal when the forward references are resolved,
- then it will be accepted by the assembler.
-
- A86 will also accept the reserved symbol END as a
- forward-reference quantity, either by itself as an operand, or
- within an expression. END will be resolved when assembly is
- complete, as a label pointing to the end of the program.
-
- For example, suppose you wish to advance the ES segment register
- to point immediately beyond your program. You can code:
-
- MOV AX,CS ; fetch the program's segment value
- ADD AX,(END+15)/16 ; add in the number of paragraphs
- MOV ES,AX ; ES is now loaded as desired
-
-
-
- The EQU Directive
-
- Syntax: symbol-name EQU expression
- symbol-name EQU built-in-symbol
- symbol-name EQU INT n
-
- The expression field may specify an operand of any type that
- could appear as an operand to an instruction.
- 9-8
-
- As a simple example, suppose you are writing a program that
- manipulates a table containing 100 names and that you want to
- refer to the maximum number of names throughout the source file.
- You can, of course, use the number 100 to refer to this maximum
- each time, as in MOV CX,100, but this approach suffers from two
- weaknesses. First of all, 100 can mean a lot of things; in the
- absence of comments, it is not obvious that a particular use of
- 100 refers to the maximum number of names. Secondly, if you
- extend the table to allow 200 names, you will have to locate each
- 100 and change it to a 200. Suppose, instead, that you define a
- symbol to represent the maximum number of names with the
- following statement:
-
- MAX_NAMES EQU 100
-
- Now when you use the symbol MAX_NAMES instead of the number 100
- (for example, MOV CX,MAX_NAMES), it will be obvious that you are
- referring to the maximum number of names in the table. Also, if
- you decide to extend the table, you need only change the 100 in
- the EQU directive to a 200 and every reference to MAX_NAMES will
- reflect the change.
-
- You could also take advantage of A86's strong typing, by changing
- MAX_NAMES to a variable:
-
- MAX_NAMES DB ?
-
- or even an indexed quantity:
-
- MAX_NAMES EQU [BX+1]
-
- Because the A86 language is strongly typed, the instruction for
- loading MAX_NAMES into the CX register remains exactly the same
- in all cases: simply MOV CX,MAX_NAMES.
-
-
- Equates to Built-In Symbols
-
- A86 allows you to define synonyms for any of the assembler
- reserved symbols, by EQUating an alternate name of your choosing,
- to that symbol. For example, suppose you were coding a source
- module that is to be incorporated into several different
- programs. In some programs, a certain variable will exist in the
- code segment. In others, it will exist in the stack segment. You
- want to address the variable in the common source module, but you
- don't know which segment override to use. The solution is to
- declare a synonym, QS, for the segment register. QS will be
- defined by each program: the code-segment program will have a QS
- EQU CS at the top of it; the stack-segment program will have QS
- EQU SS. The source module can use QS as an override, just as if
- it were CS or SS. The code would be, for example, QS MOV
- AL,VARNAME.
- 9-9
-
- The NIL Prefix
-
- A86 provides a mnemonic, NIL, that generates no code. NIL can be
- used as a prefix to another instruction (which will have no
- effect on that instruction), or it can appear by itself on a
- line. NIL is provided to extend the example in the previous
- section, to cover the possibility of no overrides. If your
- source module goes into a program that fits into 64K, so that all
- the segment registers have the same value, then code QS EQU NIL
- at the top of that program.
-
-
- Interrupt Equates
-
- A86 allows you to equate your own name to an INT instruction with
- a specific interrupt number. For example, if you place TRAP EQU
- INT 3 at the top of your program, you can use the name TRAP as a
- synonym for INT 3 (the debugger trap on the 8086).
-
-
- Duplicate Definitions
-
- A86 contains the unique feature of duplicate definitions. We
- have already discussed local symbols, which can be redefined to
- different values without restriction. Local symbols are the only
- symbols that can be redefined. However, any symbol can be
- defined more than once, as long as the symbol is defined to be
- the same value and type in each definition.
-
- This feature has two uses. First, it eases modular program
- development. For example, if two independently-developed source
- files both use the symbol ESC to stand for the ASCII code for
- ESCAPE, they can both contain the declaration ESC EQU 01B, with
- no problems if they are combined into the same program.
-
- The second use for this feature is assertion checking. Your
- deliberate redeclaration of a symbol name is an assertion that
- the value of the symbol has not changed; and you want the
- assembler to issue you an error message if it has changed.
- Example: suppose you have declared a table of options in your
- DATA segment; and you have another table of initial values for
- those options in your CODE segment. If you come back months
- later and add an option to your tables, you want to be reminded
- to update both tables in the same way. You should declare your
- tables as follows:
-
- DATA SEGMENT
- OPTIONS:
- .
- .
- OPT_COUNT EQU $-OPTIONS ; OPT_COUNT is the size of the table
-
- CODE SEGMENT
- OPT_INITS:
- .
- .
- OPT_COUNT EQU $-OPT_INITS ; second OPT_COUNT had better be the same!
- 9-10
-
- The = Directive
-
- Syntax: symbol-name = expression
- symbol-name = built-in-symbol
- symbol-name = INT n
-
- The equals sign directive is provided for compatibility. It is
- identical to the EQU directive, with one exception: if the first
- time a symbol appears in a program is in an = directive, that
- symbol will be taken as a local symbol. It can be redefined to
- other values, just like the generic local symbols (letter
- followed by digits) that A86 supports. (If you try to redefine an
- EQU symbol to a different value, you get an error message.) The =
- facility is most often used to define "assembler variables", that
- change value as the assembly progresses.
-
-
- The PROC Directive
-
- Syntax: name PROC NEAR
- name PROC FAR
- name PROC
-
- PROC is a directive provided for compatibility with Intel/IBM
- assemblers. I don't like PROC; and I recommend that you do not
- use it, even if you are programming for those assemblers.
-
- The idea behind PROC is to give the assembler a mechanism whereby
- it can decide for you what kind of RET instruction you should be
- providing. If you specify NEAR in your PROC directive, then the
- assembler will generate a near (same segment) return when it sees
- RET. If you specify FAR in your PROC directive, the assembler
- will generate a far RETF return (which will cause both IP and CS
- to be popped from the stack). If you simply leave well enough
- alone, and never code a PROC in your program, then RET will mean
- near return throughout your program.
-
- The reason I don't like PROC is because it is yet another attempt
- by the assembler to do things "behind your back". This goes
- against the reason why you are programming in assembly language
- in the first place, which is to have complete control over the
- code generated by your source program. It leads to nothing but
- trouble and confusion.
-
- Another problem with PROC is its verbosity. It replaces a simple
- colon, given right after the label it defines. This creates a
- visual clutter in the program, that makes the program harder to
- read.
-
- A86 provides an explicit RETF mnemonic so that you don't need to
- use PROC to distinguish between near and far return instructions.
- You can use RET for a near return and RETF for a far return.
- 9-11
-
- The ENDP Directive
-
- Syntax: [name] ENDP
-
- The only action A86 takes when it sees an ENDP directive is to
- return the assembler to its (sane) default state, in which RET is
- a near return.
-
- NOTE that this means that A86 does not support nested PROCs, in
- which anything but the innermost PROC has the FAR attribute. I'm
- sorry if I am blunt, but anybody who would subject their program
- to that level of syntactic clutter has rocks in their head.
-
-
- The LABEL Directive
-
- Syntax: name LABEL NEAR
- name LABEL FAR
- name LABEL BYTE
- name LABEL WORD
- name LABEL DWORD
- name LABEL QWORD
- name LABEL TBYTE
-
- LABEL is another directive provided for compatibility with
- Intel/IBM assemblers. A86 provides less verbose ways of
- specifying all the above LABEL forms, except for LABEL FAR.
-
- LABEL defines "name" to have the type given, and a value equal to
- the current output pointer. Thus, LABEL NEAR is synonymous with
- a simple colon following the name; and LABEL BYTE, LABEL WORD,
- LABEL DWORD, etc., are synonymous with DB, DW, DD, etc., with no
- operands.
-
- LABEL FAR does have a unique functionality, not found in other
- assemblers. It identifies "name" as a procedure that can be
- called from outside this program's code segment. Such procedures
- should have RETFs instead of RETs. Furthermore, I have provided
- the following feature, unique to A86: if you CALL the procedure
- from within your program, A86 will generate a PUSH CS instruction
- followed by a NEAR call to the procedure. Other assemblers will
- generate a FAR call, having the same functional effect; but the
- FAR call consumes more program space, and takes more time to
- execute.
-
- WARNING: you cannot use the above CALL feature as a forward
- reference; the LABEL FAR definition must precede any CALLs to it.
- This is unavoidable, since the assembler must assume that a CALL
- to an undefined symbol takes 3 program bytes. All assemblers
- will issue an error in this situation.
- 9-12
-
- The INCLUDE Directive
-
- A86 now allows the inclusion of alternate source files within the
- middle of a "parent" source file, via the INCLUDE directive.
- When you give the name INCLUDE followed by the name of a file,
- A86 will insert the contents of the named file into the assembly
- source stream, as if it were substituted for the INCLUDE line.
- There is no limit to the size of an INCLUDE file, and INCLUDEs
- may be nested (the file included may itself contain INCLUDE
- directives) to any level within reason. Parentheses are optional
- around the file name; if you don't give them, there must be at
- least one blank between the INCLUDE and the file name.
-
- If there is no file name whatever following the INCLUDE, A86 will
- perform an A86LIB library search (see Chapter 13), and INCLUDE
- all library files necessary to resolve all undefined symbols at
- the point of the INCLUDE. This provides an "in-file" equivalent
- to the pound-sign given on the invocation line.
-
-