Microsoft Programmer's Library 1.3

home *** CD-ROM | disk | FTP | other *** search

/ Microsoft Programmer's Library 1.3 / Microsoft_Programmers_Library.7z / MPL / masm / qapgrmg.txt < prev

Wrap

Text File | 2013-11-08 | 829.6 KB | 20,649 lines

Microsoft(R) QuickAssembler Programmer's Guide Version 2.01 ════════════════════════════════════════════════════════════════════════════ Microsoft(R) QuickAssembler Programmer's Guide Version 2.01 ════════════════════════════════════════════════════════════════════════════ Information in this document is subject to change without notice and does not represent a commitment on the part of Microsoft Corporation. The software described in this document is furnished under a license agreement or nondisclosure agreement. The software may be used or copied only in accordance with the terms of the agreement. It is against the law to copy the software on any medium except as specifically allowed in the license or nondisclosure agreement. No part of this manual may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, for any purpose without the express written permission of Microsoft. (C)Copyright Microsoft Corporation, 1989. All rights reserved. Simultaneously published in the U.S. and Canada. Printed and bound in the United States of America. Microsoft, MS, MS-DOS, GW-BASIC, QuickC, and XENIX are registered trademarks of Microsoft Corporation. IBM is a registered trademark of International Business Machines Corporation. Intel is a registered trademark of Intel Corporation. Document No. LN0114-201-R00-0689 Part No. 06792 10 9 8 7 6 5 4 3 2 1 ──────────────────────────────────────────────────────────────────────────── Table of Contents Introduction Chapter 1 The QuickAssembler Interface 1.1 Creating the Program 1.2 Building and Running a Program 1.3 Assembling from the Command Line 1.4 Choosing C or Assembler Defaults 1.5 Using the Quick Advisor (Help) 1.6 Debugging Assembly Code 1.6.1 Debugging .COM Files 1.6.2 Specifying Expressions 1.6.3 Tracing Execution 1.6.4 Modifying Registers and Flags 1.7 Viewing a Listing File Chapter 2 Introducing 8086 Assembly Language 2.1 Programming the 8086 Family 2.2 Instructions, Directives, and Operands 2.2.1 The Name Field 2.2.2 The Operation Field 2.2.3 The Operand Field 2.2.4 The Comment Field 2.2.5 Entering Numbers in Different Bases 2.2.6 Line-Continuation Character 2.3 8086-Family Instructions 2.3.1 Data-Manipulation Instructions 2.3.1.1 The MOV Instruction 2.3.1.2 The ADD Instruction 2.3.1.3 The SUB Instruction 2.3.1.4 The INC and DEC Instructions 2.3.1.5 The AND Instruction 2.3.1.6 The MUL Instruction 2.3.2 Control-Flow Instructions 2.3.2.1 The JMP Instruction 2.3.2.2 The CMP Instruction 2.3.2.3 The Conditional Jump Instructions 2.4 Declaring Simple Data Objects 2.5 8086-Family Registers 2.5.1 The General-Purpose Registers 2.5.1.1 The AX Register 2.5.1.2 The BX Register 2.5.1.3 The CX Register 2.5.1.4 The DX Register 2.5.2 The Index Registers 2.5.3 The Pointer Registers 2.5.3.1 The BP Register 2.5.3.2 The SP Register 2.5.3.3 The IP Register 2.5.4 The Flags Register 2.6 Addressing Modes 2.6.1 Immediate Operands 2.6.2 Register Operands 2.6.3 Direct Memory Operands 2.6.4 Indirect Memory Operands 2.7 Segmented Addressing and Segment Registers Chapter 3 Writing Assembly Modules for C Programs 3.1 A Skeleton for Procedure Modules 3.1.1 The .MODEL Directive 3.1.2 The .CODE Directive 3.1.3 The PROC Directive 3.1.4 The ENDP and END Statements 3.2 Instructions Used in This Chapter 3.3 Decimal Conversion Example 3.4 Decimal Conversion with Far Data Pointers 3.4.1 Writing a Model-Independent Procedure 3.4.2 Accessing Far Data through ES 3.5 Hexadecimal Conversion Example Chapter 4 Writing Stand-Alone Assembly Programs 4.1 A Skeleton for Stand-Alone Programs 4.1.1 The .MODEL Directive 4.1.2 The .STACK, .CODE, and .DATA Directives 4.1.3 The .STARTUP Directive 4.2 Instructions Used in This Chapter 4.3 A Program That Says Hello 4.4 Inside the Stack Segment 4.5 Inside the Data Segment 4.6 Inside the Code Segment 4.7 Making the Program Repeat Itself 4.8 Creating .COM Files 4.9 Creating .COM Files with Full Segment Definitions Chapter 5 Defining Segment Structure 5.1 Simplified Segment Directives 5.1.1 Understanding Memory Models 5.1.2 Specifying DOS Segment Order 5.1.3 Defining Basic Attributes of the Module 5.1.4 Defining Simplified Segments 5.1.4.1 How to Use Simplified Segments 5.1.4.2 How Simplified Segments Are Implemented 5.1.5 Using Predefined Segment Equates 5.1.6 Simplified Segment Defaults 5.1.7 Default Segment Names 5.2 Full Segment Definitions 5.2.1 Setting the Segment-Order Method 5.2.2 Defining Full Segments 5.2.2.1 Controlling Alignment with Align Type 5.2.2.2 Defining Segment Combinations with Combine Type 5.2.2.3 Controlling Segment Structure with Class Type 5.3 Defining Segment Groups 5.4 Associating Segments with Registers 5.5 Initializing Segment Registers 5.5.1 Initializing the CS and IP Registers 5.5.2 Initializing the DS Register 5.5.3 Initializing the SS and SP Registers 5.5.4 Initializing the ES Register 5.6 Nesting Segments Chapter 6 Defining Constants, Labels, and Variables 6.1 Constants 6.1.1 Integer Constants 6.1.1.1 Specifying Integers with Radix Specifiers 6.1.1.2 Setting the Default Radix 6.1.2 Packed Binary Coded Decimal Constants 6.1.3 Real-Number Constants 6.1.4 String Constants 6.1.5 Determining Floating-Point Format 6.2 Assigning Names to Symbols 6.3 Using Type Specifiers 6.4 Defining Code Labels 6.4.1 Near-Code Labels 6.4.2 Anonymous Labels 6.4.3 Procedure Labels 6.4.4 Code Labels Defined with the LABEL Directive 6.5 Defining and Initializing Data 6.5.1 Variables 6.5.1.1 Integer Variables 6.5.1.2 Binary Coded Decimal Variables 6.5.1.3 String Variables 6.5.1.4 Real-Number Variables 6.5.2 Arrays and Buffers 6.5.3 Labeling Variables 6.5.4 Pointer Variables 6.6 Setting the Location Counter 6.7 Aligning Data Chapter 7 Using Structures and Records 7.1 Structures 7.1.1 Declaring Structure Types 7.1.2 Defining Structure Variables 7.1.3 Using Structure Operands 7.2 Records 7.2.1 Declaring Record Types 7.2.2 Defining Record Variables 7.2.3 Using Record Operands and Record Variables 7.2.4 Record Operators 7.2.4.1 The MASK Operator 7.2.4.2 The WIDTH Operator 7.2.5 Using Record-Field Operands Chapter 8 Creating Programs from Multiple Modules 8.1 Declaring Symbols Public 8.2 Declaring Symbols External 8.3 Using Multiple Modules 8.4 Declaring Symbols Communal 8.5 Specifying Library Files Chapter 9 Using Operands and Expressions 9.1 Using Operands with Directives 9.2 Using Operators 9.2.1 Calculation Operators 9.2.1.1 Arithmetic Operators 9.2.1.2 Structure-Field-Name Operator 9.2.1.3 Index Operator 9.2.1.4 Shift Operators 9.2.1.5 Bitwise Logical Operators 9.2.2 Relational Operators 9.2.3 Segment-Override Operator 9.2.4 Type Operators 9.2.4.1 PTR Operator 9.2.4.2 SHORT Operator 9.2.4.3 THIS Operator 9.2.4.4 HIGH and LOW Operators 9.2.4.5 SEG Operator 9.2.4.6 OFFSET Operator 9.2.4.7 .TYPE Operator 9.2.4.8 TYPE Operator 9.2.4.9 LENGTH Operator 9.2.4.10 SIZE Operator 9.2.5 Operator Precedence 9.3 Using the Location Counter 9.4 Using Forward References 9.4.1 Forward References to Labels 9.4.2 Forward References to Variables 9.5 Strong Typing for Memory Operands Chapter 10 Assembling Conditionally 10.1 Using Conditional-Assembly Directives 10.1.1 Testing Expressions with IF and IFE Directives 10.1.2 Testing the Pass with IF1 and IF2 Directives 10.1.3 Testing Symbol Definition with IFDEF and IFNDEF Directi 10.1.4 Verifying Macro Parameters with IFB and IFNB Directives 10.1.5 Comparing Macro Arguments with IFIDN and IFDIF Directiv 10.1.6 ELSEIF Directives 10.2 Using Conditional-Error Directives 10.2.1 Generating Unconditional Errors with .ERR, .ERR1, and . Directives 10.2.2 Testing Expressions with .ERRE or .ERRNZ Directives 10.2.3 Verifying Symbol Definition with .ERRDEF and .ERRNDEF Directives 10.2.4 Testing for Macro Parameters with .ERRB and .ERRNB Directives 10.2.5 Comparing Macro Arguments with .ERRIDN and .ERRDIF Directives Chapter 11 Using Equates, Macros, and Repeat Blocks 11.1 Using Equates 11.1.1 Redefinable Numeric Equates 11.1.2 Nonredefinable Numeric Equates 11.1.3 String Equates 11.1.4 Predefined Equates 11.2 Using Macros 11.2.1 Defining Macros 11.2.2 Calling Macros 11.2.3 Using Local Symbols 11.2.4 Exiting from a Macro 11.3 Text-Macro String Directives 11.3.1 The SUBSTR Directive 11.3.2 The CATSTR Directive 11.3.3 The SIZESTR Directive 11.3.4 The INSTR Directive 11.3.5 Using String Directives Inside Macros 11.4 Defining Repeat Blocks 11.4.1 The REPT Directive 11.4.2 The IRP Directive 11.4.3 The IRPC Directive 11.5 Using Macro Operators 11.5.1 Substitute Operator 11.5.2 Literal-Text Operator 11.5.3 Literal-Character Operator 11.5.4 Expression Operator 11.5.5 Macro Comments 11.6 Using Recursive, Nested, and Redefined Macros 11.6.1 Using Recursion 11.6.2 Nesting Macro Definitions 11.6.3 Nesting Macro Calls 11.6.4 Redefining Macros 11.6.5 Avoiding Inadvertent Substitutions 11.7 Managing Macros and Equates 11.7.1 Using Include Files 11.7.2 Purging Macros from Memory Chapter 12 Controlling Assembly Output 12.1 Sending Messages to the Standard Output Device 12.2 Controlling Page Format in Listings 12.2.1 Setting the Listing Title 12.2.2 Setting the Listing Subtitle 12.2.3 Controlling Page Breaks 12.2.4 Naming the Module 12.3 Controlling the Contents of Listings 12.3.1 Suppressing and Restoring Listing Output 12.3.2 Controlling Listing of Conditional Blocks 12.3.3 Controlling Listing of Macros Chapter 13 Loading, Storing, and Moving Data 13.1 Transferring Data 13.1.1 Copying Data 13.1.2 Exchanging Data 13.1.3 Looking Up Data 13.1.4 Transferring Flags 13.2 Converting between Data Sizes 13.2.1 Extending Signed Values 13.2.2 Extending Unsigned Values 13.3 Loading Pointers 13.3.1 Loading Near Pointers 13.3.2 Loading Far Pointers 13.4 Transferring Data to and from the Stack 13.4.1 Pushing and Popping 13.4.2 Using the Stack 13.4.3 Saving Flags on the Stack 13.4.4 Saving All Registers on the Stack 13.5 Transferring Data to and from Ports Chapter 14 Doing Arithmetic and Bit Manipulations 14.1 Adding 14.1.1 Adding Values Directly 14.1.2 Adding Values in Multiple Registers 14.2 Subtracting 14.2.1 Subtracting Values Directly 14.2.2 Subtracting with Values in Multiple Registers 14.3 Multiplying 14.4 Dividing 14.5 Calculating with Binary Coded Decimals 14.5.1 Unpacked BCD Numbers 14.5.2 Packed BCD Numbers 14.6 Doing Logical Bit Manipulations 14.6.1 AND Operations 14.6.2 OR Operations 14.6.3 XOR Operations 14.6.4 NOT Operations 14.7 Shifting and Rotating Bits 14.7.1 Multiplying and Dividing by Constants 14.7.2 Moving Bits to the Least-Significant Position 14.7.3 Adjusting Masks 14.7.4 Shifting Multiword Values Chapter 15 Controlling Program Flow 15.1 Jumping 15.1.1 Jumping Unconditionally 15.1.2 Jumping Conditionally 15.1.2.1 Comparing and Jumping 15.1.2.2 Jumping Based on Flag Status 15.1.2.3 Testing Bits and Jumping 15.2 Looping 15.3 Using Procedures 15.3.1 Calling Procedures 15.3.2 Defining Procedures 15.3.3 Passing Arguments on the Stack 15.3.4 Declaring Parameters with the PROC Directive 15.3.5 Using Local Variables 15.3.6 Creating Locals Automatically 15.3.7 Variable Scope 15.3.8 Setting Up Stack Frames 15.4 Using Interrupts 15.4.1 Calling Interrupts 15.4.2 Defining and Redefining Interrupt Routines 15.5 Checking Memory Ranges Chapter 16 Processing Strings 16.1 Setting Up String Operations 16.2 Moving Strings 16.3 Searching Strings 16.4 Comparing Strings 16.5 Filling Strings 16.6 Loading Values from Strings 16.7 Transferring Strings to and from Ports Chapter 17 Calculating with a Math Coprocessor 17.1 Coprocessor Architecture 17.1.1 Coprocessor Data Registers 17.1.2 Coprocessor Control Registers 17.2 Emulation 17.3 Using Coprocessor Instructions 17.3.1 Using Implied Operands in the Classical-Stack Form 17.3.2 Using Memory Operands 17.3.3 Specifying Operands in the Register Form 17.3.4 Specifying Operands in the Register-Pop Form 17.4 Coordinating Memory Access 17.5 Transferring Data 17.5.1 Transferring Data to and from Registers 17.5.1.1 Real Transfers 17.5.1.2 Integer Transfers 17.5.1.3 Packed BCD Transfers 17.5.2 Loading Constants 17.5.3 Transferring Control Data 17.6 Doing Arithmetic Calculations 17.7 Controlling Program Flow 17.7.1 Comparing Operands to Control Program Flow 17.7.1.1 Compare 17.7.1.2 Compare and Pop 17.7.2 Testing Control Flags after Other Instructions 17.8 Using Transcendental Instructions 17.9 Controlling the Coprocessor Chapter 18 Controlling the Processor 18.1 Controlling Timing and Alignment 18.2 Controlling the Processor 18.3 Processor Directives Appendix A Mixed-Language Mechanics A.1 Writing the Assembly Procedure A.1.1 Setting Up the Procedure A.1.2 Entering the Procedure A.1.3 Allocating Local Data (Optional) A.1.4 Preserving Register Values A.1.5 Accessing Parameters A.1.6 Returning a Value (Optional) A.1.7 Exiting the Procedure A.2 Calls from Modules Using C Conventions A.3 Calls from Non-C Modules A.4 Calling High-Level Languages from Assembly Language A.5 Using Full Segment Definitions Appendix B Using Assembler Options with QCL B.1 Specifying the Segment-Order Method B.2 Checking Code for Tiny Model B.3 Selecting Case Sensitivity B.4 Defining Assembler Symbols B.5 Displaying Error Lines on the Screen B.6 Creating Code for a Floating-Point Emulator B.7 Creating Listing Files B.8 Enabling One-Pass Assembly B.9 Listing All Lines of Macro Expansions B.10 Creating a Pass 1 Listing B.11 Specifying an Editor-Oriented Listing B.12 Suppressing Tables in the Listing File B.13 Adding a Line-Number Index to the Listing B.14 Listing False Conditionals B.15 Controlling Display of Assembly Statistics B.16 Setting the Warning Level Appendix C Reading Assembly Listings C.1 Reading Code in a Listing C.2 Reading a Macro Table C.3 Reading a Structure and Record Table C.4 Reading a Segment and Group Table C.5 Reading a Symbol Table C.6 Reading Assembly Statistics C.7 Reading a Pass 1 Listing Index ──────────────────────────────────────────────────────────────────────────── Introduction If you're a C programmer who has been wanting to try out the full power of assembly language, this is the product for you. Microsoft(R) QuickC(R) with QuickAssembler is a package you install along with Microsoft QuickC Version 2.0 in order to create a single powerful environment in which you can develop C, assembly, and mixed-language programs. What's more, QuickAssembler is an integrated environment, containing tools for editing, assembling, compiling, and linking. Integrated tools help you achieve faster development of assembly-language programs. Each MS-DOS(R) and IBM(R) PC-DOS computer is driven by one of the processors in the 8086 family. A processor is the central motor of a computer. It responds to its own numeric language, called "machine code." Assembly language is very close to machine code, but it lets you use meaningful keywords and variable names instead of difficult-to-remember numeric codes. As a result, assembly language is convenient to use, but gives you the ultimate in ability to control hardware and optimize code. To support the low-level operations of assembly language, QuickAssembler expands the general power of the QuickC environment. Increased debugging capabilities let you change flag settings and modify registers──including registers of the 8087 math coprocessor. Furthermore, the Quick Advisor (the on-line Help system) is expanded to provide help on QuickAssembler keywords as well as DOS and ROM-BIOS services. A Note about Operating-System Terms Microsoft documentation uses the term "OS/2" to refer to the OS/2 system── Microsoft Operating System/2 (MS(R) OS/2) and IBM OS/2. Similarly, the term "DOS" refers to both the MS-DOS and IBM Personal Computer DOS operating systems. The name of a specific operating system is used when it is necessary to note features unique to that system. General Features QuickAssembler does not replace the QuickC in-line assembler, which you can continue to use inside .C files. The joint QuickC/QuickAssembler environment puts both QuickAssember and the in-line assembler at your disposal. But Microsoft QuickAssembler supports a number of features beyond those supported by the in-line assembler: ■ You can write stand-alone assembly programs. These programs begin and end with assembly code and do not include the C start-up code. Unlike programs written from within C modules, useful stand-alone assembly programs can be 1K (kilobyte) or even smaller. ■ You can use the assembler's rich set of macro-definition capabilities, which go far beyond the macro capabilities supported by C. An assembly-language macro can handle variable parameter lists, recursion, and repeated operations. These macros are roughly as powerful and flexible as procedure calls, but execute faster. ■ Your assembly modules can be shared by many different programs. Since an assembly-language module is in its own file, you can write the module once and link it to any program you want. ■ QuickAssembler is a full implementation of 8086 assembly language. You can use the full set of the Microsoft Macro Assembler 5.1 directives and operators. In addition, QuickAssembler provides the best set of keywords yet available for simplifying tedious programming tasks, such as initializing registers at the beginning of a program or determining how to access parameters on the stack. (Part 1 of this manual focuses on the use of these keywords.) QuickAssembler for QuickC is a DOS-based product, and it does not include the following extensions to 8086 assembly language: ■ 80386 extended registers and special instructions ■ 80387 extended instructions ■ OS/2 protected-mode operation QuickAssembler does support the 80286 extended instruction set, as well as the 8087 and 80287 coprocessors. The 80386 processor can run all QuickAssembler programs; the only limitation is that QuickAssembler does not support extended capabilities of the 80386. The Microsoft Macro Assembler supports 80386 extended features and development of protected-mode applications. System Requirements In addition to a computer with one of the 8086-family processors, you must have Version 2.1 or later of the MS-DOS or IBM PC-DOS operating system. You can also run QuickAssembler in the 3.x compatibility box of OS/2 systems. Your computer system must have approximately 512K of memory. A hard-disk setup is strongly recommended. To enable the use of QuickAssembler, you should first choose Full Menus from the Options menu. ────────────────────────────────────────────────────────────────────────── NOTE The 8086 family is a set of processors that all support the same basic instruction set. This family includes the 8088, 8086, 80188, 80186, 80286, and 80386 chips. All of these processors support the entire instruction set of the 8086 itself; some support additional instructions. Rather than list the entire set of chips, this manual often discusses the core instruction set by referring only to the 8086. ────────────────────────────────────────────────────────────────────────── Installing QuickAssembler If you purchased QuickC and QuickAssembler together, the installation procedure described in Up and Running automatically installs both products. A few of the questions shown in that booklet are reworded in the install program to make more sense for the joint QuickC/QuickAssembler installation. If you purchased QuickAssembler separately, run the installation program on the QuickAssembler distribution disks. The first screen asks you the following questions: Source of assembler files [A:]: Installing on a hard disk drive [Y]: Copy QuickAssembler documentation files [Y]: Copy sample Assembler programs [N]: Do you want to change any of the above options? [N] As with the QuickC installation program, the default responses are indicated in brackets ([]). Each of these questions is accompanied by an explanation at the bottom of the screen. To accept a default response, press ENTER. If you enter an incorrect response, just answer no (N) to the last question. The second screen asks you the following questions: Directory for QuickC executable files [C:\QC2\BIN]: Directory for Sample files [C:\QC2\SAMPLES]: Do you want to change any of the above options? [N] The QuickAssembler installation program replaces some of the existing QuickC files. QuickAssembler must be installed in the directory that currently contains QC.EXE. Make sure you enter the location of your current QuickC executable files. If you're not sure, press CTRL+C to stop the installation and examine your setup. Getting Information about Assembly Language The combined paper and on-line documentation with QuickAssembler gives you a complete reference to the language. This manual provides three basic kinds of information: ■ Part 1, "Introducing QuickAssembler," provides a basic introduction to programming in assembly language. Chapter 1 describes how the interface changes when you install QuickAssembler. Chapter 2 gives a general background to 8086 architecture and assembly-language concepts. Chapters 3 and 4 demonstrate how to use special QuickAssembler keywords to simplify programming. Even if you have used assembly language before, you should take a look at these chapters. ■ Parts 2 ("Using Directives") and 3 ("Using Instructions") give a reference to the use of directives and instructions. This material is much less tutorial than Part 1, but it does illustrate the use of each directive and instruction in context. ■ The appendixes explain low-level mixed-language techniques, the use of assembly options with the QCL driver, and how to read listing files. This manual does not teach systems programming or advanced programming techniques. Even with the tutorial material provided in this manual, you may want to purchase other books on assembly language, such as the ones listed in the next section. In addition, this manual assumes you understand certain basic concepts of programming, such as modules, variables, and pointers. If you need more background in one of these topics, you should first read the appropriate sections in C For Yourself. Part 1 of this manual often explains concepts by comparing a language feature to C. The Quick Advisor (the on-line Help system) is an integral part of the overall documentation. As explained in Section 1.5, "Using the Quick Advisor (Help)," QuickAssembler provides help on all keywords──in particular, you get instant reference information on each instruction, including timing, encoding, and flag settings. The Help Contents and Index screens also provide information on each DOS service. Books on Assembly Language The following books may be useful in learning to program in assembly language: Duncan, Ray. Advanced MS-DOS. Redmond, WA: Microsoft Corporation, 1986. An intermediate book on writing C and assembly-language programs that interact with MS-DOS (includes DOS and BIOS function descriptions) Jourdain, Robert. Programmer's Problem Solver for the IBM PC, XT and AT. New York: Brady Communications Company, Inc., 1986. Reference of routines and techniques for interacting with hardware devices through DOS, BIOS, and ports (high-level routines in BASIC and low- or medium-level routines in assembler) Lafore, Robert. Assembly Language Primer for the IBM PC & XT. New York: Plume/Waite, 1984. An introduction to assembly language, including some information on DOS function calls and IBM-type BIOS Metcalf, Christopher D., and Sugiyama, Marc B. COMPUTE!'s Beginner's Guide to Machine Language on the IBM PC & PCjr. Greensboro, NC: COMPUTE! Publications, Inc., 1985. Beginning discussion of assembly language, including information on the instruction set and MS-DOS function calls Microsoft MS-DOS Programmer's Reference. Redmond, WA: Microsoft Press, 1986, 1987. Reference manual for MS-DOS Morgan, Christopher, and the Waite Group. Bluebook of Assembly Routines for the IBM PC. New York: New American Library, 1984. Sample assembly routines that can be integrated into assembly or high-level-language programs Norton, Peter. The Peter Norton Programmer's Guide to the IBM PC. Redmond, WA: Microsoft Press, 1985. Information on using IBM-type BIOS and MS-DOS function calls Scanlon, Leo J. IBM PC Assembly Language: A Guide for Programmers. Bowie, MD: Robert J. Brady Co., 1983. An introduction to assembly language, including information on DOS function calls Schneider, Al. Fundamentals of IBM PC Assembly Language. Blue Ridge Summit, PA: Tab Books Inc., 1984. An introduction to assembly language, including information on DOS function calls These books are listed for your convenience only. Microsoft Corporation does not endorse these books (with the exception of those published by Microsoft) or recommend them over others on the same subjects. Document Conventions The following document conventions are used throughout this manual: Example of Description Convention ────────────────────────────────────────────────────────────────────────── SAMPLE2.ASM Uppercase letters indicate file names, segment names, registers, and terms used at the DOS-command level. .MODEL Boldface type indicates assembly-language directives, instructions, type specifiers, and predefined equates, as well as keywords in other programming languages. placeholders Italic letters indicate placeholders for information you must supply, such as a file name. Italics are also occasionally used for emphasis in the text. target This font is used to indicate example programs, user input, and screen output. SHIFT Names of keys on the keyboard appear in small capital letters. Notice that a plus (+) indicates a combination of keys. For example, CTRL+E means to hold down the CTRL key while pressing the E key. [[argument ]] Items inside double square brackets are optional. {register | memory} Braces and a vertical bar indicate a choice between two or more items. You must choose one of the items unless double square brackets surround the braces. Repeating Three dots following an item indicate that more items elements... having the same form may appear. Program A column of three dots tells you that part of a . program has been intentionally omitted. . . Fragment "processor flag" The first time a new term is defined, it is enclosed in quotation marks. Color Graphics The first time an acronym is used, it is spelled out. Adapter (CGA) Getting Assistance or Reporting Problems If you need help or feel you have discovered a problem in the software, please provide the following information to help us locate the problem: ■ The version of DOS you are running (use the DOS VER command) ■ Your system configuration (the type of machine you are using, its total memory, and its total free memory at assembler execution time, as well as any other information you think might be useful) ■ The assembly command line used (or the link command line if the problem occurred during linking) ■ Any object files or libraries you linked with if the problem occurred at link time If your program is very large, please try to reduce its size to the smallest possible program that still produces the problem. Use the Product Assistance Request form at the back of this manual to send this information to Microsoft. If you have comments or suggestions regarding any of the manuals accompanying this product, please indicate them on the Document Feedback card at the back of this manual. If you are not already a registered QuickAssembler owner, you should fill out and return the Registration Card. This enables Microsoft to keep you informed of updates and other information about the assembler. ──────────────────────────────────────────────────────────────────────────── PART 1: Using Assembler Programs Part 1 of the Programmer's Guide (comprising Chapters 1-4) will help you start using assembly language quickly. Chapter 1 summarizes all the differences between the standard QuickC interface and the expanded QuickC/QuickAssembler interface. Read this chapter to learn how to enter, assemble, and run an assembly-language program. Read Chapter 2 if you are new to 8086 assembly language or need to review basic concepts. Chapter 2 explains the architecture of 8086-family processors, as well as how to write simple code and data statements. Whether or not you're new to assembly language, you'll want to read Chapters 3 and 4, which show the use of QuickAssembler's simplified keywords in useful examples. These keywords make programming easier. ──────────────────────────────────────────────────────────────────────────── Chapter 1: The QuickAssembler Interface After you install Microsoft QuickC with QuickAssembler, you'll have a single environment for both compiling and assembling. You can create C programs, assembly-language programs, and programs that combine both languages. The environment completely supports the standard QuickC features, including all editing commands as well as mouse, keyboard, and menu techniques. This manual assumes you have read QuickC Up and Running and have used the on-line Help system to learn how to use each menu. Refer to these sources of information for basic help on using the interface. The combined QuickC/QuickAssembler interface provides some new menu selections and dialog boxes to support development of assembly-language programs. This chapter describes the new features, focusing on areas where the interface adds new functionality: creating a program, building a program, getting help, debugging, and viewing a listing file. To enable all the features described in this chapter, you should first choose Full Menus from the Options menu if you are not already using full menus. 1.1 Creating the Program Start the environment with the QC command, regardless of whether you're creating a C or assembly-language source file. You can type QC by itself or QC followed by the name of a file. By default, QC assumes that a file name on the command line has a .C extension. You'll learn how to change this behavior later (by choosing Display from the Options menu), but for now, make sure you include the .ASM file extension when you want to create an assembly-language module: QC SAMPLE.ASM If the file is new, the QuickC/QuickAssembler environment asks you if you would like to create the file. Once inside the QuickC/QuickAssembler environment, you can enter a program by using all the QuickC editing commands. You can get started by entering the following stand-alone assembly program. By default, QuickAssembler is not case sensitive (except for external symbols), so you can enter statements as uppercase or lowercase. .MODEL small .STACK .CODE .STARTUP mov ah,2 mov dl,7 int 21h mov ax,4c00h int 21h END Enter the program above in a file with a .ASM extension. No other modules and no special assembly or link flags are required. When run, the program beeps and exits. For now, you may just want to run the program to see how the QuickC/Quick-Assembler environment works. However, you can read the rest of this section to get a brief explanation of why the program works. The four statements are directives──nonexecutable statements that give basic structure to the program by declaring a memory model, stack segment, and code segment. The next five statements perform the actions of the program. The first three set up a call to a DOS function that prints the beep character. (The QuickAssembler Advisor, which you access through the Help menu, provides information on each DOS function.) The first three statements are shown below, with comments: mov ah,2 ; Move 2 to AH (select Print function) mov dl,7 ; Move 7 to DL (select Beep character) int 21h ; Call DOS function The next two statements, shown below with comments, call DOS to exit gracefully. Unlike C programs, assembly-language programs must make an explicit function call to exit, or else cause the processor to execute meaningless instructions beyond the end of the program. mov ax,4c00h ; Move 4c00h to AX (select Exit ; function and 0 return code) int 21h ; Call DOS function The last statement ends the module. 1.2 Building and Running a Program Once inside the QuickC/QuickAssembler environment, you build an assembly-language program the same way you build a C program. Choose the Go command from the Run menu, or press the F5 key. The environment assembles and links the program if it needs to be built. Then, if there are no errors, it executes the program. You can also assemble a program by using the Make menu. The Compile File command assembles your file rather than compiling it, assuming the current file has a .ASM extension. To help you create assembly-language programs, the QuickC/QuickAssembler interface adds the following extensions to QuickC: ■ A program list can now have .ASM files as well as .C, .OBJ, and .LIB files if you work with multiple modules. ■ The Make dialog box from the Options menu has a new option button: Assembler Flags. ■ The Assembler Flags dialog box lets you control how .ASM files are assembled. If your program has multiple modules, you can add .ASM files to the program list as well as other kinds of files. When you build the program, the environment compiles each .C file module that needs to be built and assembles each .ASM module that needs to be built. For example, the program list in Figure 1.1 creates a mixed-language program with both C and assembly-language source files. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 1.2 of the manual │ └────────────────────────────────────────────────────────────────────────┘ The environment sets the default file extension by looking at the extension of the last file loaded. If the last file loaded had a .ASM extension, the File List field now displays all the .ASM files for the current directory. If the last file loaded had a .C extension, the File List field displays all .C files. You can alter this behavior by choosing Display from the Options menu, as explained in Section 1.4, "Choosing C or Assembler Defaults." In any case, you can always control which files are displayed by entering a wildcard expression, such as *.asm, in the File Name field. The environment lets you set assembler options as well as compiler options. When you open the Options menu and choose Make, the dialog box shown in Figure 1.2 appears. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 1.2 of the manual │ └────────────────────────────────────────────────────────────────────────┘ This dialog box contains one new field: Assembler Flags. When you choose this field, a new dialog box, shown in Figure 1.3, appears. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 1.2 of the manual │ └────────────────────────────────────────────────────────────────────────┘ By setting flags in the Assembler Flags dialog box, you control the action of the assembler whenever it builds a program. These settings have no effect on .C modules, but do affect how each .ASM module is assembled. This dialog box contains a Debug Flags section, which has options that apply only to Debug builds, and a Global Flags section, which has options that apply to every build. Choose the Help button for an explanation of each option. ────────────────────────────────────────────────────────────────────────── NOTE You control the type of build operation (Debug or Release) by choosing the appropriate option button in the dialog box shown in Figure 1.2. You can return to that dialog box by choosing the OK or Cancel command button. By choosing Debug (the default), you can use all of the QuickC debugging commands while running the program. By choosing Release, you produce a program that cannot be debugged but is somewhat smaller. ────────────────────────────────────────────────────────────────────────── The Custom Flags section lets you enter additional options. In the three Custom Flags text boxes, you can type any of the assembly options accepted by the QCL driver. See Appendix B for a description of these options. The next section describes how to use the QCL driver to assemble programs. 1.3 Assembling from the Command Line You can run QuickAssembler from the command line, just as you can run QuickC. One utility, QCL, invokes both the assembler and compiler. You can even use it to compile, assemble, and link mixed-language programs in one step. However, make sure you use the version of QCL copied during QuickAssembler installation. If you type a file name that has a .C extension, QCL invokes the C compiler. For example, the following command compiles and links the file SAMPLE1.C: QCL SAMPLE1.C If you type a file name that has a .ASM extension, QCL invokes the QuickAssembler. For example, the following command assembles and links the file SAMPLE2.ASM: QCL SAMPLE2.ASM In any case, QCL links all resulting object files to create a .EXE file, unless you specify /c on the command line. (You can also create a .COM file if the program is written entirely in assembly language.) For example, the following command compiles SAMPLE1.C and assembles SAMPLE2.ASM, but does not link the resulting object files: QCL /c SAMPLE1.C /Cl SAMPLE2.ASM As always, you can specify .LIB files and .OBJ files on the QCL command line. A file with no extension is assumed to have a .OBJ extension by default. For example, the following QCL command compiles M1.C, assembles M2.ASM (with lowercase symbols preserved), and links M1.OBJ, M2.OBJ, and M3.OBJ. Finally, QCL searches M4.LIB for any unresolved references. QCL /Cx M1.C M2.ASM M3 M4.LIB You can specify a number of QuickAssembler options, in addition to the ones provided specifically for C. See Appendix B, "Using Assembler Options with QCL," for a description of all these options. 1.4 Choosing C or Assembler Defaults At all times, you can use the QuickC/QuickAssembler environment to create either C modules or assembly-language modules. However, there are some details of operation that make it a little easier to work with one language or another. For example, one consideration is whether the dialog box starts by displaying all the C files in the directory (*.c) or all the assembly-language files (*.asm) when you choose the Open command from the File menu. You can control this behavior by choosing Display from the Options menu. Figure 1.4 shows the dialog box that appears. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 1.4 of the manual │ └────────────────────────────────────────────────────────────────────────┘ In the Language section of this dialog box, select either C, Assembler, or Auto. The Auto selection uses C or Assembler defaults, depending on what file was last loaded into the active window. For example, if you load the PROG.ASM file into the source window, all the defaults (described below) change to assembly-language settings. ────────────────────────────────────────────────────────────────────────── NOTE When you first use QuickAssembler, the environment starts up in Auto mode. Thereafter, it looks at the settings in QC.INI to determine what mode to start in; this feature has the effect of saving display-mode settings between sessions. ────────────────────────────────────────────────────────────────────────── The following items change when the display mode changes──either because you change the mode manually or because you are in Auto mode and load a different kind of file: ■ For commands on the File menu, the default file name changes to *.c or *.asm. ■ The Include command on the View menu responds to .H files if the display mode is C, or .INC files if the display mode is Assembler. ■ The Index and Contents items from the Help menu bring up lists of topics for either C or Assembly, as determined by the display mode. Auto display mode assumes C defaults until you load a .ASM file. When you start the environment with the QC command, QC assumes that file names on the command line have .C extensions, unless the environment is in Assembler display mode. 1.5 Using the Quick Advisor (Help) QuickAssembler extends the number of topics you can get information on, and updates QCENV.HLP so you can get context-sensitive help on the new menu items and dialog boxes. In addition, you still continue to get help on all of the C-language topics. The new topics, added for use with assembly language, are shown below: ■ QuickAssembler instructions ■ QuickAssembler directives and operators ■ DOS and ROM-BIOS services You can get help on assembly-language topics by using one of two different methods: 1. Topical Help (press F1) 2. The Help menu At all times, the expanded environment provides topical Help for both assembler and C keywords. Place the cursor on the keyword, then press F1. You can also get topical Help by moving the mouse cursor to the desired word and clicking the Right mouse button. The display mode (described in the previous section) determines whether C help files or assembly help files are searched first. ────────────────────────────────────────────────────────────────────────── NOTE If the keyword starts with a dot (.), do not place the cursor on the dot or click on the dot to get topical Help. Place the cursor on the keyword or click on the keyword. ────────────────────────────────────────────────────────────────────────── QuickAssembler keywords include instructions, directives, and operators. Chapter 2, "Introducing 8086 Assembly Language" provides information on each of these concepts. An "instruction" is a specific action that the processor executes. Instructions are the primary building blocks of an assembly-language program. The Help screens on instructions are particularly useful, because they provide detailed information on timing, syntax, and processor flags. This manual features a topical discussion of instructions, but provides only limited information on timing and flags. To write the most efficient assembly-language programs, you should refer often to the on-line Help for instructions. To get help on DOS or ROM-BIOS services, select Contents or Index from the Help menu. These menu items give you help on assembly-language topics rather than C topics whenever the display mode (described in the previous section) is set to Assembler. The Help system offers other paths to get to information on DOS and BIOS functions. Move the mouse cursor to an interrupt number (such as 21H or 33) and click the Right mouse button, or move the cursor to the number and press F1. The Help system responds by showing a screen listing of all the functions accessed through that interrupt number. You can then go to the specific Help screen you want. You can also get help on interrupt functions by selecting context-sensitive help for the INT keyword. You call these DOS and BIOS functions by using the INT instruction, as described in Chapter 4. These services perform basic input and output functions for you, giving you access to DOS and to hardware. By default, the Smart Help display option is on. This option makes the system more flexible by ignoring the presence or absence of a leading underscore (_) in front of a name. Consequently, pressing F1 while on _printf gives you help for the printf function. 1.6 Debugging Assembly Code You can run a Debug build by choosing Debug in the dialog box opened by the Options menu's Make command. Debug is the default setting, so you probably won't need to choose it. You can use all of QuickC's debugging commands with programs written in assembly language. But keep in mind these considerations: ■ You must use an extra file with a .DBG file extension to debug programs in .COM format. ■ You must use C syntax to specify expressions to watch or modify, even when you debug assembly code. In addition, you can use the BY, WO, and DW memory operators, register names, and the colon (:) operator. The colon operator helps to specify segmented addresses. ■ When you trace execution of an assembly-language module, the behavior of the environment changes. Screen swapping is turned on by default, and the first line of code is never highlighted. ■ You can alter flag values and registers from within the environment. Sections 1.6.1-1.6.4 discuss each debugging feature in turn. 1.6.1 Debugging .COM Files Section 4.8, "Creating .COM Files," explains how to use tiny memory model, along with a linker flag, to generate a program in the .COM-file format. A .COM file has a total size limitation of 64K, but is slightly smaller and loads faster than a similar .EXE file. When you run a Debug build that creates a .COM file, the linker places debugging information in a separate file with the same base name as the program and with a .DBG extension. If you delete the .DBG file, you cannot debug your program until you run another Debug build. Otherwise, all the considerations that apply to debugging .EXE files apply to .COM files as well. 1.6.2 Specifying Expressions The Debug menu provides two commands──Watch Value and Watchpoint──that let you specify an expression for the QuickC/QuickAssembler environment to dynamically update and display. The environment displays the updated values in the Watch window. When you choose one of these commands, a dialog box appears, prompting you to enter an expression. Figure 1.5 shows the dialog box for the Watch Value command. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 1.6.2 of the manual │ └────────────────────────────────────────────────────────────────────────┘ You can enter any combination of variable names, constants, and C-language syntax. You cannot enter assembly-language keywords. However, the environment does recognize all valid register names (including names of both 8-bit and 16-bit registers). See Chapter 2, "Introducing 8086 Assembly Language," for information on registers. In addition to register names, the expanded environment supports the optional use of the colon operator (:) for specifying segmented addresses: segment:offset In the syntax display above, segment can be a constant or a register; offset can be any expression. The QuickC/QuickAssembler environment combines the segment and offset addresses to determine a physical address, as described in Section 2.7, "Segmented Addressing and Segment Registers." The following examples demonstrate the use of the colon in valid expressions. Note that you use C-language syntax to specify hexadecimal numbers: 0xb000:0x0000 es:0x0100 es:(array[2]) ss:bp The QuickC/QuickAssembler environment considers a segmented-address expression to be a pointer to a character, which the Watch window evaluates by displaying the character pointed to. However, you can use QuickC type specifiers to alter how an expression is displayed. For example, the Watch window evaluates the following expression by displaying the numeric value of the address es:(warray+3): es:(warray+3),p You can use the three memory operators──BY, WO, and DW──to view the byte, word, or doubleword of memory at a given address. With pointer expressions and registers, BY returns the byte pointed to by the expression. (Segmented addresses are pointer expressions, as are procedure parameters declared with PTR.) With nonpointer variables, BY returns the byte at the same address as the variable. WO and DW work the same way, but return a word or doubleword, respectively. The rest of this section demonstrates how to use the three memory operators to specify useful expressions. To watch the contents of a register, enter just the register's name. To examine the value that the register points to, enter the BY, WO, and DW operators followed by the register name. Example Value Specified ────────────────────────────────────────────────────────────────────────── bx The contents of the BX register BY bx The byte pointed to by the BX register WO bx The word pointed to by the BX register DW es:si The doubleword pointed to by the SI register, relative to the segment address in ES To watch the value of a variable, enter the variable's name. To watch the byte, word, or doubleword at the same address as the variable, use the BY, WO, and DW operators. In this context, these operators function as the QuickAssembler PTR operator does: they change the size of data to be examined. They are similar, but not identical, to C type casts. In the following examples, assume that Var is a word variable defined with DW: Example Value Specified ────────────────────────────────────────────────────────────────────────── Var The variable Var (the word at the address of Var) BY Var The byte at the address of Var DW Var The doubleword at the address of Var You can use BY, WO, and DW to specify an array element, but you must understand that expressions in the Debug window are treated like C expressions rather than assembler expressions. As a result, the syntax you use to watch a memory location in the Debug window is often different from the syntax in your assembly source. For example, assume you have the following data and code: warr DW 1, 2, 3, 4, 5, 6 . . . mov bx,0 mov cx,5 loop1: add ax,warr[bx] add bx,2 loop loop1 You cannot watch the assembler expression warr[bx] directly. However, you can put an equivalent C expression in the Debug window: WO (char*)&warr + bx The address-of operator is necessary to make the C debugger look at the MASM array as a C array──that is, as an address. The value must cast to a character pointer because the debugger looks at it as a scaled C index rather than an unscaled assembler index. In this case, the assembler code adds 2 to the pointer BX to adjust for the variable size. You must tell the debugger to ignore its normal word scaling. Expressions are only scaled when there is a variable in the expression. In the expression WO BP+6 the 6 is not scaled──the expression means, "look at the word six bytes beyond the address that is in BP." However, in the expression WO &warr+6, the 6 is scaled because of the word size of the variable. Note that the variable size, not the expression type ( BY, WO, or DW), determines the size of scaling. If you are comfortable with C, you can also use C expressions to look at assembler expressions. Here are some examples you might find useful: Example Value Specified ────────────────────────────────────────────────────────────────────────── &Var The address of Var es:0x81,s The string at es:[81h] (the DOS command line when a program is started) &Arr[3] The third element of an array (note that the 3 will be scaled) *(&Arr+3) Equivalent to the previous expression 1.6.3 Tracing Execution The Run menu's Trace Into, Animate, and Step Over commands execute one statement of your program at a time. These commands are fully functional with assembly-language programs. However, debugging commands behave differently when you trace execution of an assembly-language module, as summarized below: ■ By default, screen swapping is on. ■ If the main module of the program is an assembly-language module, the first line of the program is never highlighted. ■ The Calls menu does not function unless you write your program according to certain guidelines. The rest of this section elaborates on these differences. When you trace execution of an assembly-language module, screen swapping is turned on. The environment does not support an Auto screen-swapping mode for assembly-language programs because it cannot detect when a program writes to the screen. Therefore, when executing a .ASM file, the environment equates the Auto screen-swapping selection with screen swapping turned on. You can always turn screen swapping off manually by choosing the Run/Debug command from the Options menu. When a dialog box appears, choose the Off option button in the Screen Swapping field. Screen swapping causes the environment to switch to a full output screen each time the program executes code. The effect is particularly noticeable when you choose the Animate command. Leaving screen swapping on preserves program output. However, if large portions of your program do not write to the video display, you may want to turn screen swapping off temporarily. The second debugging feature that operates differently for assembly- language programs is current-line highlighting. When you restart a program, the environment does not highlight the first line of code. The debugging facility does not know which line of code is the first to be executed, since this information is stored in the executable-file header. After you execute a trace, the second program line is highlighted, and thereafter current-line highlighting works as you would expect. The third feature that operates differently is the Calls command from the Debug menu. To ensure that the command works with assembly-language modules, either use the PROC directive with an argument list or local variables, as described in Chapter 3, "Writing Assembly Modules for C Programs," or else set up the framepointer (the BP register) as described in Appendix A, "Mixed-Language Mechanics." Both these methods set up a stack frame for each procedure, using the standard Microsoft methods. The environment checks stack frames to see what procedures have been called, and with what parameters. 1.6.4 Modifying Registers and Flags With the expanded QuickC/QuickAssembler environment, you can get much greater use from the Registers window. The Registers window displays more information than it does in the simple QuickC environment, and you can also use the window to alter register and flag values. ────────────────────────────────────────────────────────────────────────── NOTE By default, the environment does not display the Registers window. To open this window, choose the Window command from the View menu. A dialog box appears that lists all windows. Move the cursor to Registers and press the ENTER key, or move the mouse cursor to Registers and double click the Left mouse button. To close the window, repeat the procedure. ────────────────────────────────────────────────────────────────────────── The Registers window displays the contents of both 8086 and 8087 registers. You can remove 8087 registers from the Registers window by choosing Display from the Options menu. When the dialog box appears, turn the Show 8087 option button off. The environment only displays 8087 registers if you have a math coprocessor or have a program that calls floating-point emulator routines from a high-level language. You can alter values in the window by either using the mouse or the keyboard. To alter a value, you first select the item you want to change: ■ To alter a value with the mouse, select an item by clicking the Left mouse button. ■ To alter a value with the keyboard, first place the cursor on an item in the window. (Press TAB or SHIFT+TAB to cycle quickly through the items.) Then select the item by pressing the ENTER key. The List field has no function in this context and should be ignored. Choosing a flag toggles the flag to the opposite setting. Choosing a register brings up a dialog box. Type the new value for the register and press ENTER. 1.7 Viewing a Listing File When you assemble a module with the Debug build setting (the default), QuickAssembler can create a listing file. Choose the type of listing by using the Assembler Flags dialog box. (To access this dialog box, choose Make from the Options menu, then choose Assembler Flags.) You should also make sure that the One Pass Assembly option is not selected. A QuickAssembler listing file shows precisely how the assembler translated each line of code during the last program build. Each instruction in the source code is listed next to its corresponding numeric code (machine instruction). Listing files are particularly useful if your program uses macro calls or include files. The listing file displays each statement generated by a macro call and each line of code copied from an include file. Tables at the end of the listing file give information on macros, symbols, structures, groups, and records. Part 2 of this manual describes each of these features of assembly language. To view the listing file, assemble the source code at least once. You can view the listing file for the current module by choosing the Listing command from the View menu. You can also view the file with the CTRL+F2 shortcut key. The listing file is then displayed in the Source window, as shown in Figure 1.6. You can page through this file by using all the normal cursor-movement commands. When you want to return to the previous file, press F2 or use the File menu. You can also leave the listing file by choosing the Listing command again; this action causes the environment to switch to the original line of source code that generated the current line of code. In particular, if you are in a listing file and move the cursor to a line generated from an include file (.INC), the Listing command switches directly to that include file. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 1.7 of the manual │ └────────────────────────────────────────────────────────────────────────┘ Normally, you would choose the Listing command when in a .LST file or in a .ASM file with a corresponding .LST file (previously generated by a program build). If you are not in either of these types of files, the environment responds by displaying a dialog box for opening a file; *.lst is the default file name. ──────────────────────────────────────────────────────────────────────────── Chapter 2: Introducing 8086 Assembly Language Assembly-language programs control hardware directly, giving you the ability to write the fastest, smallest programs possible and to execute any operation. But assembly-language programming also requires an understanding of the architecture of 8086-family processors. Assembly language is close to machine code──the processor's numeric language of 1's and 0's. Each QuickAssembler instruction corresponds to an 8086 instruction but consists of a meaningful name (mnemonic) instead of a number. For example, the ADD instruction computes the sum of two items. QuickAssembler translates this instruction to produce a numeric code, such as 10000010 binary. The processor responds to this code when you run the program. This process of translation is called "assembling." Before you can assemble a program, you need to understand the basic concepts of the processor and of assembly language. This chapter presents these concepts. 2.1 Programming the 8086 Family If you have programmed in C, you can get a good grasp of 8086 assembly language by focusing on the differences between the two languages: 1. A C statement may combine many complex operations, but each line of assembly language specifies just one limited action called an "instruction." QuickAssembler also supports a number of nonexecutable statements called "directives," which provide structure to the program, declare data objects, and provide other information. Sections 2.2-2.4 explain the basics of writing instructions and directives. 2. C programs deal with memory locations (known as variables), but assembly-language programs must deal with registers as well. A "register" is a special memory location inside the processor itself, having a permanent name rather than a numeric address. Section 2.5, "8086-Family Registers," describes the use of each register. 3. A data object in a C program can be arbitrarily complicated. Assembly-language statements work on objects accessed through four specific modes: immediate, register, direct memory, and indirect memory. Each mode has specific properties and limitations imposed by the processor. Section 2.6, "Addressing Modes," explains each of these four modes and gives examples. 4. The processor combines two 16-bit addresses to access each memory location. This mechanism is called "segmented addressing." Assembly language often requires a more complete understanding of segmented addressing than C does. Section 2.7, "Segmented Addressing and Segment Registers," explains the full implications of segmented addressing. Of the features listed above, segmented addressing is unique to the 8086 family. The 8086 is further distinguished from other processors by its set of string operations, which permit fast initialization and copying of blocks of data. You can read more about the string operations in Chapter 16, "Processing Strings." 2.2 Instructions, Directives, and Operands The 8086-family processors understand only one kind of statement: an instruction. QuickAssembler understands two kinds of statements: instructions and directives. As explained above, an instruction corresponds to a specific action that the processor executes at run time. The fundamental task of the assembler is to correctly translate each of these statements to specific machine-code instructions. As nonexecutable statements, directives are not translated to machine actions. However, they give information to the assembler that affects how other statements are translated. For example, some of the most important directives declare data. These directives, in turn, help the assembler correctly interpret instructions that refer to the data. The rest of this section explains each part of an assembly-language statement; the general syntax applies to both instructions and directives. The section ends by stating the basics of entering numbers in different radixes. Syntax Each line of source code consists of a blank line or a statement. Each statement is an instruction or directive, and can contain as many as 512 characters. Statements can have up to four fields, as shown below: [[name]] [[operation]] [[operands]] [[;comment]] Each field (except the comment field) must be separated from the other fields by a space or tab character. You can enter statements in uppercase or lowercase letters. By default, QuickAssembler is not case sensitive, but it does preserve case for external variables──thus providing compatibility with C, which is case sensitive. You can control case sensitivity by using the Assembler Flags dialog box. As a convention, sample code in this manual uses uppercase letters for directives, hexadecimal letter digits, and segment definitions. 2.2.1 The Name Field The name field labels the statement with a symbolic name that other parts of the program can reference. The meaning of the name depends on the type of statement. One of the most important uses of this field occurs in data declarations. These declarations are much like variable declarations in C. The statement defines the type and initial value. You use the name elsewhere in the program, when you want to access the data. QuickAssembler is different from C, however, in that the symbolic name occurs in the first field. For example, the following DB directive (Declare Bytes) associates the name string with a series of characters: string DB "Hello, world" In instructions, the name field functions like a program label in C: it provides a target for a jump or call instruction elsewhere in the program. To label an instruction, follow the name field with a colon (:). You can place the name on the same line as the rest of the instruction or, to improve readability, on a separate line. The following example shows the latter case: top: ; This label marks the top of the loop mov ax,1 ; This is first instruction in the loop There are other ways to label instructions. See Section 6.4, "Defining Code Labels," for more information on how to declare labels. 2.2.2 The Operation Field The operation field states the action of the statement. This field determines the fundamental type of the statement──instruction or directive. It also determines what additional syntax, if any, is required. Some operations require an entry in the name field; most do not. If the operation is an instruction, it strictly determines how many and what kind of operands are legal. This field contains exactly one item──an instruction or directive mnemonic. "Mnemonics" are abbreviated, easy-to-remember names that each symbolize a different operation (for instance, ADD, SUB, and OR). Examples of directive mnemonics include EQU (Equate) and DB (Declare Bytes). 2.2.3 The Operand Field The operand field lists the objects on which the statement operates. Multiple operands are separated by commas. These objects can be registers, constants, or memory locations. A memory location is typically represented as a variable, although it can also be expressed as a numeric address or complex expression. Registers and constants require no previous declaration. To refer to a variable, however, you should first declare the name with a data directive, such as DB (Declare Bytes). The following example declares the variable count and then uses it in an instruction: count DB 7 ; Declare count as a byte variable . . . inc count ; count = count + 1 In the first statement, count appears in the name field and the number 7 appears in the operand field. The DB directive associates count with the address of a byte initialized to 7. In the second statement, count appears in the operand field. The INC instruction (increment) adds 1 to count, thus increasing the value of the data to 8. The next section gives more information on how to declare memory locations as data types. Section 2.6, "Addressing Modes," gives a complete description of all the different methods for specifying operands. 2.2.4 The Comment Field The comment field lets you add text that appears in source code but is ignored by the assembler. You can enter any text you want in this field. Typically, you would use it to document the purpose of the statement. The purpose of an assembly-language statement is not always self-explanatory, and for this reason, programs often contain at least one comment for each instruction. Single-line comments always begin with a semicolon (;). You can also create a multiline comment by one of two methods. You can enter successive comment lines as shown below: add count,5 ; Add 5 to count. ; ADD is the operation. ; count and 5 are operands. sub Sum,12 ; Subtract 12 from Sum. ; SUB is the operation. ; Sum and 12 are operands. You can also use the COMMENT directive, which lets you enter multiline comments without using the semicolon. This directive has the following syntax: COMMENT delimiter [[text]] text [[text]] delimiter [[text]] All text between the first delimiter and the line containing a second delimiter is ignored by the assembler. The delimiter character is the first nonblank character after the COMMENT directive. The text includes the comments up to and including the line containing the next occurrence of the delimiter. Example COMMENT + The plus sign is the delimiter. The assembler ignores the statement following the last delimiter + mov ax,1 (ignored) 2.2.5 Entering Numbers in Different Bases As with C, you can enter assembly-language constants as decimal, hexadecimal, or octal. You can also enter binary constants. By default, all constants are decimal, but you specify a different default with the RADIX directive. Hexadecimal constants appear frequently in assembly-language programs. To indicate a hexadecimal constant, add an uppercase or lowercase H suffix. If the first digit is one of the letters A-F, prefix the constant with a leading 0 to indicate that the number is not a symbolic name. Examples 100H 10FAh 0be03H 0FFh You may often want to enter binary constants as well, particularly when constructing bit masks. To indicate a binary constant, simply add an uppercase or lowercase B suffix. For more information on using different bases and using the RADIX directive, see Section 6.1.1.2, "Setting the Default Radix." 2.2.6 Line-Continuation Character You can create program lines that extend over more than one physical line by using the backslash (\) as a line-continuation character. The backslash must be the last character on the line. Comments cannot follow it. A backslash is not considered a continuation character if it occurs in a comment. Example BigProc PROC FAR \ USES DS SI DI, \ IntArg:WORD, \ String:FAR PTR BYTE, \ Ptr:FAR PTR BIGSTRUC, \ Long:DWORD . . . ret BigProc ENDP In this example, the line continuation-character is used to specify multiple procedure arguments with the extended PROC syntax. All the arguments must be placed on a single logical line, but they would go past the edge of the editor screen if not placed on separate lines. The continuation character is also useful for long macro calls and data initializations. 2.3 8086-Family Instructions The 8086-family processors support more than 80 instructions, but you don't need to memorize the entire instruction set. Once inside the expanded QuickC environment, you can get instant information on any instruction. Move the cursor to an instruction keyword on the screen, then press F1. To find the appropriate instruction for the action you want to perform, refer to Part 3 of this book, which provides a topical survey of instructions. Many programs can be written with just a few of the most common instructions. Sections 2.3.1 and 2.3.2 introduce some of these instructions, grouping them into two sets: instructions that manipulate data and instructions that control program flow. The programs in Chapters 3 and 4 use these same instructions to illustrate basic concepts of 8086 assembly language. 2.3.1 Data-Manipulation Instructions The first group of instructions manipulate data. Each causes the processor to copy data or perform a calculation at run time. Some of the simpler C statements translate directly into a single instruction, so this section uses C statements for illustration. Here are the six basic data- manipulation instructions introduced in this section: ■ MOV (move data) ■ ADD (add second operand to first) ■ SUB (subtract second operand from first) ■ INC (increment operand) ■ DEC (decrement operand) ■ MUL (integer multiplication) The processor supports a great many other data-manipulation instructions, which are covered in Part 3 of this manual. 2.3.1.1 The MOV Instruction The MOV instruction, probably the most frequently used 8086 instruction, copies data from one location to another. The instruction leaves the source data unaffected, so it is more a copy than a move. The MOV instruction takes two operands: MOV destination,source The instruction copies the value of the source to the destination. It might seem more logical to place the source operand first, until you consider that C and BASIC assignments use the same order. For example, the instruction mov count,5 places the value 5 at the memory location count and thus performs the same action as the C statement count = 5; The destination operand is similar to an "lvalue" in C. Instructions that have two operands always interpret the leftmost operand as the destination, or lvalue. The destination is the operand that the instruction can alter; thus, it can't be a constant. Another limitation on instructions with two operands is that the operands cannot both be memory locations. 2.3.1.2 The ADD Instruction The ADD instruction, like MOV, takes two operands: a destination and a source. The processor adds the two operands together, storing the result in the destination (on the left). This action will be familiar to C programmers, since the instruction add sum,10 adds 10 to the memory location sum and thus performs the same action as the C statement sum += 10; The 8086 does not perform automatic scaling for pointer addition as C does. The program itself must perform scaling for all pointer arithmetic. 2.3.1.3 The SUB Instruction The SUB instruction is the counterpart of ADD: it subtracts the source operand from the destination operand, storing the result in the destination (on the left). Thus, the instruction sub total,7 performs the same action as the C statement total -= 7; 2.3.1.4 The INC and DEC Instructions The INC (Increment) and DEC (Decrement) instructions add and subtract 1, respectively. They are similar to, but faster than, ADD and SUB, and are provided because adding and subtracting by 1 are such common operations. The instruction inc count performs the same action as the C statement count++; 2.3.1.5 The AND Instruction The AND instruction is one of several bitwise logic operations supported by the 8086. AND provides an efficient way to mask out bits. The instruction and stuff,0FFF0h masks out the four lowest bits of stuff, as does the C statement stuff &= 0x0FFF0; 2.3.1.6 The MUL Instruction The MUL instruction multiplies two items, but one of these items is an "implied operand"──that is, an operand you do not specify. For example, the 16-bit version of the MUL instruction takes one explicit 16-bit operand: mul factor The other operand is the AX register. The processor multiplies factor by the value of AX, storing the low 16 bits of the result in AX. The description of the AX register in Section 2.5.1, "The General-Purpose Registers," gives more information on MUL. 2.3.2 Control-Flow Instructions The control-flow instructions enable the program to execute loops and to make decisions. Some of these instructions transfer control of the program to a new address. The conditional jump instructions let you provide program logic: they look at the result of a previous operation, and then decide whether to jump or not. Here are the five basic control-flow instructions introduced in this section: ■ JMP (Jump unconditionally) ■ CMP (Compare──subtract without storing result) ■ JE (Jump If Equal) ■ JA (Jump If Above) ■ JB (Jump If Below) The processor supports a number of other control-flow instructions, including several conditional jumps. See Section 15.1.2, "Jumping Conditionally," for a description of these instructions. 2.3.2.1 The JMP Instruction The JMP instruction causes the processor to jump to a new program address. Like the C goto statement, JMP takes one operand: a label associated with another statement. The instruction jmp begin jumps to the label begin, and thus performs the same action as the C statement goto begin; 2.3.2.2 The CMP Instruction The CMP instruction, like SUB, performs a subtraction. But CMP doesn't store the result; instead, it just sets processor flags in preparation for a conditional jump (such as JE, JA, or JB). A "processor flag" is a bit that resides in the processor and indicates whether a specific condition is on or off. For example, the Zero flag indicates that the result of the last operation produced zero. The JE instruction (Jump If Equal) checks this one flag only, jumping if it is set. Other conditional jumps determine a result by checking a combination of flag settings. See Section 2.5.4, "The Flags Register," for a description of all the flags. Many instructions, including SUB, set processor flags. However, some of these instructions have strong side effects. Use ADD or SUB to prepare for a conditional jump when convenient. But use CMP when you need to make a simple comparison without altering data. 2.3.2.3 The Conditional Jump Instructions The JE, JA, and JB instructions are conditional jumps (meaning Jump On Equal, Jump If Above, and Jump If Below, respectively). Like JMP, they each take one argument: a program label to which to jump. Unlike JMP, they cause the processor to jump only when certain flag settings are detected. The result is that when you use CMP in combination with a conditional jump instruction, you create an if-then relationship similar to an if statement in a high-level language. Consider the following instructions: cmp sum,10 ; Compare sum to 10 ja top ; If sum > 10, jump to top This logic is a little different from a C program. The first instruction makes the comparison. The second states, "If the result of the previous instruction was above zero, then jump." Taken together, these two instructions perform the same action as the C statement if( sum > 10 ) goto top; Of course, most C programmers do not use many goto statements. Typically, you would test for a condition and execute a series of statements if the condition is true, as in the following code: if( sum >= 10 ) { sum = 1; count += 2; delta = 5; } To implement this code in assembly language, test for the opposite condition, then jump past statements if they should not be executed. For example, the following code executes the three statements inside the if block only if sum is greater than or equal to 10: TopOfBlock: cmp sum,10 ; Compare sum to 10 jb SumNotGreater ; If sum < 10, do NOT do ; next three statements mov sum,1 ; sum = 1 add count,2 ; count = count + 2 mov delta,5 ; delta = 5 SumNotGreater: ────────────────────────────────────────────────────────────────────────── NOTE JA (Jump If Above) and JB (Jump If Below) each work properly when you compare unsigned integers. To compare signed integers, use JG (Jump If Greater) and JL (Jump If Less Than). See Section 15.1.2, "Jumping Conditionally," for a complete list of conditional jump instructions. ────────────────────────────────────────────────────────────────────────── 2.4 Declaring Simple Data Objects This section describes how to declare global variables──often called "static" because each corresponds to a fixed memory location. Programs generally require data. If you wrote a program in machine code, you'd have to reserve locations in memory for data, determine the address of each data object, and remember these addresses whenever you operated on memory. Fortunately, the assembler reserves memory locations for you and associates each location with a symbolic name. You use data directives to tell the assembler how to allocate and refer to memory. The most common data directives for characters and integers are: Directive Description ────────────────────────────────────────────────────────────────────────── DB Declare byte (either a small integer or a character) DW Declare word (2-byte integer) DD Declare doubleword (4-byte integer) To use these directives, place the name of the variable first, then enter the data directive. The third column (operand field) contains one or more initial values. Use a question mark to indicate an item with no initial value. aByte DB 1 ; aByte is a 1-byte integer, initialized to 1 area DW 500 ; area is a 2-byte integer, initialized to 500 population DD ? ; population is a 4-byte integer, no initial value These directives correspond roughly to the following C statements: char aByte = 1; int area = 500; long population; Assembly data declarations are different from C declarations, however, in that assembly data declarations are not declared signed or unsigned. Instead, you must remember whether you intend to treat a variable as signed or unsigned, and choose the appropriate operations. Data directives reserve memory in the object file. They also associate each variable with a name and a size attribute. The assembler uses this information to correctly assemble instructions that operate on variables. For example, at the machine-code level, the INC instruction can be encoded to increment either a byte or a word of data. The way the assembler encodes the instruction inc myvar depends on whether myvar was declared as a byte or word. (If it was declared a doubleword, the instruction is illegal.) Another important use of size attributes is in checking the validity of two operands. For example, the following instruction causes the assembler to print a warning message, because aByte and bx do not share the size attribute: mov bx,aByte ; Move aByte into a word register Moving a byte into a word location is not possible. After issuing the warning, the assembler adjusts the instruction as if it were written as follows: mov bx,WORD PTR aByte ; Move the word at aByte to BX The PTR operator temporarily modifies the size attribute of the object that follows it. PTR can be used with a number of different data types, as shown below: Keywords Refers to ────────────────────────────────────────────────────────────────────────── BYTE PTR object The byte at address of object WORD PTR object The word at address of object DWORD PTR object The doubleword at address of object However, this adjustment may not produce the action you really want. The PTR operator is not quite the same as a type cast in C. The C (int) type cast manipulates data so that it represents the same value, but in a different format. WORD PTR does no data manipulation──it simply causes the instruction to operate on the word at the given address. In the example above, the use of WORD PTR causes two adjacent bytes of data to be loaded from memory into BX. If what you really want is to move a single byte of data to BX, but convert it to a word, use the following code: mov bl,aByte ; Lower byte of BX = aByte sub bh,bh ; Higher byte of BX = 0 The example above only works properly when handling unsigned numbers. When working with signed quantities, use the CBW instruction, as described in Section 13.2.1, "Extending Signed Values." By far the most common use of WORD PTR is in operations on objects 32 bits or longer. An 8086 instruction can operate only on a byte or a word. You use WORD PTR to tell the assembler to operate on one word at a time. For example, the following code uses two moves to copy the 32-bit integer X to a similar integer, Y: X DD 80000 ; X is a long integer = 80,000 Y DD ? ; Y is a long integer . . . mov ax, WORD PTR X ; Move word at X to word at Y mov WORD PTR Y, ax ; (using AX as intermediate register) mov ax, WORD PTR X[2] ; Move word 2 bytes past X to mov WORD PTR Y[2], ax ; word 2 bytes past Y Brackets ([ ]) are used with arrays as well as portions of large data objects as shown here; they also let you add a displacement to an address. The use of brackets is further explained in the next few paragraphs. Assembly language makes almost no distinction between simple variables and arrays. You refer to the first element of an array just as you would a simple variable──index brackets are optional. To declare an array or string, just give a series of initial values: warray DW ?,?,?,? xarray DW 1,2,3,4 mystring DB "Hello, there." To refer to the first element of warray, type warray into your program (no brackets required). To refer to the next element, use either of these two forms, each of which refers to the object two bytes past the beginning of warray: warray+2 warray[2] When used with a variable name, the brackets do nothing but add a number to the address. If warray refers to the address 2400h, then warray[2] refers to the address 2402h. However, the brackets have an additional function when used with registers. See Section 2.6.4, "Indirect Memory Operands," for more information. In assembly language, array indexes are zero-based, as in C; but unlike C, they are unscaled. The number inside brackets always represents an absolute distance in bytes. In practical terms, the fact that indexes are unscaled means that if the size of an element is larger than one byte, you must multiply the index of the element by its size (in this case, 2), then add the result to the address of the array. Thus, the expression warray[4] represents the third element, which is 4 bytes past the beginning of the array. Similarly, the expression warray[6] represents the fourth element. In general, the numeric offset required to access an array element can be calculated as shown in the following formula: Nth element of Array = Array[(N-1) * size of element] 2.5 8086-Family Registers A "register" is a special memory location inside the processor itself. Operations on registers execute faster than operations on main memory. The processor has a limited number of registers. Moreover, many operations on the 8086 are impossible without the use of registers at some point. For example, you cannot copy data between two memory locations without first moving it into a register. Figure 2.1 shows the registers common to all the 8086-family processors. The 8086 registers can be grouped by function into the following sets: general-purpose registers, index registers, pointer registers, and segment registers. Each set corresponds to a different ending letter (X, I, P, or S). The registers in each set are as follows: ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 2.5 of the manual │ └────────────────────────────────────────────────────────────────────────┘ ■ The four general-purpose registers are AX, BX, CX, and DX. These registers exist for the general use of the program. You can use these registers to store temporary values and perform calculations. ■ The two index registers are SI (Source Index) and DI (Destination Index). These registers can also be used for general storage, but are less flexible than the general-purpose registers. SI and DI have a special purpose in string instructions. ■ The pointer registers are IP (Instruction Pointer), SP (Stack Pointer), and BP (Base Pointer). These registers should not be confused with BX, which is the register normally used for pointer indirection. IP, SP, and BP each have a special purpose in conjunction with procedure calls. SP and BP should be altered with care; IP cannot be altered or referenced directly at all. ■ The segment registers are CS, DS, SS, and ES. This section does not describe these registers. You generally don't alter or reference them except when starting the program or accessing data from multiple segments. Section 2.7, "Segmented Addressing and Segment Registers," describes each segment register and how it is important to programs. In addition, there is a flags register that indicates the status of the process. 2.5.1 The General-Purpose Registers The general-purpose registers have many important uses in an 8086 assembly-language program, including: ■ Storing the values most frequently used. Operations on registers are much faster than operations on memory. Therefore, place the program's principal values in registers. In larger programs, you will probably have too many variables to place them all into registers. You can, however, place a value in a register while it is in heavy use. ■ Supporting operations with two or more variables. Direct memory-to-memory operations are illegal with 8086 processors. To operate on two memory locations, you need to first load one of the values into a register. ■ Enabling use of all the instructions. Many instructions require the use of a particular register. For example, the MUL instruction always works with the AX register (or AL, if you specify a byte operand). ■ Passing or returning values in a procedure or interrupt call. Each of the general-purpose registers──AX, BX, CX, and DX──can be accessed as single 16-bit registers, or as two 8-bit registers. As shown in Figure 2.1, the AH, BH, CH, and DH registers represent the high-order 8 bits of the corresponding registers. Similarly, AL, BL, CL, and DL represent the low-order 8 bits. This design lets you operate directly on two-byte and one-byte objects. It also lets you load a two-byte object and then manipulate one byte at a time. Each of the general-purpose registers has special uses, discussed below. 2.5.1.1 The AX Register The AX (Accumulator) register is ideal for repeated calculations. It accumulates totals as well as the results of multiplication and division. Using AX can add speed to your program, because some instructions have special encodings optimized for use with AX. Multiplication instructions always use AX. In the 16-bit version of the MUL instruction, you specify one 16-bit value. The processor multiplies this value by the contents of AX and stores the 16 least significant binary digits of the result in AX. (The 16 most significant digits are stored in DX.) The following example multiplies base times height, and stores the result in area. These instructions are sufficient if the result does not exceed the limit for two-byte numbers (otherwise, the DX register will contain the overflow): base DW 5 ; base is a word, initialized to 5 height DW 3 ; height is a word, initialized to 3 area DW ? ; area stores 16-bit (word) product . . . mov ax,base ; AX = base mul height ; AX = AX * height mov area,ax ; area = result AX has a similar use in division instructions (DIV and IDIV). See Section 14.4, "Dividing," for examples of division. Also, in port I/O instructions, AX holds the data to write to a port and receives data read from a port. By convention, AX has another special use. Microsoft high-level languages expect AX to contain a function's return value. If the return value is longer than four bytes, the high-level languages expect DX:AX to point to the location of the return value. 2.5.1.2 The BX Register The BX (Base) register has great importance as a pointer or address register. All 16-bit registers can hold addresses, but not all registers can be used to retrieve the contents of an address. In C this operation is called "pointer dereferencing," or "indirection." The C source code to implement this action might look like this: value = *pVar; The following assembly code achieves the same effect: mov bx,pVar ; BX = pVar mov value,[bx] ; value = object pointed to by BX The brackets around BX in the second instruction direct QuickAssembler to consider BX a pointer to the actual operand. The item [bx] is an example of an indirect memory operand. See Section 2.6.4, "Indirect Memory Operands," for more information. 2.5.1.3 The CX Register The CX (Count) register has special meaning to instructions with a repeat-operation feature. The contents of CX indicate how many times to repeat execution. Loops, string operations, certain jump instructions, and shifts and rotates all use CX this way. A common instruction that uses CX to repeat execution is LOOP, which is analogous to the C for statement. This instruction subtracts one from CX, then jumps to the given label if CX is not equal to 0. Thus, the following loop executes 20 times: mov cx,20 top: . . . loop top In the case of shifts and rotates, CL (the lower byte of CX) indicates how many bit positions to shift. See Section 14.7, "Shifting and Rotating Bits," for more information. Also, when an instruction has a REP (repeat) prefix, the value in CX determines how many times the instruction is executed. 2.5.1.4 The DX Register The DX (Data) register often is used only for storage of temporary values. However, DX has a special function in some versions of the multiplication, division, and port instructions. Each of these uses is closely related to AX. In fact, DX is located next to AX in the actual physical layout of the 8086 chip. (Figure 2.1 places the registers in the order AX, BX, CX, and DX merely for ease of reference.) When you multiply 16-bit values with MUL, DX holds the high 16 bits of the 32-bit result. The following example is a variation of the one given for AX. In this example, Area is a 32-bit value (a long integer), and it stores the entire 32-bit result of the MUL instruction: base DW 500 ; base is a word, initialized to 500 height DW 300 ; height is a word, initialized to 3 area DD ? ; area stores doubleword product . . . mov ax,base ; AX = base mul height ; DX:AX = AX * height mov WORD PTR area[0],ax ; Store low 16 bits mov WORD PTR area[2],dx ; Store high 16 bits By convention, Microsoft high-level languages use both DX and AX to return four-byte values from procedures. The high 16 bits are placed in DX. 2.5.2 The Index Registers The two index registers are SI (Source Index) and DI (Destination Index). These registers are similar to the general-purpose registers, but cannot be accessed one byte at a time. Index registers are efficient places to store general data, pointers, array indexes, and pointers to blocks of memory. They have the following special uses: ■ You can use both SI and DI for pointer indirection, as you can BX and BP. "Pointer indirection" is the process of retrieving the value that a pointer points to. ■ You can use SI or DI to hold an array index. Indirect memory operands can combine this index with a base address stored in BX or BP. ■ You prepare for string instructions, which execute highly efficient block operations, by loading SI with a source address and DI with a destination address. See Chapter 16, "Processing Strings," for information on how to use string instructions. When you write a procedure to be called by C, be careful to leave SI and DI in the same state they were in before C called your procedure. Microsoft QuickC allocates register variables in SI and DI. 2.5.3 The Pointer Registers The pointer registers──BP, SP, and IP──are all special-purpose registers that help implement procedure calls. The processor alters SP (Stack Pointer) and IP (Instruction Pointer) whenever you call a procedure, and you can use BP (Base Pointer) to access parameters placed on the stack. Despite their names, pointer registers are not good places to store pointer variables or other general program data; you should generally use BX, SI, and DI for that purpose. 2.5.3.1 The BP Register You can use BP (Base Pointer) to retrieve the contents pointed to by an address. However, by default, the BP register points into the stack segment rather than the data segment. Therefore, BP is typically used to access items on the stack. The "stack" is the area of memory that holds parameters, local variables, and return addresses for each procedure being executed. Although you can store general data in BP, it is commonly used to access parameters of the current procedure. When you use the PROC statement with a parameter list as explained in the next chapter, avoid altering the value of BP. The PROC directive generates instructions that set BP to point to the procedure's local stack area, and then use BP to access parameters and local data. If BP changes, all your references to parameters will be wrong. To learn how to set BP yourself, see Section 15.3.3, "Passing Arguments on the Stack," or Appendix A, "Mixed-Language Mechanics." 2.5.3.2 The SP Register The SP (Stack Pointer) register points to the current location within the stack segment. As you add or remove items from the stack, the processor changes the value of SP, so that SP always points to the top of the stack. The processor stack works like a stack of dishes: you push items onto the top of the stack as you need to save them, then pop them off the stack when you're ready to use them again. The stack is a last-in-first-out mechanism. You can only remove the item currently at the top of the stack. Items must be removed in the reverse order they were placed there. The processor automatically pushes and pops return addresses for you when you call or return from a procedure. A "return address" is the place a procedure or routine returns to when done. You can also place other values on the stack by using the PUSH and POP instructions. The PUSH instruction saves the value of a register or memory location by placing it on the stack. POP removes the value from the stack and places it back in the original location. (You can also pop the contents into some other location if you wish.) Use these instructions when you need to preserve a value. In the following example, BX holds an important value, but the program needs temporary use of BX: push bx ; Save BX on the stack mov bx,pointer ; Load pointer into BX mov value,[bx] ; value = *pointer pop bx ; Pop old value back into BX The stack also holds parameters and local variables during procedure calls. Sections 13.4.2, "Using the Stack," and 15.3.3, "Passing Arguments on the Stack," provide more information on using the stack. Appendix A, "Mixed-Language Mechanics," explains how to manipulate the stack to make room for local variables──one of the few times you should change the value of SP directly. 2.5.3.3 The IP Register You cannot adjust the IP (Instruction Pointer) register directly; it can only be adjusted indirectly, through control-flow instructions. For this reason, Quick-Assembler does not even recognize IP as a keyword. The IP register contains the address of the next instruction to execute. The instructions that control program flow (calls, jumps, loops, and interrupts) automatically set the instruction pointer to the proper value. The processor pushes the address of the next instruction onto the stack when you call a procedure. The processor pops this instruction into IP when the procedure returns. Normally, the processor increments IP to point to the next instruction in memory. 2.5.4 The Flags Register The flags register, shown in Figure 2.2, is a 16-bit register made up of bits that each indicate some specific condition. Most of the flags help determine the behavior of conditional jump instructions. Many instructions──most notably CMP──set these flags in a meaningful way. Other flags (Trap, Interrupt Enable, and Direction) do not affect conditional jump instructions but control the processor's general operation. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 2.5.4 of the manual │ └────────────────────────────────────────────────────────────────────────┘ The nine flags common to all 8086-family processors are summarized below, progressing from the low-order to high-order flags. In these descriptions, the term "set" means the bit value is 1, and "cleared" means the bit value is 0. Instructions actively set and clear various flags. For example, if the result of a SUB or CMP instruction is zero, it sets the Zero flag. This flag setting can, in turn, affect subsequent instructions──in particular, conditional jumps. Some instructions do not set the flags at all, or have random effects on some flags. Consult on-line Help for each instruction to see precisely how it affects flag settings. Flag Description ────────────────────────────────────────────────────────────────────────── Carry Is set if an operation generates a carry to or a borrow from a destination operand. (Operation viewed as unsigned.) Parity Is set if the low-order bits of the result of an operation contain an even number of set bits. Auxiliary Carry Is set if an operation generates a carry to or a borrow from the low-order four bits of an operand. This flag is used for binary coded decimal arithmetic. Zero Is set if the result of an operation is 0. Sign Equal to the high-order bit of the result of an operation (0 is positive, 1 is negative). Trap If set, the processor generates a single-step interrupt after each instruction. Debugging programs, including the QuickC/QuickAssembler debugging facility, use this feature to execute a program one instruction at a time. Interrupt Enable If set, interrupts will be recognized and acted on as they are received. The bit can be cleared to temporarily turn off interrupt processing. Direction Can be set to make string operations process down from high addresses to low addresses, or can be cleared to make string operations process up from low addresses to high addresses. Overflow Is set if the result of an operation is too large or small to fit in the destination operand. (Operation viewed as signed.) The Carry and Overflow flags are similar, but have one major difference: the Carry flag is set according to the rules of unsigned operations, and the Overflow flag is set according to the rules of signed operations. A signed operation uses two's complement arithmetic to represent negative numbers. One of the features of this system is that a number is negative if the most significant bit is set. Unsigned operations do not view any number as negative. Thus, the same ADD operation can be viewed as adding FFFFH to FFFEH (unsigned) or -1 to -2 (signed). This operation would set the Carry flag (because the maximum unsigned value is FFFFH), but not the Overflow flag. ────────────────────────────────────────────────────────────────────────── NOTE This manual does not describe the details of two's-complement arithmetic. For more information, see one of the references listed in the Introduction. ────────────────────────────────────────────────────────────────────────── Each of the conditional jump instructions responds to a particular flag or combination of flags. For example, the JZ (Jump If Zero) instruction jumps if the Zero flag is set. The JBE (Jump If Below or Equal) jumps if either the Zero flag or the Carry flag is set. For a description of all the conditional jump instructions, see Section 15.1.2, "Jumping Conditionally." 2.6 Addressing Modes You can specify several kinds of operands: immediate, register, direct memory, and indirect memory. Each type of operand corresponds to a different addressing mode. The "addressing mode" is the method that the processor uses to calculate the actual value of the operand at run time. You don't specify addressing modes explicitly. You simply give an operand, and the assembler determines the corresponding addressing mode. The four types of operands are summarized below, and described at length in the rest of this section. Operand Type Description ────────────────────────────────────────────────────────────────────────── Immediate A constant value contained in the instruction itself Register A 16-bit or 8-bit register Direct memory A fixed location in memory Indirect memory A memory location determined at run time by using the address stored in one or two registers Direct memory and indirect memory operands are closely related. Syntax displays in this manual, as well as in on-line Help, often refer to memory operands. You can use either type of memory operand wherever memory is specified. From the processor's viewpoint, the only difference between these types of operands is how the address is determined. The address specified in the memory operand is called the "effective address" of the instruction. Most two-operand instructions require operands of the same size. When one of the operands is a register, QuickAssembler adjusts the size of the other, if possible, to be the size of the register──either 8 or 16 bits. An instruction that operates on AX and BL is illegal, since these registers are different sizes. If the sizes conflict, you can sometimes use the PTR operator to override the size attribute of an operand. Sections 2.6.1-2.6.4 discuss each of the four operand types (and corresponding addressing modes) in detail. 2.6.1 Immediate Operands An "immediate operand" is a constant value on which the instruction operates directly. This is the only addressing mode that involves no further access of registers or memory. The data follows the instruction right inside the executable code, thus giving rise to the name "immediate." Use immediate operands for the same reasons you would use a literal or symbolic constant in C. The value of an immediate operand never changes. An immediate operand can be a symbolic constant declared with the EQU operand. This operand is often used for the same purpose as the C #define directive. For example, consider the constant declaration: magic EQU 7243 You could use this the same way as the C statement: #define magic 7243 Chapter 11, "Using Equates, Macros, and Repeat Blocks," tells more about defining constants with the EQU or = operator. An immediate operand can also be an expression made up of constants. For example, the following code directs QuickAssembler to calculate the difference between two ASCII values, then use this difference as the source (rightmost) operand: mov bigdiff,'a'-'A' The assembler interprets the one-byte strings 'a' and 'A' as the ASCII values 97 and 65. The assembler calculates the difference──in this case, 32──and places the resulting value into the object code. At run time, this value is fixed. Each time the instruction is executed, the processor moves the value 32 into the memory location bigdiff. This instruction is precisely equivalent to, but more readable than, the following: mov bigdiff,32 One-byte and two-byte strings can be immediate operands. Larger strings cannot be processed by a single 8086 instruction. Chapter 3, "Writing Assembly Modules for C Programs," explains how to process longer strings, one character at a time. The OFFSET and SEG operators turn variable names (which normally are memory operands) into immediate operands. These operators are similar to the address operator (&) in C. In Chapter 4, "Writing Stand-Alone Assembly Programs," you'll see how to use the OFFSET operator to treat an address as immediate data. When an instruction has two operands, you cannot place immediate data in the destination (leftmost) operand. (The OUT instruction is the one exception.) Examples var DW ? college DW 1636 nine EQU 5+4 ; Declare nine as symbolic constant . . . mov var,nine ; Move immediate data to memory mov bx,'ab' ; Move ASCII values for 'a' and 'b' ; into BH and BL mov college,1701 ; Move immediate data to memory mov ax,1+2+3+4 ; Move immediate data to AX mov ax,OFFSET var ; Move address of var to AX int 21h ; Immediate data is single operand ; 21 hexadecimal (33 decimal) 2.6.2 Register Operands A register operand consists of one of the 20 register names. The processor operates directly on the data stored in the register. "Register-direct" mode refers to the direct use of the value of the register rather than a memory location. Registers can also be used indirectly, to point to memory locations as described in Section 2.6.4, "Indirect Memory Operands." Most instructions can take one or more register operands. You generally can use any of the general-purpose registers with these instructions, although some instructions require specific registers. The use of segment registers (CS, DS, SS, and ES) is restricted. You can refer to segment registers only under special circumstances. Table 2.1 shows all the valid register names for 8086 processors. You can use any of these names as a register-direct operand. Table 2.1 Register Operands Register Type Register Name ────────────────────────────────────────────────────────────────────────── 8-bit high registers AH BH CH DH 8-bit low registers AL BL CL DL 16-bit general AX BX CX DX purpose 16-bit pointer and SP BP SI DI index 16-bit segment CS DS SS ES ────────────────────────────────────────────────────────────────────────── Section 2.5, "8086-Family Registers," discusses registers in more detail. Limitations on register use for specific instructions are discussed in sections on the specific instructions throughout Part 3, "Using Instructions." Examples mov ds,ax ; Both operands are register direct mov stuff,dx ; Source operand is register direct mov ax,1 ; Destination is register direct mul bx ; Single operand, register direct 2.6.3 Direct Memory Operands A direct memory operand specifies a fixed address in main memory containing the data to operate on. At the machine level, a direct memory operand is a numeric address. In your QuickAssembler source code, you usually represent a direct memory operand by entering a symbolic name previously declared with a data directive such as DB (Declare Bytes). A direct memory operand is similar to a simple variable in C or an array element with a constant index. Any object in memory can be a direct memory operand as long as the exact location is fixed in the executable code. The data at the location can change, but the location itself is the same each time the processor executes the instruction. This fact gives direct memory operands a static character. For more dynamic operations, use indirect memory operands. Examples mov ax,count ; Source operand is direct memory mov count,ax ; Destination operand is direct memory inc total ; Single operand is direct memory Typically, a direct memory operand is a simple label. As with immediate operands, you can specify a direct memory operand by entering an expression. As long as the address can be determined at assembly time, the operand is direct memory. ────────────────────────────────────────────────────────────────────────── NOTE Technically, a program address is not determined until link time (in the case of near addresses) or load time (in the case of segment addresses). These adjustments are necessary to support multiple modules and to enable the program to run anywhere in memory. However, you can ignore these details. If the assembler can determine the operand's address relative to the rest of the module, the operand is direct memory. ────────────────────────────────────────────────────────────────────────── The following example uses an expression that translates to a direct memory operand. This example could be used to load the value of DX into the third element of an array of bytes. QuickAssembler considers area[2] as equivalent to area+2. mov area[2],dx ; Move DX to memory location 2 bytes ; past the address of "area" In the statement above, the assembler calculates an address by adding 2 to the address of area. The resulting address will be the same no matter what values are stored in registers. At run time, the address is fixed. Thus, the operand is direct memory. You can use a numeric constant as a direct memory operand. Normally, Quick-Assembler interprets a numeric constant as an immediate operand. To ensure interpretation as a memory operand, prefix the number with a segment register and colon (:). Brackets are optional. The following instructions each load AX with the contents of memory address 100 hexadecimal in the data segment: mov ax,ds:[100h] mov ax,ds:100h Section 2.7, "Segmented Addressing and Segment Registers," provides more information on segment registers and the use of the colon (:). By default, the processor assumes that data references lie in the segment pointed to by DS. 2.6.4 Indirect Memory Operands With indirect memory operands, the processor calculates the address of the data at execution time, by referring to the contents of one or two registers. Since values in the registers can change at run time, indirect memory operands provide the most dynamic method for accessing data. Indirect memory operands make possible run-time operations such as pointer indirection, dynamic indexing of array elements──including indexing of multi-dimensional arrays──and dynamic accessing of members of a structure. All these operations are similar to operations in high-level languages. The major difference is that assembly language requires you to use one of several specific registers: BX, BP, SI, and DI. You indicate an indirect memory operand by using at least one pair of brackets. Use of the index operator ([ ]) is explained in more detail in Section 9.2.1.3. When you place a register name in brackets, the processor uses the data pointed to by the register. For example, the following instruction accesses the data at the address contained in BX, and then moves this data into AX: mov ax,[bx] When you specify more than one register, the processor adds the contents together to determine the effective address (the address of the data to operate on). One register must be a base register (BX or BP), and the other must be an index register (SI or DI): mov ax,[bx+si] You can specify one or more displacements. A "displacement" is a constant value to add to the effective address. A simple use of a displacement is to add a base address to a register: mov ax,table[si] In the example above, the displacement table is the address of an array; SI holds an index to an array element. (Unlike C, an assembly-language index always indicates the distance in bytes between the beginning of the array and the element.) Each time the instruction executes, it may load a different element into AX. The value of SI determines which array element to load. Each displacement can be an address or numeric constant. If there is more than one displacement, the assembler adds them all together at assembly time, and places the total displacement into the executable code. For example, in the statement mov ax,table[bx][di]+6 both table and 6 are displacements. The assembler adds the value of table to 6 to get the total displacement. Table 2.2 shows the modes in which registers can be used to specify indirect memory operands. Table 2.2 Indirect Addressing Modes Mode Syntax Description ────────────────────────────────────────────────────────────────────────── Register indirect [BX] [BP] [DI] [DI] Effective address is contents of register ────────────────────────────────────────────────────────────────────────── Based or indexed displacement[BX] Effective address is contents displacement[BP] of register plus displacement displacement[DI] displacement[SI] ────────────────────────────────────────────────────────────────────────── Based indexed [BX][DI] [BP][DI] Effective address is contents [BX][SI] [BP][SI] of base register plus contents of index register ────────────────────────────────────────────────────────────────────────── Based indexed with displacement[BX][DI] Effective address is the sum displacement displacement[BP][DI] of base register, index displacement[BX][SI] register, plus displacement displacement[BP][SI] ────────────────────────────────────────────────────────────────────────── You can enclose each register in its own pair of brackets, or you can place the registers in the same pair of brackets separated by a plus sign (+). The period (.) is normally used with structures, but it also indicates addition. The following statements are equivalent: mov ax,table[bx][di] mov ax,table[bx+di] mov ax,[table+bx+di] mov ax,[bx][di].table mov ax,[bx][di]+table mov ax,table[di][bx] 2.7 Segmented Addressing and Segment Registers "Segmented addressing" is the internal mechanism that enables the processor to address up to one megabyte of main memory. This mechanism accesses each physical memory location by combining two 16-bit addresses. The two addresses can be represented in source code as follows: segment:offset The first 16-bit address is the "segment address." The second 16-bit address is the "offset address." In effect, the segment address selects a 64K region of memory, and the offset address selects a byte within this region. Here's how it works: 1. The processor shifts the segment address left by four places, producing a 20-bit address ending in four zeros. This operation has the effect of multiplying the segment address by 16. 2. The processor adds this 20-bit address to the 16-bit offset address. The offset address is not shifted. 3. The processor uses the resulting 20-bit address, often called the "physical address," to access an actual location in the one-megabyte address space. Figure 2.3 illustrates this process. The 8086-family processors were developed to use this mechanism because 16 bits (the size of an 8086 register) can only address 64K at a time. However, the combined 20-bit address is sufficient to address a full megabyte. Note that DOS and ROM BIOS reserve part of this area, so that no more than 640K is available for program addresses. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 2.7 of the manual │ └────────────────────────────────────────────────────────────────────────┘ A "segment" consists of a series of addresses that share the same segment address, but different offsets. Segments can be no more than 64K in size. To create large programs, you need to divide your program into multiple segments. Even with smaller programs, it is convenient to have separate code, data, and stack segments. (With tiny-model programs, the linker combines these segments into a single physical segment.) The following example helps illustrate segmented-address calculations further. The processor calculates the address 53C2:107A by multiplying the segment portion of the address by 16 (10H), and then adding the offset portion, as shown below: 53C20h Segment times 10h + 107Ah Offset 54C9Ah Physical address The use of segmented architecture doesn't mean that you have to specify two addresses every time you access memory. The 8086-family processors use four segment registers, which simplify programming in the following ways: ■ Normally, you don't specify a segment address when you access data. Every data reference is relative to one of the four segment registers──CS, DS, SS, or ES──so the segment address is implied. ■ Most of the time, you don't need to tell the processor which segment register to use. By default, the processor uses CS for code addresses, DS for data addresses, and SS for stack addresses, except where otherwise noted in this section. ■ You initialize segment registers at the beginning of your program. Once initialized, you can continue to use the segment addresses stored in those registers. If the program uses medium, large, huge, or compact model, you may need to periodically reload one or more of the segment registers. These memory models let you use more than 64K of code or 64K of data. However, if the program uses small or tiny model, you never reload a segment register except in the following situations: to access a special hardware-defined location in memory, such as the video-display area, or to access far memory allocated to the program by DOS function 48H. Although each memory operand has a default segment register (usually DS, unless the operand uses BP), you can specify another segment register by using the segment override operator (:). The following example loads the variable far_away residing in the segment pointed to by ES: mov ax,es:far_away For more information on this operator, see Section 9.2.3, "Segment-Override Operator." The CS Register The processor always uses the CS (Code Segment) register as the segment address of the next instruction to execute; IP (Instruction Pointer) holds the offset address. CS:IP represents the full address of the next instruction. Near jumps and procedure calls alter the value of IP. Far jumps and procedure calls alter both CS and IP. You never alter CS directly because the far jump and call instructions do so automatically. Furthermore, DOS initializes CS for you at the beginning of the program. The DS Register By default, the processor uses the DS (Data Segment) register as the segment address for program data. String instructions and indirect memory operands present some exceptions to this rule. With indirect memory operands, the use of BP anywhere in the operand causes SS to be the default segment register. Otherwise, DS is the default. All the Microsoft standard memory models place the most frequently used data in an area pointed to by DS. This area is commonly called the "default data area," and it can be no larger than 64K. These memory models use the ES register to access data outside the default data area. Your own programs can either use this technique, or else reload DS whenever you enter a new module. The standard method has the advantage of providing fast access to the most frequently used data. The SS Register When the processor accesses data on the stack, it uses the SS (Stack Segment) register as the segment register. (See the description of SP in Section 2.5.3 for more information about the stack.) Thus, SS:SP always points to the current stack position. Indirect memory operands involving BP also use SS as the default segment register. The Microsoft standard memory models set SS equal to DS. This setting makes some programming tasks easier. In particular, it lets you address stack or data addresses with either register. If you have to reload DS, you can always access items in the default data area by using an SS override. The ES Register The ES (Extra Segment) register is convenient for accessing data outside of the default data area. As demonstrated in Section 3.4, "Decimal Conversion with Far Data Pointers," you access far data by loading ES with the desired segment address, and then giving a segment override. Section 13.3.2, "Loading Far Pointers," provides further information. ES also plays a role in string instructions. With these instructions, the DI (Destination Index) register is always relative to the segment address in ES. ──────────────────────────────────────────────────────────────────────────── Chapter 3: Writing Assembly Modules for C Programs As a C programmer, you can take advantage of the superior speed and compactness of assembly-language routines. You can write most of your program in C, then write time-critical routines in assembly language. This chapter presents QuickAssembler programming techniques for interfacing to C. You can use similar techniques to interface with other languages. By using C with assembly language, however, you gain the advantage of being able to develop the entire program from within the integrated environment. If you've read Chapter 2, read this chapter to see how to use assembly language in a complete example module. If you skipped over Chapter 2, you may want to refer to it occasionally for basic concepts, such as instructions and registers. 3.1 A Skeleton for Procedure Modules Let's start by looking at the skeleton of a module with one procedure. The "skeleton" consists of statements that give basic structure to the module. Within this structure, you can supply most any instructions you want. Later sections of this chapter flesh out the skeleton by supplying useful code. The following skeleton assumes that the module is called by a small-model C program, and consists of one procedure which takes a single parameter, a pointer to a byte: .MODEL small,c .CODE dectoint PROC Array:PTR BYTE ; ; (supply executable code here) ; dectoint ENDP END Some features of the skeleton change when you write different procedures. Other parts may remain the same. In particular, you'll need to add a PROC and ENDP statement each time you add another procedure to the module. Before looking at a full program example, let's examine each part of the skeleton. 3.1.1 The .MODEL Directive The .MODEL directive gives general information about the module. It uses the following syntax: .MODEL memorymodel [[,langtype [[,stacktype]]]] The last two fields are optional. Commas are field separators and are only required if you use more than one field. Usually, you'll want to enter values in the first two fields. The memorymodel and langtype fields correspond to the memory model and language, respectively, of the calling module. If your C program declares your procedure to be of type pascal or fortran, use Pascal, BASIC, or FORTRAN in the langtype field. These keywords specify the use of the non-C calling and naming conventions. Otherwise, specify C as the langtype. Although the langtype field is optional, you should supply it since the PROC features described later in this chapter require it. Don't use the stacktype field unless the calling C program is compiled with SS not equal to DS, in which case you should type in farStack. (QuickC does not generate code that sets SS not equal to DS, but other versions of Microsoft C do support this option.) The default is nearStack, which assumes SS is equal to DS. 3.1.2 The .CODE Directive The .CODE directive marks the beginning of the code segment, which is the section of your program that contains the actual steps to execute: .CODE Statements that follow this directive are considered part of the code segment. The segment continues to the end of the module or the next segment directive. Typically, the code segment consists of instructions and procedure definitions. It can also contain macro calls. Some procedures work with static data. In Chapter 4, "Writing Stand-Alone Assembly Programs," you'll see how to declare a data segment in which you can place data declarations. 3.1.3 The PROC Directive Use the PROC directive to define a procedure. The name of the procedure appears in the first column: dectoint PROC Array:PTR BYTE Because the .MODEL statement specified C-language conventions, the assembler prefixes the name dectoint with an underscore (_), and places the name into object code as a public code label. If your procedure alters registers that should be preserved, the optional USES keyword automatically generates code to push the value of these registers on the stack and pop them when the procedure returns. Procedures called by C should not corrupt the value of SI, DI, or the segment registers CS, DS, or SS. (The value of BP is automatically preserved.) The following example shows how to preserve SI and DI for a procedure that changes these registers: dectoint PROC USES si di, Array:PTR BYTE The last part of the statement declares one or more parameters. In this case, the procedure declares a single parameter, Array, as a pointer to a byte. The most common parameter types you can declare are listed below: Declaration Meaning ────────────────────────────────────────────────────────────────────────── WORD Word (two bytes) DWORD Doubleword (four bytes) PTR BYTE Pointer to a byte; most commonly, a pointer to a character string PTR WORD Pointer to a word; typically, the address of an array of integers PTR DWORD Pointer to a doubleword For example, the following procedure definition declares a procedure named MidStr, which takes as parameters two pointers to character strings and one integer: MidStr PROC Str1:PTR BYTE, Str2:PTR BYTE, Index:WORD References to parameters are really references to locations on the stack. C modules pass parameters by pushing them on the stack just before calling the procedure. The BP register serves as a framepointer (a pointer to the procedure's stack area), and each parameter is an offset from BP. The exact offset of each parameter depends on the memory model and calling convention, both established by the .MODEL directive. When you use QuickAssembler procedure definitions, the assembler automates the work of referring to parameters. Instead of setting up the framepointer or calculating parameter offsets, you simply refer to parameters by name. You can also use these names with debugging commands. Appendix A, "Mixed-Language Mechanics," shows the actual code that establishes BP as the framepointer. It also shows how to calculate parameter offsets. Section 6.4.3, "Procedure Labels," gives the complete syntax and rules for using the PROC statement. 3.1.4 The ENDP and END Statements The module ends with two statements: ENDP, which declares the end of a procedure, and END, which declares the end of the module: dectoint ENDP END You can place any number of procedures in the same module. Each time you end a procedure, use ENDP. However, END should only occur once, at the end of the module. 3.2 Instructions Used in This Chapter The instructions below were introduced in Chapter 2, "Introducing 8086 Assembly Language." They are summarized here briefly for review. The first group of instructions manipulates data: Instruction Description ────────────────────────────────────────────────────────────────────────── MOV destination, source Copies value of source to destination ADD destination, source Adds source to destination, storing result in destination SUB destination, source Subtracts source from destination, storing result in destination INC destination Increment──adds 1 to destination DEC destination Decrement──subtracts 1 from destination MUL source Multiplies source by AX (if operand is 16 bits), storing high 16 bits in DX and low 16 bits in AX The second group of instructions controls the flow of program execution: Instruction Description ────────────────────────────────────────────────────────────────────────── CMP destination, Compare──subtracts source from destination, ignoring source result but setting processor flags appropriately JE label Jumps to label if result of last operation was equal to zero JAE label Jumps to label if result of last operation was equal to or above zero (unsigned operations) JMP label Jumps unconditionally to label 3.3 Decimal Conversion Example This section uses a decimal-conversion example to illustrate the use of some basic instructions and directives. It features an assembly module that takes a pointer to a null-terminated string of characters as input and returns an unsigned integer value. This chapter assumes that the value is unsigned. You can compute the value of a decimal string by multiplying each digit by a power of 10: 2035 = 2 x 10 cubed + 0 x 10 squared + 3 x 10 + 5 One way to calculate the value of the number is to calculate each power of 10 separately, then multiply each digit by the corresponding power. For example, you can calculate 10 cubed, and then multiply by 2. A much more efficient algorithm combines the calculations for powers of 10. The algorithm adds each digit to a running total, then multiplies the total by 10 after every digit but the last. The following pseudo-code represents this algorithm, and assumes that the first character in the string is the most significant digit: initialize total to 0 while there's another digit add value of digit to total advance to next digit if no more digits we're done else multiply total by 10 A simple C program that calls the procedure might look like this: extern unsigned int dectoint( char * ); main() { char digits[81]; gets( digits ); printf( "Numeric value is: %d", dectoint( digits ) ); } The procedure itself could be written in C as: unsigned int dectoint( char *Array) { unsigned int total = 0; /* Initialize total */ while( *Array != '\0' ) /* While there's another digit { total += *Array - '0'; /* Add value to total */ Array++; /* Advance to next digit */ if( *Array == '\0' ) /* If no more digits, */ break; /* we're done */ total *= 10; /* Else, multiply by 10 */ } return( total ); } This chapter shows how to write the same procedure in assembly language. The assembly-language version will be faster because it can make strategic use of registers and choose optimal instructions. You can write a main module with C code, place the assembly routine in a separate module with a .ASM extension, then link them together by creating a program list. ────────────────────────────────────────────────────────────────────────── NOTE You can build mixed-language programs by placing both .C and .ASM files in a program list. Place the main module first. In the Assembler Flags dialog box, make sure that you select either Preserve Case or Preserve Extrn (the default). From the QCL command line, use the /Cl (preserve case) or /Cx (preserve case of external symbols) option. QC calls the linker with case sensitivity on, so C and assembler symbols must match exactly. ────────────────────────────────────────────────────────────────────────── Before writing the assembly procedure, we first need to develop a strategy for using registers. The AX (Accumulator) register is ideal for keeping the running total. The algorithm changes this total through both addition and multiplication. The MUL instruction requires the use of AX. By keeping the total in AX at all times, the procedure avoids having to constantly reload this register. The BX register should be used to access the individual digits. The procedure receives the address of the digit string, and then retrieves each ASCII byte through pointer indirection. BX is one of the few registers that supports this operation. SI and DI could also be used this way, but C-generated code requires that SI and DI be preserved. BX can be freely altered. The procedure needs to allocate two more registers: one for holding the multiplication factor (10), and another for adjusting the binary value of the digit. The procedure uses CX and DX for these purposes. In this case, CX and DX are interchangeable. However, we use CX for multiplication now, because in the hex conversion example, CX will be needed for a special kind of multiplication──shifting bits. We use DX as an intermediate location to receive a byte and then add a word to AX. The complete assembly-language module is shown below: .MODEL small,c .CODE dectoint PROC Array:PTR BYTE sub ax,ax ; ax = 0 mov bx,Array ; bx = Array mov cx,10 ; factor = CX = 10 sub dx,dx ; dx = 0 cmp BYTE PTR [bx],0 ; Compare byte to NULL je done ; If byte=0 we're done top: mov dl,BYTE PTR [bx] ; Get next digit sub dl,'0' ; Convert numeral add ax,dx ; Add to total inc bx ; Point to next byte cmp BYTE PTR [bx],0 ; Compare byte to NULL je done ; If byte=0 we're done mul cx ; AX = AX * 10 jmp SHORT top ; Goto top of loop done: ret ; Exit procedure dectoint ENDP END We'll examine each section of the module in turn. The first three statements are directives that form part of the module's skeleton. The PROC directive, when used with one or more parameters as it is here, generates code to set the framepointer (BP) properly so that you can access parameters. .MODEL small,c .CODE dectoint PROC Array:PTR BYTE The rest of the module consists of instructions──the actual core of the program. The first four instructions initialize the registers AX, BX, CX, and DX. Note that when initializing a register to 0, the procedure uses SUB in preference to MOV. Any value subtracted from itself leaves zero in the destination operand. Although the result is the same, the SUB instruction is smaller and faster because it involves no immediate data. sub ax,ax ; ax = 0 mov bx,Array ; bx = Array mov cx,10 ; factor = CX = 10 sub dx,dx ; dx = 0 The next two instructions handle a special case──that of a string containing no digits at all. Recall that the procedure is passed a null-terminated string. The operand BYTE PTR [bx] is a memory operand referring to the byte pointed to by BX. If the string is empty, Array points to a null byte. The two instructions test for a 0 (null) value and jump to the end of the procedure if 0 is detected: cmp BYTE PTR [bx],0 ; Compare byte to NULL je done ; If byte=0 we're done In the CMP instruction above, the BYTE PTR operator is strictly required, because otherwise the assembler would have no way of knowing whether to compare 0 to the byte or a word pointed to by BX. However, when one of the operands is a register (as is the case with the MOV instruction below), the BYTE PTR operator is optional. The next eight instructions consist of a loop executed once for every digit character in the string. The label top indicates the top of the loop, and the first three instructions add the value of the digit to AX: top: mov dl,BYTE PTR [bx] ; Get next digit sub dl,'0' ; Convert numeral add ax,dx ; Add to total The first instruction above retrieves the digit. The next instruction converts the digit's ASCII value to the numeric value by subtracting the value of the character '0' (48 decimal). This statement works because the ASCII character set places all digit characters in sequence from 0 to 9. Finally, the procedure adds the resulting value to the running total stored in AX. Note that the operands in each case are the same size. The first two instructions above access DL, the low byte of DX. The next three instructions advance to the next byte in the string, and test it for equality to zero. Getting the next byte is just a matter of adding the value 1 to BX (with the INC instruction), so that BX points to the next byte. The other two instructions are identical to previous instructions that tested for zero value. inc bx ; Point to next byte cmp BYTE PTR [bx],0 ; Compare byte to NULL je done ; If byte=0 we're done If the next byte is a null byte, the processor jumps to the end of the program. Otherwise, the processor continues executing the bottom of the loop, which multiplies the current total by 10 (stored in CX), and then jumps to the top: mul CX ; AX = AX * 10 jmp SHORT top ; Goto top of loop Notice the operator SHORT used with the jmp instruction. This optional operator makes the encoded instruction smaller and faster, but it can be used only if the destination of the jump is less than 128 bytes away. SHORT is explained in more detail in Section 9.2.4.2. The loop is now complete. The rest of the module exits and marks the end of the segment and the module. The RET statement causes the assembler to generate instructions to do the following: restore the stack, restore the framepointer (BP), and return properly for the memory model (small) and calling convention (C). done: ret ; Exit procedure dectoint ENDP END Microsoft high-level languages always look for function return values in AX, if two bytes long, or in DX and AX, if four bytes long. If the return value is longer than four bytes, DX:AX points to the value returned. If the return value is one byte, AL contains the value. The C module that calls this procedure looks in AX for the return value──as does all high-level-language code that calls a function returning a two-byte value. In this case, AX already contains the results of the calculation. No further action is required. 3.4 Decimal Conversion with Far Data Pointers This section uses the same basic algorithm introduced in the last section, but presents coding techniques for different memory models. The .MODEL directive resolves all differences in the size of code addresses. However, when you use memory models that use far data pointers (compact, large, and huge), you must make some additional adjustments. The program below shows the module rewritten for large memory model. This example works for compact model if large in the first line is replaced with compact. .MODEL large,c .CODE dectoint PROC USES ds, Array:PTR BYTE sub ax,ax ; ax = 0 lds bx,Array ; ds:bx = Array mov cx,10 ; factor = CX = 10 sub dx,dx ; dx = 0 cmp BYTE PTR [bx],0 ; Compare byte to NULL je done ; If byte=0 we're done top: mov dl,BYTE PTR [bx] ; Get next digit sub dl,'0' ; Convert numeral add ax,dx ; Add to total inc bx ; Point to next byte cmp BYTE PTR [bx],0 ; Compare byte to NULL je done ; If byte=0 we're done mul cx ; AX = AX * 10 jmp SHORT top ; Goto top of loop done: ret ; Exit procedure dectoint ENDP END This procedure is the same as the one in the last section, except for two lines. The PROC directive now includes a USES clause, and the LDS instruction replaces the first MOV instruction. The procedure loads the DS register with the segment address of Array, thus causing subsequent data references to be relative to the new segment address. However, procedures called from C must preserve DS. The PROC statement, therefore, includes USES ds, which generates code to place DS on the stack. The LDS instruction (Load Data Segment) does the actual loading of the DS register. This instruction is similar to the MOV instruction: mov bx,Array ; bx = Array ; 2-byte data pointer lds bx,Array ; ds:bx = Array ; 4-byte data pointer The LDS instruction accomplishes two moves. First, it loads the offset portion of the pointer into the specified register (BX). Second, it loads the segment portion of the pointer into DS. ────────────────────────────────────────────────────────────────────────── NOTE For the LDS and LES instructions to work properly, the segment portion must be stored in the upper word of the four-byte (far) pointer. C meets this requirement by always pushing the segment portion of the pointer on the stack first. (The stack grows downward.) In your own programs, you declare far pointers with the DD directive. You initialize them by loading a segment address into the upper word of the pointer variable and an offset address into the lower word. ────────────────────────────────────────────────────────────────────────── 3.4.1 Writing a Model-Independent Procedure In the case of this procedure, the use of the LDS instruction is most convenient. Once DS is loaded with the new segment address, all subsequent memory references are automatically correct. No further adjustments are needed. The simplicity of this technique makes it easy to write a module that is completely independent of memory models. This module can then be linked with any C program. To adjust memory model, you simply change the .MODEL directive, and recompile. In fact, the memory model itself can even be specified with a compile flag so that source code never need change. The model-independent version contains only a few lines different from the previous example: % .MODEL mem,c .CODE dectoint PROC USES ds, Array:PTR BYTE sub ax,ax ; ax = 0 IF @DataSize lds bx,Array ; ds:bx = Array ELSE mov bx,Array ; bx = Array ENDIF The .MODEL directive operates on an undefined variable, mem. You define this variable on the QCL command line or in the Assembler Flags dialog box. For example, to assemble with QCL in compact model, enter the following text in the defines text box: /Dmem=compact The IF, ELSE, and ENDIF directives cause conditional assembly. The @DataSize predefined macro is equal to 1 (true) if the memory model uses far data pointers, and 0 (false) otherwise. The statement IF @DataSize begins a conditional-assembly block that assembles the LDS instruction if the memory model uses far data pointers; it assembles the MOV instruction otherwise. For more information on conditional assembly, see Chapter 10, "Assembling Conditionally." The USES clause is retained for all memory models, since even with small model it does no harm. However, to increase efficiency, you may wish to include the PROC statement inside conditional-assembly blocks. 3.4.2 Accessing Far Data through ES The LDS instruction is inconvenient if you need to access items in the default data segment, because you have no guarantee that DS still points to that area of memory. Therefore, it's sometimes more efficient to leave DS alone and use the ES register to access far data. The standard C memory models all use the LES instruction to access far data. You can also use this method, but it is not required, since it has no effect on the interface between modules. Give the LES instruction to load a far data pointer, which will load the ES register with the new segment address. Then give the ES override whenever you refer to data in the far segment. This method requires alteration of all instructions that access the string data: les bx,Array ; es:bx = Array . . . cmp es:BYTE PTR [bx],0 ; Compare byte to NULL Once ES is loaded with the segment address of far data, access objects in the default data area (the segment containing near data) as you normally would. Use the ES override to access the far data. 3.5 Hexadecimal Conversion Example The following example builds on the decimal example in Section 3.3, adding the additional logic needed to convert hexadecimal rather than decimal strings. Hexadecimal conversion can use an algorithm similar to the one used earlier for decimal conversion, with these adjustments made: ■ The procedure multiplies the running total by 16, not 10. ■ The procedure converts the letters A-F to numeric values, in addition to converting the numerals 0-9. You could make the first adjustment by loading CX with 16 instead of 10. A much more efficient method is to use the SHL (Shift Left) instruction to shift an object's bits left by four places. This effectively multiplies the object by 16. The second adjustment requires more complex logic. Hexadecimal digits can consist of either letters or numerals. The procedure must consider three different cases──one for each sequence of hexadecimal characters: Range of Characters Conversion Required ────────────────────────────────────────────────────────────────────────── 0-9 Convert to face value. Subtract ASCII value of '0'. A-F, and a-f Convert to values 10-15. Convert all letters to uppercase, then subtract ASCII value of 'A' and add 10. We convert all letters to uppercase in an optimized fashion by taking advantage of the ASCII coding sequence. Uppercase letters are coded as 41H onward. Lowercase letters are coded as 61H onward. Consequently, each lowercase letter differs from the corresponding uppercase letter by exactly one bit. We use the AND instruction, with the immediate operand 0DFH, to mask out this bit. This operation has the effect of setting the third highest bit to 0. 0110 0001 61h = 'a' 0100 0001 41h = 'A' AND 1101 1111 DFh 1101 1111 DFh ====================== ====================== result 0100 0001 41h = 'A' 0100 0001 41h = 'A' 0110 0010 62h = 'b' 0100 0010 42h = 'B' AND 1101 1111 DFh 1101 1111 DFh ====================== ====================== result 0100 0010 42h = 'B' 0100 0010 42h = 'B' The beauty of the operation is that it converts lowercase letters to uppercase, but leaves uppercase letters alone. If the third highest bit is already 0 (as is the case with uppercase letters), doing an AND operation with 0DFH has no effect. This operation removes the need to handle lowercase letters as a separate case. The revised algorithm does the following: initialize total to zero while there's another digit move byte to temporary location if ascii value < 'A' Subtract '0' else Convert lowercase to uppercase Subtract 'A'-10 add byte value to total advance to next digit if no more digits we're done else shift total left by four bits The assembly-language code below implements this algorithm. The code tests for each range, performing a different conversion for each case. Note the use of JB (Jump If Below), which jumps to the specified label if the previous comparison or subtraction produced a negative value──that is, if the first operand is less than the second. .MODEL small,c .CODE hextoint PROC Array:PTR BYTE sub ax,ax ; ax = 0 mov bx,Array ; bx = Array mov cl,4 ; Prepare to shift left by 4 sub dx,dx ; dx = 0 cmp BYTE PTR [bx],0 ; Compare byte to NULL je done ; if byte=0 we're done top: mov dl,BYTE PTR [bx] ; Move byte to DL cmp dl,'A' ; ASCII value >= 'A'? jae isletter ; If so, goto isletter sub dl,'0' ; Convert ascii to numeric jmp addbyte ; Go add value of byte isletter: and dl,0DFh ; Convert to uppercase sub dl,'A'-10 ; Convert ascii to numeric addbyte: add ax,dx ; Add value to total inc bx ; Point to next byte cmp BYTE PTR [bx],0 ; Compare byte to NULL je done ; If byte=0 we're done shl ax,cl ; AX = AX * 16 jmp SHORT top ; Goto top of loop done: ret hextoint ENDP END The beginning of the procedure initializes the CL register to 4. This step is necessary, because you can use the SHL instruction (Shift Left) in only two ways: you can shift by exactly one bit, or you can shift by the number of places indicated in CL. Clearly, using CL is more efficient than a sequence of four shift instructions. The main loop reads a character, tests it, and makes one basic decision: is the character a letter or not? This test takes advantage of the ASCII coding sequence. If the value of the character is equal to or greater than 'A', it cannot be one of the digits 0-9. The procedure uses the JAE instruction (Jump If Above or Equal) to test for this condition. top: mov dl,BYTE PTR [bx] ; Move byte to DL cmp dl,'a' ; ASCII value >= 'A'? jae isletter ; If so, goto isletter If the character is a letter, the procedure first converts the letter to uppercase──using an AND instruction that converts lowercase letters but leaves uppercase letters unchanged. The following instruction can then properly handle all letters the same way, regardless of their original case: isletter: and dl,0DFh ; Convert to uppercase sub dl,'A'-10 ; Convert ascii to numeric For simplicity, the procedure accepts invalid letters. You could easily enhance it to verify that the letters are hexadecimal. ──────────────────────────────────────────────────────────────────────────── Chapter 4: Writing Stand-Alone Assembly Programs With QuickAssembler, you can write stand-alone assembly programs to produce small, efficient utilities. For example, you might write a utility in assembly language to count the number of lines or paragraphs in a file. These programs start and end with assembly code and generally do not involve any links to high-level languages. Stand-alone assembly programs can yield remarkably small .EXE files. They require relatively little space, because they do not include the start-up code for a high-level language. And often you can make your assembly program even smaller by converting it to a .COM file as shown in this chapter. Some useful .COM files take up less than 100 bytes of memory. This chapter first describes the directives you need to write stand-alone assembly programs, reviews instructions used in the chapter's examples, and then presents a simple stand-alone program. Next, Sections 4.4-4.6 look closely at each segment of the program: stack, data, and code. Finally, the chapter describes how to create a program in the .COM format. 4.1 A Skeleton for Stand-Alone Programs This chapter uses the simplified segment directives described in the previous chapter, and introduces three more directives──.STACK, .DATA, and .STARTUP. The simplified segment directives produce programs using the Microsoft standard segment format. This format is not required, since your stand-alone program need not be compatible with a high-level-language module. However, the standard format is convenient because you can specify a number of different memory models, and you are freed from having to specify segment names, attributes, and register assumptions. ────────────────────────────────────────────────────────────────────────── NOTE Occasionally, you may need a customized segment structure. Linking assembly code to a non-Microsoft language is the most common situation that requires customized segments. QuickAssembler lets you use full segment definitions any time you need to customize segments. However, you should find that simplified segment directives support the vast majority of assembly-language programming you do──even when you write .COM files. ────────────────────────────────────────────────────────────────────────── The skeleton for the programs in this chapter includes a stack, data, and code segment. Note that one of the directives, .MODEL, will change when you alter the memory model. The other statements remain the same. .MODEL small ; Use small memory model .STACK 100h ; Declare 256-byte stack .DATA ; ; (place data declarations here) ; .CODE .STARTUP ; Set up DS, SS, and SP registers ; ; (place executable code here) ; END Sections 4.1.1-4.1.3 examine each of the statements in this skeleton more closely. 4.1.1 The .MODEL Directive The .MODEL directive performs the same role that it did in the previous chapter; it defines the overall attributes of the module. Note, however, that with a stand-alone program, a language type is not always required. A language type is useful when a module contains one or more procedures. Otherwise, you need only type .MODEL followed by a memory model: .MODEL small ; Use small memory model The memory model can be TINY, SMALL, MEDIUM, COMPACT, LARGE, or HUGE. Most of these memory models may be familiar to you if you have used QuickC. For a complete description of each memory model, see Section 5.1.1. The TINY memory model is new; it alone results in the creation of a .COM file rather than a .EXE file. Section 4.8, "Creating .COM Files," gives a complete example featuring the use of tiny memory model. Generally, to change memory model you change the .MODEL directive. You also change the way you load and use data pointers, as described in Chapter 3, "Writing Assembly Modules for C Programs." With these changes made, many programs can readily be reassembled for a new memory model. (However, as you'll see in Chapter 5, "Defining Segment Structure," you cannot use .FARDATA segments in tiny, small, or medium model, and this may require further revision of code in some cases.) 4.1.2 The .STACK, .CODE, and .DATA Directives Each of the segment directives──.STACK, .CODE, and .DATA──declares the beginning of a segment. The code and data segments begin with .CODE and .DATA, respectively. Each of these segments continues to the next segment directive or the end of the program. The data segment contains data and symbolic constant declarations. The code segment contains instructions. However, the stack segment consists of only one line: .STACK [[size]] By default, QuickAssembler interprets size according to the current radix, which by default is decimal. You can specify a hexadecimal constant by using the H suffix. (Example: 200h.) The size argument is optional. If you leave it out, the assembler creates a stack 1024 bytes long. Unless the program is written in tiny memory model, you should always declare a stack segment in your main module. Section 4.4, "Inside the Stack Segment," explains the purpose of this segment. 4.1.3 The .STARTUP Directive Unlike C programs, assembly-language programs have to initialize register values. Specifically, the program has to initialize DS, the Data Segment register; CS and IP, which point to the first instruction to execute; and SS and SP, the stack registers. By far the easiest way to initialize all these registers is to just include .STARTUP, a simple directive that takes no arguments: .STARTUP ; Set up DS, SS, and SP registers When you use this directive, the assembler generates code to initialize your registers the way Microsoft high-level languages do. The generated code is similar to some of the instructions in the C start-up code. The directive takes care of minimal start-up, but many programs will need to do additional start-up tasks, such as releasing unused memory. ────────────────────────────────────────────────────────────────────────── NOTE The start-up sequence adjusts SS and SP so that SS is equal to DS. This starting condition gives you some advantages. If you later have to alter the value of DS, you can always access a data object as an indirect operand using BP, or through an SS segment override. To avoid this starting sequence, so that the stack and data are separate physical segments, use the farStack keyword with the .MODEL directive, as described in Section 5.1.3. ────────────────────────────────────────────────────────────────────────── 4.2 Instructions Used in This Chapter This section summarizes the instructions used in this chapter. Because the program examples are simple, only a very few of the 80-odd instructions of the 8086 are featured here. This chapter features four instructions: Instruction Description ────────────────────────────────────────────────────────────────────────── MOV destination, Moves source to destination source INT number Generates the indicated interrupt signal, causing processor to call a memory-resident interrupt routine DEC destination Decrement──subtracts 1 from destination JNZ label Jump If Not Zero──jumps to label if result of last operation was not zero Most of the instructions above were introduced in Chapter 2, "Introducing 8086 Assembly Language." The new instruction is INT. The INT instruction generates a software interrupt signal, causing the processor to call an interrupt service routine usually residing in a DOS or ROM-BIOS memory area. This call is much like a procedure call; the processor executes a specific function and returns to the program when the routine is complete. There are two major differences between an interrupt call and a procedure call. First, instead of calling a procedure you have written, an INT instruction calls a DOS system routine or ROM-BIOS service. These low-level routines carry out a variety of basic operations, such as reading the keyboard, writing to the screen, or using the file system. Most DOS services are accessed through interrupt 21H (33 decimal). The second major difference is syntactic. You follow the INT keyword by an interrupt number (in the range 0 to 255), rather than a procedure name. In many cases, you further specify the interrupt routine by loading AH with a function number. 4.3 A Program That Says Hello The following sample program prints Hello world and then successfully exits back to DOS. You can use this program as a template and insert your own code and data. .MODEL small ; Use small model .STACK 100h ; Allocate 256-byte stack .DATA message DB "Hello, world.",13,10 ; Message to print lmessage EQU $ - message ; Determine length of message .CODE .STARTUP ; Use standard startup code mov bx,1 ; Load 1 - file handle ; for standard output mov cx,lmessage ; Load length of message mov dx,OFFSET message ; Load address of message mov ah,40h ; Load no. of DOS Write function int 21h ; Call interrupt 21H (DOS) mov ax,4c00h ; Load no. of DOS Exit function ; in AH, and 0 exit code in AL int 21h ; Call interrupt 21H (DOS) END The first statement determines the memory model of the program: .MODEL small ; Use small model This statement specifies small memory model, which places code and data in two separate segments, each of which cannot exceed 64K. The next few sections consider the rest of this program──stack, data, and code. 4.4 Inside the Stack Segment The stack segment is the easiest to create, because with simplified segment directives you enter only one statement: .STACK 100h ; Allocate 256-byte stack Each processor or interrupt call uses up stack space. The stack stores return addresses, parameters, and local variables for each procedure called. When a procedure or interrupt routine returns, the stack space it used is restored. The more procedure calls your program makes without returning, the more stack area it requires. Programs that nest many procedures or use recursion (in which a procedure calls itself repeatedly) may require large stacks. Unfortunately, there is no formula for determining how large a stack is needed. A 256-byte stack (100 hexadecimal) is adequate for most small programs. For this sample program, which makes one interrupt call but no procedure calls, 256 bytes provides an ample margin of error. You can also create a stack by using full segment definitions. See Section 5.2, "Full Segment Definitions," for more information. 4.5 Inside the Data Segment A single keyword declares the beginning of the segment: .DATA QuickAssembler considers all statements following this line to lie in the data segment, up until the next segment declaration or END directive. The END directive marks the end of the source file. The next two statements are directives that declare a string of characters and a symbolic constant: message DB "Hello, world.",13,10 ; Message to print lmessage EQU $ - message ; Determine length of message The first statement above declares a series of bytes. The label message is a symbolic name that QuickAssembler associates with the string's starting address. The assembler allocates 15 bytes in the data segment, and initializes these bytes to the ASCII values for H, e, l, l, o, and so forth. The values 13 and 10 indicate a carriage return and line feed, respectively, causing the program to move the cursor to the beginning of the next line when it prints the string. The second directive in the data segment declares a symbolic constant equal to the length of the string: lmessage EQU $ - message ; Determine length of message Again, the item in the first column, lmessage, is the label of the statement. The EQU directive equates the label with the value of the operand itself. EQU does not allocate memory. The operand field contains $ - message, which in this case equals 15. We could just as easily have entered 15 in the operand field. However, the item $ - message is guaranteed to be equal to the length of the string, even if you later rewrite the initial string value. The dollar sign ($) is the "location counter," which represents the current address of the statement. QuickAssembler translates the full expression as "Take the current address ($) and subtract the address of message." The current address is one byte after the end of the string. Thus, $ - message is automatically equal to the length of the string. 4.6 Inside the Code Segment A single keyword declares the beginning of the code segment: .CODE The code segment consists of all statements between .CODE and the END statement, which marks the end of the source code. In this example, all the statements in the code segment, aside from .STARTUP, are instructions. The program has three basic tasks. Each instruction helps carry out one of these operations: 1. Initialize registers 2. Call a DOS function to print the message 3. Call a DOS function to exit the program gracefully The .STARTUP directive initializes registers. If you write a main module without this directive, you must explicitly initialize DS, CS, and IP. Furthermore, if you want SS to equal DS (which gives some programming advantages), you must adjust both SS and SP. To see how to initialize registers without the use of .STARTUP, see Chapter 5, "Defining Segment Structure." After registers are initialized, a series of five instructions makes the call to DOS that prints the message: mov bx,1 ; Load 1 - file handle for ; standard output mov cx,lmessage ; Load length of message mov dx,OFFSET message ; Load address of message mov ah,40h ; Load no. of DOS Write function int 21h ; Call interrupt 21H (DOS) The first four instructions prepare for the DOS call. Interrupt calls generally use registers to receive parameters. Unlike procedure calls, they do not reference the stack for this information. The DOS Write function uses the following registers to receive data: Register Data ────────────────────────────────────────────────────────────────────────── AH Selects the DOS function. 40H is the Write function. BX File handle to which to write. The number 1 is a reserved file handle that always corresponds to standard output. "Standard output" is normally synonymous with the computer screen, unless you redirect program output. If you were writing to a file, you would first open the file and use the file handle returned by the DOS open-file function. CX Length of the message. The second statement in the data segment determined this length. DS:DX The beginning address of the actual message text. Remember that DS was loaded earlier with the address of the data segment, so it does not need to be reloaded now. This procedure uses the OFFSET operator to load DX with the address of the message. Although variables are translated to addresses, the processor normally interprets a variable address as a memory operand──that is, the processor operates on the data at the address, not the address itself. The OFFSET operator extracts the offset portion of the address and turns it into an immediate operand. If the OFFSET operator was not used the DOS routine would not receive the address of message, but would instead receive the value of the first byte. The OFFSET operator is similar to the address operator (&) in C. Use it whenever you need to pass an address rather than a value. After the interrupt service returns, the AX register contains the number of bytes written. The programs in this chapter do not use this return value, but a more sophisticated program might. In particular, if AX (number of bytes written) is less than CX (number of bytes requested to be written), then an error has occurred. Each DOS function has its own conventions for receiving data in different registers. Consult the Microsoft MS-DOS Programmer's Reference for a complete description of each function. The Assembler Contents selection from the Help menu also describes the major DOS functions. ────────────────────────────────────────────────────────────────────────── NOTE Each DOS function has conventions for getting and returning values in registers and flags. Bear in mind that values placed in any of these registers may change. If you need to preserve register values before making a DOS call, use the PUSH and POP instructions. See Section 13.4.1, "Pushing and Popping," for more information on how to preserve register values. ────────────────────────────────────────────────────────────────────────── The INT instruction makes the actual call to DOS. The interrupt number for the majority of DOS functions is 21H. You use different interrupt numbers to call ROM-BIOS services. The final two instructions cause the program to terminate operation and return control to DOS. High-level language programmers can ignore the need to exit a program explicitly, if they like. But when you write a stand-alone assembly program, you don't have this luxury. The program must exit explicitly. Otherwise, the processor continues to execute random instructions after the end of the program, making the system appear to crash. The DOS Exit function (service 4CH) is the preferred method for exiting back to DOS. This function uses two register values: Register Data ────────────────────────────────────────────────────────────────────────── AH Selects the DOS function. 4CH is the Exit function. AL Exit code. Batch files can use this exit code as an "errorlevel" indicator. An exit code of 0 usually indicates no error. A single instruction loads both registers: mov ax,4c00h ; Load number of DOS Exit func ; in AH, and 0 exit code in A single MOV instruction actually moves data into two registers──AH and AL. AH is loaded with 4CH, the function number for the DOS exit function, and AL is loaded with 0, an exit code indicating no error. Finally, another INT instruction calls DOS. int 21h ; Call interrupt 21H (DOS) 4.7 Making the Program Repeat Itself Once you understand the template for writing stand-alone programs, you can alter the sample program given above and generate your own code. This section alters the sample program so that it prints out a different message, and prints it ten times. The new sample program is listed below: .MODEL small ; Use small model .STACK 100h ; Allocate 256-byte stack .DATA message DB "Hello, ten times.",13,10 ; Message to print lmessage EQU $ - message ; Determine length of message count DW 10 .CODE .STARTUP ; Use standard startup code mov bx,1 ; Load 1 - file handle for ; standard output mov cx,lmessage ; Load length of message mov dx,OFFSET message ; Load address of message printit: mov ah,40h ; Load no. of DOS Write functi int 21h ; Call interrupt 21H (DOS) dec count ; count = count-1 jnz printit ; if count > 0, print again mov ax,4c00h ; Load DOS 4C function number ; in AH, and 0 exit code in int 21h ; Call interrupt 21H (DOS) END Note the following changes: ■ The string data is different. ■ The data segment includes a new variable, count. ■ One of the instructions is now labeled printit. ■ Two additional instructions decrement count, then loop back to the label printit if count is greater than zero. The string data is longer than before, and QuickAssembler must allocate more bytes than in the previous version of the program. However, the EQU statement that follows guarantees that the assembler still calculates string length correctly: message DB "Hello, ten times.",13,10 ; Message to print lmessage EQU $ - message ; Determine length of message The new variable is actually a memory location of word size (two bytes). QuickAssembler allocates another two bytes in the data segment, and initializes these bytes: count DW 10 The label count becomes associated with the address of the data, and the number 10 is the initial value placed at this memory location. However, the value can change. The instruction mov ah,40h now has a label, because the program needs to return here to repeat the print operation. Not all instructions need a label──only those that the program may need to jump to directly. The two new instructions cause the program to repeat the print operation ten times: dec count ; count = count-1 jnz printit ; if count > 0, print again The DEC instruction subtracts 1 from the memory location count, and sets processor flags according to the result of the operation. JNZ then jumps to the specified label if the result was not zero. The combined effect of these two instructions is to repeat the previous instructions (from printit onward) ten times. To change the number of repetitions, initialize count with a different value. Note that the DOS print function returns a value in the register AX──specifically, the number of bytes written. The program jumps back to printit so that AH is reloaded before the call to DOS. You can optimize this program further by using a register instead of the memory location count. For example, to use the register SI as the counter, follow these steps: ■ Remove the declaration of count. ■ Initialize SI to 10 at the beginning of the program with the instruction mov si,10. ■ Decrement SI instead of count near the bottom of the loop. With this program, it's safe to use SI as the counter, since SI is not needed for any other purpose. However, some programs make special use of SI. In these cases, it may be more efficient to place the count in a variable. 4.8 Creating .COM Files You can use QuickAssembler to produce .COM files as well as .EXE files. (However, these programs cannot contain any C modules.) Most of the memory models, ranging from small to large, produce a .EXE file. The tiny memory model is special because it alone supports creation of a .COM file. ────────────────────────────────────────────────────────────────────────── NOTE To produce a .COM file, you must not only use tiny memory model, but also select Generate COM File from the Linker Flags dialog box (choose Make from the Options menu), or else give the /TINY linker option on the QCL command line. ────────────────────────────────────────────────────────────────────────── Each .COM file has only one physical segment and is limited in size to a total of 64K. A .COM file has no executable-file header or relocation-table entries. Because DOS doesn't have to examine a file header or adjust relocatable segment addresses, it loads the .COM file slightly faster. DOS initializes all segment registers (including DS) to point to the first available memory address. The Stack Pointer, SP, is set to 64K above the start of the program. Unlike .EXE files, .COM files have no definite stack area. Instead, the stack starts at offset address FFFE hexadecimal and continues to grow downward until it overlaps code and data areas. At that point, program failure is likely. Simplified segment directives in QuickAssembler now provide direct support for .COM files. The template is, in fact, smaller than the template for a .EXE file. The code below shows the example in Section 4.3, "A Program That Says Hello," revised to produce a .COM file: .MODEL tiny ; Produce a .COM file .DATA message DB "Hello, world.",13,10 ; Message to print lmessage EQU $ - message ; Determine length of message .CODE .STARTUP mov bx,1 ; Load 1 - file handle for ; standard output mov cx,lmessage ; Load length of message mov dx,OFFSET message ; Load address of message mov ah,40h ; Load no. of DOS Write function int 21h ; Call interrupt 21H (DOS) mov ax,4c00h ; Load no. of DOS Exit function ; in AH, and 0 exit code in AL int 21h ; Call interrupt 21H (DOS) END A tiny-model program could be produced by simply taking the small-model version from earlier in the chapter, and changing the first line to the following: .MODEL tiny The code would then run correctly. However, the sample code in this section takes advantage of tiny model by eliminating the stack segment. DOS initializes the SS (Stack Segment) register and SP (Stack Pointer) register for you, so you need not declare a stack. The assembler ignores stack segments in tiny model. The program still includes the .STARTUP directive. With tiny model, all this directive does is generate the statement ORG 100h. ────────────────────────────────────────────────────────────────────────── NOTE The statement ORG 100h is necessary for programs in the .COM format, and must appear just before the first line of executable code. ORG 100h starts the location counter at 100 hexadecimal, reflecting the way that DOS loads .COM files into memory. (DOS reserves the first 256 bytes for the Program Segment Prefix (PSP).) See Section 6.6, "Setting the Location Counter," for more information on the ORG directive. ────────────────────────────────────────────────────────────────────────── With tiny-model programs, QuickAssembler lets you define separate code and data segments, but combines these segments into a single physical segment, called a "group." QuickAssembler places the code segment first regardless of how you write your source code. The resulting .COM file assumes a single segment address for the whole program (as required by the structure of a .COM file), and execution automatically begins at the proper address. Finally, Quick-Assembler directs the linker to output a file in the .COM format rather than the .EXE format. ────────────────────────────────────────────────────────────────────────── NOTE "Groups" are a standard concept in 8086 assembly language. You can place a series of segments into a group. The total size must not exceed 64K. The linker responds by combining all the segments into a single physical segment in which all addresses share the same segment address. For a fuller explanation of groups and segments, see Chapter 5. ────────────────────────────────────────────────────────────────────────── When you write .COM files, you must observe some important restrictions. You cannot use program-defined segment addresses. Similarly, you have no access to defined segment addresses, such as @data and @code. Because .COM files lack relocation-table entries, DOS cannot adjust segment addresses at load time. The program must use absolute segment addresses or else assume the loading segment address that DOS assigns. The principal restriction is that you cannot refer to program-defined segment addresses. Therefore, memory references can be of three kinds: 1. Any memory location within the 64K program area. For these memory references, you do not load a new value into any of the segment registers. 2. Hard-coded locations in memory that have special meaning at the system or hardware level. A video-page address, such as B800:0000, is such a special segment address. 3. An address returned to you by a DOS or ROM-BIOS function. For example, DOS function 48H, Allocate Memory, returns a pointer to a block of dynamically allocated memory. 4.9 Creating .COM Files with Full Segment Definitions You don't generally need to use full segment definitions to create .COM files. However, when you do use these directives with programs written in .COM format, you need to follow certain rules. The assembler automatically follows most of these rules when you use simplified segment directives. The guidelines for .COM format are listed below: ■ Place the entire program into one physical segment. It's possible to divide your program into separate logical segments, then group them into one physical segment with the GROUP directive. Simplified segment directives, in fact, use this technique with tiny model. However, you must ensure that code, not data, appears at the beginning of the .COM file. A number of different factors affect segment ordering, so it may be hard to ensure that the code segment appears first. Thus, creating just one segment is the more reliable method. In contrast, when you use simplified segment directives with tiny model, the assembler always places the code segment at the beginning of the .COM file. ■ Use the ASSUME directive to inform the assembler that all segment registers will point to the beginning of the segment. At load time, DOS sets all segment registers to this address. The ASSUME directive informs the assembler of this fact so that it can correctly calculate offset addresses. This directive is not necessary when you use simplified segment directives. ■ Use the ORG directive to set the location counter. At load time, DOS sets the starting address to 100H. The first 100H bytes are reserved for the Program Segment Prefix (PSP). The statement ORG 100h is necessary for the assembler to assign addresses in a way consistent with run-time conditions. Otherwise, jump instructions and data references will be wrong. When you use simplified segments directives with tiny model, the assembler automatically sets the location counter to 100H. ■ Use the END statement to take one argument: a starting address. This argument is not necessary if you use the .STARTUP simplified segment directive, because the program automatically begins execution wherever you place .STARTUP. The modified procedure is shown below: _TEXT SEGMENT 'CODE' ; Define code segment ASSUME cs:_TEXT,ds:_TEXT,ss:_TEXT ORG 100h start: jmp begin message DB "Hello, world.",13,10 ; Message to print lmessage EQU $ - message ; Determine length of message begin: mov bx,1 ; Load 1 - file handle ; for standard output mov cx,lmessage ; Load length of message mov dx,OFFSET message ; Load address of message mov ah,40h ; Load no. of DOS Write function int 21h ; Call interrupt 21H (DOS) mov ax,4c00h ; Load no. of DOS Exit function ; in AH, and 0 exit code in AL int 21h ; Call interrupt 21H (DOS) _TEXT ENDS END start The first three statements are new. The SEGMENT statement defines the beginning of a segment named _TEXT. (Instead of using the name _TEXT, you can choose any other valid symbolic name.) The ASSUME statement then informs the assembler that the CS, DS, and SS segment registers will all point to the beginning of this segment at run time. Finally, the ORG statement informs the assembler that the instruction pointer will be set to 100H. _TEXT SEGMENT ; Define code segment ASSUME cs:_TEXT,ds:_TEXT,ss:_TEXT ORG 100h The body of the procedure now includes code and data together in the same segment. The first item in the segment must be an instruction, because .COM files always begin execution at the start of the file. Attempting to execute data would almost certainly cause program failure. Since there is no separate data segment, the first instruction jumps around the data declarations. start: jmp begin message DB "Hello, world.",13,10 ; Message to print lmessage EQU $ - message ; Determine length of message begin: mov bx,1 ; Load 1 - file handle for ; standard output Another way to write a program for .COM format is to place data declarations after the end of the instructions. However, the assembler often produces better results if you place data declarations early in the source file. That way, you avoid forward references to data. The source file ends by giving an argument to the END statement. This statement is necessary because the program does not use the .STARTUP directive. The argument to END must be the label of the first instruction executed: END start ──────────────────────────────────────────────────────────────────────────── PART 2: Using Directives Part 2 of the Programmer's Guide (comprising Chapters 5-12) describes the directives and operators recognized by the Microsoft QuickAssembler. Directives are nonexecutable statements that give general information to the assembler. Some of the more important directives declare program structure, define data, and create macros. Operators indicate calculations to be performed at assembly time. Chapters 5-8 present the basic directives you need to write a program, including segment, data, multimodule, and structure directives. Chapter 9 deals specifically with operators. Chapter 10 describes conditional assembly, and Chapter 11 presents macros, a technique for replacing a series of frequently used instructions with a single statement. The directives that control your output are covered in Chapter 12. ──────────────────────────────────────────────────────────────────────────── Chapter 5: Defining Segment Structure A segment is an area in memory up to 64K in size, in which all locations share the same segment address. The 8086 assembly-language modules use segments for two reasons: ■ Segments provide a convenient means for dividing a program into its major divisions──code, data, constant data, and stack. ■ The architecture of the 8086 requires some use of segments. Every reference to memory must be relative to one of the four segment registers, as described in Section 2.7, "Segmented Addressing and Segment Registers." Segment definitions make it possible for QuickAssembler to assume the use of the same segment register for a large number of different addresses. You can define segments by using simplified segment directives or full segment definitions. In most cases, simplified segment directives are a better choice. They are easier to use and more consistent, yet you seldom sacrifice any functionality by using them. Simplified segment directives automatically define the segment structure required when combining assembler modules with modules prepared with Microsoft high-level languages. Although more difficult to use, full segment definitions give more complete control over segments. A few complex programs may require full segment definitions in order to get unusual segment orders and types. This chapter describes both methods. If you choose to use simplified segment directives, you will probably not need to read about full segment definitions. 5.1 Simplified Segment Directives Simplified segment directives provide an easy way to write assembly-language programs. They handle some of the difficult aspects of segment definition automatically, and assume the same conventions adopted by Microsoft high-level languages. When you write stand-alone assembler programs, the simplified segment directives make programming easier. The Microsoft conventions are flexible enough to work for most kinds of programs. When you write assembler routines to be linked with Microsoft high-level languages, the simplified segment directives ensure against mistakes that would make your modules incompatible. The names are automatically defined consistently and correctly. The simplified segment directives automatically generate the same ASSUME and GROUP statements used by Microsoft high-level languages. You can learn more about the ASSUME and GROUP directives in Sections 5.3 and 5.4. However, for most programs you do not need to understand these directives. Simply use the simplified segment directives in the format shown in the examples. 5.1.1 Understanding Memory Models To use simplified segment directives, you must declare a memory model for your program. The memory model specifies the default size of data and code used in a program. Microsoft high-level languages require that each program have a default size (or memory model). Any assembly-language routine called from a high-level language program should have the same memory model as the calling program. The C compiler provided with QuickAssembler supports all models except tiny. If you use assembly modules with a different compiler, the compiler documentation should tell what memory models are supported. The most commonly used memory models are described below: Model Description ────────────────────────────────────────────────────────────────────────── Tiny All data and code fit in a single physical segment (group). Tiny-model programs can be converted to .COM-file format with the Generate COM File option in the Linker Flags dialog box (or the linker /TINY option used with QCL). Tiny-model programs have restrictions described in Chapter 4, "Writing Stand-Alone Assembly Programs." Small All data fits within a single 64K segment, and all code fits within a 64K segment. Therefore, all code and data can be accessed as near. This is the most common model for stand-alone assembler programs. C is the only Microsoft language that supports this model. Medium All data fits within a single 64K segment, but code may be greater than 64K. Therefore, data is near, but code is far. Most recent versions of Microsoft high-level languages support this model. Compact All code fits within a single 64K segment, but the total amount of data may be greater than 64K (although no array can be larger than 64K). Therefore, code is near, but data is far. C is the only Microsoft high-level language that supports this model. Large Both code and data may be greater than 64K (although no array can be larger than 64K). Therefore, both code and data are far. All Microsoft high-level languages support this model. Huge Both code and data may be greater than 64K. In addition, any individual data array can be larger than 64K. From the standpoint of QuickAssembler, this memory model is almost equivalent to large model (the only exception is the meaning of the predefined equate @DataSize). If you want to support arrays larger than 64K, you must provide the program logic to support these arrays. Stand-alone assembler programs can have any model. Tiny and small model are adequate for most programs written entirely in assembly language. Since near data or code can be accessed more quickly, the smallest memory model that can accommodate your code and data is usually the most efficient. Mixed-model programs use the default size for most code and data but override the default for particular data items. Stand-alone assembler programs can be written as mixed-model programs by making specific procedures or variables near or far. Some Microsoft high-level languages have NEAR, FAR, and HUGE keywords that enable you to override the default size of individual data or code items. 5.1.2 Specifying DOS Segment Order The DOSSEG directive specifies that segments be ordered according to the DOS segment-order convention. This is the convention used by Microsoft high-level-language compilers. Syntax DOSSEG Using the DOSSEG directive enables you to maintain a consistent, logical segment order without actually defining segments in that order in your source file. Without this directive, the final segment order of the executable file depends on a variety of factors, such as segment order, class name, and order of linking. These factors are described in Section 5.2, "Full Segment Definitions." Since segment order is not crucial to the proper functioning of most stand-alone assembler programs, you can simply use the DOSSEG directive and ignore the whole issue of segment order. ────────────────────────────────────────────────────────────────────────── NOTE Using the DOSSEG directive (or the /DOSSEG linker option) has two side effects. The linker generates symbols called _end and _edata. You should not use these names in programs that contain the DOSSEG directive. Also, the linker increases the offset of the first byte of the code segment by 16 bytes in small and compact models. This is to give proper alignment to executable files created with Microsoft compilers. ────────────────────────────────────────────────────────────────────────── If you want to use the DOS segment-order convention in stand-alone assembler programs, you should use the DOSSEG argument in the main module. Modules called from the main module need not use the DOSSEG directive. You do not need to use the DOSSEG directive for modules called from Microsoft high-level languages, since the compiler already defines DOS segment order. Under the DOS segment-order convention, segments have the following order: 1. All segment names having the class name 'CODE' 2. Any segments that do not have class name 'CODE' and are not part of the group DGROUP 3. Segments that are part of DGROUP, in the following order: a. Any segments of class BEGDATA (this class name is reserved for Microsoft use) b. Any segments not of class BEGDATA, BSS, or STACK c. Segments of class BSS d. Segments of class STACK Using the DOSSEG directive has the same effect as using the /DOSSEG linker option. The directive works by writing to the comment record of the object file. The Intel(R) title for this record is COMENT. If the linker detects a certain sequence of bytes in this record, it automatically puts segments in the DOS order. 5.1.3 Defining Basic Attributes of the Module The .MODEL directive defines attributes that affect the entire module: memory model, default calling and naming conventions, and stack type. This directive should appear before any other simplified segment directive. Syntax .MODEL memorymodel[[[[,language]],stacktype]] Each of the three fields defines a basic attribute. The memorymodel field defines the segment structure of the module. The language field defines the default calling and naming conventions assumed by PROC statements. These conventions correspond to the high-level language you specify. The stacktype field determines whether or not the assembler assumes that the SS register is equal to the DS register. The memorymodel field can be TINY, SMALL, MEDIUM, COMPACT, LARGE, or HUGE. The assembler defines segments the same way for large and huge models, but the @DataSize equate (explained in Section 5.1.5, "Using Predefined Segment Equates") gives a different value for these two models. If you write an assembler routine for a high-level language, the memorymodel field should match the memory model used by the compiler or interpreter. If you write a stand-alone assembler program, you can use any model. Section 5.1.1 describes each memory model. The optional language field tells the assembler to follow the naming, calling, and return conventions appropriate to the indicated language. In addition, if you use the language argument, the assembler automatically makes all procedure names public. You can use C, Pascal, FORTRAN, or BASIC as the language argument. The last three are equivalent, since these languages share the same naming and calling conventions. Note that although the language field is optional, you will not be able to use the high-level language features of the PROC directive if you do not give it. Normally, you should specify a language with .MODEL. If you use C for the language argument, all public and external names are by default prefixed with an underscore (_) in the .OBJ file. Specifying any other language has no effect on the names. ────────────────────────────────────────────────────────────────────────── NOTE The assembler does not truncate names in order to match the conventions of specific languages, such as FORTRAN or Pascal. Moreover, using the C type specifier does not cause the assembler to preserve case. To preserve lowercase names in public symbols, choose one of the assembler flags that preserves case (Preserve Extrn or Preserve Case), or assemble with /Cx or /Cl on the QCL command line. Within the environment, the Preserve Extrn flag is on by default. ────────────────────────────────────────────────────────────────────────── See Appendix A for an explanation of how the different calling conventions are implemented. You should also note that each language has different defaults for passing parameters by value or by reference. Depending on which method is used, a high-level language passes a parameter either as a value or as a pointer to the value. The optional stacktype field determines whether or not the assembler assumes that SS is equal to DS. The default value is nearStack, which assumes that SS is part of the default data area, so that SS is equal to DS, and SP is set to the top of the data area. You can also use farStack, which assumes that the stack segment is in a separate physical segment from the default data area. If you write a module called from QuickC, you should always use the default (in other words, just leave the field blank), since QuickC always assumes DS equals SS. If you write modules for a compiler (such as the Microsoft Optimized C Compiler) that supports customized memory models, use farStack for models in which SS does not equal DS. If you write a stand-alone assembler program, you can choose either setting. If you use the .STARTUP directive, the assembler automatically generates the proper code for setting up the indicated stack type. If you write a stand-alone module without using .STARTUP, you should exercise caution. If you initialize DS but do not adjust SS and SP (as described in Section 5.5.3, "Initializing the SS and SP Registers), use the farStack keyword. If you do adjust SS and SP as described in Section 5.5.3, you can use the default value, nearStack. Example 1 DOSSEG .MODEL small,c This statement defines default segments for small-model programs and creates the ASSUME and GROUP statements used by small-model programs. The segments are automatically ordered according to the Microsoft convention. The example statements might be used at the start of the main (or only) module of a stand-alone assembler program. Example 2 .MODEL large,pascal This statement defines default segments for large-model programs and creates the ASSUME and GROUP statements used by large-model programs. It does not automatically order segments according to the Microsoft convention. The example statement might be used at the start of an assembly module that would be called from a large-model Pascal program or a C program in which the Pascal calling convention was specified. Example 3 .MODEL small,c,farStack This statement defines default segments for a small-model program and creates the appropriate ASSUME and GROUP statements. In addition, this statement makes all procedures public, and directs the assembler to prefix an underscore to the beginning of each public name, so that the naming convention is compatible with C. If you later use the PROC statement to declare parameters, the assembler will assume that the parameters are placed on the stack in the order specified by the C calling convention. In addition, the statement uses farStack, indicating that SS is not equal to DS. The last example would be appropriate for a module called by a C module with a customized memory model, compiled with a setting that did not assume SS equal to DS. Note that QuickC does not support customized memory models. ────────────────────────────────────────────────────────────────────────── NOTE The assembler does not normally display the code generated by the high-level-language support features. You can see the code produced by these features by using the .LALL directive or the /LA command-line option. ────────────────────────────────────────────────────────────────────────── To write procedures for use with more than one language and memory model, you can use text macros for the memory model and language arguments, and define the values from the command line or in the Assembler Flags dialog box. For example, the following .MODEL directive uses text macros for the memorymodel and language arguments: % .MODEL memmodel,lang ; Use % to evaluate memmodel, lang The values of the two text macros can be defined from the command line using the /D switch: QCL /Dmemmodel=MEDIUM /Dlang=C /AM /Cx main.c proc.asm 5.1.4 Defining Simplified Segments Each of the directives .CODE, .STACK, .DATA, .DATA?, .CONST, .FARDATA, .FARDATA?, and .STARTUP indicate the start of a segment. They also end the immediately preceding segment definition. Syntax .CODE [[name]] Code segment .STACK [[size]] Stack segment .DATA Initialized near-data segment .DATA? Uninitialized near-data segment .CONST Constant-data segment .FARDATA [[name]] Initialized far-data segment .FARDATA? [[name]] Uninitialized far-data segment .STARTUP Code to initialize segment registers For segments that take an optional name, the base file name of the source module is used if you do not specify a value yourself. Each new segment directive ends the previous segment. The END directive closes the last segment in the source file. 5.1.4.1 How to Use Simplified Segments The .CODE, .DATA, and .STACK directives create the three basic segments that programs generally need to have. Chapter 4, "Writing Stand-Alone Assembly Programs," demonstrates how to use these directives to write code, data, and stack segments. Chapter 4 also explains the purpose of each of these segments. The .STARTUP directive initializes segment registers to the appropriate segment values. Chapter 4 describes the use of .STARTUP, and Section 5.5 tells more about how .STARTUP works and what code it generates. When you write a mixed-language program, you generally don't need to declare a stack segment, because the start-up code in the C main module creates a stack for you. When you write a stand-alone program, you should declare a stack segment in the main module only. Your programs can also use the .DATA? and .CONST directives to create segments for uninitialized and constant data, respectively. With stand-alone assembler programs, the use of these directives is optional, because you can place all data in the segment defined by .DATA if you want. With mixed-language programs, use .DATA? and .CONST to ensure compatibility with the way C handles uninitialized and constant data. Once you define these segments, it is up to you to place the appropriate data in each segment. If your program is written in compact, large, or huge model, you can use the .FARDATA and .FARDATA? directives to define additional data segments. All the data in the other data segments (defined by .DATA, .DATA?, and .CONST) must not exceed a total of 64K across all modules. In addition, the stack segment is also placed into this 64K area unless you specify farStack with the .MODEL directive. Data in the .FARDATA and .FARDATA? segments takes slightly longer to access. However, there is generally much more room in these segments for data definitions. For each module, the .FARDATA and .FARDATA? directives each create a separate physical segment that can be up to 64K in size. The recommended procedure is to use .FARDATA for initialized data, and .FARDATA? for uninitialized data, although this is optional. With medium, large, and huge model, you can use the name attribute to create multiple code segments within a source module. With compact, large, and huge model, you can also use the name attribute to create multiple far-data segments. Example 1 DOSSEG .MODEL small,c .STACK 100h .DATA ivariable DB 5 iarray DW 50 DUP (5) string DB "This is a string" uarray DW 50 DUP (?) EXTRN xvariable:WORD .CODE .STARTUP EXTRN xprocedure:NEAR call xprocedure . . . END This code uses simplified segment directives for a small-model, stand-alone assembler program. Notice that initialized data, uninitialized data, and a string constant are all defined in the same data segment. See Section 5.1.7, "Default Segment Names," for an equivalent version that uses full segment definitions. Example 2 .MODEL, large,c .FARDATA? fuarray DW 10 DUP (?) ; Far uninitialized data .CONST string DB "This is a string" ; String constant .DATA niarray DB 100 DUP (5) ; Near initialized data .FARDATA EXTRN xvariable:FAR fiarray DW 100 DUP (10) ; Far initialized data .CODE TASK EXTRN xprocedure:PROC task PROC . . . ret task ENDP END This example uses simplified segment directives to create a module that might be called from a large-model, high-level-language program. Notice that different types of data are put in different segments to conform to Microsoft compiler conventions. See Section 5.1.7, "Default Segment Names," for an equivalent version using full segment definitions. 5.1.4.2 How Simplified Segments Are Implemented When you use the simplified segment directives described above, the assembler defines segments in a way compatible with Microsoft high-level languages. This section makes a number of references to groups and ASSUME statements. Both of these concepts arise from the need to deal with the 8086 segmented architecture. A "group" consists of one or more segments, totaling no more than 64K. When multiple segments are placed into a group, the linker combines these segments into a single physical segment. All addresses in the physical segment are adjusted so that they share the same segment address. Use of groups is convenient because it removes the need to constantly reload the DS register. The ASSUME directive is described at greater length in Section 5.4, "Associating Segments with Registers." This directive informs the assembler where a segment register will point to at run time so that the assembler can correctly calculate offset addresses relative to the value in the appropriate segment register. Unless you use tiny model, the code segment (defined with .CODE) is placed in its own physical segment, separate from all the data and stack segments. With medium, large, or huge model, you can define multiple code segments within one source model by using .CODE repeatedly, each time with a different name attribute. When you use this technique, each .CODE directive generates a new ASSUME statement so that the assembler knows where CS points to at run time. Segments defined with the .STACK, .CONST, .DATA, or .DATA? directives are placed in a group called DGROUP. Segments defined with the .FARDATA or .FARDATA? directives are not placed in any group. See Section 5.3 for more information on segment groups. When initializing the DS register to access data in a group-associated segment, the value of DGROUP should be loaded into DS. The .STARTUP directive does this initialization automatically. The .MODEL directive generates ASSUME statements to inform the assembler that at run time, DS, SS, and ES will all point to the beginning of DGROUP. You don't need to write these ASSUME statements yourself. If you specify farStack with the .MODEL directive, the stack is placed in a separate physical segment and the .MODEL directive generates an ASSUME statement to inform the assembler that SS does not point to the same segment address that DS does. 5.1.5 Using Predefined Segment Equates Several equates are predefined for you. You can use the equate names at any point in your code to represent the equate values. You should not assign equates having these names. The predefined equates are listed below: Name Value ────────────────────────────────────────────────────────────────────────── @CodeSize and If the .MODEL directive has been used, the value of @DataSize @CodeSize is 0 for the models that use near-code labels (tiny, small, and compact) or 1 for models that use far-code labels (medium, large, and huge). The value of @DataSize is 0 for models that use near-data labels (tiny, small, and medium), 1 for compact and large models, and 2 for huge models. These values can be used in conditional-assembly statements. IF @DataSize les bx,pointer ; Load far pointer mov ax,es:WORD PTR [bx] ELSE mov bx,WORD PTR pointer ; Load near pointer mov ax,WORD PTR [bx] ENDIF @CurSeg This name has the segment name of the current segment. This value may be convenient for ASSUME statements, segment overrides, or other cases in which you need to access the current segment. It can also be used to end a segment. @FileName This value represents the base name of the current source file. For example, if the current source file is TASK.ASM, the value of @FileName is TASK. This value can be used in any name you would like to change if the file name changes. For example, it can be used as a procedure name: @FileName PROC . . . @FileName ENDP @Model As with the @CodeSize and @DataSize predefined equates, you must first use the .MODEL directive before using the @Model equate. The value of @Model is 1 for tiny model, 2 for small, 3 for compact, 4 for medium, 5 for large, and 6 for huge. @Model can be used in conditional-assembly statements. Segment equates For each of the primary segment directives, there is a corresponding equate with the same name, except that the equate starts with an "at sign" (@) instead of a period. For example, the @code equate represents the segment name defined by the .CODE directive. Similarly, @fardata represents the .FARDATA segment name and @fardata? represents the .FARDATA? segment name. The @data equate represents the group name shared by all the near-data segments. It can be used to access the segments created by the .DATA, .DATA?, .CONST, and .STACK segments. These equates can be used in ASSUME statements and at any other time a segment must be referred to by name. ────────────────────────────────────────────────────────────────────────── NOTE Although predefined equates are part of the simplified segment system, the @CurSeg and @FileName equates are also available when using full segment definitions. If you use the /Cl option or set Preserve Case in the Assembler Flags dialog box, predefined equates will be case sensitive with the exact names shown above. ────────────────────────────────────────────────────────────────────────── 5.1.6 Simplified Segment Defaults Although your program can combine full segment definitions and simplified segment directives, the .MODEL directive enables certain features of simplified segment directives that change defaults. Defaults that change are listed below: ■ If you do not use the .MODEL directive, the default size for the PROC directive is always NEAR. If you use the .MODEL directive, the PROC directive is associated with the specified memory model: NEAR for tiny, small, and compact models and FAR for medium, large, and huge models. See Section 6.4.3, "Procedure Labels," for further discussion of the PROC directive. ■ If you use the .MODEL directive, the OFFSET operator returns an offset relative to the beginning of a group, whenever a data item is defined within a group. If you do not use the .MODEL directive, the OFFSET operator always returns an offset relative to the beginning of the segment. The simplified segment directives .DATA, .DATA?, and .STACK all create segments that are part of the group DGROUP. For example, assume the variable test1 was declared in a segment defined with the .DATA directive and test2 was declared in a segment defined with the .FARDATA directive. The statement mov ax,OFFSET test1 loads the address of test1 relative to DGROUP. The statement mov ax,OFFSET test2 loads the address of test2 relative to the segment defined by the .FARDATA directive. See Section 5.3 for more information on groups. 5.1.7 Default Segment Names If you use the simplified segment directives by themselves, you do not need to know the names assigned for each segment. However, it is possible to mix full segment definitions with simplified segment directives. Therefore, some programmers may wish to know the actual names assigned to all segments. Table 5.1 shows the default segment names created by each directive. Table 5.1 Default Segments and Types for Standard Memory Models Model Directive Name Align Combine Class Group ────────────────────────────────────────────────────────────────────────── Tiny .CODE _TEXT WORD PUBLIC 'CODE' DGROUP .DATA _DATA WORD PUBLIC 'DATA' DGROUP .CONST CONST WORD PUBLIC 'CONST' DGROUP .DATA? _BSS WORD PUBLIC 'BSS' DGROUP ────────────────────────────────────────────────────────────────────────── Small .CODE _TEXT WORD PUBLIC 'CODE' .DATA _DATA WORD PUBLIC 'DATA' DGROUP .CONST CONST WORD PUBLIC 'CONST' DGROUP .DATA? _BSS WORD PUBLIC 'BSS' DGROUP .STACK STACK PARA STACK 'STACK' DGROUP ────────────────────────────────────────────────────────────────────────── Medium .CODE name_TEXT WORD PUBLIC 'CODE' .DATA _DATA WORD PUBLIC 'DATA' DGROUP .CONST CONST WORD PUBLIC 'CONST' DGROUP .DATA? _BSS WORD PUBLIC 'BSS' DGROUP .STACK STACK PARA STACK 'STACK' DGROUP ────────────────────────────────────────────────────────────────────────── Compact .CODE _TEXT WORD PUBLIC 'CODE' .FARDATA FAR_DATA PARA private 'FAR_DATA' .FARDATA? FAR_BSS PARA private 'FAR_BSS' .DATA _DATA WORD PUBLIC 'DATA' DGROUP .CONST CONST WORD PUBLIC 'CONST' DGROUP .DATA? _BSS WORD PUBLIC 'BSS' DGROUP .STACK STACK PARA STACK 'STACK' DGROUP ────────────────────────────────────────────────────────────────────────── Large or .CODE name_TEXT WORD PUBLIC 'CODE' huge .FARDATA FAR_DATA PARA private 'FAR_DATA' .FARDATA? FAR_BSS PARA private 'FAR_BSS' .DATA _DATA WORD PUBLIC 'DATA' DGROUP .CONST CONST WORD PUBLIC 'CONST' DGROUP .DATA? _BSS WORD PUBLIC 'BSS' DGROUP .STACK STACK PARA STACK 'STACK' DGROUP ────────────────────────────────────────────────────────────────────────── The name used as part of far-code segment names is the file name of the module. The default name associated with the .CODE directive can be overridden in medium and large models. The default names for the .FARDATA and .FARDATA? directives can always be overridden. The segment and group table at the end of listings always shows the actual segment names. However, the GROUP and ASSUME statements generated by the .MODEL directive are not shown in listing files. For a program that uses all possible segments, group statements equivalent to the following would be generated: DGROUP GROUP _DATA,CONST,_BSS,STACK For tiny model, the following would be generated: ASSUME cs:DGROUP,ds:DGROUP,ss:DGROUP For small and compact models, the following would be generated: ASSUME cs:_TEXT,ds:DGROUP,ss:DGROUP For medium, large, and huge models, the following statement is given: ASSUME cs: name_TEXT,ds:DGROUP,ss:DGROUP Example 1 EXTRN xvariable:WORD EXTRN xprocedure:NEAR DGROUP GROUP _DATA,_BSS ASSUME cs:_TEXT,ds:DGROUP,ss:DGROUP _TEXT SEGMENT WORD PUBLIC 'CODE' start: mov ax,DGROUP ; Initialize data segment mov ds,ax cli mov ss,ax ; Move DGROUP into SS add sp,OFFSET STACK ; Adjust SP to top of stac sti . . . TEXT ENDS _DATA SEGMENT WORD PUBLIC 'DATA' ivariable DB 5 iarray DW 50 DUP (5) string DB "This is a string" uarray DW 50 DUP (?) _DATA ENDS STACK SEGMENT PARA STACK 'STACK' DB 100h DUP (?) STACK ENDS END start This example is equivalent to Example 1 in Section 5.1.4, "Defining Simplified Segments." Notice that the segment order must be different in this version to achieve the segment order specified by using the DOSSEG directive in the first Section 5.1.4 example. The external variables are declared at the start of the source code in this example. With simplified segment directives, external variables can be declared in the segment in which they are used. The code generated by .STARTUP is discussed in more detail in Section 5.5.3. Example 2 DGROUP GROUP _DATA,CONST,STACK ASSUME cs:TASK_TEXT,ds:FAR_DATA,ss:STACK EXTRN xprocedure:FAR EXTR xvariable:FAR FAR_BSS SEGMENT PARA 'FAR_DATA' fuarray DW 10 DUP (?) ; Far uninitialized data FAR_BSS ENDS CONST SEGMENT WORD PUBLIC 'CONST' string DB "This is a string" ; String constant CONST ENDS _DATA SEGMENT WORD PUBLIC 'DATA' niarray DB 100 DUP (5) ; Near initialized data _DATA ENDS FAR_DATA SEGMENT WORD 'FAR_DATA' fiarray DW 100 DUP (10) FAR_DATA ENDS TASK_TEXT SEGMENT WORD PUBLIC 'CODE' task PROC FAR . . . ret task ENDP TASK_TEXT ENDS END This example is equivalent to Example 2 in Section 5.1.4, "Defining Simplified Segments." Notice that the segment order is the same in both versions. The segment order shown here is written to the object file, but it is different in the executable file. The segment order specified by the compiler (the DOS segment order) overrides the segment order in the module object file. 5.2 Full Segment Definitions If you need complete control over segments, you may want to give complete segment definitions. The section below explains all aspects of segment definitions, including how to order segments and how to define all the segment types. 5.2.1 Setting the Segment-Order Method The order in which QuickAssembler writes segments to the object file can be either sequential or alphabetical. If the sequential method is specified, segments are written in the order in which they appear in the source code. If the alphabetical method is specified, segments are written in the alphabetical order of their segment names. The default is sequential. If no segment-order directive or option is given, segments are ordered sequentially. The segment-order method is only one factor in determining the final order of segments in memory. The DOSSEG directive (see Section 5.1.2, "Specifying DOS Segment Order") and class type (see Section 5.2.2.3, "Controlling Segment Structure with Class Type") can also affect segment order. The ordering method can be set by using the .ALPHA or .SEQ directive in the source code. The method can also be set using the /s (sequential) or /a (alphabetical) assembler options (see Appendix B, Section B.1, "Specifying the Segment-Order Method"). The directives have precedence over the options. For example, if the source code contains the .ALPHA directive, but the /s option is given on the command line, the segments are ordered alphabetically. Changing the segment order is an advanced technique. In most cases, you can simply leave the default sequential order in effect. If you are linking with high-level-language modules, the compiler automatically sets the segment order. The DOSSEG directive also overrides any segment-order directives or options. ────────────────────────────────────────────────────────────────────────── NOTE Some previous versions of the IBM Macro Assembler ordered segments alphabetically by default. If you have trouble assembling and linking source-code listings from books or magazines, try using the /a option. Listings written for previous IBM versions of the assembler may not work without this option. The distinction between ENDS as the end of a segment and ENDS as the end of a structure is also made by the content of the program. ────────────────────────────────────────────────────────────────────────── Example 1 .SEQ DATA SEGMENT WORD PUBLIC 'DATA' DATA ENDS CODE SEGMENT WORD PUBLIC 'CODE' CODE ENDS Example 2 .ALPHA DATA SEGMENT WORD PUBLIC 'DATA' DATA ENDS CODE SEGMENT WORD PUBLIC 'CODE' CODE ENDS In Example 1, the DATA segment is written to the object file first because it appears first in the source code. In Example 2, the CODE segment is written to the object file first because its name comes first alphabetically. 5.2.2 Defining Full Segments The beginning of a program segment is defined with the SEGMENT directive, and the end of the segment is defined with the ENDS directive. Syntax name SEGMENT [[align]] [[combine]] [[use]] [['class']] statements name ENDS The name defines the name of the segment. This name can be unique, or it can be the same name given to other segments in the program. Segments with identical names are treated as the same segment. For example, if it is convenient to put different portions of a single segment in different source modules, the segment is given the same name in both modules. The optional align, combine, use, and 'class' types give the linker and the assembler instructions on how to set up and combine segments. Types can be specified in any order; it is not necessary to enter all types, or any type, for a given segment. Defining segment types is an advanced technique. Beginning assembly-language programmers might try using the simplified segment directives discussed in Section 5.1. ────────────────────────────────────────────────────────────────────────── NOTE Don't confuse the PAGE align type and the PUBLIC combine type with the PAGE and PUBLIC directives. The distinction should be clear from context since the align and combine types are only used on the same line as the SEGMENT directive. ────────────────────────────────────────────────────────────────────────── 5.2.2.1 Controlling Alignment with Align Type The optional align type defines the range of memory addresses from which a starting address for the segment can be selected. The align type can be any one of the following: Align Type Meaning ────────────────────────────────────────────────────────────────────────── BYTE Uses the next available byte address WORD Uses the next available word address (2 bytes per word) DWORD Uses the next available doubleword address (4 bytes per doubleword) PARA Uses the next available paragraph address (16 bytes per paragraph) PAGE Uses the next available page address (256 bytes per page) If no align type is given, PARA is used by default. The linker uses the alignment information to determine the relative start address for each segment. DOS uses the information to calculate the actual start address when the program is loaded. Align types are illustrated in Figure 5.1 in the next section. 5.2.2.2 Defining Segment Combinations with Combine Type The optional combine type defines how to combine segments having the same name. The combine type can be any one of the following: Combine Type Meaning ────────────────────────────────────────────────────────────────────────── PUBLIC Concatenates all segments having the same name to form a single, contiguous segment. The total size of the resulting segment is equal to the sum of all contributing segments. All instruction and data addresses in the new segment are relative to a single segment register, and all offsets are adjusted to represent the distance from the beginning of the segment. STACK Concatenates all segments having the same name to form a single, contiguous segment. This combine type is the same as the PUBLIC combine type, except that all addresses in the new segment are relative to the SS segment register. The total size of the resulting segment is equal to the sum of all contributing segments. The Stack Pointer (SP) register is initialized to the length of the segment. The stack segment of your program should normally use the STACK type, since this automatically initializes the SS register, as described in Section 5.5.3. If you create a stack segment and do not use the STACK type, you must give instructions to initialize the SS and SP registers. For each individual segment, all initialized data is placed at the high end of the resulting stack segment. Consequently, if more than one stack segment contains initialized data, the linker overwrites this data as it links in each segment. Note that stack data cannot be initialized with simplified segment directives. COMMON Creates overlapping segments by placing the start of all segments having the same name at the same address. The length of the resulting area is the length of the longest segment. All addresses in the segments are relative to the same base address. If variables are initialized in more than one segment having the same name and COMMON type, the most recently initialized data replaces any previously initialized data. MEMORY Concatenates all segments having the same name to form a single, contiguous segment. The Microsoft Overlay Linker treats MEMORY segments exactly the same as PUBLIC segments. QuickAssembler allows you to use MEMORY type even though LINK does not recognize a separate MEMORY type. This feature is compatible with other linkers that may support a combine type conforming to the Intel definition of MEMORY type. AT address Causes all label and variable addresses defined in the segment to be relative to address. The address can be any valid expression but must not contain a forward reference──that is, a reference to a symbol defined later in the source file. An AT segment typically contains no code or initialized data. Instead, it represents an address template that can be placed over code or data already in memory, such as a screen buffer or other absolute memory locations defined by hardware. The linker will not generate any code or data for AT segments, but existing code or data can be accessed by name if it is given a label in an AT segment. Section 6.6, "Setting the Location Counter," shows an example of a segment with AT combine type. If no combine type is given, the segment has private type. Segments having the same name are not combined. Instead, each segment receives its own physical segment when loaded into memory. ────────────────────────────────────────────────────────────────────────── NOTE Although a given segment name can be used more than once in a source file, each segment definition using that name must have either exactly the same attributes, or attributes that do not conflict. If types are given for an initial segment definition, subsequent definitions for that segment need not specify any types. Normally, you should provide at least one stack segment (having STACK combine type) in a program. If no stack segment is declared, LINK displays a warning message. You can ignore this message if you have a specific reason for not declaring a stack segment. For example, you would not have a separate stack segment in a program in the .COM format. ────────────────────────────────────────────────────────────────────────── Example The following source-code shell illustrates one way in which the combine and align types can be used. Figure 5.1 shows the way LINK would load the sample program into memory. NAME module_1 ASEG SEGMENT BYTE PUBLIC 'CODE' start: . . . ASEG ENDS BSEG SEGMENT WORD COMMON 'DATA' . . . BSEG ENDS CSEG SEGMENT PARA STACK 'STACK' . . . CSEG ENDS DSEG SEGMENT AT 0B800H . . . DSEG ENDS END start NAME module_2 ASEG SEGMENT BYTE PUBLIC 'CODE' . . . ASEG ENDS BSEG SEGMENT WORD COMMON 'DATA' . . . BSEG ENDS ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 5.2.2.2 of the manual │ └────────────────────────────────────────────────────────────────────────┘ 5.2.2.3 Controlling Segment Structure with Class Type Class type is a means of associating segments that have different names, but similar purposes. It can be used to control segment order and to identify the code segment. The class name must be enclosed in single quotation marks ('). Class names are not case sensitive unless the /Cl or /Cx option is used during assembly. All segments belong to a class. Segments for which no class name is explicitly stated have the null class name. LINK imposes no restriction on the number or size of segments in a class. The total size of all segments in a class can exceed 64K. ────────────────────────────────────────────────────────────────────────── NOTE The names assigned for class types of segments should not be used for other symbol definitions in the source file. For example, if you give a segment the class name 'CONSTANT', you should not give the name constant to variables or labels in the source file. ────────────────────────────────────────────────────────────────────────── The linker expects segments having the class name CODE or a class name with the suffix CODE to contain program code. You should always assign this class name to segments containing code. Class type is one of two factors that control the final order of segments in an executable file. The other factor is the order of the segments in the source file (with the /s option or the .SEQ directive) or the alphabetical order of segments (with the /a option or the .ALPHA directive). These factors control different internal behavior, but both affect the final order of segments in the executable file. The sequential or alphabetical order of segments in the source file determines the order in which the assembler writes segments to the object file. The class type can affect the order in which the linker writes segments from object files to the executable file. Segments having the same class type are loaded into memory together, regardless of their sequential or alphabetical order in the source file. ────────────────────────────────────────────────────────────────────────── NOTE The DOSSEG directive (see Section 5.1.2, "Specifying DOS Segment Order") overrides all other factors in determining segment order. ────────────────────────────────────────────────────────────────────────── Example A_SEG SEGMENT 'SEG_1' A_SEG ENDS B_SEG SEGMENT 'SEG_2' B_SEG ENDS C_SEG SEGMENT 'SEG_1' C_SEG ENDS When QuickAssembler assembles the preceding program fragment, it writes the segments to the object file in sequential or alphabetical order, depending on whether the /a option or the .ALPHA directive was used. In the example above, the sequential and alphabetical order are the same, so the order will be A_SEG, B_SEG, C_SEG in either case. When the linker writes the segments to the executable file, it first checks to see if any segments have the same class type. If they do, it writes them to the executable file together. Thus, A_SEG and C_SEG are placed together because they both have class type 'SEG_1'. The final order in memory is A_SEG, C_SEG, B_SEG. Since LINK processes modules in the order it receives them on the command line, you may not always be able to easily specify the order in which you want segments to be loaded. For example, assume your program has four segments that you want loaded in the following order: _TEXT, _DATA, CONST, and STACK. The _TEXT, CONST, and STACK segments are defined in the first module of your program, but the _DATA segment is defined in the second module. LINK will not put the segments in the proper order because it first loads the segments encountered in the first module. You can avoid this problem by starting your program with dummy segment definitions in the order you wish to load your real segments. The dummy segments can either go at the start of the first module, or they can be placed in a separate include file that is called at the start of the first module. You can then put the actual segment definitions in any order or any module you find convenient. For example, you might call the following include file at the start of the first module of your program: _TEXT SEGMENT WORD PUBLIC 'CODE' _TEXT ENDS _DATA SEGMENT WORD PUBLIC 'DATA' _DATA ENDS CONST SEGMENT WORD PUBLIC 'CONST' CONST ENDS STACK SEGMENT PARA STACK 'STACK' STACK ENDS The DOSSEG directive may be more convenient for defining segment order if you are willing to accept the DOS segment-order conventions. Once a segment has been defined, you do not need to specify the align, combine, use, and class types on subsequent definitions. For example, if your code defined dummy segments as shown above, you could define an actual data segment with the following statements: _DATA SEGMENT . . . _DATA ENDS 5.3 Defining Segment Groups A group is a collection of segments associated with the same starting address. You may wish to use a group if you want several types of data to be organized in separate segments in your source code, but want them all to be accessible from a single, common segment register at run time. Syntax name GROUP segment [[,segment]]... The name is the symbol assigned to the starting address of the group. All labels and variables defined within the segments of the group are relative to the start of the group, rather than to the start of the segments in which they are defined. The segment can be any previously defined segment or a SEG expression (see Section 9.2.4.5). Segments can be added to a group one at a time. For example, you can define and add segments to a group one by one. The GROUP directive does not affect the order in which segments of a group are loaded. Loading order depends on each segment's class, or on the order in which object modules are given to the linker. Segments in a group need not be contiguous. Segments that do not belong to the group can be loaded between segments that do. The only restriction is that the distance (in bytes) between the first byte in the first segment of the group and the last byte in the last segment must not exceed 65,535 bytes. ────────────────────────────────────────────────────────────────────────── NOTE When the .MODEL directive is used, the offset of a group-relative segment refers to the ending address of the segment, not the beginning. For example, the expression OFFSET STACK evaluates to the end of the stack segment. ────────────────────────────────────────────────────────────────────────── Group names can be used with the ASSUME directive (discussed in Section 5.4, "Associating Segments with Registers") and as an operand prefix with the segment-override operator (discussed in Section 9.2.3). Example DGROUP GROUP ASEG,CSEG ASSUME ds:DGROUP ASEG SEGMENT WORD PUBLIC 'DATA' . asym . . ASEG ENDS BSEG SEGMENT WORD PUBLIC 'DATA' . bsym . . BSEG ENDS CSEG SEGMENT WORD PUBLIC 'DATA' . csym . . CSEG ENDS END Figure 5.2 shows the order of the example segments in memory. They are loaded in the order in which they appear in the source code (or in alphabetical order if the .ALPHA directive or /s option is specified). ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 5.3 of the manual │ └────────────────────────────────────────────────────────────────────────┘ Since ASEG and CSEG are declared part of the same group, they have the same base despite their separation in memory. This means that the symbols asym and csym have offsets from the beginning of the group, which is also the beginning of ASEG. The offset of bsym is from the beginning of BSEG, since it is not part of the group. This sample illustrates the way LINK organizes segments in a group. It is not intended as a typical use of a group. 5.4 Associating Segments with Registers Many of the assembler instructions assume a default segment. For example, JMP instructions assume the segment associated with the CS register; PUSH and POP instructions assume the segment associated with the SS register; MOV instructions assume the segment associated with the DS register. When the assembler needs to reference an address, it must know what segment the address is in. It does this by using the default segment or group addresses assigned with the ASSUME directive. ────────────────────────────────────────────────────────────────────────── NOTE Using the ASSUME directive to tell the assembler which segment to associate with a segment register is not the same as telling the processor. The ASSUME directive only affects assembly-time assumptions. You may need to use instructions to change run-time assumptions. Initializing segment registers at run time is discussed in Section 5.5. ────────────────────────────────────────────────────────────────────────── Syntax ASSUME segmentregister:name [[,segmentregister:name]]... ASSUME segmentregister:NOTHING ASSUME NOTHING The name must be the name of the segment or group that is to be associated with segmentregister. Subsequent instructions that assume a default register for referencing labels or variables automatically assume that if the default segment is segmentregister, the label or variable is in the name segment or group. The ASSUME directive can define a segment for each of the segment registers. The segmentregister can be CS, DS, ES, or SS. The name must be one of the following: ■ The name of a segment defined in the source file with the SEGMENT directive ■ The name of a group defined in the source file with the GROUP directive ■ The keyword NOTHING ■ A SEG expression (see Section 9.2.4.5, "SEG Operator") ■ A string equate that evaluates to a segment or group name (but not a string equate that evaluates to a SEG expression) The keyword NOTHING cancels the current segment selection. For example, the statement ASSUME NOTHING cancels all register selections made by previous ASSUME statements. Usually, a single ASSUME statement defines all four segment registers at the start of the source file. However, you can use the ASSUME directive at any point to change segment assumptions. Using the ASSUME directive to change segment assumptions is often equivalent to changing assumptions with the segment-override operator (:) (see Section 9.2.3). The segment-override operator is more convenient for one-time overrides, whereas the ASSUME directive may be more convenient if previous assumptions must be overridden for a sequence of instructions. Example DOSSEG .MODEL large ; DS automatically assumed to @data .STACK 100h .DATA d1 DW 7 .FARDATA d2 DW 9 .CODE start: mov ax,@data ; Initialize near data mov ds,ax mov ax,@fardata ; Initialize far data mov es,ax . . . ; Method 1 for series of instructions that need override ; Use segment override for each statement mov ax,es:d2 . . . mov es:d2,bx ; Method 2 for series of instructions that need override ; Use ASSUME at beginning of series of instructions ASSUME es:@fardata mov cx,d2 . . . mov d2,dx 5.5 Initializing Segment Registers Assembly-language programs must initialize segment values for each segment register before instructions that reference the segment register can be used in the source program. Initializing segment registers is different from assigning default values for segment registers with the ASSUME statement. The ASSUME directive tells the assembler what segments to use at assembly time. Initializing segments gives them an initial value that will be used at run time. The .STARTUP directive generates all the initialization code described in this section. This directive must be preceded by the .MODEL directive. If the .MODEL directive was followed by the farStack attribute, .STARTUP does not adjust SS and SP. Otherwise, it assumes the nearStack default, which sets SS equal to DS as described in Section 5.5.3, "Initializing the SS and SP Registers." When you use this default, the combined stack and near data must not exceed 64K. If you use .STARTUP, you don't need to enter any of the code in this section, except for the END statement. (However, if you use .STARTUP, you don't need to specify a starting address.) Make sure that you place the .STARTUP directive at the point you want your program to start executing, because the assembler automatically initializes CS:IP to point to the beginning of the code generated by .STARTUP. 5.5.1 Initializing the CS and IP Registers The CS and IP registers are initialized by specifying a starting address with the END directive. Syntax END [[startaddress]] The startaddress is a label or expression identifying the address where you want execution to begin when the program is loaded. Normally, a label for the start address should be placed at the address of the first instruction in the code segment. The CS segment is initialized to the value of startaddress. The IP register is normally initialized to 0. You can change the initial value of the IP register by using the ORG directive (see Section 6.6, "Setting the Location Counter") just before the startaddress label. For example, programs in the .COM format use ORG 100h to initialize the IP register to 256 (100 hexadecimal). If a program consists of a single source module, the start address is required for that module. If a program has several modules, all modules must terminate with an END directive, but only one of them can define a start address. ────────────────────────────────────────────────────────────────────────── WARNING One, and only one, module must define a start address. If you do not specify a start address, none is assumed. Neither QuickAssembler nor LINK will generate an error message, but your program will probably start execution at the wrong address. ────────────────────────────────────────────────────────────────────────── Example ; Module 1 .CODE start: . ; First executable instruction . . EXTRN task:NEAR call task . . . END start ; Starting address defined in main module ; Module 2 PUBLIC task .CODE task PROC . . . task ENDP END ; No starting address in secondary module If Module 1 and Module 2 are linked into a single program, it is essential that only the calling module define a starting address. 5.5.2 Initializing the DS Register The DS register must be initialized to the address of the segment that will be used for data. The address of the segment or group for the initial data segment must be loaded into the DS register. This is done in two statements because a memory value cannot be loaded directly into a segment register. The segment-setup lines typically appear at the start or very near the start of the code segment. Example 1 _DATA SEGMENT WORD PUBLIC 'DATA' . . . _DATA ENDS _TEXT SEGMENT BYTE PUBLIC 'CODE' ASSUME cs:_TEXT,ds:_DATA start: mov ax,_DATA ; Load start of data segment mov ds,ax ; Transfer to DS register . . . _TEXT ENDS END start If you are using the Microsoft naming convention and segment order, the address loaded into the DS register is not a segment address but the address of DGROUP, as shown in Example 2. With simplified segment directives, the address of DGROUP is represented by the predefined equate @data. Example 2 DOSSEG .MODEL SMALL .DATA . . . .CODE start: mov ax,@data ; Load start of DGROUP (@data) mov ds,ax ; Transfer to DS register . . . END start 5.5.3 Initializing the SS and SP Registers At load time, DOS sets SS to the segment address of the last segment having combine type STACK, and SP to the size of the stack. (The linker actually determines the value of SS:SP and places this value in the executable-file header. DOS sets SS and SP as indicated in the file header.) If you use a stack segment with combine type STACK or use the .STACK directive, the program automatically loads with SS and SP initialized, as described above. However, this basic initialization does not set SS equal to DS. If the program contains the statement ASSUME SS:DGROUP, it will be prone to errors. The following code resets SS and SP so that SS has the same value as DS. The code then adjusts SP upward so that SS:SP points to the same physical address it did before. Since hardware interrupts use the same stack as the program, you should turn off interrupts while changing the stack. Most 8086-family processors turn off interrupts automatically when you adjust SS or SP, but early versions of the 8088 do n Example 1 .MODEL small .STACK 100h ; Initialize "STACK" .DATA . . . .CODE start: mov ax,@data ; Load segment location mov ds,ax ; into DS register cli ; Turn off interrupts mov ss,ax ; Load same value as DS into SS mov sp,OFFSET STACK ; Give SP new stack size sti ; Turn interrupts back on . . . This example reinitializes SS so that it has the same value as DS, and it adjusts SP to reflect the new stack offset. Microsoft high-level-language compilers do this so that stack variables in near procedures can be accessed relative to either SS or DS. However, this code only works correctly if you use .MODEL and you declare a stack segment in just one module. The following code handles the more general case. The .STARTUP directive generates this code: Example 2 start_label: mov dx,DGROUP ; Move DGROUP into DS and DX mov ds,dx mov bx,ss ; BX = STACK - DGROUP sub bx,dx ; shl bx,1 ; Multiply difference by 16 shl bx,1 ; and leave result in BX shl bx,1 shl bx,1 cli mov ss,dx ; Move DGROUP into SS add sp,bx ; Adjust SP upward by sti ; (STACK - DGROUP) * 16 The code above sets SS and SP so that SS equals DS. This code works correctly no matter how many modules declare a stack segment. 5.5.4 Initializing the ES Register The ES register is not automatically initialized. If your program uses the ES register, you must initialize it by moving the appropriate segment value into the register. Example ASSUME es:@fardata ; Tell the assembler mov ax,@fardata ; Tell the processor mov es,ax 5.6 Nesting Segments Segments can be nested. When QuickAssembler encounters a nested segment, it temporarily suspends assembly of the enclosing segment and begins assembly of the nested segment. When the nested segment has been assembled, Quick-Assembler continues assembly of the enclosing segment. Nesting of segments makes it possible to mix segment definitions in programs that use simplified segment directives for most segment definitions. When a full segment definition is given, the new segment is nested in the simplified segment in which it is defined. Example 1 ; Macro to print message on the screen ; Uses full segment definitions - segments nested message MACRO text LOCAL symbol _DATA SEGMENT WORD PUBLIC 'DATA' symbol DB &text DB 13,10,"$" _DATA ENDS mov ah,09h mov dx,OFFSET symbol int 21h ENDM _TEXT SEGMENT BYTE PUBLIC 'CODE' . . . message "Please insert disk" In the example above, a macro called from inside of the code segment (_TEXT) allocates a variable within a nested data segment (_DATA). This has the effect of allocating more data space on the end of the data segment each time the macro is called. The macro can be used for messages appearing only once in the source code. Example 2 ; Macro to print message on the screen ; Uses simplified segment directives - segments not nested message MACRO text LOCAL symbol .DATA symbol DB &text DB 13,10,"$" .CODE mov ah,09h mov dx,OFFSET symbol int 21h ENDM .CODE . . . message "Please insert disk" Although Example 2 has the same practical effect as Example 1, Quick-Assembler handles the two macros differently. In Example 1, assembly of the outer (code) segment is suspended rather than terminated. In Example 2, assembly of the code segment terminates, assembly of the data segment starts and terminates, and then assembly of the code segment is restarted. ──────────────────────────────────────────────────────────────────────────── Chapter 6: Defining Constants, Labels, and Variables This chapter explains how to define constants, labels, variables, and other symbols that refer to instruction and data locations within segments. Constants are important in QuickAssembler, just as they are in other languages. You can use constants as immediate operands in instructions and as initial values in data declarations. QuickAssembler supports a number of useful radixes (including binary and hexadecimal), as described in Section 6.1. QuickAssembler lets you use symbols as well as constants. Sections 6.2, "Assigning Names to Symbols," and 6.3, "Using Type Specifiers," present the basic principles of generating symbolic names. Most symbols are either code labels or variable names. Section 6.4, "Defining Code Labels," and Section 6.5, "Defining and Initializing Data," describe how to define these symbols. This chapter tells you how to assign labels and most kinds of variables. (Multifield variables, such as structures and records, are discussed in Chapter 7, "Using Structures and Records.") Chapter 6 also discusses related directives, including those that control the location counter directly. The assembler uses the location counter to assign addresses to symbols. 6.1 Constants Constants can be used in source files to specify numbers or strings that are set or initialized at assembly time. The assembler recognizes four types of constant values: 1. Integers 2. Packed binary coded decimals 3. Real numbers 4. Strings 6.1.1 Integer Constants Integer constants represent integer values. They can be used in a variety of contexts in assembly-language source code. For example, they can be used in data declarations and equates, or as immediate operands. Packed decimal integers are a special kind of integer constant that can only be used to initialize binary coded decimal (BCD) variables. They are described in Sections 6.1.2, "Packed Binary Coded Decimal Constants," and 6.5.1.2, "Binary Coded Decimal Variables." Integer constants can be specified in binary, octal, decimal, or hexadecimal values. Table 6.1 shows the legal digits for each of these radixes. For hexadecimal radix, the digits can be either uppercase or lowercase letters. Table 6.1 Digits Used with Each Radix Radix Base Digits ────────────────────────────────────────────────────────────────────────── Binary 2 0 1 Octal 8 0 1 2 3 4 5 6 7 Decimal 10 0 1 2 3 4 5 6 7 8 9 Hexadecimal 16 0 1 2 3 4 5 6 7 8 9 A B C D E F ────────────────────────────────────────────────────────────────────────── The radix for an integer can be defined for a specific integer by using radix specifiers, or a default radix can be defined globally with the .RADIX directive. 6.1.1.1 Specifying Integers with Radix Specifiers The radix for an integer constant can be given by putting one of the following radix specifiers after the last digit of the number: Radix Specifier ────────────────────────────────────────────────────────────────────────── Binary B Octal Q or O Decimal D Hexadecimal H Radix specifiers can be given in either uppercase or lowercase letters; sample code in this manual uses lowercase letters. Hexadecimal numbers must always start with a decimal digit (0-9). If necessary, put a leading 0 at the left of the number to distinguish between symbols and hexadecimal numbers that start with a letter. For example, 0ABCh is interpreted as a hexadecimal number, but ABCh is interpreted as a symbol. The hexadecimal digits A through F can be either uppercase or lowercase letters. Sample code in this manual uses uppercase letters. If no radix is given, the assembler interprets the integer by using the current default radix. The initial default radix is decimal, but you can change the default with the .RADIX directive. Examples n360 EQU 01011010b + 132q + 5Ah + 90d ; 4 * 90 n60 EQU 00001111b + 17o + 0Fh + 15d ; 4 * 15 6.1.1.2 Setting the Default Radix The .RADIX directive sets the default radix for integer constants in the source file. Syntax .RADIX expression The expression must evaluate to a number in the range 2-16. It defines whether the numbers are binary, octal, decimal, hexadecimal, or numbers of some other base. Numbers given in expression are always considered decimal, regardless of the current default radix. The initial default radix is decimal. Note that the .RADIX directive does not affect real numbers initialized as variables with the DD, DQ, or DT directive. Initial values for real-number variables declared with these directives are always evaluated as decimal unless a radix specifier is appended. Also, the .RADIX directive does not affect the optional radix specifiers, B and D, used with integer numbers. When the letters B or D appear at the end of any integer, they are always considered to be a radix specifier even if the current radix is 16. For example, if the input radix is 16, the number 0ABCD will be interpreted as 0ABC decimal, an illegal number, instead of as 0ABCD hexadecimal, as intended. Type 0ABCDh to specify 0ABCD in hexadecimal. Similarly, the number 11B will be treated as 11 binary, a legal number, but not as 11B hexadecimal as intended. Type 11Bh to specify 11B in hexadecimal. Examples .RADIX 16 ; Set default radix to hexadecimal .RADIX 2 ; Set default radix to binary 6.1.2 Packed Binary Coded Decimal Constants When an integer constant is used with the DT directive, the number is interpreted by default as a packed binary coded decimal (BCD) number. You can use the D radix specifier to override the default and initialize 10-byte integers as binary-format integers. The syntax for specifying binary coded decimals is exactly the same as for other integers. However, the assembler encodes binary coded decimals in a completely different way. See Section 6.5.1.2, "Binary Coded Decimal Variables," for complete information on storage of binary coded decimals. Examples positive DT 1234567890 ; Encoded as 00000000001234567890h negative DT -1234567890 ; Encoded as 80000000001234567890h 6.1.3 Real-Number Constants A real number is a number consisting of an integer part, a fractional part, and an exponent. Real numbers are usually represented in decimal format. Syntax [[+ | -]] integer.fraction[[E[[+ | -]]exponent]] The integer and fraction parts combine to form the value of the number. This value is stored internally as a unit and is called the mantissa. It may be signed. The optional exponent follows the exponent indicator (E). It represents the magnitude of the value and is stored internally as a unit. If no exponent is given, 1 is assumed. If an exponent is given, it may be signed. During assembly, the assembler converts real-number constants given in decimal format to a binary format. The sign, exponent, and mantissa of the real number are encoded as bit fields within the number. See Section 6.5.1.4, "Real-Number Variables," for an explanation of how real numbers are encoded. You can specify the encoded format directly using hexadecimal digits (0-9 or A-F). The number must begin with a decimal digit (0-9) and cannot be signed. It must be followed by the real-number designator (R). This designator is used the same as a radix designator except it specifies that the given hexadecimal number should be interpreted as a real number. Real numbers can only be used to initialize variables with the DD, DQ, and DT directives. They cannot be used in expressions. The maximum number of digits in the number and the maximum range of exponent values depend on the directive. The number of digits for encoded numbers used with DD, DQ, and DT must be 8, 16, and 20 digits, respectively. (If a leading 0 is supplied, the number must be 9, 17, or 21 digits.) See Section 6.5.1.4, "Real-Number Variables," for an explanation of how real numbers are encoded. ────────────────────────────────────────────────────────────────────────── NOTE Real numbers will be encoded differently depending upon whether you use the .MSFLOAT directive. By default, real numbers are encoded in the IEEE format. The .MSFLOAT directive overrides the default and specifies Microsoft Binary format. See Section 6.5.1.4, "Real-Number Variables," for a description of these formats. ────────────────────────────────────────────────────────────────────────── Example ; Real numbers shrt DD 25.23 long DQ 2.523E1 ten_byte DT 2523.0E-2 ; Assumes .MSFLOAT mbshort DD 81000000r ; 1.0 as Microsoft Binary short mblong DQ 8100000000000000r ; 1.0 as Microsoft Binary long ; Assumes default IEEE format ieeeshort DD 3F800000r ; 1.0 as IEEE short ieeelong DQ 3FF0000000000000r ; 1.0 as IEEE long ; The same regardless of processor directives temporary DT 3FFF8000000000000000r ; 1.0 as 10-byte temporary real 6.1.4 String Constants A string constant consists of one or more ASCII characters enclosed in single or double quotation marks. Strings are interpreted as lists of characters having the ASCII values of the characters in the string. Syntax 'characters' "characters" String constants are case sensitive. A string constant consisting of a single character is sometimes called a character constant. Single quotation marks must be encoded twice when used literally within string constants that are also enclosed by single quotation marks. Similarly, double quotation marks must be encoded twice when used in string constants that are also enclosed by double quotation marks. Examples char DB 'a' char2 DB "a" message DB "This is a message." warn DB 'Can"t find file.' ; Can't find file. warn2 DB "Can't find file." ; Can't find file. string DB "This ""value"" not found." ; This "value" not found. string2 DB 'This "value" not found.' ; This "value" not found. 6.1.5 Determining Floating-Point Format The .MSFLOAT directive disables all coprocessor instructions and specifies that initialized real-number variables be encoded in the Microsoft Binary format. Without this directive, initialized real-number variables are encoded in the IEEE format. This is a change from Versions 4.0 and earlier of the Microsoft Macro Assembler, which used Microsoft Binary format by default and required a coprocessor directive or the /R option to specify IEEE format. .MSFLOAT must be used for programs that require real-number data in the Microsoft Binary format. Section 6.5.1.4, "Real-Number Variables," describes real-number data formats and the factors to consider in choosing a format. 6.2 Assigning Names to Symbols A symbol is a name that represents a value. Symbols are one of the most important elements of assembly-language programs. Elements that must be represented symbolically in assembly-language source code include variables, address labels, macros, segments, procedures, records, and structures. Constants, expressions, and strings can also be represented symbolically. Symbol names are combinations of letters (both uppercase and lowercase), digits, and special characters. The QuickAssembler recognizes the following character set: A-Z a-z 0-9 ? @ _ $ : . [ ] ( ) < > { } + - / * & % ! ' ~ | \ = # ^ ; , ` " Letters, digits, and some characters can be used in symbol names, but some restrictions on how certain characters can be used or combined are listed below: ■ A name can have any combination of uppercase and lowercase letters. Within the QC integrated environment, the default behavior (Preserve Extrn) is for the assembler to convert all symbol names to uppercase unless they are public or external. When you use simplified segment directives, all procedure labels declared with PROC are automatically public. When you use QCL, all lowercase letters are converted to uppercase by the assembler, unless you give the /Cl assembly option, or you declare the name with a PROC, PUBLIC, or EXTRN directive and you give the /Cx option. The /Cl and /Cx options correspond to the assembler flags Preserve Case and Preserve Extrn, respectively, within the QC environment. ■ Digits may be used within a name, but not as the first character. ■ A name can be given any number of characters, but only the first 31 are used. All other characters are ignored. ■ The following characters may be used at the beginning of a name or within a name: underscore (_), question mark (?), dollar sign ($), and at sign (@). ■ The period (.) is an operator and cannot be used within a name, but it can be used as the first character of a name. ■ A name may not be the same as any reserved name. Note that two special characters, the question mark (?) and the dollar sign ($), are reserved names and therefore can't stand alone as symbol names. A reserved name is any name with a special, predefined meaning to the assembler. Reserved names include instruction and directive mnemonics, register names, and operator names. All uppercase and lowercase letter combinations of these names are treated as the same name. The following is a list of names that are always reserved by the assembler. Using any of these names for a symbol results in an error. $ DWORD GE %OUT * ELSE GROUP PAGE + ELSEIF GT PROC - ELSEIF1 HIGH PTR . ELSEIF2 IF PUBLIC / ELSEIFB IF1 PURGE = ELSEIFDEF IF2 QWORD ? ELSEIFDIF IFB .RADIX [] ELSEIFDIFI IFDEF RECORD .186 ELSEIFE IFDIF REPT .286 ELSEIFIDN IFE .SALL .286P ELSEIFIDNI IFIDN SEG .287 ELSEIFNB IFNB SEGMENT .386 ELSEIFNDEF IFNDEF .SEQ .386P END INCLUDE .SFCOND .387 ENDIF INCLUDELIB SHL .8086 ENDM IRP SHORT .8087 ENDP IRPC SHR ALIGN ENDS LABEL SIZE .ALPHA EQ .LALL SIZESTR AND EQU LE .STACK ASSUME .ERR LENGTH .STARTUP BYTE .ERR1 .LFCOND STRUC CATSTR .ERR2 .LIST SUBSTR .CODE .ERRB LOCAL SUBTTL COMM .ERRDEF LOW TBYTE COMMENT .ERRDIF LT .TFCOND .CONST .ERRE MACRO THIS .CREF .ERRIDN MASK TITLE .DATA .ERRNB MOD TYPE .DATA? .ERRNDEF .MODEL .TYPE DB .ERRNZ NAME WIDTH DD EVEN NE WORD DOSSEG EXITM NEAR .XALL DQ EXTRN NOT .XCREF DS FAR OFFSET .XLIST DT .FARDATA OR XOR In addition to the names listed above, instruction mnemonics and register names are considered reserved names. Instructions can vary depending on the processor directives given in the source file. For example, ENTER is recognized as a reserved word if you have enabled 286 instructions with the .286 directive. Section 18.3 describes processor directives. Instruction mnemonics for each processor are listed in the on-line Help system. Register names are listed in Section 2.6.2, "Register Operands." 6.3 Using Type Specifiers Some statements require type specifiers to give the size or type of an operand. There are two kinds of type specifiers: those that specify the size of a variable or other memory operand, and those that specify the distance of a label. The type specifiers that give the size of a memory operand are listed below with the number of bytes specified by each: Specifier Number of Bytes ────────────────────────────────────────────────────────────────────────── BYTE 1 WORD 2 DWORD 4 QWORD 8 TBYTE 10 In some contexts, ABS can also be used as a type specifier that indicates an operand is a constant rather than a memory operand. The type specifiers that give the distance of a label are listed below: Specifier Description ────────────────────────────────────────────────────────────────────────── FAR The label references both the segment and offset of the label. NEAR The label references only the offset of the label. PROC The label has the default type (NEAR or FAR) of the current memory model. The default size is always NEAR if you use full segment definitions. If you use simplified segment directives (see Section 5.1), the default type is NEAR for small and compact models or FAR for medium, large, and huge models. Directives that use type specifiers include LABEL, PROC, EXTRN, and COMM. Operators that use type specifiers include PTR and THIS. 6.4 Defining Code Labels Code labels give symbolic names to the addresses of instructions in the code segment. These labels can be used as the operands to jump, call, and loop instructions to transfer program control to a new instruction. 6.4.1 Near-Code Labels Near-label definitions create instruction labels that have NEAR type. These instruction labels can be used to access the address of the label from other statements. Syntax name: The name must be followed by a colon (:). The segment containing the definition must be the one that the assembler currently associates with the CS register. The ASSUME directive is used to associate a segment with a segment register (see Section 5.4, "Associating Segments with Registers"). A near label can appear on a line by itself or on a line with an instruction. Near-code labels have different behavior depending on whether they are used in a procedure with the extended PROC syntax. When the extended PROC feature is used (which requires that .MODEL and a language must be specified), near labels are local to the procedure. This functionality is explained in Section 15.3.7, "Variable Scope." If the full segments are used or if the language argument is not supplied to the .MODEL directive, near labels are known throughout the module in which they occur. The same label name can be used in different modules as long as each label is only referenced by instructions in its own module. If a label must be referenced by instructions in another module, it must be given a unique name and declared with the PUBLIC and EXTRN directives, as described in Chapter 8, "Creating Programs from Multiple Modules." Examples cmp ax,5 ; Compare with 5 ja bigger jb smaller . ; Instructions if AX = 5 . . jmp done bigger: . ; Instructions if AX > 5 . . jmp done smaller: . ; Instructions if AX < 5 . . done: 6.4.2 Anonymous Labels The assembler provides a way to generate automatic labels for jump instructions. To define a label, use two at signs (@@) followed by a colon (:). To jump to the nearest preceding anonymous label, use @B (back) in the jump instruction's operand field; to jump to the nearest following anonymous label, use @F (forward) in the operand field. You can use two at signs (@@) to define any number of anonymous labels in your program. The items @B and @F always refer to the nearest occurrences of @@, so there is never any conflict between different anonymous labels. Anonymous labels are best used for conditionally executing a few lines of code. The advantage is that you do not need to continually think up new label names. The disadvantage is that they do not provide a meaningful name. You should mark major divisions of a program with actual named labels. The following example shows a typical sequence of code with a jump-to-label instruction: ; DX is 20, unless CX is less than -20, then make DX 30 mov dx,20 cmp cx,-20 jge greatequ mov dx,30 greatequ: Here are the same lines rewritten to use an anonymous label: ; DX is 20, unless CX is less than -20, then make DX 30 mov dx,20 cmp cx,-20 jge @F mov dx,30 @@: 6.4.3 Procedure Labels The easiest way to declare a procedure is to use the PROC and ENDP directives. The former declares the beginning of the procedure, and the latter declares the end. The PROC directive has the following syntax: label PROC [[NEAR|FAR]] statements RET [[constant]] label ENDP The label assigns a symbol to the procedure. The distance can be NEAR or FAR. Any RET instructions within the procedure automatically have the same distance (NEAR or FAR) as the procedure. The syntax shown here is always available. In addition, there is an extended PROC syntax available if you use .MODEL and specify a language. The extended PROC syntax is explained in Section 15.3.4, "Declaring Parameters with the PROC Directive." The ENDP directive labels the address where the procedure ends. Every procedure label must have a matching ENDP label to mark the end of the procedure. QuickAssembler generates an error message if it does not find an ENDP directive to match each PROC directive. When the PROC label definition is encountered, the assembler sets the label's value to the current value of the location counter and sets its type to NEAR or FAR. If the label has FAR type, the assembler also sets its segment value to that of the enclosing segment. If you have specified full segment definitions, the default distance is NEAR. If you are using simplified segment directives, the default distance is the distance associated with the declared memory model──that is, NEAR for small and compact models or FAR for medium, large, and huge models. The procedure label can be used in a CALL instruction to direct execution control to the first instruction of the procedure. Control can be transferred to a NEAR procedure label from any address in the same segment as the label. Control can be transferred to a FAR procedure label from an address in any segment. Procedure labels must be declared with the PUBLIC and EXTRN directives if they are located in one module but called from another module, as described in Chapter 8, "Creating Programs from Multiple Modules." Example call task ; Call procedure . . . task PROC NEAR ; Start of procedure . . . ret task ENDP ; End of procedure 6.4.4 Code Labels Defined with the LABEL Directive The LABEL directive provides an alternative method of defining code labels. Syntax name LABEL distance The name is the symbol name assigned to the label. The distance can be a type specifier, such as NEAR, FAR, or PROC. PROC means NEAR or FAR, depending on the default memory model, as described in Section 5.1.3, "Defining Basic Attributes of the Module." You can use the LABEL directive to define a second entry point into a procedure. FAR code labels can also be the destination of far jumps or of far calls that use the RETF instruction (see Section 15.3.2, "Defining Procedures"). Example task PROC FAR ; Main entry point . . . task1 LABEL FAR ; Secondary entry point . . . ret task ENDP ; End of procedure 6.5 Defining and Initializing Data The data-definition directives enable you to allocate memory for data. At the same time, you can specify the initial values for the allocated data. Data can be specified as numbers, strings, or expressions that evaluate to constants. The assembler translates these constant values into binary bytes, words, or other units of data. The encoded data is written to the object file at assembly time. 6.5.1 Variables Variables consist of one or more named data objects of a specified size. Syntax [[name]] directive initializer [[,initializer]]... The name is the symbol name assigned to the variable. If no name is assigned, the data is allocated; but the starting address of the variable has no symbolic name. The size of the variable is determined by directive. The directives that can be used to define single-item data objects are listed below: Directive Meaning ────────────────────────────────────────────────────────────────────────── DB Defines byte DW Defines word (2 bytes) DD Defines doubleword (4 bytes) DQ Defines quadword (8 bytes) DT Defines 10-byte variable The optional initializer can be a constant, an expression that evaluates to a constant, or a question mark (?). The question mark is the symbol indicating that the value of the variable is undefined. You can define multiple values by using multiple initializers separated by commas, or by using the DUP operator, as explained in Section 6.5.2, "Arrays and Buffers." Simple data types can allocate memory for integers, strings, addresses, or real numbers. 6.5.1.1 Integer Variables When defining an integer variable, you can specify an initial value as an integer constant or as a constant expression. QuickAssembler generates an error if you specify an initial value too large for the specified variable. Integer values for all sizes except 10-byte variables are stored in binary form. They can be interpreted as either signed or unsigned numbers. For instance, the hexadecimal value 0FFCD can be interpreted either as the signed number -51 or the unsigned number 65,485. The processor cannot tell the difference between signed and unsigned numbers. Some instructions are designed specifically for signed numbers. It is the programmer's responsibility to decide whether a value is to be interpreted as signed or unsigned, and then to use the appropriate instructions to handle the value correctly. Figure 6.1 shows various integer storage formats. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 6.5.1.1 of the manual │ └────────────────────────────────────────────────────────────────────────┘ The directives for defining integer variables are listed below with the sizes of integer they can define: Directive Size of Directive ────────────────────────────────────────────────────────────────────────── DB (bytes) Allocates unsigned numbers from 0 to 255 or signed numbers from -128 to 127. DW (words) Allocates unsigned numbers from 0 to 65,535 or signed numbers from -32,768 to 32,767. The bytes of a word integer are stored in the format shown in Figure 6.1. Note that in assembler listings and in most debuggers the bytes of a word are shown in the opposite order──high byte first──since this is the way most people think of numbers. Word values can be used directly in 8086-family instructions. They can also be loaded, used in calculations, and stored with 8087-family instructions. DD (doublewords) Allocates unsigned numbers from 0 to 4,294,967,295 or signed numbers from -2,147,483,648 to 2,147,483,647. The words of a doubleword integer are stored in the format shown in Figure 6.1. These 32-bit values (called long integers) can be loaded, used in calculations, and stored with 8087-family instructions. Some calculations can be done on these numbers directly with 16-bit 8086-family processors; others involve an indirect method of doing calculations on each word separately (see Chapter 14, "Doing Arithmetic and Bit Calculations"). DQ (quadwords) Allocates 64-bit integers. The doublewords of a quadword integer are stored in the format shown in Figure 6.1. These values can be loaded, used in calculations, and stored with 8087-family instructions. You must write your own routines to use them with 16-bit 8086-family processors. DT Allocates 10-byte (80-bit) integers if the D radix specifier is used. By default, DT allocates packed binary coded decimal (BCD) numbers, as described in Section 6.5.1.2, "Binary Coded Decimal Variables." If you define binary 10-byte integers, you must write your own routines to use routines in calculations. Example integer DB 16 ; Initialize byte to 16 expression DW 4*3 ; Initialize word to 12 empty DQ ? ; Allocate uninitialized long integer DB 1,2,3,4,5,6 ; Initialize six unnamed bytes long DD 4294967295 ; Initialize double word to 4,294,967,295 tb DT 2345d ; Initialize 10-byte binary integer 6.5.1.2 Binary Coded Decimal Variables Binary coded decimal (BCD) numbers provide a method of doing calculations on large numbers without rounding errors. They are sometimes used in financial applications. There are two kinds: packed and unpacked. Unpacked BCD numbers are stored one digit to a byte, with the value in the lower four bits. They can be defined with the DB directive. For example, an unpacked BCD number could be defined and initialized as shown below: unpackedr DB 1,5,8,2,5,2,9 ; Initialized to 9,252,851 unpackedf DB 9,2,5,2,8,5,1 ; Initialized to 9,252,851 Whether least-significant digits can come either first or last depends on how you write the calculation routines that handle the numbers. Calculations with unpacked BCD numbers are discussed in Section 14.5.1. Packed BCD numbers are stored two digits to a byte, with one digit in the lower four bits and one in the upper four bits. The leftmost bit holds the sign (0 for positive or 1 for negative). Packed BCD variables can be defined with the DT directive as shown below: packed DT 9252851 ; Allocate 9,252,851 The 8087-family processors can do fast calculations with packed BCD numbers, as described in Chapter 17, "Calculating with a Math Coprocessor." The 8086-family processors can also do some calculations with packed BCD numbers, but the process is slower and more complicated. See Section 14.5.2 for details. 6.5.1.3 String Variables Strings are normally initialized with the DB directive. The initializing value is specified as a string constant. Strings can also be initialized by specifying each value in the string. For example, the following definitions are equivalent: version1 DB 97,98,99 ; As ASCII values version2 DB 'a','b','c' ; As characters version3 DB "abc" ; As a string One- and two-character strings can also be initialized with any of the other data-definition directives. The last (or only) character in the string is placed in the byte with the lowest address. Either 0 or the first character is placed in the next byte. The unused portion of such variables is filled with zeros. Examples function9 DB 'Hello',13,10,'$' ; Use with DOS INT 21h ; function 9 asciiz DB "\ASM\TEST.ASM",0 ; Use as ASCIIZ string message DB "Enter file name: " ; Use with DOS INT 21h l_message EQU $-message ; function 40h a_message EQU OFFSET message str1 DB "ab" ; Stored as 61 62 str2 DD "ab" ; Stored as 62 61 00 00 str3 DD "a" ; Stored as 61 00 00 00 6.5.1.4 Real-Number Variables Real numbers must be stored in binary format. However, when initializing variables, you can specify decimal or hexadecimal constants and let the assembler automatically encode them into their binary equivalents. QuickAssembler can use two different binary formats for real numbers: IEEE or Microsoft Binary. You can specify the format by using directives (IEEE is the default). This section tells you how to initialize real-number variables, describes the two binary formats, and explains real-number encoding. Initializing and Allocating Real-Number Variables Real numbers can be defined by initializing them either with real-number constants or with encoded hexadecimal constants. The real-number designator (R) must follow numbers specified in encoded format. The directives for defining real numbers are listed below with the sizes of the numbers they can allocate: Directive Size ────────────────────────────────────────────────────────────────────────── DD Allocates short (32-bit) real numbers in either the IEEE or Microsoft Binary format. DQ Allocates long (64-bit) real numbers in either the IEEE or Microsoft Binary format. DT Allocates temporary or 10-byte (80-bit) real numbers. The format of these numbers is similar to the IEEE format. They are always encoded the same regardless of the real-number format. Their size is nonstandard and incompatible with most Microsoft high-level languages. Temporary-real format is provided for those who want to initialize real numbers in the format used internally by 8087-family processors. The 8086-family microprocessors do not have any instructions for handling real numbers. You must write your own routines, use a library that includes real-number calculation routines, or use a coprocessor. The 8087-family coprocessors can load real numbers in the IEEE format; they can also use the values in calculations and store the results back to memory, as explained in Chapter 17, "Calculating with a Math Coprocessor." Examples shrt DD 98.6 ; QuickAsm automatically encodes long DQ 5.391E-4 ; in current format ten_byte DT -7.31E7 eshrt DD 87453333r ; 98.6 encoded in Microsoft ; Binary format elong DQ 3F41AA4C6F445B7Ar ; 5.391E-4 encoded in IEEE format The real-number designator (R) used to specify encoded numbers is explained in Section 6.1.3, "Real-Number Constants." Selecting a Real-Number Format QuickAssembler can encode four-byte and eight-byte real numbers in two different formats: IEEE and Microsoft Binary. Your choice depends on the type of program you are writing. The four primary alternatives are listed below: 1. If your program requires a coprocessor for calculations, you must use the IEEE format. 2. Most high-level languages use the IEEE format. If you are writing modules that will be called from such a language, your program should use the IEEE format. All versions of the C, FORTRAN, and Pascal compilers sold by Microsoft and IBM use the IEEE format. 3. If you are writing a module that will be called from early versions of Microsoft or IBM BASIC, your program should use the Microsoft Binary format. Versions that support only the Microsoft Binary format include: ■ Microsoft QuickBASIC through Version 2.01 ■ Microsoft BASIC Compiler through Version 5.3 ■ IBM BASIC Compiler through Version 2.0 ■ Microsoft GW-BASIC(R) interpreter (all versions) ■ IBM BASICA interpreter (all versions) Microsoft QuickBASIC Version 3.0 supported both the Microsoft Binary and IEEE formats as options. Current and future versions of Microsoft QuickBASIC and the Microsoft and IBM BASIC compilers support only the IEEE format. 4. If you are creating a stand-alone program that does not use a coprocessor, you can choose either format. The IEEE format is better for overall compatibility with high-level languages. The Microsoft Binary format may be necessary for compatibility with existing source code. ────────────────────────────────────────────────────────────────────────── NOTE When you interface assembly-language modules with high-level languages, the real-number format only matters if you initialize real-number variables in the assembly module. If your assembly module does not use real numbers, or if all real numbers are initialized in the high-level-language module, the real-number format does not make any difference. ────────────────────────────────────────────────────────────────────────── By default, QuickAssembler assembles real-number data in the IEEE format. If you wish to use the Microsoft Binary format, you must put the .MSFLOAT directive at the start of your source file before initializing any real-number variables. Real-Number Encoding The IEEE format for encoding four- and eight-byte real numbers is illustrated in Figure 6.2. Although this figure places the most-significant bit first for illustration, low bytes actually appear first in memory. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 6.5.1.4 of the manual │ └────────────────────────────────────────────────────────────────────────┘ The parts of the real numbers are described below: 1. Sign bit (0 for positive or 1 for negative) in the upper bit of the first byte. 2. Exponent in the next bits in sequence (8 bits for short real number or 11 bits for long real number). 3. All except the first set bit of mantissa in the remaining bits of the variable. Since the first significant bit is known to be set, it need not be actually stored. The length is 23 bits for short real numbers and 52 bits for long real numbers. The Microsoft Binary format for encoding real numbers is illustrated in Figure 6.3. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 6.5.1.4 of the manual │ └────────────────────────────────────────────────────────────────────────┘ The three parts of real numbers are described below: 1. Biased exponent (8 bits) in the high-address byte. The bias is 81H for short real numbers and 401H for long real numbers. 2. Sign bit (0 for positive or 1 for negative) in the upper bit of the second-highest byte. 3. All except the first set bit of mantissa in the remaining 7 bits of the second-highest byte and in the remaining bytes of the variable. Since the first significant bit is known to be set, it need not be actually stored. The length is 23 bits for short real numbers and 55 bits for long real numbers. QuickAssembler also supports the 10-byte temporary-real format used internally by 8087-family coprocessors. This format is similar to IEEE format. The size is nonstandard and is not used by Microsoft compilers or interpreters. Since the coprocessors can load and automatically convert numbers in the more standard 4-byte and 8-byte formats, the 10-byte format is seldom used in assembly-language programming. The temporary-real format for encoding real numbers is illustrated in Figure 6.4. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 6.5.1.4 of the manual │ └────────────────────────────────────────────────────────────────────────┘ The four parts of the real numbers are described below: 1. Sign bit (0 for positive or 1 for negative) in the upper bit of the first byte. 2. Exponent in the next bits in sequence (15 bits for 10-byte real). 3. The integer part of mantissa in the next bit in sequence (bit 63). 4. Remaining bits of mantissa in the remaining bits of the variable. The length is 63 bits. Notice that the 10-byte temporary-real format stores the integer part of the mantissa. This differs from the 4-byte and 8-byte formats, in which the integer part is implicit. 6.5.2 Arrays and Buffers Arrays, buffers, and other data structures consisting of multiple data objects of the same size can be defined with the DUP operator. This operator can be used with any of the data-definition directives described in this chapter. Syntax count DUP (initialvalue[[,initialvalue]]...) The count sets the number of times to define initialvalue. The initial value can be any expression that evaluates to an integer value, a character constant, or another DUP operator. It can also be the undefined symbol (?) if there is no initial value. Multiple initial values must be separated by commas. If multiple values are specified within the parentheses, the sequence of values is allocated count times. For example, the statement DB 5 DUP ("Text ") allocates the string "Text " five times for a total of 25 bytes. DUP operators can be nested up to 17 levels. The initial value (or values) must always be placed within parentheses. Examples array DD 10 DUP (1) ; 10 doublewords ; initialized to 1 buffer DB 256 DUP (?) ; 256 byte buffer masks DB 20 DUP (040h,020h,04h,02h) ; 80 byte buffer ; with bit masks DB 32 DUP ("I am here ") ; 320 byte buffer with ; signature for debugging three_d DD 5 DUP (5 DUP (5 DUP (0))) ; 125 doublewords ; initialized to 0 Note that QuickAssembler sometimes generates different object code when the DUP operator is used rather than when multiple values are given. For example, the statement test1 DB ?,?,?,?,? ; Indeterminate is "indeterminate." It causes QuickAssembler to write five zero-value bytes to the object file. The statement test2 DB 5 DUP (?) ; Undefined is "undefined." It causes QuickAssembler to increase the offset of the next record in the object file by five bytes. Therefore, an object file created with the first statement will be larger than one created with the second statement. In most cases, the distinction between indeterminate and undefined definitions is trivial. The linker adjusts the offsets so that the same executable file is generated in either case. However, the difference is significant in segments with the COMMON combine type. If COMMON segments in two modules contain definitions for the same variable, one with an indeterminate value and one with an explicit value, the actual value in the executable file varies depending on link order. If the module with the indeterminate value is linked last, the 0 initialized for it overrides the explicit value. You can prevent this by always using undefined rather than indeterminate values in COMMON segments. For example, use the first of the following statements: test3 DB 1 DUP (?) ; Undefined - doesn't initialize test4 DB ? ; Indeterminate - initializes 0 If you use the undefined definition, the explicit value is always used in the executable file regardless of link order. 6.5.3 Labeling Variables The LABEL directive can be used to define a variable of a given size at a specified location. It is useful if you want to refer to the same data as variables of different sizes. Syntax name LABEL type The name is the symbol assigned to the variable, and type is the variable size. The type can be any one of the following type specifiers: BYTE, WORD, DWORD, QWORD, or TBYTE. It can also be the name of a previously defined structure. Examples warray LABEL WORD ; Access array as 50 words darray LABEL DWORD ; Access same array as 25 doublewords barray DB 100 DUP(?) ; Access same array as 100 bytes 6.5.4 Pointer Variables The assembler lets you explicitly allocate pointers. A pointer (address) variable is either two or four bytes in size; consequently, any word or doubleword data definition can create a pointer variable. However, declaring pointer variables explicitly gives more accurate debugging information to the environment. Pointer-variable definitions have the following form: symbol [[DW | DD]] type PTR initialvalue The type can consist of the name of a record, structure, or one of the standard types described in Section 6.3, "Using Type Specifiers." Example string DB "from swerve of shore to bend of bay", 0 pstring DW BYTE PTR string ; Declares a near pointer to string. fpstring DD BYTE PTR string ; Declares a far pointer to string. In this example, near (two-byte) and far (four-byte) pointers are declared and initialized to the value of a null terminated string. This is the format used in C language, and the pointer variables in the example could be used in C functions that process strings. Using an explicit pointer declaration generates debugging information, causing the variable to be viewed as a pointer during debugging. Consequently, the environment properly interprets the variable when you enter it as a Watch expression. No special syntax is required. This use of PTR is distinct from the use of PTR to alter the type of a variable during an instruction. The assembler uses the context of the program to determine which way you are using the PTR keyword. 6.6 Setting the Location Counter As the assembler encounters labels and data declarations, it needs to know what address to assign. This function is fulfilled by the location counter, which indicates the offset address corresponding to the current line of source code. This value is generally equal to the value that IP will have at run time. The assembler increments the location counter as it processes each statement. However, you can set the location counter directly by using the ORG directive. Syntax ORG expression Subsequent code and data offsets begin at the new offset specified set by expression. The expression must resolve to a constant number. In other words, all symbols used in the expression must be known on the first pass of the assembler. ────────────────────────────────────────────────────────────────────────── NOTE The value of the location counter, represented by the dollar sign ($), can be used in the expression, as described in Section 9.3, "Using the Location Counter." ────────────────────────────────────────────────────────────────────────── Example 1 ; Labeling absolute addresses STUFF SEGMENT AT 0 ; Segment has constant value 0 ORG 410h ; Offset has constant value 410h equipment LABEL WORD ; Value at 0000:0410 labeled "equipment" ORG 417h ; Offset has constant value 417h keyboard LABEL WORD ; Value at 0000:0417 labeled "keyboard" STUFF ENDS .CODE . . . ASSUME ds:STUFF ; Tell the assembler mov ax,STUFF ; Tell the processor mov ds,ax mov dx,equipment mov keyboard,ax Example 1 illustrates one way of assigning symbolic names to absolute addresses. Example 2 ; Format for .COM files _TEXT SEGMENT ASSUME cs:_TEXT,ds:_TEXT,ss:_TEXT,es:_TEXT ORG 100h ; Skip 100h bytes of DOS header entry: jmp begin ; Jump over data variable DW ? ; Put more data here . begin: . ; First line of code . ; Put more code here _TEXT ENDS END entry Example 2 illustrates how the ORG directive is used to initialize the starting execution point in .COM files. 6.7 Aligning Data Some operations are more efficient when the variable used in the operation is lined up on a boundary of a particular size. The EVEN and ALIGN directives can be used to pad the object file so that the next variable is aligned on a specified boundary. Syntax 1 EVEN Syntax 2 ALIGN number The EVEN directive always aligns on the next even byte. The ALIGN directive aligns on the next byte that is a multiple of number. The number must be a power of 2. For example, use ALIGN 2 or EVEN to align on word boundaries, or use ALIGN 4 to align on doubleword boundaries. If the value of the location counter is not on the specified boundary when an ALIGN directive is encountered, the location counter is incremented to a value on the boundary. If the location counter is already on the boundary, the directive has no effect. When the assembler increments the location counter, it also pads the skipped memory locations by inserting certain values. If the segment is a data segment, the assembler always pads these locations with zeros. If the segment is a code segment, the assembler pads skipped memory locations with a two-byte no-op instruction (instruction that performed no operation) where possible: xchg bx,bx This instruction, which assembles as 87D8 hexadecimal, does nothing, but it executes faster than two NOP instructions. Where there is no room for the two-byte no-op, the assembler inserts the one-byte NOP instruction. The ALIGN and EVEN directives give no efficiency improvements on processors that have an 8-bit data bus (such as the 8088). These processors always fetch data one byte at a time, regardless of the alignment. However, using EVEN can speed certain operation on processors that have a 16-bit data bus (such as the 8086), since the processor can fetch a word if the data is word aligned, but must do two memory fetches if the data is not word aligned. ────────────────────────────────────────────────────────────────────────── NOTE The ALIGN directive is a new feature of recent versions of Microsoft assemblers, starting with 5.0. In previous versions, data could be word aligned by using the EVEN directive, but other alignments could not be specified. The EVEN directive should not be used in segments with BYTE align type. Similarly, the number specified with the ALIGN directive should be no greater than the size of the align type of the current segment (thus ensuring that number is a divisor of the align type of the segment). ────────────────────────────────────────────────────────────────────────── Example DOSSEG .MODEL small,c .STACK 100h .DATA . . . ALIGN 2 ; For faster data access stuff DW 66,124,573,99,75 . . . ALIGN 2 ; For faster data access evenstuff DW ?,?,?,?,? .CODE start: mov ax,@data ; Load segment location mov ds,ax ; into DS mov es,ax ; and ES registers mov cx,5 ; Load count mov si,OFFSET stuff ; Point to source mov di,OFFSET evenstuff ; and destination ALIGN 2 ; Align for faster loop access mloop: lodsw ; Load a word inc ax ; Make it even by incrementing and ax,NOT 1 ; and turning off first bit stosw ; Store loop mloop ; Again . . . In this example, the words at stuff and evenstuff are forced to word boundaries. This makes access to the data faster with processors that have a 16-bit data bus. Without this alignment, the initial data might start on an odd boundary and the processor would have to fetch half of each word at a time. Similarly, the alignment in the code segment speeds up repeated access to the code at the start of the loop. The sample code sacrifices program size in order to achieve moderate improvements on the 8086 and 80286. There is no speed advantage on the 8088. ──────────────────────────────────────────────────────────────────────────── Chapter 7: Using Structures and Records QuickAssembler can define and use two kinds of multifield variables: structures and records. "Structures" are templates for data objects made up of smaller data objects. A structure can be used to define structure variables, which are made up of smaller variables called fields. Fields within a structure can be different sizes, and each can be accessed individually. "Records" are templates for data objects whose bits can be described as groups of bits called fields. A record can be used to define record variables. Each bit field in a record variable can be used separately in constant operands or expressions. The processor cannot access bits individually at run time, but bit fields can be used with logical bit instructions to change bits indirectly. This chapter describes structures and records and tells how to use them. 7.1 Structures A structure variable is a collection of data objects that can be accessed symbolically as a single data object. Objects within the structure can have different sizes and can be accessed symbolically. There are two steps in using structure variables: 1. Declare a structure type. A structure type is a template for data. It declares the sizes and, optionally, the initial values for objects in the structure. By itself the structure type does not define any data. The structure type is used by QuickAssembler during assembly but is not saved as part of the object file. 2. Define one or more variables having the structure type. For each variable defined, memory is allocated to the object file in the format declared by the structure type. The structure variable can then be used as an operand in assembler statements. The structure variable can be accessed as a whole by using the structure name, or individual fields can be accessed by using structure and field names. 7.1.1 Declaring Structure Types The STRUC and ENDS directives mark the beginning and end of a type declaration for a structure. Syntax name STRUC fielddeclarations name ENDS The name declares the name of the structure type. It must be unique. The fielddeclarations declare the fields of the structure. Any number of field declarations may be given. They must follow the form of data definitions described in Section 6.5, "Defining and Initializing Data." Default initial values may be declared individually or with the DUP operator. The names given to fields must be unique within the source file where they are declared. When variables are defined, the field names will represent the offset from the beginning of the structure to the corresponding field. When declaring strings in a structure type, make sure the initial values are long enough to accommodate the largest possible string. Strings smaller than the field size can be placed in the structure variable, but larger strings will be truncated. A structure declaration can contain both field declarations and comments. Conditional-assembly statements are allowed in structure declarations; no other kinds of statements are allowed. Since the STRUC directive is not allowed inside structure declarations, structures cannot be nested. ────────────────────────────────────────────────────────────────────────── NOTE The ENDS directive that marks the end of a structure has the same mnemonic as the ENDS directive that marks the end of a segment. The assembler recognizes the meaning of the directive from context. Make sure each SEGMENT directive and each STRUC directive has its own ENDS directive. ────────────────────────────────────────────────────────────────────────── Example student STRUC ; Structure for student records id DW ? ; Field for identification # sname DB "Last, First Middle " scores DB 10 DUP (100) ; Field for 10 scores student ENDS Within the sample structure student, the fields id, sname, and scores have the offset values 0, 2, and 24, respectively. 7.1.2 Defining Structure Variables A structure variable is a variable with one or more fields of different sizes. The sizes and initial values of the fields are determined by the structure type with which the variable is defined. Syntax [[name]] structurename <[[initialvalue [[,initialvalue]]...]]> The name is the name assigned to the variable. If no name is given, the assembler allocates space for the variable, but does not give it a symbolic name. The structurename is the name of a structure type previously declared by using the STRUC and ENDS directives. An initialvalue can be given for each field in the structure. Its type must not be incompatible with the type of the corresponding field. The angle brackets (< >) are required even if no initial value is given. If initial values are given for more than one field, the values must be separated by commas. If the DUP operator (see Section 6.5.2, "Arrays and Buffers") is used to initialize multiple structure variables, only the angle brackets and initial values, if given, need to be enclosed in parentheses. For example, you can define an array of structure variables as shown below: war date 365 DUP (<,,1940>) You need not initialize all fields in a structure. If an initial value is left blank, the assembler automatically uses the default initial value of the field, which was originally determined by the structure type. If there is no default value, the field is undefined. Examples The following examples use the student type declared in the example in Section 7.1.1, "Declaring Structure Types": s1 student <> ; Uses default values of type s2 student <1467,"White, Robert D.",> ; Override default values of first two ; fields--use default value of third sarray student 100 DUP (<>) ; Declare 100 student variables ; with default initial values Note that you cannot initialize any structure field that has multiple values if this field was given a default initial value when the structure was declared. For example, assume the following structure declaration: stuff STRUC buffer DB 100 DUP (?) ; Can't override crlf DB 13,10 ; Can't override query DB 'Filename: ' ; String <= can override endmark DB 36 ; Can override stuff ENDS The buffer and crlf fields cannot be overridden by initial values in the structure definition because they have multiple values. The query field can be overridden as long as the overriding string is no longer than query (10 bytes). A longer string would generate an error. The endmark field can be overridden by any byte value. 7.1.3 Using Structure Operands Like other variables, structure variables can be accessed by name. Fields within structure variables can also be accessed by using the syntax shown below: Syntax variable.field The variable must be the name of a structure (or an operand that resolves to the address of a structure). The field must be the name of a field within that structure. The variable is separated from the field by a period. The period is discussed as a structure-field-name operator in Section 9.2.1.2. The address of a structure operand is the sum of the offsets of variable and field. The address is relative to the segment or group in which the variable is declared. Examples date STRUC ; Declare structure month DB ? day DB ? year DW ? date ENDS .DATA yesterday date <9,30,1987> ; Declare structure today date <10,1,1987> ; variables tomorrow date <10,2,1987> .CODE . . . mov al,yesterday.day ; Use structure variables mov ah,today.month ; as operands mov tomorrow.year,dx mov bx,OFFSET yesterday ; Load structure address mov ax,[bx].month ; Use as indirect operand . . . 7.2 Records A record variable is a byte or word variable in which specific bit fields can be accessed symbolically. Bit fields within the record can have different sizes. There are two steps in declaring record variables: 1. Declare a record type. A record type is a template for data. It declares the sizes and, optionally, the initial values for bit fields in the record. By itself, the record type does not define any data. The record type is used by Quick-Assembler during assembly but is not saved as part of the object file. 2. Define one or more variables having the record type. For each variable defined, memory is allocated to the object file in the format declared by the type. The record variable can then be used as an operand in assembler statements. The record variable can be accessed as a whole by using the record name, or individual fields can be specified by using the record name and a field name combined with the field-name operator. A record type can also be used as a constant (immediate data). 7.2.1 Declaring Record Types The RECORD directive declares a record type for an 8-bit or 16-bit record that contains one or more bit fields. Syntax recordname RECORD field [[,field]]... The recordname is the name of the record type to be used when creating the record. The field declares the name, width, and initial value for the field. The syntax for each field is shown below: Syntax fieldname:width[[=expression]] The fieldname is the name of a field in the record, width is the number of bits in the field, and expression is the initial (or default) value for the field. Any number of field combinations can be given for a record, as long as each is separated from its predecessor by a comma. The sum of the widths for all fields must not exceed 16 bits. The width must be a constant. If the total width of all declared fields is larger than eight bits, the assembler uses two bytes. Otherwise, only one byte is used. If expression is given, it declares the initial value for the field. An error message is generated if an initial value is too large for the width of its field. If the field is at least seven bits wide, you can use an ASCII character for expression. The expression must not contain a forward reference to any symbol. In all cases, the first field you declare goes into the most significant bits of the record. Successively declared fields are placed in the succeeding bits to the right. If the fields you declare do not total exactly 8 bits or exactly 16 bits, the entire record is shifted right so that the last bit of the last field is the lowest bit of the record. Unused bits in the high end of the record are initialized to 0. Example 1 color RECORD blink:1,back:3,intense:1,fore:3 Example 1 creates a byte record type color having four fields: blink, back, intense, and fore. The contents of the record type are shown below: ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 7.2.1 of the manual │ └────────────────────────────────────────────────────────────────────────┘ Since no initial values are given, all bits are set to 0. Note that this is only a template maintained by the assembler. No data is created. Example 2 cw RECORD r1:3=0,ic:1=0,rc:2=0,pc:2=3,r2:2=1,masks:6=63 Example 2 creates a record type cw having six fields. Each record declared by using this type occupies 16 bits of memory. The bit diagram below shows the contents of the record type: ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 7.2.1 of the manual │ └────────────────────────────────────────────────────────────────────────┘ Default values are given for each field. They can be used when data is declared for the record. 7.2.2 Defining Record Variables A record variable is an 8-bit or 16-bit variable whose bits are divided into one or more fields. Syntax [[name]] recordname <[[initialvalue [[,initialvalue]]...]]> The name is the symbolic name of the variable. If no name is given, the assembler allocates space for the variable, but does not give it a symbolic name. The recordname is the name of a record type that was previously declared by using the RECORD directive. An initialvalue for each field in the record can be given as an integer, a character constant, or an expression that resolves to a value compatible with the size of the field. Angle brackets (< >) are required even if no initial value is given. If initial values for more than one field are given, the values must be separated by commas. If the DUP operator (see Section 6.5.2, "Arrays and Buffers") is used to initialize multiple record variables, only the angle brackets and initial values, if given, need to be enclosed in parentheses. For example, you can define an array of record variables as shown below: xmas color 50 DUP (<1,2,0,4>) You do not have to initialize all fields in a record. If an initial value is left blank, the assembler automatically uses the default initial value of the field. This is declared by the record type. If there is no default value, each bit in the field is cleared. Sections 7.2.3, "Using Record Operands and Record Variables," and 7.2.4, "Record Operators," illustrate ways to use record data after it has been declared. Example 1 color RECORD blink:1,back:3,intense:1,fore:3 ; Record declaration warning color <1,0,1,4> ; Record definition The definition above creates a variable named warning whose type is given by the record type color. The initial values of the fields in the variable are set to the values given in the record definition. The initial values would override the default record values, had any been given in the declaration. The contents of the record variable are shown below: ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 7.2.2 of the manual │ └────────────────────────────────────────────────────────────────────────┘ Example 2 color RECORD blink:1,back:3,intense:1,fore:3 ; Record declaration colors color 16 DUP (<>) ; Record declaration Example 2 creates an array named colors containing 16 variables of type color. Since no initial values are given in either the declaration or the definition, the variables have undefined (0) values. Example 3 cw RECORD r1:3=0,ic:1=0,rc:2=0,pc:2=3,r2:2=1,masks:6=63 newcw cw <,,2,,,> Example 3 creates a variable named newcw with type cw. The default values set in the type declaration are used for all fields except the rc field. This field is set to 2. The contents of the variable are shown below: ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 7.2.2 of the manual │ └────────────────────────────────────────────────────────────────────────┘ 7.2.3 Using Record Operands and Record Variables A record operand refers to the value of a record type. It should not be confused with a record variable. A record operand is a constant; a record variable is a value stored in memory. A record operand can be used with the following syntax: Syntax recordname <[[value[[,value]]...]]> The recordname must be the name of a record type declared in the source file. The optional value is the value of a field in the record. If more than one value is given, each value must be separated by a comma. Values can include expressions or symbols that evaluate to constants. The enclosing angle brackets (<>) are required, even if no value is given. If no value for a field is given, the default value for that field is used. Example .DATA color RECORD blink:1,back:3,intense:1,fore:3 ; Record declaration window color <0,6,1,6> ; Record definition .CODE . . . mov ah,color <0,3,0,2> ; Load record operand ; (constant value 32h mov bh,window ; Load record variable ; (memory value 6Eh) In this example, the record operand color <0,3,0,2> and the record variable warning are loaded into registers. The contents of the values are shown below: ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 7.2.3 of the manual │ └────────────────────────────────────────────────────────────────────────┘ 7.2.4 Record Operators The WIDTH and MASK operators are used exclusively with records to return constant values representing different aspects of previously declared records. 7.2.4.1 The MASK Operator The MASK operator returns a bit mask for the bit positions in a record occupied by the given record field. A bit in the mask contains a 1 if that bit corresponds to a field bit. All other bits contain 0. Syntax MASK {recordfieldname | record} The recordfieldname may be the name of any field in a previously defined record. The record may be the name of any previously defined record. The NOT operator is sometimes used with the MASK operator to reverse the bits of a mask. Example .DATA color RECORD blink:1,back:3,intense:1,fore:3 message color <0,5,1,1> .CODE . . . mov ah,message ; Load initial 0101 1001 and ah,NOT MASK back ; Turn off AND 1000 1111 ; "back" --------- ; 0000 1001 or ah,MASK blink ; Turn on OR 1000 0000 ; "blink" --------- ; 1000 1001 xor ah,MASK intense ; Toggle XOR 0000 1000 ; "intense" --------- ; 1000 0001 7.2.4.2 The WIDTH Operator The WIDTH operator returns the width (in bits) of a record or record field. Syntax WIDTH {recordfieldname | record} The recordfieldname may be the name of any field defined in any record. The record may be the name of any defined record. Note that the width of a field is the number of bits assigned for that field; the value of the field is the starting position (from the right) of the field. Examples .DATA color RECORD blink:1,back:3,intense:1,fore:3 wblink EQU WIDTH blink ; "wblink" = 1 "blink" = 7 wback EQU WIDTH back ; "wback" = 3 "back" = 4 wintense EQU WIDTH intense ; "wintense" = 1 "intense" = 3 wfore EQU WIDTH fore ; "wfore" = 3 "fore" = 0 wcolor EQU WIDTH color ; "wcolor" = 8 prompt color <1,5,1,1> .CODE . . . IF (WIDTH color) GE 8 ; If color is 16 bit, load mov ax,prompt ; into 16-bit register ELSE ; else mov al,prompt ; load into low 8-bit register xor ah,ah ; and clear high 8-bit register ENDIF 7.2.5 Using Record-Field Operands Record-field operands represent the location of a field in its corresponding record. The operand evaluates to the bit position of the low-order bit in the field and can be used as a constant operand. The field name must be from a previously declared record. Record-field operands are often used with the WIDTH and MASK operators, as described in Sections 7.2.4.1 and 7.2.4.2. Example .DATA color RECORD blink:1,back:3,intense:1,fore:3 ; Record declaration cursor color <1,5,1,1> ; Record definition .CODE . . . ; Rotate "back" of "cursor" without changing other values mov al,cursor ; Load value from memory mov ah,al ; Save a copy for work 1101 1001= and al,NOT MASK back ; Mask out old bits AND 1000 1111= ; to save old cursor --------- ; 1000 1001= mov cl,back ; Load bit position shr ah,cl ; Shift to right 0000 1101= inc ah ; Increment 0000 1110= shl ah,cl ; Shift left again 1110 0000= and ah,MASK back ; Mask off extra bits AND 0111 0000= ; to get new cursor --------- ; 0110 0000 or ah,al ; Combine old and new OR 1000 1001 ; --------- mov cursor,ah ; Write back to memory 1110 1001 This example illustrates several ways in which record fields can be used as operands and in expressions. ──────────────────────────────────────────────────────────────────────────── Chapter 8: Creating Programs from Multiple Modules Most medium and large assembly-language programs are created from several source files or modules. When several modules are used, the scope of symbols becomes important. This chapter discusses the scope of symbols and explains how to declare global symbols that can be accessed from any module. It also tells you how to specify a module that will be accessed from a library. Symbols, such as labels and variable names, can be either local or global in scope. By default, all symbols are local; they are specific to the source file in which they are defined. Symbols must be declared global if they must be accessed from modules other than the one in which they are defined. To declare symbols global, they must be declared public in the source module in which they are defined. They must also be declared external in any module that must access the symbol. If the symbol represents uninitialized data, it can be declared communal──meaning that the symbol is both public and external. The PUBLIC, EXTRN, and COMM directives are used to declare symbols public, external, and communal, respectively. ────────────────────────────────────────────────────────────────────────── NOTE The term "local" often has a different meaning in assembly language than in many high-level languages. Local symbols in compiled languages are symbols that are known only within a procedure (called a function, routine, subprogram, or subroutine, depending on the language). You can use QuickAssembler to generate these kinds of variables, as explained in Section 15.3.6, "Creating Locals Automatically." By default, the assembler converts all lowercase letters in names declared with the PUBLIC, EXTRN, and COMM directives to uppercase letters before copying the name to the object file. To preserve lowercase names in public symbols, choose Preserve Case or Preserve Extrn from the Assembler Flags dialog box, or assemble with /Cx or /Cl on the QCL command line. This should be done when preparing assembler modules to be linked with modules from C and other case-sensitive languages. ────────────────────────────────────────────────────────────────────────── 8.1 Declaring Symbols Public The PUBLIC directive is used to declare symbols public so that they can be accessed from other modules. If a symbol is not declared public, the symbol name is not written to the object file. The symbol has the value of its offset address during assembly, but the name and address are not available to the linker. If the symbol is declared public, its name is associated with its offset address in the object file. During linking, symbols in different modules──but with the same name──are resolved to a single address. Public symbol names are also used by some symbolic debuggers (such as SYMDEB) to associate addresses with symbols. Syntax PUBLIC declaration [[,declaration]]... Each declaration has the following syntax: [[lang]] name The optional lang field contains a language specifier that overrides the language specified by the .MODEL directive. With this statement, the language specifier determines naming conventions for the variable that it precedes. The specifier can be C, FORTRAN, Pascal, or BASIC. The C naming convention prefixes each variable with an underscore (_); the other conventions do not. If you specify lang with the .MODEL directive, all procedures are automatically public. However, you must use the PUBLIC directive for any data that you want to access from other modules. Note that using the C type specifier does not preserve case. You must choose one of the assembler flags or options that preserve case. The name must be the name of a variable, label, or numeric equate defined within the current source file. PUBLIC declarations can be placed anywhere in the source file. Equate names, if given, can only represent one-byte or two-byte integer or string values. Text macros (or text equates) cannot be declared public. Note that although absolute symbols can be declared public, aliases for public symbols may cause errors. For example, the following statements are illegal: PUBLIC lines ; Declare absolute symbol public lines EQU rows ; Declare alias for lines rows EQU 25 ; Illegal - Assign value to alias Example .MODEL small,c PUBLIC true,status,first,clear true EQU -1 ; Public constant .DATA status DB 1 ; Public variable .CODE . . . first LABEL FAR ; Public label clear PROC ; Procedure names are automatically public . ; with .MODEL model, lang . . clear ENDP 8.2 Declaring Symbols External If a symbol undeclared in a module must be accessed by instructions in that module, it must be declared with the EXTRN directive. This directive tells the assembler not to generate an error, even though the symbol is not in the current module. The assembler assumes that the symbol occurs in another module. However, the symbol must actually exist and must be declared public in some module. Otherwise, the linker generates an error. Syntax EXTRN declaration [[,declaration]]... Each declaration has the following syntax: [[lang]]name:type The optional lang field contains a language specifier that overrides the language specified by the .MODEL directive. With this statement, the language specifier determines naming conventions for the variable that it precedes. The specifier can be C, FORTRAN, Pascal, or BASIC. The C naming convention prefixes each variable with an underscore (_); the other conventions do not. Note that using the C type specifier does not preserve case. You must choose one of the assembler flags or options that preserve case. The EXTRN directive defines an external variable, label, or symbol of the specified name and type. The type must match the type given to the item in its actual definition in some other module. It can be any one of the following: Description Types ────────────────────────────────────────────────────────────────────────── Distance specifier NEAR, FAR, or PROC Size specifier BYTE, WORD, DWORD, QWORD, or TBYTE Absolute ABS The ABS type is for symbols that represent constant numbers, such as equates declared with the EQU and = directives (see Section 11.1, "Using Equates"). The PROC type represents the default type for a procedure. For programs that use simplified segment directives, the type of an external symbol declared with PROC will be NEAR for small or compact model, or FAR for medium, large, or huge model. Section 5.1.3, "Defining Basic Attributes of the Model," tells you how to declare the memory model using the .MODEL directive. If full segment definitions are used, the default type represented by PROC is always NEAR. Although the actual address of an external symbol is not determined until link time, the assembler assumes a default segment for the item, based on where the EXTRN directive is placed in the source code. Placement of EXTRN directives should follow these rules: ■ NEAR code labels (such as procedures) must be declared in the code segment from which they are accessed. ■ FAR code labels can be declared anywhere in the source code. It may be convenient to declare them in the code segment from which they are accessed if the label may be FAR in one context or NEAR in another. ■ Data must be declared in the segment in which it occurs. This may require that you define a dummy data segment for the external declaration. ■ Absolute symbols can be declared anywhere in the source code. Example 1 EXTRN max:ABS,act:FAR ; Constant or FAR label anywhere DOSSEG .MODEL small,c .STACK 100h .DATA EXTRN nvar:BYTE ; NEAR variable in near data .FARDATA EXTRN fvar:WORD ; FAR variable in far data .CODE .STARTUP EXTRN task:PROC ; PROC or NEAR in near code ASSUME es:SEG fvar ; Tell assembler mov ax,SEG fvar ; Tell processor that ES mov es,ax ; has far data segment . . . mov ah,nvar ; Load external NEAR variable mov bx,fvar ; Load external FAR variable mov cx,max ; Load external constant call task ; Call procedure (NEAR or FAR) jmp act ; Jump to FAR label END The example above shows how each type of external symbol could be declared and used in a small-model program that uses simplified segment directives. Notice the use of the PROC type specifier to make the external-procedure memory model independent. The jump and its external declaration are written so that they will be FAR regardless of the memory model. Using these techniques, you can change the memory model without breaking code. Example 2 EXTRN max:ABS,act:FAR ; Constant or FAR label anywhere STACK SEGMENT PARA STACK 'STACK' DB 100h DUP (?) STACK ENDS _DATA SEGMENT WORD PUBLIC 'DATA' EXTRN nvar:BYTE ; NEAR variable in near data _DATA ENDS FAR_DATA SEGMENT PARA 'FAR_DATA' EXTRN fvar:WORD ; FAR variable in far data FAR_DATA ENDS DGROUP GROUP _DATA,STACK _TEXT SEGMENT BYTE PUBLIC 'CODE' EXTRN task:NEAR ; NEAR procedure in near code ASSUME cs:_TEXT,ds:DGROUP,ss:STACK start: mov ax,DGROUP ; Load segment mov ds,ax ; into DS ASSUME es:SEG fvar ; Tell assembler mov ax,SEG fvar ; Tell processor that ES mov es,ax ; has far data segment . . . mov ah,nvar ; Load external NEAR variable mov bx,fvar ; Load external FAR variable mov cx,max ; Load external constant call task ; Call NEAR procedure jmp act ; Jump to FAR label _TEXT ENDS END start Example 2 shows a fragment similar to the one in Example 1, but with full segment definitions. Notice that the types of code labels must be declared specifically. If you wanted to change the memory model, you would have to specifically change each external declaration and each call or jump. 8.3 Using Multiple Modules The following source files illustrate a program that uses public and external declarations to access instruction labels. The program consists of two modules called hello and display. The hello module is the program's initializing module. Execution starts at the instruction labeled start in the hello module. After initializing the data segment, the program calls the procedure display in the display module, where a DOS call is used to display a message on the screen. Execution then returns to the address after the call in the hello module. The hello module is shown below: TITLE hello DOSSEG .MODEL small,c .STACK 256 .DATA PUBLIC message, lmessage message DB "Hello, world.",13,10 lmessage EQU $ - message .CODE EXTRN display:PROC ; Declare in near code segment .STARTUP call display ; Call other module mov ax,04C00h ; Terminate with exit code 0 int 21h ; Call DOS END The display module is shown below: TITLE display EXTRN lmessage:ABS ; Declare anywhere .MODEL small .DATA EXTRN message:BYTE ; Declare in near data segment .CODE display PROC mov bx,1 ; File handle for standard output mov cx,lmessage ; Message length mov dx,OFFSET message ; Message address mov ah,40h ; Write function int 21h ; Call DOS ret display ENDP END The sample program is a variation of the HELLO.ASM program used in the examples in Chapter 4, "Writing Stand-Alone Assembly Programs," except that it uses an external procedure to display to the screen. Notice that all symbols defined in one module but used in another are declared PUBLIC in the defining module and declared EXTRN in the using module. For instance, message and lmessage are declared PUBLIC in the program HELLO.ASM and declared EXTRN in DISPLAY.ASM. The procedure display is declared EXTRN in HELLO.ASM. The symbol display is automatically public in the simplified segment version, but you would have to specifically declare it PUBLIC if you used full segments. To create an executable file for these modules, you can add both files to the environment's Program List dialog box. You can also assemble the modules with the following command line: QCL hello.asm display.asm The output is placed in the executable file HELLO.EXE. For each source module, QuickAssembler writes a module name to the object file. The module name is used by some debuggers and by the linker when it displays error messages. With QuickAssembler, the module name is always the base name of the source module file. For compatibility, QuickAssembler recognizes the NAME directive. However, NAME has no effect. Arguments to the directive are ignored. 8.4 Declaring Symbols Communal Communal variables are uninitialized variables that are both public and external. They are often declared in include files. If a variable must be used by several assembly routines, you can declare the variable communal in an include file, and then include the file in each of the assembly routines. Although the variable is declared in each source module, it exists at only one address. Using a communal variable in an include file and including it in several source modules is an alternative to defining the variable and declaring it public in one source module and then declaring it external in other modules. If a variable is declared communal in one module and public in another, the public declaration takes precedence and the communal declaration has the same effect as an external declaration. Syntax COMM definition[[,definition]]... Each definition has the following syntax: [[NEAR | FAR]] [[lang]] label:size[[:count]] A communal variable can be NEAR or FAR. If neither is specified, the type will be that of the default memory model. If you use simplified segment directives, the default type is NEAR for small and medium models, or FAR for compact, large, and huge models. If you use full segment definitions, the default type is NEAR. The optional lang field can be C, BASIC, FORTRAN, or Pascal. The use of the C keyword turns on the C naming convention──the assembler prefixes the name of the variable with an underscore (_). The use of any of the other language types turns off the C naming convention, even if you specified C with the .MODEL directive. Note that the use of C does not preserve case. You must choose one of the assembler flags or options that preserve case. The label is the name of the variable. The size can be BYTE, WORD, DWORD, QWORD, TBYTE, or the name of a structure. The count is the number of elements. If no count is given, one element is assumed. Multiple variables can be defined with one COMM statement by separating each definition with a comma. ────────────────────────────────────────────────────────────────────────── NOTE C variables declared outside functions (except static variables) are communal unless explicitly initialized; they are the same as assembly-language communal variables. If you are writing assembly-language modules for C, you can declare the same communal variables in C include files and in QuickAssembler include files. ────────────────────────────────────────────────────────────────────────── QuickAssembler cannot tell whether a communal variable has been used in another module. Allocation of communal variables is handled by LINK. As a result, communal variables have the following limitations that other variables declared in assembly language do not have: ■ Communal variables cannot be initialized. Under DOS, initial values are not guaranteed to be 0 or any other value. The variables can be used for data, such as file buffers, that is not given a value until run time. ■ Communal variables are not guaranteed to be allocated in the sequence in which they are declared. Assembly-language techniques that depend on the sequence and position in which data is defined should not be used with communal variables. For example, the following statements do not work: COMM wbuffer:WORD:128 lbuffer EQU $ - buffer ; "lbuffer" won't have desired val bbuffer LABEL BYTE ; "bbuffer" won't have desired add COMM wbuffer:WORD:40 ■ If a communal variable references a variable that is allocated and declared public inside a module, the variable has the segment of the allocated instance. If all references to the variable are communal, the variable will be placed in one of the segments described below. Near communal variables are placed in a segment called c_common, which is part of DGROUP. This group is created and initialized automatically if you use simplified segment directives. If you use full segment definitions, you must create a group called DGROUP and use the ASSUME directive to associate it with the DS register. Far communal variables are placed in a segment called FAR_BSS. This segment has combine type private and class type 'FAR_BSS'. This means that multiple segments with the same name can be created. Such segments cannot be accessed by name. They must be initialized indirectly using the SEG operator. For example, if a far communal variable (with word size) is called comvar, its segment can be initialized with the following lines: ASSUME ds:SEG comvar ; Tell the assembler mov ax,SEG comvar ; Tell the processor mov ds,ax mov bx,comvar ; Use the variable Example 1 .DATA COMM temp:BYTE:128 ASCIIZ MACRO address ;; Name of address for string mov temp,128 ;; Insert maximum length mov dx,OFFSET temp ;; Address of string buffer mov ah,0Ah ;; Get string int 21h mov dl,temp[1] ;; Get length of string xor dh,dh mov bx,dx mov temp[bx+2],0 ;; Overwrite CR with null address EQU OFFSET temp+2 ENDM Example 1 shows an include file that declares a buffer for temporary data. The buffer is then used in a macro in the same include file. An example of how the macro could be used in a source file is shown below: DOSSEG .MODEL small,c INCLUDE communal.inc .STACK .DATA message DB "Enter file name: $" .CODE .STARTUP . . . mov dx,OFFSET message ; Load offset of file prompt mov ah,09h ; Display prompt int 21h ASCIIZ place ; Get file name and ; return address as "place" mov al,00000010b ; Load access code mov dx,place ; Load address of ASCIIZ string mov ah,3Dh ; Open the file int 21h . . . Note that once the macro is written, the user does not need to know the name of the temporary buffer or how it is used in the macro. Example 2 date STRUC month DB ? day DB ? year DB ? date ENDS .DATA COMM today:date . . . The example above uses the COMM directive to make the structure variable today a communal variable. 8.5 Specifying Library Files The INCLUDELIB directive instructs the linker to link with a specified library file. If you are writing a program that calls library routines, you can use this directive to specify the library file in the assembly source file rather than in the LINK command line. Syntax INCLUDELIB libraryname The libraryname is written to the comment record of the object file. The Intel title for this record is COMENT. At link time, the linker reads this record and links with the specified library file. The libraryname must be a file name rather than a complete file specification. If you do not specify an extension, the default extension .LIB is assumed. LINK searches directories for the library file in the following order: 1. The current directory 2. Any directories given in the library field of the LINK command line 3. Any directories listed in the LIB environment variable Example INCLUDELIB graphics This statement passes a message from QuickAssembler telling LINK to use library routines from the file GRAPHICS.LIB. If this statement is included in a source file called DRAW.ASM, the program might be linked with the following command line: LINK draw; Without the INCLUDELIB directive, the program would have to be linked with the following command line: LINK draw,,,graphics; ──────────────────────────────────────────────────────────────────────────── Chapter 9: Using Operands and Expressions Operands are the arguments that define values to be acted on by instructions or directives. Operands can be constants, variables, expressions, or keywords, depending on the instruction or directive and the context of the statement. A common type of operand is an expression. An expression consists of several operands that are combined to describe a value or memory location. Operators indicate the operations to be performed when combining the operands of an expression. Expressions are evaluated at assembly time. By using expressions, you can instruct the assembler to calculate values that would be difficult or inconvenient to calculate when you are writing source code. This chapter discusses operands, expressions, and operators as they are evaluated at assembly time. See Section 2.6, "Addressing Modes," for a discussion of the addressing modes that can be used to calculate operand values at run time. This chapter also discusses the location-counter operand, forward references, and strong typing of operands. 9.1 Using Operands with Directives Each directive requires a specific type of operand. Most directives take string or numeric constants, or symbols or expressions that evaluate to such constants. The type of operand varies for each directive, but the operand must always evaluate to a value that is known at assembly time. This differs from instructions, whose operands may not be known at assembly time and may vary at run time. Operands used with instructions are discussed in Section 2.6, "Addressing Modes." Some directives, such as those used in data declarations, accept labels or variables as operands. When a symbol that refers to a memory location is used as an operand to a directive, the symbol represents the address of the symbol rather than its contents. This is because the contents may change at run time and are therefore not known at assembly time. Example 1 ORG 100h ; Set address to 100h var DB 10h ; Address of "var" is 100h ; Value of "var" is 10h pvar DW var ; Address of "pvar" is 101h ; Value of "pvar" is ; address of "var" (100h) In Example 1, the operand of the DW directive in the third statement represents the address of var (100h) rather than its contents (10h). The address is relative to the start of the segment in which var is defined. Example 2 TITLE doit ; String _TEXT SEGMENT BYTE PUBLIC 'CODE' ; Key words INCLUDE \include\bios.inc ; Pathname .RADIX 16 ; Numeric constant tst DW a / b ; Numeric expression PAGE + ; Special character sum EQU x * y ; Numeric expression here LABEL WORD ; Type specifier Example 2 illustrates the different kinds of values that can be used as directive operands. 9.2 Using Operators The assembler provides a variety of operators for combining, comparing, changing, or analyzing operands. Some operators work with integer constants, some with memory values, and some with both. Operators cannot be used with floating-point constants since QuickAssembler does not recognize real numbers in expressions. It is important to understand the difference between operators and instructions. Operators handle calculations of constant values that are known at assembly time. Instructions handle calculations of values that may not be known until run time. For example, the addition operator (+) handles assembly-time addition, while the ADD and ADC instructions handle run-time addition. This section describes the different kinds of operators used in assembly-language statements and gives examples of expressions formed with them. In addition to the operators described in this chapter, you can use the DUP operator (Section 6.5.2, "Arrays and Buffers"), the record operators (Section 7.2.4, "Record Operators"), and the macro operators (Section 11.4, "Using Macro Operators"). 9.2.1 Calculation Operators QuickAssembler provides the common arithmetic operators as well as several other operators for adding, shifting, or doing bit manipulations. The sections below describe operators that can be used for doing numeric calculations. 9.2.1.1 Arithmetic Operators QuickAssembler recognizes a variety of arithmetic operators for common mathematical operations. Table 9.1 lists the arithmetic operators. Table 9.1 Arithmetic Operators Operator Syntax Meaning ────────────────────────────────────────────────────────────────────────── + +expression Positive (unary) - -expression Negative (unary) * expression1 * expression2 Multiplication / expression1 / expression2 Integer division MOD expression1 MOD expression2 Remainder (modulus) + expression1 + expression2 Addition - expression1 - expression2 Subtraction ────────────────────────────────────────────────────────────────────────── For all arithmetic operators except the addition operator (+) and the subtraction operator (-), the expressions operated on must be integer constants. The addition and subtraction operators can be used to add or subtract an integer constant and a memory operand. The result can be used as a memory operand. The subtraction operator can also be used to subtract one memory operand from another, but only if the operands refer to locations within the same segment. The result will be a constant, not a memory operand. ────────────────────────────────────────────────────────────────────────── NOTE The unary plus and minus operators (used to designate positive or negative numbers) are not the same as the binary plus and minus operators (used to designate addition or subtraction). The unary plus and minus operators have a higher level of precedence, as described in Section 9.2.5, "Operator Precedence." ────────────────────────────────────────────────────────────────────────── Example 1 intgr = 14 * 3 ; = 42 intgr = intgr / 4 ; 42 / 4 = 10 intgr = intgr MOD 4 ; 10 mod 4 = 2 intgr = intgr + 4 ; 2 + 4 = 6 intgr = intgr - 3 ; 6 - 3 = 3 intgr = -intgr - 8 ; -3 - 8 = -11 intgr = -intgr - intgr ; 11 - -11 = 22 Example 1 illustrates arithmetic operators used in integer expressions. Example 2 ORG 100h a DB ? ; Address is 100h b DB ? ; Address is 101h mem1 EQU a + 5 ; mem1 = 100h + 5 = 105h mem2 EQU a - 5 ; mem2 = 100h - 5 = 0FBh const EQU b - a ; const = 101h - 100h = 1 Example 2 illustrates arithmetic operators used in memory expressions. 9.2.1.2 Structure-Field-Name Operator The structure-field-name operator (.) indicates addition. It is used to designate a field within a structure. Syntax variable.field The variable is a memory operand (usually a previously declared structure variable), and field is the name of a field within the structure. See Section 7.1, "Structures," for more information. Example .DATA date STRUC ; Declare structure month DB ? day DB ? year DW ? date ENDS yesterday date <12,31,1987> ; Define structure variables today date <1,1,1988> .CODE . . . mov bh,yesterday.day ; Load structure variable mov bx,OFFSET today ; Load structure variable address inc [bx].year ; Use in indirect memory operand 9.2.1.3 Index Operator The index operator ([ ]) indicates addition. It is similar to the addition (+) operator. When used with a register, the index operator also indicates that the operand is an indirect memory operand rather than a register-direct operand. Syntax [[expression1]][expression2] In most cases expression1 is simply added to expression2. The limitations of the addition operator for adding memory operands also apply to the index operator. For example, two direct memory operands cannot be added. The expression label1[label2] is illegal if both are memory operands. The index operator has an extended function in specifying indirect memory operands. Section 2.6.4 explains the use of indirect memory operands. The index brackets must be outside the register or registers that specify the indirect displacement. However, any of the three operators that indicate addition (the addition operator, the index operator, or the structure-field-name operator) may be used for multiple additions within the expression. For example, the following statements are equivalent: mov ax,table[bx][di] mov ax,table[bx+di] mov ax,[table+bx+di] mov ax,[table][bx][di] The following statements are illegal because the index operator does not enclose the registers that specify indirect displacement: mov ax,table+bx+di ; Illegal - no index operator mov ax,[table]+bx+di ; Illegal - registers not ; inside index operator The index operator is typically used to index elements of a data object, such as variables in an array or characters in a string. Example 1 mov al,string[3] ; Get 4th element of string add ax,array[4] ; Add 5th element of array mov string[7],al ; Load into 8th element of string Example 1 illustrates the index operator used with direct memory operands. Example 2 mov ax,[bx] ; Get element BX points to add ax,array[si] ; Add element SI points to mov string[di],al ; Load element DI points to cmp cx,table[bx][di] ; Compare to element BX and DI point Example 2 illustrates the index operator used with indirect memory operands. 9.2.1.4 Shift Operators The SHR and SHL operators can be used to shift bits in constant values. Both perform logical shifts. Bits on the right for SHL and on the left for SHR are zero-filled as their contents are shifted out of position. Syntax expression SHR count expression SHL count The expression is shifted right or left by count number of bits. Bits shifted off either end of the expression are lost. If count is greater than or equal to 16, the result is 0. Do not confuse the SHR and SHL operators with the processor instructions having the same names. The operators work on integer constants only at assembly time. The processor instructions work on register or memory values at run time. The assembler can tell the difference between instructions and operands from context. Examples mov ax,01110111b SHL 3 ; Load 01110111000b mov ah,01110111b SHR 3 ; Load 01110b 9.2.1.5 Bitwise Logical Operators The bitwise operators perform logical operations on each bit of an expression. The expressions must resolve to constant values. Table 9.2 lists the logical operators and their meanings. Table 9.2 Logical Operators Operator Syntax Meaning ────────────────────────────────────────────────────────────────────────── NOT NOT expression Bitwise complement AND expression1 AND expression2 Bitwise AND OR expression1 OR expression2 Bitwise inclusive OR XOR expression1 XOR expression2 Bitwise exclusive XOR ────────────────────────────────────────────────────────────────────────── Do not confuse the NOT, AND, OR, and XOR operators with the processor instructions having the same names. The operators work on integer constants only at assembly time. The processor instructions work on register or memory values at run time. The assembler can tell the difference between instructions and operands from context. ────────────────────────────────────────────────────────────────────────── NOTE Although calculations on expressions using the AND, OR, and XOR operators are done using 17-bit numbers, the results are truncated to 16 bits. ────────────────────────────────────────────────────────────────────────── Examples mov ax,NOT 11110000b ; Load 1111111100001111b mov ah,NOT 11110000b ; Load 00001111b mov ah,01010101b AND 11110000b ; Load 01010000b mov ah,01010101b OR 11110000b ; Load 11110101b mov ah,01010101b XOR 11110000b ; Load 10100101b 9.2.2 Relational Operators The relational operators compare two expressions and return true (-1) if the condition specified by the operator is satisfied, or false (0) if it is not. The expressions must resolve to constant values. Relational operators are typically used with conditional directives. Table 9.3 lists the operators and the values they return if the specified condition is satisfied. Table 9.3 Relational Operators Operator Syntax Returned Value ────────────────────────────────────────────────────────────────────────── EQ expression1 EQ expression2 True if expressions are equal NE expression1 NE expression2 True if expressions are not equal LT expression1 LT expression2 True if left expression is less than right LE expression1 LE expression2 True if left expression is less than or equal to right GT expression1 GT expression2 True if left expression is greater than right GE expression1 GE expression2 True if left expression is greater than or equal to right ────────────────────────────────────────────────────────────────────────── Note that the EQ and NE operators treat their arguments as 16-bit numbers. Numbers specified with the 16th bit set are considered negative. For example, the expression -1 EQ OFFFFh is true, but the expression -1 NE OFFFFh is false. The LT, LE, GT, and GE operators treat their arguments as 17-bit numbers, in which the 17th bit specifies the sign. For example, OFFFFh is 65,535, not -1. The expression 1 GT -1 is true, but the expression 1 GT OFFFFh is false. Examples mov ax,4 EQ 3 ; Load false( 0) mov ax,4 NE 3 ; Load true (-1) mov ax,4 LT 3 ; Load false( 0) mov ax,4 LE 3 ; Load false( 0) mov ax,4 GT 3 ; Load true (-1) mov ax,4 GE 3 ; Load true(-1) 9.2.3 Segment-Override Operator The segment-override operator (:) forces the address of a variable or label to be computed relative to a specific segment. Syntax segment:expression The segment can be specified in several ways. It can be one of the segment registers: CS, DS, SS, or ES. It can also be a segment or group name. In this case, the name must have been previously defined with a SEGMENT or GROUP directive and assigned to a segment register with an ASSUME directive. The expression can be a constant, expression, or a SEG expression. See Section 9.2.4.5 for more information on the SEG operator. Note that when a segment override is given with an indexed operand, the segment must be specified outside the index operators. For example, es:[di] is correct, but [es:di] generates an error. Examples mov ax,ss:[bx+4] ; Override default assume (DS) mov al,es:082h ; Load from ES ASSUME ds:FAR_DATA ; Tell the assembler and mov bx,FAR_DATA:count ; load from a far segment As shown in the last two statements, a segment override with a segment name is not enough if no segment register is assumed for the segment name. You must use the ASSUME directive to assign a segment register, as explained in Section 5.4, "Associating Segments with Registers." 9.2.4 Type Operators This section describes the assembler operators that specify or analyze the types of memory operands and other expressions. 9.2.4.1 PTR Operator The PTR operator specifies the type for a variable or label. Syntax type PTR expression The operator forces expression to be treated as having type. The expression can be any operand. The type can be BYTE, WORD, DWORD, QWORD, or TBYTE for memory operands. It can be NEAR, FAR, or PROC for labels. The PTR operator is typically used with forward references to define explicitly what size or distance a reference has. If it is not used, the assembler assumes a default size or distance for the reference. See Section 9.4 for more information on forward references. The PTR operator is also used to enable instructions to access variables in ways that would otherwise generate errors. For example, you could use the PTR operator to access the high-order byte of a WORD size variable. The PTR operator is required for FAR calls and jumps to forward-referenced labels. Example 1 .DATA stuff DD ? buffer DB 20 DUP (?) .CODE . . . call FAR PTR task ; Call a far procedure jmp FAR PTR place ; Jump far mov bx,WORD PTR stuff[0] ; Load a word from a ; doubleword variable add ax,WORD PTR buffer[bx] ; Add a word from a byte variab The PTR operator can be used to specify the size of an indirect register operand for a CALL or JMP instruction. However, the size cannot be specified with NEAR or FAR. Use WORD or DWORD instead. Examples are shown below: Example 2 jmp WORD PTR [bx] ; Legal near jump call NEAR PTR [bx] ; Illegal near call call DWORD PTR [bx] ; Legal far call jmp FAR PTR [bx] ; Illegal far jump This limitation only applies to indirect register operands. NEAR or FAR can be applied to operands associated with labels, as shown in Example 1. Furthermore, use NEAR or FAR with an indirect operand that combines a register with a label. 9.2.4.2 SHORT Operator The SHORT operator sets the type of a specified label to SHORT. Short labels can be used in JMP instructions whenever the distance from the label to the instruction is less than 128 bytes. Syntax SHORT label Instructions using short labels are a byte smaller than identical instructions using the default near labels. See Section 9.4.1, "Forward References to Labels," for information on using the SHORT operator with jump instructions. Example jmp again ; Jump 128 bytes or more . . . jmp SHORT again ; Jump less than 128 bytes . . . again: 9.2.4.3 THIS Operator The THIS operator creates an operand whose offset and segment values are equal to the current location-counter value and whose type is specified by the operator. Syntax THIS type The type can be BYTE, WORD, DWORD, QWORD, or TBYTE for memory operands. It can be NEAR, FAR, or PROC for labels. The THIS operator is typically used with the EQU or equal-sign (=) directive to create labels and variables. The result is similar to using the LABEL directive. Example tag1 EQU THIS BYTE ; Both represent the same variable tag2 LABEL BYTE check1 EQU THIS NEAR ; All represent the same address check2 LABEL NEAR check3: check4 PROC NEAR check4 ENDP 9.2.4.4 HIGH and LOW Operators The HIGH and LOW operators return the high and low bytes, respectively, of an expression. Syntax HIGH expression LOW expression The HIGH operator returns the high-order eight bits of expression; the LOW operator returns the low-order eight bits. The expression must evaluate to a constant. You cannot use the HIGH and LOW operators on the contents of a memory operand since the contents may change at run time. Examples stuff EQU 0ABCDh mov ah,HIGH stuff ; Load 0ABh mov al,LOW stuff ; Load 0CDh The HIGH and LOW operators work reliably only with constants and with offsets to external symbols. HIGH and LOW operations are not supported for offsets to local symbols. 9.2.4.5 SEG Operator The SEG operator returns the segment address of an expression. Syntax SEG expression The expression can be any label, variable, segment name, group name, or other memory operand. The SEG operator cannot be used with constant expressions. The returned value can be used as a memory operand. Example .DATA var DB ? .CODE . . . ASSUME ds:SEG var ; Assume segment of variable mov ax,SEG var ; Get address of segment mov ds,ax ; where variable is declared 9.2.4.6 OFFSET Operator The OFFSET operator returns the offset address of an expression. Syntax OFFSET expression The expression can be any label, variable, or other direct memory operand. Constant expressions return meaningless values. The value returned by the OFFSET operand is an immediate (constant) operand. If the MODEL directive is used, the value returned by the OFFSET operator is relative to a group, whenever the data item is declared in a segment that is part of a group. The OFFSET operator returns the number of bytes between the beginning of the group and the data object. If the object is declared in a segment not part of a group, OFFSET returns the number of bytes between the beginning of the segment and the data object. If the MODEL directive is not used, OFFSET returns a value relative to the beginning of the segment, regardless of whether the segment is part of a group. If full segment definitions are given, the returned value is a memory operand equal to the number of bytes between the item and the beginning of the segment in which it is defined. The segment-override operator (:) can be used to force OFFSET to return the number of bytes between the item in expression and the beginning of a named segment or group. This is the method used to generate valid offsets for items in a group when full segment definitions are used. For example, the statement mov bx,OFFSET DGROUP:array is not the same as mov bx,OFFSET array if array is not in the first segment in DGROUP. Example .DATA string DB "This is it." .CODE . . . mov dx,OFFSET string ; Load offset of variable 9.2.4.7 .TYPE Operator The .TYPE operator returns a byte that defines the mode and scope of an expression. Syntax .TYPE expression If expression is not valid, .TYPE returns 0. Otherwise, .TYPE returns a byte having the bit setting shown in Table 9.4. The .TYPE operator sets all bits except bit 6. Future versions of the assembler may reserve a use for this bit. Table 9.4 .TYPE Operator and Variable Attributes Bit Position If Bit = 0 If Bit = 1 ────────────────────────────────────────────────────────────────────────── 0 Not program related Program related 1 Not data related Data related 2 Not a constant value Constant value 3 Addressing mode is not direct Addressing mode is direct 4 Not a register Expression is a register 5 Not defined Defined 7 Local or public scope External scope ────────────────────────────────────────────────────────────────────────── The .TYPE operator is typically used in macros in which different kinds of arguments may need to be handled differently. Example display MACRO string IF ((.TYPE string) SHL 14) NE 8000h IF2 %OUT Argument must be a variable ENDIF ENDIF mov dx,OFFSET string mov ah,09h int 21h ENDM This macro checks to see if the argument passed to it is data related (a variable). It does this by shifting all bits except the relevant bits (1 and 0) to the left so that they can be checked. If the data bit is not set, an error message is generated. 9.2.4.8 TYPE Operator The TYPE operator returns a number that represents the type of an expression. Syntax TYPE expression If expression evaluates to a variable, the operator returns the number of bytes in each data object in the variable. Each byte in a string is considered a separate data object, so the TYPE operator returns 1 for strings. If expression evaluates to a structure or structure variable, the operator returns the number of bytes in the structure. If the expression is a label, the operator returns 0FFFFH for NEAR labels and 0FFFEH for FAR labels. If the expression is a constant, the operator returns 0. 9.2.4.9 LENGTH Operator The LENGTH operator returns the number of data elements in an array or other variable defined with the DUP operator. Syntax LENGTH variable The returned value is the number of elements of the declared size in variable. If the variable was declared with nested DUP operators, only the value given for the outer DUP operator is returned. If the variable was not declared with the DUP operator, the value returned is always 1. Example array DD 100 DUP(0FFFFFFh) table DW 100 DUP(1,10 DUP(?)) string DB 'This is a string' var DT ? larray EQU LENGTH array ; 100 - number of elements ltable EQU LENGTH table ; 100 - inner DUP not counted lstring EQU LENGTH string ; 1 - string is one element lvar EQU LENGTH var ; 1 . . . mov cx,LENGTH array ; Load number of elements again: . ; Perform some operation on . ; each element . loop again 9.2.4.10 SIZE Operator The SIZE operator returns the total number of bytes allocated for an array or other variable defined with the DUP operator. Syntax SIZE variable The returned value is equal to the value of LENGTH variable times the value of TYPE variable. If the variable was declared with nested DUP operators, only the value given for the outside DUP operator is considered. If the variable was not declared with the DUP operator, the value returned is always TYPE variable. Example array DD 100 DUP(1) table DW 100 DUP(1,10 DUP(?)) string DB 'This is a string' var DT ? sarray EQU SIZE array ; 400 - elements times size stable EQU SIZE table ; 200 - inner DUP ignored sstring EQU SIZE string ; 1 - string is one element svar EQU SIZE var ; 10 - bytes in variable . . . mov cx,SIZE array ; Load number of bytes again: . ; Perform some operation on . ; each byte . loop again 9.2.5 Operator Precedence Expressions are evaluated according to the following rules: ■ Operations of highest precedence are performed first. ■ Operations of equal precedence are performed from left to right. ■ The order of evaluation can be overridden by using parentheses. Operations in parentheses are always performed before any adjacent operations. The order of precedence for all operators is listed in Table 9.5. Operators on the same line have equal precedence. Table 9.5 Operator Precedence Precedence Operators ────────────────────────────────────────────────────────────────────────── (Highest) 1 LENGTH, SIZE, WIDTH, MASK, (), [],<> 2 . (structure-field-name operator) 3 : 4 PTR, OFFSET, SEG, TYPE, THIS 5 HIGH, LOW 6 +,- (unary) 7 *,/, MOD, SHL, SHR 8 +, - (binary) 9 EQ, NE, LT, LE, GT, GE 10 NOT 11 AND 12 OR, XOR 13 SHORT, .TYPE (Lowest) ────────────────────────────────────────────────────────────────────────── Examples a EQU 8 / 4 * 2 ; Equals 4 b EQU 8 / (4 * 2) ; Equals 1 c EQU 8 + 4 * 2 ; Equals 16 d EQU (8 + 4) * 2 ; Equals 24 e EQU 8 OR 4 AND 2 ; Equals 8 f EQU (8 OR 4) AND 3 ; Equals 0 9.3 Using the Location Counter The location counter is a special operand that, during assembly, represents the address of the statement currently being assembled. At assembly time, the location counter keeps changing, but when used in source code, it resolves to a constant representing an address. The location counter has the same attributes as a near label. It represents an offset that is relative to the current segment and is equal to the number of bytes generated for the segment to that point. Example 1 string DB "Who wants to count every byte in a string, " DB "especially if you might change it later." lstring EQU $-string ; Let the assembler do it Example 1 shows one way of using the location-counter operand in expressions relating to data. Example 2 cmp ax,bx jl shortjump ; If ax < bx, go to "shortjump" . ; else if ax >= bx, continue . shortjump: . cmp ax,bx jge $+5 ; If ax >= bx, continue jmp longjump ; else if ax < bx, go to "longjump" . ; This is "$+5" . longjump: . Example 2 illustrates how you can use the location counter to do conditional jumps of more than 128 bytes. The first part shows the normal way of coding jumps of less than 128 bytes, and the second part shows how to code the same jump when the label is more than 128 bytes away. 9.4 Using Forward References The assembler permits you to refer to labels, variable names, segment names, and other symbols before they are declared in the source code. Such references are called forward references. The assembler handles forward references by making assumptions about them on the first pass and then attempting to correct the assumptions, if necessary, on the second pass. Checking and correcting assumptions on the second pass takes processing time, so source code with forward references assembles more slowly than source code with no forward references. In addition, the assembler may make incorrect assumptions that it cannot correct, or corrects at a cost in program efficiency. 9.4.1 Forward References to Labels Forward references to labels may result in incorrect or inefficient code. In the statement below, the label target is a forward reference: jmp target ; Generates 3 bytes in 16-bit segmen . . . target: Since the assembler processes source files sequentially, target is unknown when it is first encountered. It could be one of three types: short (-128 to 127 bytes from the jump), near (-32,768 to 32,767 bytes from the jump), or far (in a different segment than the jump). QuickAssembler assumes that target is a near label, and assembles the number of bytes necessary to specify a near label: one byte for the instruction and two bytes for the operand. If, on the second pass, the assembler learns that target is a short label, it will need only two bytes: one for the instruction and one for the operand. However, it will not be able to change its previous assembly and the three-byte version of the assembly will stand. If the assembler learns that target is a far label, it will need five bytes. Since it can't make this adjustment, it will generate a phase error. You can override the assembler's assumptions by specifying the exact size of the jump. For example, if you know that a JMP instruction refers to a label less than 128 bytes from the jump, you can use the SHORT operator, as shown below: jmp SHORT target ; Generates 2 bytes . . . target: Using the SHORT operator makes the code smaller and slightly faster. If the assembler has to use the three-byte form when the two-byte form would be acceptable, it will generate a warning message if the warning level is 2. (The warning level can be set with the /W option, as described in Appendix B, Section B.16.) You can ignore the warning, or you can go back to the source code and change the code to eliminate the forward references. ────────────────────────────────────────────────────────────────────────── NOTE The SHORT operator in the example above would not be needed if target were located before the jump. The assembler would have already processed target and would be able to make adjustments based on its distance. ────────────────────────────────────────────────────────────────────────── If you use the SHORT operator when the label being jumped to is more than 128 bytes away, QuickAssembler generates an error message. You can either remove the SHORT operator, or try to reorganize your program to reduce the distance. If a far jump to a forward-referenced label is required, you must override the assembler's assumptions with the FAR and PTR operators, as shown below: jmp FAR PTR target ; Generates 5 bytes . . . target: ; In different segment If the type of a label has been established earlier in the source code with an EXTRN directive, the type does not need to be specified in the jump statement. 9.4.2 Forward References to Variables When QuickAssembler encounters code referencing variables that have not yet been defined in pass 1, it makes assumptions about the segment where the variable will be defined. If on pass 2 the assumptions turn out to be wrong, an error will occur. These problems usually occur with complex segment structures that do not follow the Microsoft segment conventions. The problems never appear if simplified segment directives are used. By default, QuickAssembler assumes that variables are referenced to the DS register. If a statement must access a variable in a segment not associated with the DS register, and if the variable has not been defined earlier in the source code, you must use the segment-override operator to specify the segment. The situation is different if neither the variable nor the segment in which it is defined has been defined earlier in the source code. In this case, you must assign the segment to a group earlier in the source code. QuickAssembler will then know about the existence of the segment even though it has not yet been defined. 9.5 Strong Typing for Memory Operands The assembler carries out strict syntax checks for all instruction statements, including strong typing for operands that refer to memory locations. This means that when an instruction uses two operands with implied data types, the operand types must match. Warning messages are generated for nonmatching types. For example, in the following fragment, the variable string is incorrectly used in a move instruction: .DATA string DB "A message." .CODE . . . mov ax,string[1] The ax register has WORD type, but string has BYTE type. Therefore, the statement generates the following warning message: Operand types must match To avoid all ambiguity and prevent the warning error, use the PTR operator to override the variable's type, as shown below: mov ax,WORD PTR string[1] You can ignore the warnings if you are willing to trust the assembler's assumptions. When a register and memory operand are mixed, the assembler assumes that the register operand is always the correct size. For example, in the statement mov ax,string[1] the assembler assumes that the programmer wishes the word size of the register to override the byte size of the variable. A word starting at string[1] will be moved into AX. In the statement mov string[1],ax the assembler assumes that the programmer wishes to move the word value in AX into the word starting at string[1]. However, the assembler's assumptions are not always as clear as in these examples. You should not ignore warnings about type mismatches unless you are sure you understand how your code will be assembled. ────────────────────────────────────────────────────────────────────────── NOTE Some assemblers (including early versions of the IBM Macro Assembler) do not do strict type checking. For compatibility with these assemblers, type errors are warnings rather than severe errors. Many assembly-language program listings in books and magazines are written for assemblers with weak type checking. Such programs may produce warning messages, but assemble correctly. You can use the /W option to turn off type warnings if you are sure the code is correct. ────────────────────────────────────────────────────────────────────────── ──────────────────────────────────────────────────────────────────────────── Chapter 10: Assembling Conditionally QuickAssembler provides two types of conditional directives, conditional-assembly and conditional-error directives. Conditional-assembly directives test for a specified condition and assemble a block of statements if the condition is true. Conditional-error directives test for a specified condition and generate an assembly error if the condition is true. Both kinds of conditional directives test assembly-time conditions. They cannot test run-time conditions. Only expressions that evaluate to constants during assembly can be compared or tested. Since macros and conditional-assembly directives are often used together, you may need to refer to Chapter 11, "Using Equates, Macros, and Repeat Blocks," to understand some of the examples in this chapter. In particular, conditional directives are frequently used with the operators described in Section 11.5, "Using Macro Operators." 10.1 Using Conditional-Assembly Directives The conditional-assembly directives include the following: ELSE IFB IFIDN ENDIF IFDEF IFIDNI IF IFDIF IFNB IF1 IFDIFI IFNDEF IF2 IFE The IF directives and the ENDIF and ELSE directives can be used to enclose the statements to be considered for conditional assembly. Syntax IFcondition statements [[ELSEIFcondition statements]] . . . [[ELSE statements]] ENDIF The statements following the IF directive can be any valid statements, including other conditional blocks. The ELSEIF and ELSE blocks are optional. The conditional block can contain any number of ELSEIF blocks. (The ELSEIF directives are listed in Section 10.1.6.) ENDIF ends the block. The statements following the IF directive are assembled only if the corresponding condition is true. If the condition is not true and an ELSEIF directive is used, the assembler checks to see if the corresponding condition is true. If so, it assembles the statements following the ELSEIF directive. If no IF or ELSEIF conditions are satisifed, the statements following the ELSE directive are assembled. IF statements can be nested up to 20 levels. A nested ELSE or ELSEIF directive always belongs to the nearest preceding IF statement that does not have its own ELSE directive. 10.1.1 Testing Expressions with IF and IFE Directives The IF and IFE directives test the value of an expression and grant assembly based on the result. Syntax IF expression IFE expression The IF directive grants assembly if the value of expression is true (nonzero). The IFE directive grants assembly if the value of expression is false (0). The expression must evaluate to a constant value and must not contain forward references. Example IF debug GT 20 push debug call adebug ELSEIF debug GT 10 call bdebug ELSE call cdebug ENDIF In this example, a different debug routine will be called, depending on the value of debug. 10.1.2 Testing the Pass with IF1 and IF2 Directives The IF1 and IF2 directives test the current assembly pass and grant assembly only on the pass specified by the directive. Multiple passes of the assembler are discussed in Appendix C, Section C.7, "Reading a Pass 1 Listing." Syntax IF1 IF2 The IF1 directive grants assembly only on pass 1. The IF2 directive grants assembly only on pass 2. The directives take no arguments. If you turn on the One-Pass Assembly option, the IF2 directive produces an error. Macros usually only need to be processed once. You can enclose blocks of macros in IF1 blocks to prevent them from being reprocessed on the second pass. Example IF1 ; Define on first pass only dostuff MACRO argument . . . ENDM ENDIF 10.1.3 Testing Symbol Definition with IFDEF and IFNDEF Directives The IFDEF and IFNDEF directives test whether a symbol has been defined and grant assembly based on the result. Syntax IFDEF name IFNDEF name The IFDEF directive grants assembly only if name is a defined label, variable, or symbol. The IFNDEF directive grants assembly if name has not yet been defined. The name can be any valid name. Note that if name is a forward reference, it is considered undefined on pass 1, but defined on pass 2. Example IFDEF buffer buff DB buffer DUP(?) ENDIF In this example, buff is allocated only if buffer has been previously defined. One way to use this conditional block is to leave buffer undefined in the source file and define it if needed by using the /Dsymbol option (see Appendix B, Section B.4, "Defining Assembler Symbols") when you start QuickAssembler. For example, if the conditional block is in TEST.ASM, you could start the assembler with the following command line: QCL /Dbuffer=1024 test.asm You could also define the symbol buffer by entering buffer=1024 in the Defines field of the Assembler Flags dialog box. The command line would define the symbol buffer. As a result, the conditional-assembly block would allocate buff. However, if you didn't need buff, you could use the following command line: QCL test.asm 10.1.4 Verifying Macro Parameters with IFB and IFNB Directives The IFB and IFNB directives test to see if a specified argument was passed to a macro and grant assembly based on the result. Syntax IFB <argument > IFNB <argument> These directives are always used inside macros, and they always test whether a real argument was passed for a specified dummy argument. The IFB directive grants assembly if argument is blank. The IFNB directive grants assembly if argument is not blank. The arguments can be any name, number, or expression. Angle brackets (< >) are required. Example Write MACRO buffer,bytes,handle IFNB <handle> mov bx,handle ; (1=stdout,2=stderr,3=aux,4=printer) ELSE mov bx,1 ; Default standard out ENDIF mov dx,OFFSET buffer ; Address of buffer to write to mov cx,bytes ; Number of bytes to write mov ah,40h int 21h ENDM In this example, a default value is used if no value is specified for the third macro argument. 10.1.5 Comparing Macro Arguments with IFIDN and IFDIF Directives The IFIDN and IFDIF directives compare two macro arguments and grant assembly based on the result. Syntax IFIDN[[I]] <argument1>,<argument2> IFDIF[[I]] <argument1>,<argument2> These directives are always used inside macros, and they always test whether real arguments passed for two specified arguments are the same. The IFIDN directive grants assembly if argument1 and argument2 are identical. The IFDIF directive grants assembly if argument1 and argument2 are different. The arguments can be names, numbers, or expressions. They must be enclosed in angle brackets and separated by a comma. The optional I at the end of the directive name specifies that the directive is case insensitive. Arguments that are spelled the same will be evaluated the same, regardless of case. If the I is not given, the directive is case sensitive. Example divide8 MACRO numerator,denominator IFDIFI <numerator>,<al> ;; If numerator isn't AL mov al,numerator ;; make it AL ENDIF xor ah,ah div denominator ENDM In this example, a macro uses the IFDIFI directive to check one of the arguments and take a different action, depending on the text of the string. The sample macro could be enhanced further by checking for other values that would require adjustment (such as a denominator passed in AL or passed in AH). 10.1.6 ELSEIF Directives The assembler includes an ELSEIF conditional-assembly directive corresponding to each of the IF directives. The ELSEIF directives provide a more compact and better structured way of writing some sequences of ELSE and IF directives. QuickAssembler supports the following directives: ELSEIF ELSEIFDEF ELSEIFIDN ELSEIF1 ELSEIFDIF ELSEIFIDNI ELSEIF2 ELSEIFDIFI ELSEIFNB ELSEIFB ELSEIFE ELSEIFNDEF The following macro contains nested IF and ELSE blocks: ; Macro to load register for high-level-language return FuncRet MACRO arg,length LOCAL tmploc IF length EQ 1 mov al,arg ELSE IF length EQ 2 mov ax,arg ELSE IF length EQ 4 .DATA tmploc DW ? DW ? .CODE mov ax,WORD PTR arg mov tmploc,ax mov ax,WORD PTR arg+2 mov tmploc+2,ax mov dx,SEG tmploc mov ax,OFFSET tmploc ELSE %OUT Error in FuncRet expansion .ERR ENDIF ENDIF ENDIF ENDM The macro can be rewritten as follows, using the ELSEIF directives: FuncRet MACRO arg,length LOCAL tmploc IF length EQ 1 mov al,arg ELSEIF length EQ 2 mov ax,arg ELSEIF length EQ 4 .DATA tmploc DW ? DW ? .CODE mov ax,WORD PTR arg mov tmploc,ax mov ax,WORD PTR arg+2 mov tmploc+2,ax mov dx,SEG tmploc mov ax,OFFSET tmploc ELSE %OUT Error in FuncRet expansion .ERR ENDIF ENDM 10.2 Using Conditional-Error Directives Conditional-error directives can be used to debug programs and check for assembly-time errors. By inserting a conditional-error directive at a key point in your code, you can test assembly-time conditions at that point. You can also use conditional-error directives to test for boundary conditions in macros. The conditional-error directives and the error messages they produce are listed in Table 10.1. Table 10.1 Conditional-Error Directives Directive Number Message ────────────────────────────────────────────────────────────────────────── .ERR1 2087 Forced error - pass1 .ERR2 2088 Forced error - pass2 .ERR 2089 Forced error .ERRE 2090 Forced error - expression equals 0 .ERRNZ 2091 Forced error - expression not equal 0 .ERRNDEF 2092 Forced error - symbol not defined .ERRDEF 2093 Forced error - symbol defined .ERRB 2094 Forced error - string blank .ERRNB 2095 Forced error - string not blank .ERRIDN [[I]] 2096 Forced error - strings identical .ERRDIF [[I]] 2097 Forced error - strings different ────────────────────────────────────────────────────────────────────────── Like other severe errors, those generated by conditional-error directives cause the assembler to return exit code 7. If a severe error is encountered during assembly, QuickAssembler will delete the object module. All conditional-error directives except ERR1 generate severe errors. 10.2.1 Generating Unconditional Errors with .ERR, .ERR1, and .ERR2 Directives The .ERR, .ERR1, and .ERR2 directives force an error where the directives occur in the source file. The error is generated unconditionally when the directive is encountered, but the directives can be placed within conditional-assembly blocks to limit the errors to certain situations. Syntax .ERR .ERR1 .ERR2 The .ERR directive forces an error regardless of the pass. The .ERR1 and .ERR2 directives force the error only on their respective passes. The .ERR1 directive appears only on the screen or in the listing file if you use the /D option to request a pass 1 listing. You can place these directives within conditional-assembly blocks or macros to see which blocks are being expanded. Example IFDEF dos . . . ELSEIFDEF xenix . . . ELSE .ERR %OUT dos or xenix must be defined ENDIF ENDIF This example makes sure that either the symbol dos or the symbol xenix is defined. If neither is defined, the nested ELSE condition is assembled and an error message is generated. Since the .ERR directive is used, an error would be generated on each pass. You could use .ERR1 or .ERR2 to check if you want the error to be generated only on the corresponding pass. 10.2.2 Testing Expressions with .ERRE or .ERRNZ Directives The .ERRE and .ERRNZ directives test the value of an expression and conditionally generate an error based on the result. Syntax .ERRE expression .ERRNZ expression The .ERRE directive generates an error if expression is false (0). The .ERRNZ directive generates an error if expression is true (nonzero). The expression must evaluate to a constant value and must not contain forward references. Example buffer MACRO count,bname .ERRE count LE 128 ;; Allocate memory, but bname DB count DUP(0) ;; no more than 128 bytes ENDM . . . buffer 128,buf1 ; Data allocated - no error buffer 129,buf2 ; Error generated In this example, the .ERRE directive is used to check the boundaries of a parameter passed to the macro buffer. If count is less than or equal to 128, the expression being tested by the error directive will be true (nonzero) and no error will be generated. If count is greater than 128, the expression will be false (0) and the error will be generated. 10.2.3 Verifying Symbol Definition with .ERRDEF and .ERRNDEF Directives The .ERRDEF and .ERRNDEF directives test whether a symbol is defined and conditionally generate an error based on the result. Syntax .ERRDEF name .ERRNDEF name The .ERRDEF directive produces an error if name is defined as a label, variable, or symbol. The .ERRNDEF directive produces an error if name has not yet been defined. If name is a forward reference, it is considered undefined on pass 1, but defined on pass 2. Example .ERRNDEF publevel IF publevel LE 2 PUBLIC var1, var2 ELSE PUBLIC var1, var2, var3 ENDIF In this example, the .ERRNDEF directive at the beginning of the conditional block makes sure that a symbol being tested in the block actually exists. 10.2.4 Testing for Macro Parameters with .ERRB and .ERRNB Directives The .ERRB and .ERRNB directives test whether a specified argument was passed to a macro and conditionally generate an error based on the result. Syntax .ERRB <argument> .ERRNB <argument> These directives are always used inside macros, and they always test whether a real argument was passed for a specified dummy argument. The .ERRB directive generates an error if argument is blank. The .ERRNB directive generates an error if argument is not blank. The argument can be any name, number, or expression. Angle brackets (<>) are required. Example work MACRO realarg,testarg .ERRB <realarg> ;; Error if no parameters .ERRNB <testarg> ;; Error if more than one parameter . . . ENDM In this example, error directives are used to make sure that one, and only one, argument is passed to the macro. The .ERRB directive generates an error if no argument is passed to the macro. The .ERRNB directive generates an error if more than one argument is passed to the macro. 10.2.5 Comparing Macro Arguments with .ERRIDN and .ERRDIF Directives The .ERRIDN and .ERRDIF directives compare two macro arguments and conditionally generate an error based on the result. Syntax .ERRIDN[[I]] <argument1>,<argument2> .ERRDIF[[I]] <argument1>,<argument2> These directives are always used inside macros, and they always compare the real arguments specified for two parameters. The .ERRIDN directive generates an error if the arguments are identical. The .ERRDIF directive generates an error if the arguments are different. The arguments can be names, numbers, or expressions. They must be enclosed in angle brackets and separated by a comma. The optional I at the end of the directive name specifies that the directive is case insensitive. Arguments that are spelled the same will be evaluated the same regardless of case. If the I is not given, the directive is case sensitive. Example addem MACRO ad1,ad2,sum .ERRIDNI <ax>,<ad2> ;; Error if ad2 is "ax" mov ax,ad1 ;; Would overwrite if ad2 were AX add ax,ad2 mov sum,ax ;; Sum must be register or memory ENDM In this example, the .ERRIDNI directive is used to protect against passing the AX register as the second parameter, since this would cause the macro to fail. ──────────────────────────────────────────────────────────────────────────── Chapter 11: Using Equates, Macros, and Repeat Blocks This chapter explains how to use equates, macros, and repeat blocks. "Equates" are constant values assigned to symbols so that the symbol can be used in place of the value. "Macros" are a series of statements that are assigned a symbolic name (and, optionally, parameters) so that the symbol can be used in place of the statements. "Repeat blocks" are a special form of macro used to do repeated statements. Both equates and macros are processed at assembly time. They can simplify writing source code by allowing the user to substitute mnemonic names for constants and repetitive code. By changing a macro or equate, a programmer can change the effect of statements throughout the source code. In exchange for these conveniences, the programmer loses some assembly-time efficiency. Assembly may be slightly slower for a program that uses macros and equates extensively than for the same program written without them. However, the program without macros and equates usually takes longer to write and is more difficult to maintain. 11.1 Using Equates The equate directives enable you to use symbols that represent numeric or string constants. QuickAssembler recognizes three kinds of equates: 1. Redefinable numeric equates 2. Nonredefinable numeric equates 3. String equates (also called text macros) 11.1.1 Redefinable Numeric Equates Redefinable numeric equates are used to assign a numeric constant to a symbol. The value of the symbol can be redefined at any point during assembly time. Although the value of a redefinable equate may be different at different points in the source code, a constant value will be assigned for each use, and that value will not change at run time. Redefinable equates are often used for assembly-time calculations in macros and repeat blocks. Syntax name=expression The equal-sign (=) directive creates or redefines a constant symbol by assigning the numeric value of expression to name. No storage is allocated for the symbol. The symbol can be used in subsequent statements as an immediate operand having the assigned value. It can be redefined at any time. The expression can be an integer, a constant expression, a one- or two-character string constant, or an expression that evaluates to an address. The name must be either a unique name or a name previously defined by using the equal-sign (=) directive. ────────────────────────────────────────────────────────────────────────── NOTE Redefinable equates must be assigned numeric values. String constants longer than two characters cannot be used. ────────────────────────────────────────────────────────────────────────── Example counter = 0 ; Initialize counter array LABEL BYTE ; Label array of increasing numbers REPT 100 ; Repeat 100 times DB counter ; Initialize number counter = counter + 1 ; Increment counter ENDM This example redefines equates inside a repeat block to declare an array initialized to increasing values from 0 to 100. The equal-sign directive is used to increment the counter symbol for each loop. See Section 11.4 for more information on repeat blocks. 11.1.2 Nonredefinable Numeric Equates Nonredefinable numeric equates are used to assign a numeric constant to a symbol. The value of the symbol cannot be redefined. Nonredefinable numeric equates are often used for assigning mnemonic names to constant values. This can make the code more readable and easier to maintain. If a constant value used in numerous places in the source code needs to be changed, the equate can be changed in one place rather than throughout the source code. Syntax name EQU expression The EQU directive creates constant symbols by assigning expression to name. The assembler replaces each subsequent occurrence of name with the value of expression. Once a numeric equate has been defined with the EQU directive, it cannot be redefined. Attempting to do so generates an error. ────────────────────────────────────────────────────────────────────────── NOTE String constants can also be defined with the EQU directive, but the syntax is different, as described in Section 11.1.3, "String Equates." ────────────────────────────────────────────────────────────────────────── No storage is allocated for the symbol. Symbols defined with numeric values can be used in subsequent statements as immediate operands having the assigned value. Examples column EQU 80 ; Numeric constant 80 row EQU 25 ; Numeric constant 25 screenful EQU column * row ; Numeric constant 2000 line EQU row ; Alias for "row" .DATA buffer DW screenful .CODE . . . mov cx,column mov bx,line 11.1.3 String Equates String equates (or text macros) are used to assign a string constant to a symbol. String equates can be used in a variety of contexts, including defining aliases and string constants. Syntax name EQU{string | <string>} The EQU directive creates constant symbols by assigning string to name. The assembler replaces each subsequent occurrence of name with string. Symbols defined to represent strings with the EQU directive can be redefined to new strings. Symbols cannot be defined to represent strings with the equal-sign (=) directive. An alias is a special kind of string equate. It is a symbol that is equated to another symbol or keyword. If you want an equate to be a string equate, you should use angle brackets to force the assembler to evaluate it as a string. If you do not use angle brackets, the assembler will try to guess from context whether a numric or string equate is appropriate. This can lead to unexpected results. For example, the statement rt EQU run-time would be evaluated as run minus time, even though the user might intend to define the string run-time. If run and time were not already defined as numeric equates, the statement would generate an error. Using angle brackets solves this problem. The statement rt EQU <run-time> is evaluated as the string run-time. Examples ;String equate definitions pi EQU <3.1415> ; String constant "3.1415" prompt EQU <'Type Name: '> ; String constant "'Type Name: '", WPT EQU <WORD PTR> ; String constant for "WORD PTR" argl EQU <[bp+4]> ; String constant for "[bp+4]" ; Use of string equates .DATA message DB prompt ; Allocate string "Type Name:" pie DQ pi ; Allocate real number 3.1415 .CODE . . . inc WPT parm1 ; Increment word value of ; argument passed on stack Section 11.3, "Text-Macro String Directives," describes directives that enable you to manipulate strings. They are particularly powerful when you use them from within macros and repeat blocks, described later. 11.1.4 Predefined Equates The assembler includes several predefined equates. The ones related to segments are described in Section 5.1.5, "Using Predefined Segment Equates." In addition, the following equates are available: @WordSize, @Cpu, and @Version. The @WordSize equate returns the size of a word for the current segment. With QuickAssembler, this value is always equal to 2. However, other versions of the assembler can assign a different value to @WordSize when working with 80386 extended features. ────────────────────────────────────────────────────────────────────────── NOTE If you set the Preserve Case assembler flag or use the /Cl option, QuickAssembler considers predefined equates to be case-sensitive. The case-sensitive names of predefined equates are @WordSize, @Cpu, @Version, @CurSeg, @FileName, @CodeSize, @DataSize, @Model, @data, @data?, @fardata, @fardata?, and @code. ────────────────────────────────────────────────────────────────────────── The @Cpu equate returns a 16-bit value containing information about the selected processor. You select a processor by using one of the processor directives, such as the .286 directive. You can use the @Cpu text macro to control assembly of processor-specific code. Individual bits in the value returned by @Cpu indicate information about the selected processor. Bit If Bit = 1 ────────────────────────────────────────────────────────────────────────── 0 8086 processor 1 80186 processor 2 80286 processor 8 8087 coprocessor instructions enabled 10 80287 coprocessor instructions enabled Because the processors are upwardly compatible, selecting a higher-numbered processor automatically sets the bits indicating lower-numbered processors. For example, selecting an 80286 processor automatically sets the 80186 and 8086 bits. Bits 4 through 6, 9, and 12 through 15 are reserved for future use and should be masked off when testing. Bits 3, 7, and 11 have special meaning to Versions 5.1 and later of the Microsoft Macro Assembler: bit 3 indicates an 80386 processor, bit 7 indicates privilege mode enabled, and bit 11 indicates that 80387 coprocessor instructions are enabled. ────────────────────────────────────────────────────────────────────────── NOTE The @Cpu equate only provides information about the processor selected during assembly by one of the processor directives. It does not provide information about the processor actually used when a program is run. ────────────────────────────────────────────────────────────────────────── The following example uses the @Cpu text macro to select more efficient instructions available only on the 80186 processor and above: ; Use the 186/286/386 pusha instruction if possible P186 EQU (@Cpu AND 0002h) ; Only test 186 bit--286 and ; 386 set 186 bit as well . . . IF P186 ; Non-zero if 186 processor pusha ; or above ELSE push ax ; Do what the single push cx ; pusha instruction does push dx push bx push sp push bp push si push di ENDIF The @Version equate returns the version of the assembler in use. With the @Version equate, you can write macros that take appropriate actions for different versions of the assembler. Currently, the @Version equate returns 520 as a string of three characters. ────────────────────────────────────────────────────────────────────────── NOTE Although the version number of QuickAssembler is 2.01, the @Version equate returns 520 rather than 201. The number 520 was selected because QuickAssembler is an enhancement of Version 5.1 of the Microsoft Macro Assembler. The @Version equate was first assembled for Version 5.1. ────────────────────────────────────────────────────────────────────────── You can use the IF and IFE conditional assembly directives to test for different versions of the assembler and to assemble different code depending on the version. IFNDEF @Version %OUT MASM 5.0 or earlier has no extended PROC or .STARTUP ELSEIF @Version EQ 510 %OUT MASM 5.1 has extended PROC, but not .STARTUP ELSEIF @Version EQ 520 %OUT QuickAssembler 2.01 has extended PROC and .STARTUP ELSE %OUT Future assembler ENDIF 11.2 Using Macros Macros enable you to assign a symbolic name to a block of source statements and then to use that name in your source file to represent the statements. Parameters can also be defined to represent arguments passed to the macro. Macro expansion is a text-processing function that occurs at assembly time. Each time QuickAssembler encounters the text associated with a macro name, it replaces that text with the text of the statements in the macro definition. Similarly, the text of parameter names is replaced with the text of the corresponding actual arguments. A macro can be defined any place in the source file as long as the definition precedes the first source line that calls the macro. Macros and equates are often kept in a separate file and made available to the program through an INCLUDE directive (see Section 11.7.1, "Using Include Files") at the start of the source code. Often a task can be done by using either a macro or procedure. For example, the addup procedure shown in Section 15.3.3, "Passing Arguments on the Stack," does the same thing as the addup macro in Section 11.2.1, "Defining Macros." Macros are expanded on every occurrence of the macro name, so they can increase the length of the executable file if called repeatedly. Procedures are coded only once in the executable file, but the increased overhead of saving and restoring addresses and parameters can make them slower. The section below tells how to define and call macros. Repeat blocks, a special form of macro for doing repeated operations, are discussed separately in Section 11.4. 11.2.1 Defining Macros The MACRO and ENDM directives are used to define macros. MACRO designates the beginning of the macro block, and ENDM designates the end of the macro block. Syntax name MACRO [[parameter [[,parameter]]...]] statements ENDM The name must be unique and a valid symbol name. It can be used later in the source file to invoke the macro. The parameters (sometimes called dummy parameters) are names that act as placeholders for values to be passed as arguments to the macro when it is called. Any number of parameters can be specified, but they must all fit on one line. If you give more than one parameter, you must separate them with commas, spaces, or tabs. Commas can always be used as separators; spaces and tabs may cause ambiguity if the arguments are expressions. ────────────────────────────────────────────────────────────────────────── NOTE This manual uses the term "parameter" to refer to a placeholder for a value that will be passed to a macro or procedure. Parameters appear in macro or procedure definitions. The term "argument" is used to refer to an actual value passed to the macro or procedure when it is called. ────────────────────────────────────────────────────────────────────────── Any valid assembler statements may be placed within a macro, including statements that call or define other macros. Any number of statements can be used. The parameters can be used any number of times in the statements. Macros can be nested, redefined, or used recursively, as explained in Section 11.6, "Using Recursive, Nested, and Redefined Macros." QuickAssembler assembles the statements in a macro only if the macro is called and only at the point in the source file from which it is called. The macro definition itself is never assembled. A macro definition can include the LOCAL directive, which lets you define labels used only within a macro, or the EXITM directive, which allows you to exit from a macro before all the statements in the block are expanded. These directives are discussed in Sections 11.2.3, "Using Local Symbols," and 11.2.4, "Exiting from a Macro." Macro operators can also be used in macro definitions, as described in Section 11.5, "Using Macro Operators." Example addup MACRO ad1,ad2,ad3 mov ax,ad1 ;; First parameter in AX add ax,ad2 ;; Add next two parameters add ax,ad3 ;; and leave sum in AX ENDM This example defines a macro named addup, which uses three parameters to add three values and leave their sum in the AX register. The three parameters will be replaced with arguments when the macro is called. 11.2.2 Calling Macros A macro call directs QuickAssembler to copy the statements of the macro to the point of the call and to replace any parameters in the macro statements with the corresponding actual arguments. Syntax name [[argument [[,argument]]...]] The name must be the name of a macro defined earlier in the source file. The arguments can be any text. For example, symbols, constants, and registers are often given as arguments. Any number of arguments can be given, but they must all fit on one logical line. You can use the continuation character (\) to continue long macro calls on multiple physical lines. Multiple arguments must be separated by commas, spaces, or tabs. QuickAssembler replaces the first parameter with the first argument, the second parameter with the second argument, and so on. If a macro call has more arguments than the macro has parameters, the extra arguments are ignored. If a call has fewer arguments than the macro has parameters, any remaining parameters are replaced with a null (empty) string. You can use conditional statements to enable macros to check for null strings or other types of arguments. The macro can then take appropriate action to adjust to different kinds of arguments. See Chapter 10, "Assembling Conditionally," for more information on using conditional-assembly and conditional-error directives to test macro arguments. Example addup MACRO ad1,ad2,ad3 ; Macro definition mov ax,ad1 ;; First parameter in AX add ax,ad2 ;; Add next two parameters add ax,ad3 ;; and leave sum in AX ENDM . . . addup bx,2,count ; Macro call When the addup macro is called, QuickAssembler replaces the parameters with the actual parameters given in the macro call. In the example above, the assembler would expand the macro call to the following code: mov ax,bx add ax,2 add ax,count This code could be shown in an assembler listing, depending on whether the .LALL, .XALL, or .SALL directive was in effect (see Section 12.3, "Controlling the Contents of Listings"). 11.2.3 Using Local Symbols The LOCAL directive can be used within a macro to define symbols that are available only within the defined macro. ────────────────────────────────────────────────────────────────────────── NOTE In this context, the term "local" is not related to the public availability of a symbol, as described in Chapter 8, "Creating Programs from Multiple Modules," or to variables that are defined to be local to a procedure, as described in Section 15.3.5, "Using Local Variables." Local simply means that the symbol is not known outside the macro where it is defined. ────────────────────────────────────────────────────────────────────────── Syntax LOCAL localname [[,localname]]... The localname is a temporary symbol name that is to be replaced by a unique symbol name when the macro is expanded. At least one local name is required for each LOCAL directive. If more than one local symbol is given, the names must be separated with commas. Once declared, local name can be used in any statement within the macro definition. QuickAssembler creates a new actual name for localname each time the macro is expanded. The actual name has the following form: ??number The number is a hexadecimal number in the range 0000 to 0FFFF. You should not give other symbols names in this format, since doing so may produce a symbol with multiple definitions. In listings, the local name is shown in the macro definition, but the actual name is shown in expansions of macro calls. Nonlocal labels may be used in a macro; but if the macro is used more than once, the same label will appear in both expansions, and QuickAssembler will display an error message, indicating that the file contains a symbol with multiple definitions. To avoid this problem, use only local labels (or redefinable equates) in macros. ────────────────────────────────────────────────────────────────────────── NOTE The LOCAL directive in macro definitions must precede all other statements in the definition. If you try another statement (such as a comment directive) before the LOCAL directive, an error will be generated. ────────────────────────────────────────────────────────────────────────── Example power MACRO factor,exponent ;; Use for unsigned only LOCAL again,gotzero ;; Declare symbols for macro xor dx,dx ;; Clear DX mov cx,exponent ;; Exponent is count for loop mov ax,1 ;; Multiply by 1 first time jcxz gotzero ;; Get out if exponent is zero mov bx,factor again: mul bx ;; Multiply until done loop again gotzero: ENDM In this example, the LOCAL directive defines the local names again and gotzero as labels to be used within the power macro. These local names will be replaced with unique names each time the macro is expanded. For example, the first time the macro is called, again will be assigned the name ??0000 and gotzero will be assigned ??0001. The second time through, again will be assigned ??0002 and gotzero will be assigned ??0003, and so on. 11.2.4 Exiting from a Macro Normally, QuickAssembler processes all the statements in a macro definition and then continues with the next statement after the macro call. However, you can use the EXITM directive to tell the assembler to terminate macro expansion before all the statements in the macro have been assembled. When the EXITM directive is encountered, the assembler exits the macro or repeat block immediately. Any remaining statements in the macro or repeat block are not processed. If EXITM is encountered in a nested macro or repeat block, QuickAssembler returns to expanding the outer block. The EXITM directive is typically used with conditional directives to skip the last statements in a macro under specified conditions. Often macros using the EXITM directive contain repeat blocks or are called recursively. Example allocate MACRO times ; Macro definition x = 0 REPT times ;; Repeat up to 256 times IF x GT 0FFh ;; Is x > 255 yet? EXITM ;; If so, quit ELSE DB x ;; Else allocate x ENDIF x = x + 1 ;; Increment x ENDM ENDM This example defines a macro that allocates a variable amount of data, but no more than 255 bytes. The macro contains an IF directive that checks the expression x - 0FFh. When the value of this expression is true (x-255 = 0), the EXITM directive is processed and expansion of the macro stops. 11.3 Text-Macro String Directives The assembler includes four text-macro string directives that let you manipulate literal strings or text-macro values. You use the four directives in much the same way you use the equal-sign (=) directive. For example, the following line assigns the first three characters (abc) of the literal string to the label three by using the SUBSTR directive: three SUBSTR <abcdefghijklmnopqrstuvwxyz>,1,3 Each of the directives assigns its value──depending on the directive──to a numeric label or a text macro. The following list summarizes the four directives and the type of label that the directives should be used with: Directive Description ────────────────────────────────────────────────────────────────────────── SUBSTR Returns a substring of its text macro or literal string argument. SUBSTR requires a text-macro label. CATSTR Concatenates a variable number of strings (text macros or literal strings) to form a single string. CATSTR requires a text-macro label. SIZESTR Returns the length, in characters, of its argument string. SIZESTR requires a numeric label. INSTR Returns an index indicating the starting position of a substring within another string. INSTR requires a numeric label. Strings used as arguments in the directives must be text enclosed in angle brackets (< >), previously defined text macros, or expressions starting with a percent sign (%). Numeric arguments can be numeric constants or expressions that evaluate to constants during assembly. The next four sections describe the directives in more detail. 11.3.1 The SUBSTR Directive The SUBSTR directive returns a substring from a given string. Syntax textlabel SUBSTR string,start[[, length]] The SUBSTR directive takes the following arguments: Argument Description ────────────────────────────────────────────────────────────────────────── textlabel The text label the result is assigned to. string The string the substring is extracted from. start The starting position of the substring. The first character in the string has a position of one. length The number of characters to extract. If omitted, SUBSTR returns all characters to the right of position start, including the character at position start. In the following lines, the text macro freg is assigned the first two characters of the text macro reglist: reglist EQU <ax,bx,cx,dx> . . . freg SUBSTR reglist,1,2 ; freg = ax 11.3.2 The CATSTR Directive The CATSTR directive concatenates a series of strings. Syntax textlabel CATSTR string[[, string]]... The CATSTR directive takes the following arguments: Argument Description ────────────────────────────────────────────────────────────────────────── textlabel The text label the result is assigned to string The string or strings concatenated and assigned to textlabel The following lines concatenate the two literal strings and assign the result to the text macro lstring: lstring CATSTR <a b c>, <d e f>, ; lstring = a b c d e f 11.3.3 The SIZESTR Directive The SIZESTR directive assigns the length of its argument string to a numeric label. Syntax numericlabel SIZESTR string The SIZESTR directive takes the following arguments: Argument Description ────────────────────────────────────────────────────────────────────────── numericlabel The numeric label that the assembler assigns the string length to string The string whose length is returned The following lines set slength to 8──the length of the text macro tstring: tstring EQU <ax bx cx> . . . slength SIZESTR tstring ; slength = 8 A null string has a length of zero. 11.3.4 The INSTR Directive The INSTR directive returns the position of a string within another string. The directive returns 0 if the string is not found. The first character in a string has a position of one. Syntax numericlabel INSTR [[start,]]string1, string2 The INSTR directive takes the following arguments: Argument Description ────────────────────────────────────────────────────────────────────────── numbericlabel The numeric label the substring's position is assigned to. start The starting position for the search. When omitted, the INSTR directive starts searching at the first character. The first character in the string has a position of one. string1 The string being searched. string2 The string to look for. The following lines set colpos to the character position of the first colon in segarg: segarg EQU <ES:AX> . . . colpos INSTR segarg,<:> ; colpos = 3 11.3.5 Using String Directives Inside Macros The following example uses the text-macro string directives CATSTR, INSTR, SIZESTR, and SUBSTR. It defines two macros, SaveRegs and RestRegs, that save and restore registers on the stack. The macros are written so that RestRegs restores only the most recently saved group of registers. The SaveRegs macro uses a text macro, regpushed, to keep track of the registers pushed onto the stack. The RestRegs macro uses this string to restore the proper registers. Each time the SaveRegs macro is invoked, it adds a pound sign (#) to the string to mark the start of a new group of registers. The RestRegs macro restores the most recently saved group by finding the first pound sign in the string, creating a substring containing the saved register names, and then looping and generating PUSH instructions. ; Initialize regpushed to the null string regpushed EQU <> ; SaveRegs ; Loops and generates a push for each argument register. ; Saves each register name in regpushed. SaveRegs MACRO r1,r2,r3,r4,r5,r6,r7,r8,r9 regpushed CATSTR <#>,regpushed ;; Mark a new group of regs IRP reg,<r1,r2,r3,r4,r5,r6,r7,r8,r9> IFNB <reg> push reg ;; Push and record a register regpushed CATSTR <reg>,<,>,regpushed ELSE EXITM ;; Quit on blank argument ENDIF ENDM ENDM ; RestRegs ; Generates a pop for each register in the most recently saved groups RestRegs MACRO numloc INSTR regpushed,<#> ;; Find location of # reglist SUBSTR regpushed,1,numloc-1 ;; Get list of registers to pop reglen SIZESTR regpushed ;; Adjust numloc if # is notlast IF reglen GT numloc ;; item in the string numloc = numloc + 1 ENDIF regpushed SUBSTR regpushed,numloc ;; Remove list from regpushed % IRP reg,<reglist> ;; Generate pop for each register IFNB <reg> pop reg ENDIF ENDM ENDM The following lines from a listing file show the sample code that the macros would generate (a 2 marks lines generated by the macros): SaveRegs ax,bx 2 push ax ; 2 push bx ; SaveRegs cx 2 push cx ; SaveRegs dx 2 push dx ; RestRegs 2 pop dx RestRegs 2 pop cx RestRegs 2 pop bx 2 pop ax 11.4 Defining Repeat Blocks Repeat blocks are a special form of macro that allows you to create blocks of repeated statements. They differ from macros in that they are not named, and thus cannot be called. However, like macros, they can have parameters that are replaced by actual arguments during assembly. Macro operators, symbols declared with the LOCAL directive, and the EXITM directive can be used in repeat blocks. Like macros, repeat blocks are always terminated by an ENDM directive. Repeat blocks are frequently placed in macros in order to repeat some of the statements in the macro. They can also be used independently, usually for declaring arrays with repeated data elements. Repeat blocks are processed at assembly time and should not be confused with the REP instruction, which causes string instructions to be repeated at run time, as explained in Chapter 16, "Processing Strings." Three different kinds of repeat blocks can be defined by using the REPT, IRP, and IRPC directives. The difference between them is in how the number of repetitions is specified. 11.4.1 The REPT Directive The REPT directive is used to create repeat blocks in which the number of repetitions is specified with a numeric argument. Syntax REPT expression statements ENDM The expression must evaluate to a numeric constant (a 16-bit unsigned number). It specifies the number of repetitions. Any valid assembler statements may be placed within the repeat block. Example alphabet LABEL BYTE x = 0 ;; Initialize REPT 26 ;; Specify 26 repetitions DB 'A' + x ;; Allocate ASCII code for letter x = x + 1 ;; Increment ENDM This example repeats the equal-sign (=) and DB directives to initialize ASCII values for each uppercase letter of the alphabet. 11.4.2 The IRP Directive The IRP directive is used to create repeat blocks in which the number of repetitions, as well as parameters for each repetition, is specified in a list of arguments. Syntax IRP parameter,<argument[[,argument]]...> statements ENDM The assembler statements inside the block are repeated once for each argument in the list enclosed by angle brackets (< >). The parameter is a name for a placeholder to be replaced by the current argument. Each argument can be text, such as a symbol, string, or numeric constant. Any number of arguments can be given. If multiple arguments are given, they must be separated by commas. The angle brackets (< >) around the argument list are required. The parameter can be used any number of times in the statements. When QuickAssembler encounters an IRP directive, it makes one copy of the statements for each argument in the enclosed list. While copying the statements, it substitutes the current argument for all occurrences of parameter in these statements. If a null argument (< >) is found in the list, the dummy name is replaced with a blank value. If the argument list is empty, the IRP directive is ignored and no statements are copied. Example numbers LABEL BYTE IRP x,<0,1,2,3,4,5,6,7,8,9> DB 10 DUP(x) ENDM This example repeats the DB directive 10 times, allocating 10 bytes for each number in the list. The resulting statements create 100 bytes of data, starting with 10 zeros, followed by 10 ones, and so on. 11.4.3 The IRPC Directive The IRPC directive is used to create repeat blocks in which the number of repetitions, as well as arguments for each repetition, is specified in a string. Syntax IRPC parameter,string statements ENDM The assembler statements inside the block are repeated as many times as there are characters in string. The parameter is a name for a placeholder to be replaced by the current character in the string. The string can be any combination of letters, digits, and other characters. It should be enclosed with angle brackets (< >) if it contains spaces, commas, or other separating characters. The parameter can be used any number of times in these statements. When QuickAssembler encounters an IRPC directive, it makes one copy of the statements for each character in the string. While copying the statements, it substitutes the current character for all occurrences of parameter in these statements. Example 1 ten LABEL BYTE IRPC x,0123456789 DB x ENDM Example 1 repeats the DB directive 10 times, once for each character in the string 0123456789. The resulting statements create 10 bytes of data having the values 0-9. Example 2 IRPC letter,ABCDEFGHIJKLMNOPQRSTUVWXYZ DB '&letter' ; Allocate uppercase letter DB '&letter'+20h ; Allocate lowercase letter DB '&letter'-40h ; Allocate number of letter ENDM Example 2 allocates the ASCII codes for uppercase, lowercase, and numeric versions of each letter in the string. Notice that the substitute operator (&) is required so that letter will be treated as an argument rather than a string. See Section 11.5.1, "Substitute Operator," for more information. 11.5 Using Macro Operators Macro and conditional directives use the following special set of macro operators: Operator Definition ────────────────────────────────────────────────────────────────────────── & Substitute operator <> Literal-text operator ! Literal-character operator % Expression operator ;; Macro comment When used in a macro definition, a macro call, a repeat block, or as the argument of a conditional-assembly directive, these operators carry out special control operations, such as text substitution. 11.5.1 Substitute Operator The substitute operator (&) enables substitution of macro parameters to take place, even when the parameter occurs within a larger word or within a quoted string. Syntax You can use the substitute operator in any one of three different ways: name1&name2 name& &name The assembler responds by analyzing name1 and name2 separately, then joining them together. If either name1 or name2 is a parameter, the assembler replaces each parameter by an actual argument before joining the names. You can join any number of names with the substitute operator, so that items such as a&b&c are valid. The last two forms are useful when a parameter name appears within a quoted string. The assembler responds by substituting the actual argument for the parameter; when not next to an ampersand (&), the assembler considers the parameter name just part of the string data. Example declare MACRO x,y xy DW 0 x&y DW 0 x&str DB 'x and y params are &x and &y' ENDM The example above demonstrates how the presence or absence of the substitute operator affects macro substitution. Given the macro definition above, the statement declare foot,ball is expanded to xy DW football DW 0 footstr DB 'x and y params are foot and ball' In the first statement of the macro, xy is not identified with either of the parameters x or y; instead, xy forms a distinct name. No substitution takes place. In the second statement, the assembler interprets x&y by analyzing x and y as separate names, performing the appropriate parameter substitution in each case, and then joining the resulting names together. When you use the substitute operator with nested macros and repeat blocks, the assembler applies the following rules to expressions outside of quotes: 1. Perform parameter substitution as described above. 2. Remove exactly one of the substitute operators (&) from the group. The assembler strips off an operator whether or not a parameter was substituted. The number of ampersands (&) you use should never be greater than the number of levels of nesting. If you use too few ampersands, proper substitution will not take place. If you use too many ampersands, text such as x&y will remain after macro expansion is complete. Such expressions are invalid. When an ampersand appears inside quotes, the assembler removes ampersands on either side of a macro parameter when substitution is possible. When substitution is not possible (because the parameter name is not defined at the current level of nesting), the assembler leaves the ampersand as it is. With this method, parameter substitution automatically takes place at the appropriate level of nesting. 11.5.2 Literal-Text Operator The literal-text operator (< >) directs QuickAssembler to treat a list as a single string rather than as separate arguments. Syntax <text > The text is considered a single literal element even if it contains commas, spaces, or tabs. The literal-text operator is most often used in macro calls and with the IRP directive to ensure that values in a parameter list are treated as a single parameter. The literal-text operator can also be used to force QuickAssembler to treat special characters, such as the semicolon or the ampersand, literally. For example, the semicolon inside angle brackets <;> becomes a semicolon, not a comment indicator. QuickAssembler removes one set of angle brackets each time the parameter is used in a macro. When using nested macros, you will need to supply as many sets of angle brackets as there are levels of nesting. Example work 1,2,3,4,5 ; Passes five parameters to "work" work <1,2,3,4,5> ; Passes one five-element ; parameter to "work" When the IRP directive is used inside a macro definition and when the argument list of the IRP directive is also a parameter of the macro, you must use the literal-text operator (< >) to enclose the macro parameter. For example, in the following macro definition, the parameter x is used as the argument list for the IRP directive: init MACRO x IRP y,<x> DB y ENDM ENDM If this macro is called with init <0,1,2,3,4,5,6,7,8,9> the macro removes the angle brackets from the parameter so that it is expanded as 0,1,2,3,4,5,6,7,8,9. The brackets inside the repeat block are necessary to put the angle brackets back on. The repeat block is then expanded as shown below: IRP y,<0,1,2,3,4,5,6,7,8,9> DB y ENDM 11.5.3 Literal-Character Operator The literal-character operator (!) forces the assembler to treat a specified character literally rather than as a symbol. Syntax !character The literal-character operator is used with special characters, such as the semicolon or ampersand, when meaning of the special character must be suppressed. Using the literal-character operator is the same as enclosing a single character in brackets. For example, !! is the same as <!>. Example errgen MACRO y,x PUBLIC err&y err&y DB 'Error &y: &x' ENDMW . . . errgen 103,<Expression !> 255> The example macro call is expanded to allocate the string Error 103: Expression > 255. Without the literal-character operator, the greater-than symbol would be interpreted as the end of the argument and an error would result. 11.5.4 Expression Operator The expression operator (%) causes the assembler to treat the argument following the operator as an expression. Syntax %text QuickAssembler computes the expression's value and replaces text with the result. The expression can be either a numeric expression or a text equate. Additional arguments after an argument that uses the expression operator must be preceded by a comma, not a space or tab. The expression operator is typically used in macro calls when the programmer needs to pass the result of an expression rather than the actual expression to a macro. Example printe MACRO exp,val IF1 ;; On pass 1 only %OUT exp = val ;; Display expression and result ENDIF ;; to screen ENDM sym1 EQU 100 sym2 EQU 200 msg EQU <"Hello, World."> printe <sym1 + sym2>,%(sym1 + sym2) printe msg,%msg In the first macro call, the text literal sym1 + sym2 = is passed to the parameter exp, and the result of the expression is passed to the parameter val. In the second macro call, the equate name msg is passed to the parameter exp, and the text of the equate is passed to the parameter val. As a result, Quick-Assembler displays the following messages: sym1 + sym2 = 300 msg = "Hello, World." The %OUT directive, which sends a message to the screen, is described in Section 12.1, "Sending Messages to the Standard Output Device"; the IF1 directive is described in Section 10.1.2, "Testing the Pass with IF1 and IF2 Directives." You can also use the expression operator (%) to substitute the values of text macros for the macro names anywhere a text-macro name appears. When the expression operator is the first item on a line and is followed by one or more blanks or tabs, the line is scanned for text macros and the values of the macros are substituted. Using the expression operator, you can force substitution of text macros wherever they appear in a line. The assembler re-scans the line until all substitutions have been made. ────────────────────────────────────────────────────────────────────────── NOTE Text macros are always evaluated when they appear in the name or operation fields. The expression operator is required to evaluate a text macro only when the macro appears in the operand field. ────────────────────────────────────────────────────────────────────────── This use of the expression operator eliminates the need to do a macro call in order to evaluate a text macro. For example, the following macro uses a separate macro, popregs, to evaluate the text macro regpushed: regpushed EQU <ax,bx,cx> . . . RestRegs MACRO popregs %regpushed ENDM popregs MACRO reglist IRP reg,<reglist> pop reg ENDM ENDM The use of the expression operator to evaluate text macros in a line makes the popregs macro unnecessary: regpushed EQU <ax,bx,cx> . . . RestRegs MACRO % IRP reg,<regpushed> ;; % operator makes ;; separate macro unnecessary pop reg ENDM ENDM You cannot use the EQU directive to assign a value to a text macro in a line evaluated with the expression operator. For example, the following lines generate an error: strpos EQU <[si]+12> . . . % wpstrpos EQU <WORD PTR strpos> On pass 1, wpstrpos is defined as a text macro that is expanded on pass 2. Thus, on pass 2 the second EQU directive becomes WORD PTR [si]+12 EQU <WORD PTR [si]+12> and generates an error. Instead, use the CATSTR directive to assign values to text macros (see Section 11.3, "Text-Macro String Directives," for more information about CATSTR and other text-macro string directives). The previous example should be rewritten as follows: strpos EQU <[si]+12> . . . wpstrpos CATSTR <WORD PTR >, strpos If the text macro evaluates to a valid name, there is no error when you use EQU. The following lines do not generate an error, but define two names, one (numlabel) with the value 5, the other (tmacro) with the value <numlabel>: tmacro EQU <numlabel> % tmacro EQU 5 You can also use the substitution operator (&) with text macros just as you would inside a macro: SegName EQU <MySeg> . . . % SegName&_text SEGMENT PUBLIC 'CODE' The final line, after expanding the text macro, becomes: MySeg_text SEGMENT PUBLIC 'CODE' The substitution operator separates the text-macro name from the text that immediately follows it. The name appears to the assembler as segName_text without the substitution operator, and the assembler fails to recognize the text macro. 11.5.5 Macro Comments A macro comment is any text in a macro definition that does not need to be copied in the macro expansion. A double semicolon (;;) is used to start a macro comment. Syntax ;;text All text following the double semicolon (;;) is ignored by the assembler and will appear only in the macro definition when the source listing is created. The regular comment operator (;) can also be used in macros. However, regular comments may appear in listings when the macro is expanded. Macro comments will appear in the macro definition, but not in macro expansions. Whether or not regular comments are listed in macro expansions depends on the use of the .LALL, .XALL, and .SALL directives, as described in Section 12.2.3, "Controlling Page Breaks." 11.6 Using Recursive, Nested, and Redefined Macros The concept of replacing macro names with predefined macro text is simple, but in practice it has many implications and potentially unexpected side effects. The following sections discuss advanced macro features (such as nesting, recursion, and redefinition) and point out some side effects of macros. 11.6.1 Using Recursion Macro definitions can be recursive: that is, they can call themselves. Using recursive macros is one way of doing repeated operations. The macro does a task, and then calls itself to do the task again. The recursion is repeated until a specified condition is met. Example pushall MACRO reg1,reg2,reg3,reg4,reg5,reg6 IFNB <reg1> ;; If parameter not blank push reg1 ;; push one register and repeat pushall reg2,reg3,reg4,reg5,reg6 ENDIF ENDM . . . pushall ax,bx,si,ds pushall cs,es In this example, the pushall macro repeatedly calls itself to push a register given in a parameter until no parameters are left to push. A variable number of parameters (up to six) can be given. 11.6.2 Nesting Macro Definitions One macro can define another. QuickAssembler does not process nested definitions until the outer macro has been called. Therefore, nested macros cannot be called until the outer macro has been called at least once. Macro definitions can be nested to any depth. Nesting is limited only by the amount of memory available when the source file is assembled. Using a macro to create similar macros can make maintenance easier. If you want to change all the macros, change the outer macro and it automatically changes the others. Example shifts MACRO opname ; Define macro that defines macros opname&s MACRO operand,rotates IF rotates LE 4 REPT rotates opname operand,1 ;; One at a time is faster ENDM ;; for 4 or less on 8088/8086 ELSE mov cl,rotates ;; Using CL is faster opname operand,cl ;; for more than 4 on 8088/8086 ENDIF ENDM ENDM shifts ror ; Call macro to new macros shifts rol shifts shr shifts shl shifts rcl shifts rcr shifts sal shifts sar . . . shrs ax,5 ; Call defined macros rols bx,3 This macro, when called as shown, creates macros for multiple shifts with each of the shift and rotate instructions. All the macro names are identical except for the instruction. For example, the macro for the SHR instruction is called shrs; the macro for the ROL instruction is called rols. If you want to enhance the macros by doing more parameter checking, you can modify the original macro. Doing so will change the created macros automatically. This macro uses the substitute operator, as described in Section 11.5.1. 11.6.3 Nesting Macro Calls Macro definitions can contain calls to other macros. Nested macro calls are expanded like any other macro call, but only when the outer macro is called. Example ex MACRO text,val ; Inner macro definition IF2 %OUT The expression (text) has the value: &val ENDIF ENDM express MACRO expression ; Outer macro definition ex <expression>,%(expression) ENDM . . . express <4 + 2 * 7 - 3 MOD 4> The two sample macros enable you to print the result of a complex expression to the screen by using the %OUT directive, even though that directive expects text rather than an expression (see Section 12.1, "Sending Messages to the Standard Output Device"). Being able to see the value of an expression is convenient during debugging. Both macros are necessary. The express macro calls the ex macro, using operators to pass the expression both as text and as the value of the expression. With the call in the example, the assembler sends the following line to the standard output: The expression (4 + 2 * 7 - 3 MOD 4) has the value: 15 You could get the same output by using only the ex macro, but you would have to type the expression twice and supply the macro operators in the correct places yourself. The express macro does this for you automatically. Notice that expressions containing spaces must still be enclosed in angle brackets. Section 11.5.2, "Literal-Text Operator," explains why. 11.6.4 Redefining Macros Macros can be redefined. You do not need to purge the macro before redefining it. The new definition automatically replaces the old definition. If you redefine a macro from within the macro itself, make sure there are no statements or comments between the ENDM directive of the nested redefinition and the ENDM directive of the original macro. Example getasciiz MACRO .DATA max DB 80 actual DB ? tmpstr DB 80 DUP (?) .CODE mov ah,0Ah mov dx,OFFSET max int 21h mov bl,actual xor bh,bh mov tmpstr[bx],0 getasciiz MACRO mov ah,0Ah mov dx,OFFSET max int 21h mov bl,actual xor bh,bh mov tmpstr[bx],0 ENDM ENDM This macro allocates data space the first time it is called, and then redefines itself so that it doesn't try to reallocate the data on subsequent calls. 11.6.5 Avoiding Inadvertent Substitutions QuickAssembler replaces all parameters when they occur with the corresponding argument, even if the substitution is inappropriate. For example, if you use a register name, such as AX or BH, as a parameter, QuickAssembler replaces all occurrences of that name when it expands the macro. If the macro definition contains statements that use the register, not the parameter, the macro will be incorrectly expanded. QuickAssembler will not warn you about using reserved names as macro parameters. QuickAssembler does give a warning if you use a reserved name as a macro name. You can ignore the warning, but be aware that the reserved name will no longer have its original meaning. For example, if you define a macro called ADD, the ADD instruction will no longer be available. Your ADD macro takes its place. 11.7 Managing Macros and Equates Macros and equates are often kept in a separate file and read into the assembler source file at assembly time. In this way, libraries of related macros and equates can be used by many different source files. The INCLUDE directive is used to read an include file into a source file. Memory can be saved by using the PURGE directive to delete the unneeded macros from memory. 11.7.1 Using Include Files The INCLUDE directive inserts source code from a specified file into the source file from which the directive is given. Syntax INCLUDE filespec The filespec must specify an existing file containing valid assembler statements. When the assembler encounters an INCLUDE directive, it opens the specified source file and begins processing its statements. When all statements have been read, QuickAssembler continues with the statement immediately following the INCLUDE directive. The filespec can be given either as a file name, or as a complete or relative file specification, including drive or directory name. If a complete or relative file specification is given, QuickAssembler looks for the include file only in the specified directory. If a file name is given without a directory or drive name, QuickAssembler looks for the file in the following order: 1. If paths are specified with the /I option, QuickAssembler looks for the include file in the specified directory or directories. 2. QuickAssembler looks for the include file in the current directory. 3. If an INCLUDE environment variable is defined, QuickAssembler looks for the include file in the directory or directories specified in the environment variable. Nested INCLUDE directives are allowed. QuickAssembler marks included statements with the letter "C" in assembly listings. Directories can be specified in INCLUDE path names with either the backslash (\) or the forward slash (/). This is for XENIX(R) compatibility. ────────────────────────────────────────────────────────────────────────── NOTE Any standard code can be placed in an include file. However, include files are usually used only for macros, equates, and standard segment definitions. Standard procedures are usually assembled into separate object files and linked with the main source modules. ────────────────────────────────────────────────────────────────────────── Examples INCLUDE fileio.mac ; File name only; use with ; /I or environment INCLUDE b:\include\keybd.inc ; Complete file specification INCLUDE /usr/jons/include/stdio.mac ; Path name in XENIX format INCLUDE masm_inc\define.inc ; Partial path name in DOS format ; (relative to current directory) 11.7.2 Purging Macros from Memory The PURGE directive can be used to delete a currently defined macro from memory. Syntax PURGE macroname[[,macroname]]... Each macroname is deleted from memory when the directive is encountered at assembly time. Any subsequent call to that macro causes the assembler to generate an error. The PURGE directive is intended to clear memory space no longer needed by a macro. If a macro has been used to redefine a reserved name, the reserved name is restored to its previous meaning. The PURGE directive can be used to clear memory if a macro or group of macros is needed only for part of a source file. It is not necessary to purge a macro before redefining it. Any redefinition of a macro automatically purges the previous definition. Also, a macro can purge itself as long as the PURGE directive is on the last line of the macro. The PURGE directive works by redefining the macro to a null string. Therefore, calling a purged macro does not cause an error. The macro name is simply ignored. Examples GetStuff PURGE GetStuff These examples call a macro and then purge it. You might need to purge macros in this way if your system does not have enough memory to keep all of the macros needed for a source file in memory at the same time. ──────────────────────────────────────────────────────────────────────────── Chapter 12: Controlling Assembly Output QuickAssembler has two ways of communicating results. It can write information to a listing or object file, or it can display messages to the standard output device (normally the screen). Both kinds of output can be controlled by menu commands and by source-file statements. This chapter explains the directives that directly control output through source-file statements. 12.1 Sending Messages to the Standard Output Device The %OUT directive instructs the assembler to display text to the program output window. Syntax %OUT text The text can be any line of ASCII characters. If you want to display multiple lines, you must use a separate %OUT directive for each line. The directive is useful for displaying messages at specific points of a long assembly. It can be used inside conditional-assembly blocks to display messages when certain conditions are met. The %OUT directive generates output for both assembly passes. The IF1 and IF2 directives can be used for control when the directive is processed. Macros that enable you to output the value of expressions are shown in Section 11.6.3, "Nesting Macro Calls." Example IF1 %OUT First Pass - OK ENDIF This sample block could be placed at the end of a source file so that the message First Pass - OK would be displayed at the end of the first pass, but ignored on the second pass. 12.2 Controlling Page Format in Listings QuickAssembler provides several directives for controlling the page format of listings. These directives include the following: Directive Action ────────────────────────────────────────────────────────────────────────── TITLE Sets title for listings SUBTTL Sets title for sections in listings PAGE Sets page length and width, and controls page and section breaks 12.2.1 Setting the Listing Title The TITLE directive specifies a title to be used on each page of assembly listings. In editor-based listing files (the default inside the QC environment), the title is only printed once, at the top of the file. Syntax TITLE text The text can be any combination of characters up to 60 in length. The title is printed flush left on the second line of each page of the listing. If no TITLE directive is given, the title will be blank. No more than one TITLE directive per module is allowed. Example TITLE Graphics Routines This example sets the listing title. A page heading that reflects this title is shown below: Microsoft (R) QuickC with QuickAssembler Version 2.01 9/25/89 12:00:00 Graphics Routines Page 1-2 12.2.2 Setting the Listing Subtitle The SUBTTL directive specifies the subtitle used on each page of assembly listings. In editor-based listing files (the default inside the QC environment), the subtitle is ignored. Syntax SUBTTL text The text can be any combination of characters up to 60 in length. The subtitle is printed flush left on the third line of the listing pages. If no SUBTTL directive is used, or if no text is given for a SUBTTL directive, the subtitle line is left blank. Any number of SUBTTL directives can be given in a program. Each new directive replaces the current subtitle with the new text. SUBTTL directives are often used just before a PAGE + statement, which creates a new section (see Section 12.2.3, "Controlling Page Breaks"). Example SUBTTL Point Plotting Procedure PAGE + The example above creates a section title and then creates a page break and a new section. A page heading that reflects this title is shown below: Microsoft (R) QuickC with QuickAssembler Version 2.01 9/25/89 12:00:00 Graphics Routines Page 3-1 Point Plotting Procedure 12.2.3 Controlling Page Breaks The PAGE directive can be used to designate the line length and width for the program listing, to increment the section and adjust the section number accordingly, or to generate a page break in the listing. In editor-based listing files (the default inside the QC environment), page-break directives are ignored, except for the page width specifier. Syntax PAGE [[[[length]],width]] PAGE + If length and width are specified, the PAGE directive sets the maximum number of lines per page to length and the maximum number of characters per line to width. The length must be in the range of 10-255 lines. The default page length is 50 lines. The width must be in the range of 60-132 characters. The default page width is 80 characters. With editor-based listing files, the default page width is 255. To specify the width without changing the default length, use a comma before width. If no argument is given, PAGE starts a new page in the program listing by copying a form-feed character to the file and generating new title and subtitle lines. If a plus sign follows PAGE, a page break occurs, the section number is incremented, and the page number is reset to 1. Program-listing page numbers have the following format: section-page The section is the section number within the module, and page is the page number within the section. By default, section and page numbers begin with 1-1. The SUBTTL directive and the PAGE directive can be used together to start a new section with a new subtitle. See Section 12.2.2, "Setting the Listing Subtitle," for an example. Example 1 PAGE Example 1 creates a page break. Example 2 PAGE 58,90 Example 2 sets the maximum page length to 58 lines and the maximum width to 90 characters. Example 3 PAGE ,132 Example 3 sets the maximum width to 132 characters. The current page length (either the default of 50 or a previously set value) remains unchanged. Example 4 PAGE + Example 4 creates a page break, increments the current section number, and sets the page number to 1. For example, if the preceding page was 3-6, the new page would be 4-1. 12.2.4 Naming the Module The assembler automatically uses the base file name of the source file as the name of the module. This name is used to identify error messages when you run the assembler from the command line. 12.3 Controlling the Contents of Listings QuickAssembler provides several directives for controlling what text will be shown in listings. The directives that control the contents of listings are shown below: Directive Action ────────────────────────────────────────────────────────────────────────── .LIST Lists statements in program listing .XLIST Suppresses listing of statements .LFCOND Lists false-conditional blocks in program listing .SFCOND Suppresses false-conditional listing .TFCOND Toggles false-conditional listing .LALL Includes macro expansions in program listing .SALL Suppresses listing of macro expansions .XALL Excludes comments from macro listing 12.3.1 Suppressing and Restoring Listing Output The .LIST and .XLIST directives specify which source lines are included in the program listing. Syntax .LIST .XLIST The .XLIST directive suppresses copying of subsequent source lines to the program listing. The .LIST directive restores copying. The directives are typically used in pairs to prevent a particular section of a source file from being copied to the program listing. The .XLIST directive overrides other listing directives, such as .SFCOND or .LALL. Example .XLIST ; Listing suspended here . . . .LIST ; Listing resumes here . . . 12.3.2 Controlling Listing of Conditional Blocks The .SFCOND, .LFCOND, and .TFCOND directives control whether false-conditional blocks should be included in assembly listings. Syntax .SFCOND .LFCOND .TFCOND The .SFCOND directive suppresses the listing of any subsequent conditional blocks whose condition is false. The .LFCOND directive restores the listing of these blocks. Like .LIST and .XLIST, conditional-listing directives can be used to suppress listing of conditional blocks in sections of a program. The .TFCOND directive toggles the current status of listing of conditional blocks. This directive can be used in conjunction with the /Sx option of the assembler. By default, conditional blocks are not listed on start-up. However, they will be listed on start-up if the /Sx option is given. This means that using /Sx reverses the meaning of the first .TFCOND directive in the source file. The /Sx option is discussed in Appendix B, Section B.14, "Listing False Conditionals." Example test1 EQU 0 ; Defined to make all conditionals false ; /Sx not used /Sx used .TFCOND IFNDEF test1 ; Listed Not listed test2 DB 128 ENDIF .TFCOND IFNDEF test1 ; Not listed Listed test3 DB 128 ENDIF .SFCOND IFNDEF test1 ; Not listed Not listed test4 DB 128 ENDIF .LFCOND IFNDEF test1 ; Listed Listed test5 DB 128 ENDIF In the example above, the listing status for the first two conditional blocks would be different, depending on whether the /X option was used. The blocks with .SFCOND and .LFCOND would not be affected by the /X option. 12.3.3 Controlling Listing of Macros The .LALL, .XALL, and .SALL directives control the listing of the expanded macro calls. The assembler always lists the full macro definition. The directives only affect expansion of macro calls. Syntax .LALL .XALL .SALL The .LALL directive causes QuickAssembler to list all the source statements in a macro expansion, including normal comments (preceded by a single semicolon) but not macro comments (preceded by a double semicolon). The .XALL directive causes QuickAssembler to list only those source statements in a macro expansion that generate code or data. For instance, comments, equates, and segment definitions are ignored. The .SALL directive causes QuickAssembler to suppress listing of all macro expansions. The listing shows the macro call, but not the source lines generated by the call. The .XALL directive is in effect when QuickAssembler first begins execution. Example tryout MACRO param ;; Macro comment ; Normal comment it EQU 3 ; No code or data ASSUME es:_DATA ; No code or data DW param ; Generates data mov ax,it ; Generates code ENDM . . . .LALL tryout 6 ; Call with .LALL .XALL tryout 6 ; Call with .XALL .SALL tryout 6 ; Call with .SALL The macro calls in the example generate the following listing lines: .LALL tryout 6 ; Call with .LALL 1 ; Normal comment = 0003 1 it EQU 3 ; No code or data 1 ASSUME es:_TEXT ; No code or data 0015 0006 1 DW 6 ; Generates data 0017 B8 0003 1 mov ax,it ; Generates code .XALL tryout 6 ; Call with .XALL 001A 0006 1 DW 6 ; Generates data 001C B8 0003 1 mov ax,it ; Generates code .SALL tryout 6 ; Call with .SALL Notice that the macro comment is never listed in macro expansions. Normal comments are listed only with the .LALL directive. ──────────────────────────────────────────────────────────────────────────── PART 3: Using Instructions Part 3 of the Programmer's Guide (comprising Chapters 13-18) explains how to use instructions in assembly-language source code. Instructions constitute the actual steps of your program and are translated into machine-code statements that the processor executes at run time. Part 3 is organized topically, with related instructions discussed together. Chapter 13 explains the instructions for moving data from one location to another. The instructions for performing calculations on numbers and bits are covered in Chapter 14. The 8086-family processors provide four major types of instructions for controlling program flow, as described in Chapter 15. Chapter 16 explains the instructions and techniques for processing strings. The 8087-family coprocessors and their instructions are explained in Chapter 17. Finally, Chapter 18 summarizes the instructions available for processor control. ──────────────────────────────────────────────────────────────────────────── Chapter 13: Loading, Storing, and Moving Data The 8086-family processors provide several instructions for loading, storing, or moving various kinds of data. Among the types of transferable data are variables, pointers, and flags. Data can be moved to and from registers, memory, ports, and the stack. This chapter explains the instructions for moving data from one location to another. 13.1 Transferring Data Moving data is one of the most common tasks in assembly-language programming. Data can be moved between registers or between memory and registers. Immediate data can be loaded into registers or into memory. Furthermore, all memory-to-memory operations are illegal. To move data from one memory location to another, you must first move the data to an intermediate register. 13.1.1 Copying Data The MOV instruction is the most common method of moving data. This instruction can be thought of as a copy instruction, since it always copies the source operand to the destination operand. Immediately after a MOV instruction, the source and destination operands both contain the same value. The old value in the destination operand is destroyed. Syntax MOV {register | memory},{register | memory | immediate} Example 1 mov ax,7 ; Immediate to register mov mem,7 ; Immediate to memory direct mov mem[bx],7 ; Immediate to memory indirect mov mem,ds ; Segment register to memory mov mem,ax ; Register to memory direct mov mem[bx],ax ; Register to memory indirect mov ax,mem ; Memory direct to register mov ax,mem[bx] ; Memory indirect to register mov ds,mem ; Memory to segment register mov ax,bx ; Register to register mov ds,ax ; General register to segment register mov ax,ds ; Segment register to general register The statements in Example 1 illustrate each type of memory move that can be done with a single instruction. Example 2 illustrates several common types of moves that require two instructions. Example 2 ; Move immediate to segment register mov ax,DGROUP ; Load immediate to general register mov ds,ax ; Store general register to segment register ; Move memory to memory mov ax,mem1 ; Load memory to general register mov mem2,ax ; Store general register to memory ; Move segment register to segment register mov ax,ds ; Load segment register to general register mov es,ax ; Store general register to segment register 13.1.2 Exchanging Data The XCHG (Exchange) instruction exchanges the data in the source and destination operands. Data can be exchanged between registers or between registers and memory. Syntax XCHG {register | memory},{register | memory} Examples xchg ax,bx ; Put AX in BX and BX in AX xchg memory,ax ; Put "memory" in AX and AX in "memory" 13.1.3 Looking Up Data The XLAT (Translate) instruction is used to load data from a table in memory. The instruction is useful for translating bytes from one coding system to another. Syntax XLAT[[B]] [[[[segment:]]memory]] The BX register must contain the address of the start of the table. By default, the DS register contains the segment of the table, but a segment override can be used to specify a different segment. The operand need not be given except when specifying a segment override. Before the XLAT instruction is called, the AL register should contain a value that points into the table (the start of the table is considered 0). After the instruction is called, AL will contain the table value pointed to. For example, if AL contains 7, the 8th byte of the table will be placed in the AL register. ────────────────────────────────────────────────────────────────────────── NOTE For compatibility with Intel 80386 mnemonics, QuickAssembler recognizes XLATB as a synonym for XLAT. In the Intel syntax, XLAT requires an operand; XLATB does not allow one. Quick-Assembler never requires an operand, but always allows one. ────────────────────────────────────────────────────────────────────────── Example ; Table of Hexadecimal digits hex DB "0123456789ABCDEF" convert DB "You pressed the key with ASCII code " key DB ?,?,"h",13,10,"$" .CODE . . . mov ah,8 ; Get a key in AL int 21h ; Call DOS mov bx,OFFSET hex ; Load table address mov ah,al ; Save a copy in high byte and al,00001111b ; Mask out top character xlat ; Translate mov key[1],al ; Store the character mov cl,12 ; Load shift count shr ax,cl ; Shift high character into position xlat ; Translate mov key,al ; Store the character mov dx,OFFSET convert ; Load message mov ah,9 ; Display it int 21h ; Call DOS This example looks up hexadecimal characters in a table in order to convert an eight-bit binary number to a string representing a hexadecimal number. 13.1.4 Transferring Flags The 8086-family processors provide instructions for loading and storing flags in the AH register. Syntax LAHF SAHF The status of the lower byte of the flags register can be saved to the AH register with LAHF and then later restored with SAHF. If you need to save and restore the entire flags register, use PUSHF and POPF, as described in Section 13.4.3, "Saving Flags on the Stack." SAHF is often used with a coprocessor to transfer coprocessor control flags to processor control flags. Section 17.7, "Controlling Program Flow," explains and illustrates this technique. 13.2 Converting between Data Sizes Since moving data between registers of different sizes is illegal, you must take special steps if you need to extend a register value to a larger register or register pair. The procedure is different for signed and unsigned values. The processor cannot tell the difference between signed and unsigned numbers; the programmer has to understand this difference and program accordingly. 13.2.1 Extending Signed Values The CBW (Convert Byte to Word) and CWD (Convert Word to Doubleword) instructions are provided to sign-extend values. Sign-extending means copying the sign bit of the unextended operand to all bits of the extended operand. Syntax CBW CWD The CBW instruction converts an 8-bit signed value in AL to a 16-bit signed value in AX. The CWD instruction is similar except that it sign-extends a 16-bit value in AX to a 32-bit value in the DX:AX register pair. Both instructions work only on values in the accumulator register. Example .DATA mem8 DB -5 mem16 DW -5 .CODE . . . mov al,mem8 ; Load 8-bit -5 (FBh) cbw ; Convert to 16-bit -5 (FFFBh) in AX mov ax,mem16 ; Load 16-bit -5 (FFFBh) cwd ; Convert to 32-bit -5 (FFFF:FFFBh) ; in DX:AX 13.2.2 Extending Unsigned Values To extend unsigned numbers, set the value of the upper register to 0. Example .DATA mem8 DB 251 mem16 DW 251 .CODE . . . mov al,mem8 ; Load 251 (FBh) from 8-bit memory xor ah,ah ; Zero upper half (AH) mov ax,mem16 ; Load 251 (FBh) from 16-bit memory xor dx,dx ; Zero upper half (DX) 13.3 Loading Pointers The 8086-family processors provide several instructions for loading pointer values into registers or register pairs. They can be used to load either near or far pointers. 13.3.1 Loading Near Pointers The LEA instruction loads a near pointer into a specified register. Syntax LEA register,memory The destination register may be any general-purpose register. The source operand may be any memory operand. The effective address of the source operand is placed in the destination register. The LEA instruction can be used to calculate the effective address of a direct memory operand, but this is usually not efficient, since the address of a direct memory operand is a constant known at assembly time. For example, the following statements have the same effect, but the second version is faster: lea dx,string ; Load effective address - slow mov dx,OFFSET string ; Load offset - fast The LEA instruction is more useful for calculating the address of indirect memory operands: lea dx,string[si] ; Load effective address 13.3.2 Loading Far Pointers The LDS and LES instructions load far pointers. Syntax LDS register,memory LES register,memory The memory address being pointed to is specified in the source operand, and the register where the offset will be stored is specified in the destination operand. The address must be stored in memory with the segment in the upper word and the offset in the lower word. The segment register where the segment will be stored is specified in the instruction name. For example, LDS puts the segment in DS, and LES puts the segment in ES. The instructions are often used with string instructions, as explained in Chapter 16, "Processing Strings." Example .DATA string DB "This is a string." fpstring DD string ; Far pointer to string pointers DD 100 DUP (?) .CODE . . . les di,fpstring ; Put address in ES:DI pair lds si,pointers[bx] ; Put address in DS:SI pair 13.4 Transferring Data to and from the Stack A "stack" is an area of memory for storing temporary data. Unlike other segments in which data is stored starting from low memory, data on the stack is stored in reverse order starting from high memory. Initially, the stack is an uninitialized segment of a finite size. As data is added to the stack at run time, the stack grows downward from high memory to low memory. When items are removed from the stack, it shrinks upward from low memory to high memory. The stack has several purposes in the 8086-family processors. The CALL, INT, RET, and IRET instructions automatically use the stack to store the calling addresses of procedures and interrupts (see Sections 15.3, "Using Procedures," and 15.4, "Using Interrupts"). You can also use the PUSH and POP instructions and their variations to store values on the stack. 13.4.1 Pushing and Popping In 8086-family processors, the SP (Stack Pointer) register always points to the current location in the stack. The PUSH and POP instructions use the SP register to keep track of the current position in the stack. The values pointed to by the BP and SP registers are relative to the SS (Stack Segment) register. The BP register is often used to point to the base of a frame of reference (a stack frame) within the stack. Syntax PUSH {register | memory} POP {register | memory} PUSH immediate (80186-80386 only) The PUSH instruction is used to store a two-byte operand on the stack. The POP instruction is used to retrieve a previously pushed value. When a value is pushed onto the stack, the SP register is decreased by 2. When a value is popped off the stack, the SP register is increased by 2. Although the stack always contains word values, the SP register points to bytes. Thus, SP changes in multiples of 2. ────────────────────────────────────────────────────────────────────────── NOTE The 8088 and 8086 processors differ from later Intel processors in how they push and pop the SP register. If you give the statement push sp with the 8088 or 8086, the word pushed will be the word in SP after the push operation. ────────────────────────────────────────────────────────────────────────── Figure 13.1 illustrates how pushes and pops change the SP register. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 13.4.1 of the manual │ └────────────────────────────────────────────────────────────────────────┘ The PUSH and POP instructions are almost always used in pairs. Words are popped off the stack in reverse order from the order in which they are pushed onto the stack. You should normally do the same number of pops as pushes to return the stack to its original status. However, it is possible to return the stack to its original status by subtracting the correct number of words from the SP register. Values on the stack can be accessed by using indirect memory operands with BP as the base register. Example mov bp,sp ; Set stack frame push ax ; Push first; SP = BP + 2 push bx ; Push second; SP = BP + 4 push cx ; Push third; SP = BP + 6 . . . mov ax,[bp+6] ; Put third in AX mov bx,[bp+4] ; Put second in BX mov cx,[bp+2] ; Put first in CX . . . sub sp,6 ; Restore stack pointer ; two bytes per push 80186/286/386 Only Starting with the 80186 processor, the PUSH instruction can be given with an immediate operand. For example, the following statement is legal on the 80186, 80286, and 80386 processors: push 7 ; 3 clocks on 80286 This statement is faster than the following equivalent statements, which are required on the 8088 or 8086: mov ax,7 ; 2 clocks on 80286 push ax ; 3 clocks on 80286 13.4.2 Using the Stack The stack can be used to store temporary data. For example, in the Microsoft calling convention, the stack is used to pass arguments to a procedure. The arguments are pushed onto the stack before the call. The procedure retrieves and uses them. Then the stack is restored to its original position at the end of the procedure. The stack can also be used to store variables that are local to a procedure. Both of these techniques are discussed in Section 15.3, "Using Procedures." Another common use of the stack is to store temporary data when there are no free registers available or when a particular register must hold more than one value. For example, the CX register usually holds the count for loops. If two loops are nested, the outer count is loaded into CX at the start. When the inner loop starts, the outer count is pushed onto the stack and the inner count loaded into CX. When the inner loop finishes, the original count is popped back into CX. Example mov cx,10 ; Load outer loop counter outer: . . ; Start outer loop task . push cx ; Save outer loop value mov cx,20 ; Load inner loop counter inner: . . ; Do inner loop task . loop inner pop cx ; Restore outer loop counter . . ; Continue outer loop task . loop outer 13.4.3 Saving Flags on the Stack Flags can be pushed and popped onto the stack using the PUSHF and POPF instructions. Syntax PUSHF POPF These instructions are sometimes used to save the status of flags before a procedure call and then to restore the same status after the procedure. They can also be used within a procedure to save and restore the flag status of the caller. Example pushf call systask popf 13.4.4 Saving All Registers on the Stack 80186/286/386 Only Starting with the 80186 processor, the PUSHA and POPA instructions were implemented to push or pop all the general-purpose registers with one instruction. Syntax PUSHA POPA These instructions can be used to save the status of all registers before a procedure call and then to restore them after the return. Using PUSHA and POPA instructions is significantly faster and takes fewer bytes of code than pushing and popping each register individually. The registers are pushed in the following order: AX, CX, DX, BX, SP, BP, SI, and DI. The SP word pushed is the value before the first register is pushed. The registers are popped in the opposite order. Example pusha call systask popa 13.5 Transferring Data to and from Ports "Ports" are the gateways between hardware devices and the processor. Each port has a unique number through which it can be accessed. Ports can be used for low-level communication with devices, such as disks, the video display, or the keyboard. The OUT instruction is used to send data to a port; the IN instruction receives data from a port. Syntax IN accumulator,{portnumber | DX} OUT {portnumber | DX},accumulator When using the IN and OUT instructions, the number of the port can either be an eight-bit immediate value or the DX register. You must use DX for ports with a number higher than 256. The value to be received from the port must be in the accumulator register (AX for word values or AL for byte values). When using the IN instruction, the number of the port is given as the source operand and the value to be sent to the port is the destination operand. When using the OUT instruction, the number of the port is given as the destination operand and the value to be sent to the port is the source operand. In applications programming, most communication with hardware is done with DOS or ROM-BIOS calls. Ports are more often used in systems programming. Since systems programming is beyond the scope of this manual and since ports differ depending on hardware, the IN and OUT instructions are not explained in detail here. Example ; Actual values are hardware dependent sound EQU 61h ; Port to chip that controls speaker timer EQU 42h ; Port to chip that pulses speaker on EQU 00000011b ; Bits 0 and 1 turn on speaker in al,sound ; Get current port setting or al,on ; Turn on speaker and connect timer out sound,al ; Put value back in port mov al,50 ; Start at 50 sounder: out timer,al ; Send byte to timer port... mov cx,2000 ; Loop 2000 times to delay hold: loop hold dec al ; Go down one step jnz sounder ; Repeat for each step in al,sound ; Get port value and al,NOT on ; Turn it back off out sound,al ; Put it back in port This example creates a sound of ascending frequency on the IBM PC and IBM-compatible computers. The technique of making sound or the port values used may be different on other hardware. 80186/286/386 Only Starting with the 80186 processor, instructions were implemented to send strings of data to and from ports. The instructions are INS, INSB, INSW, OUTS, OUTSB, and OUTSW. The operation of these instructions is much like the operation of other string instructions. They are discussed in Section 16.7, "Transferring Strings to and from Ports." ──────────────────────────────────────────────────────────────────────────── Chapter 14: Doing Arithmetic and Bit Manipulations The 8086-family processors provide instructions for doing calculations on byte, word, and doubleword values. Operations include addition, subtraction, multiplication, and division. You can also do calculations at the bit level. This includes the AND, OR, XOR, and NOT logical operations. Bits can also be shifted or rotated to the right or left. This chapter tells you how to use the instructions that do calculations on numbers and bits. 14.1 Adding The ADD, ADC, and INC instructions are used for adding and incrementing values. Syntax ADD {register | memory},{register | memory | immediate} ADC {register | memory},{register | memory | immediate} INC {register | memory} These instructions can work directly on 8-bit or 16-bit values. They can also be used in combination to do calculations on values that are too large to be held in a single register (such as 32-bit values). When used with AAA and DAA, they can be used to do calculations on binary coded decimal (BCD) numbers, as described in Section 14.5. 14.1.1 Adding Values Directly The ADD and INC instructions are used for adding to values in registers or memory. The INC instruction takes a single register or memory operand. The value of the operand is incremented. The value is treated as an unsigned integer, so the carry flag is not updated for signed carries. The ADD instruction adds values given in source and destination operands. The destination can be either a register or a memory operand. Its contents will be destroyed by the operation. The source operand can be an immediate, memory, or register operand. Since memory-to-memory operations are never allowed, the source and destination operands can never both be memory operands. The result of the operation is stored in the source operand. The operands can be either 8-bit or 16-bit, but both must be the same size. An addition operation can be interpreted as addition of either signed numbers or unsigned numbers. It is the programmer's responsibility to decide how the addition should be interpreted and to take appropriate action if the sum is too large for the destination operand. When an addition overflows the possible range for signed numbers, the overflow flag is set. When an addition overflows the range for unsigned numbers, the carry flag is set. There are two ways to take action on an overflow: you can use the JO or JNO instruction to direct program flow to or around instructions that handle the overflow (see Section 15.1.2.3, "Testing Bits and Jumping"). You can also use the INTO instruction to trigger the overflow interrupt (interrupt 4) if the overflow flag is set. This requires writing an interrupt handler for interrupt 4, since the DOS overflow routine simply returns without taking any action. Section 15.4.2, "Defining and Redefining Interrupt Routines," gives a sample of an overflow interrupt handler. Example .DATA mem8 DB 39 .CODE . . . ; unsigned signed mov al,26 ; Start with register 26 26 inc al ; Increment 1 1 add al,76 ; Add immediate + 76 76 ; ---- ---- ; 103 103 add al,mem8 ; Add memory + 39 39 ; ---- ---- mov ah,al ; Copy to AH 142 -114+overflow add al,ah ; Add register 142 ; ---- ; 28+carry This example shows 8-bit addition. When the sum exceeds 127, the overflow flag is set. A JO (Jump On Overflow) or INTO (Interrupt On Overflow) instruction at this point could transfer control to error-recovery statements. When the sum exceeds 255, the carry flag is set. A JC (Jump On Carry) instruction at this point could transfer control to error-recovery statements. 14.1.2 Adding Values in Multiple Registers The ADC (Add with Carry) instruction makes it possible to add numbers larger than can be held in a single register. The ADC instruction adds two numbers in the same fashion as the ADD instruction, except that the value of the carry flag is included in the addition. If a previous calculation has set the carry flag, then 1 will be added to the sum of the numbers. If the carry flag is not set, the ADC instruction has the same effect as the ADD instruction. When adding numbers in multiple registers, the carry flag should be ignored for the least-significant portion, but taken into account for the most-significant portion. This can be done by using the ADD instruction for the least-significant portion and the ADC instruction for most-significant portion. You can add and carry repeatedly inside a loop for calculations that require more than two registers. Use the ADC instruction in each iteration, but turn off the carry flag with the CLC (Clear Carry Flag) instruction before entering the loop so that it will not be used for the first iteration. You could also do the first add outside the loop. Example .DATA mem32 DD 316423 .CODE . . . mov ax,43981 ; Load immediate 43981 sub dx,dx ; into DX:AX add ax,WORD PTR mem32[0] ; Add to both + 316423 adc dx,WORD PTR mem32[2] ; memory words ------ ; Result in DX:AX 360404 14.2 Subtracting The SUB, SBB, DEC, and NEG instructions are used for subtracting and decrementing values. Syntax SUB {register | memory},{register | memory | immediate} SBB {register | memory},{register | memory | immediate} DEC {register | memory} NEG {register | memory} These instructions can work directly on 8-bit or 16-bit values. They can also be used in combination to do calculations on values too large to be held in a single register. When used with AAA and DAA, they can be used to do calculations on BCD numbers, as described in Section 14.5, "Calculating with Binary Coded Decimals." 14.2.1 Subtracting Values Directly The SUB and DEC instructions are used for subtracting from values in registers or memory. A related instruction, NEG (Negate), reverses the sign of a number. The DEC instruction takes a single register or memory operand. The value of the operand is decremented. The value is treated as an unsigned integer, so the carry flag is not updated for signed borrows. The NEG instruction takes a single register or memory operand. The sign of the value of the operand is reversed. The NEG instruction should only be used on signed numbers. The SUB instruction subtracts the values given in the source operand from the value of the destination operand. The destination can be either a register or a memory operand. It will be destroyed by the operation. The source operand can be an immediate, memory, or register operand. It will not be destroyed by the operation. Since memory-to-memory operations are never allowed, the source and destination operands cannot both be memory operands. The result of the operation is stored in the source operand. The operands can be either 8-bit or 16-bit, but both must be the same size. A subtraction operation can be interpreted as subtraction of either signed numbers or unsigned numbers. It is the programmer's responsibility to decide how the subtraction should be interpreted and to take appropriate action if the result is too small for the destination operand. When a subtraction overflows the possible range for signed numbers, the carry flag is set. When a subtraction underflows the range for unsigned numbers (becomes negative), the sign flag is set. Example .DATA mem8 DB 122 .CODE . . . ; signed unsigned mov al,95 ; Load register 95 95 dec al ; Decrement - 1 - 1 sub al,23 ; Subtract immediate - 23 - 23 ; ---- ---- ; 71 71 sub al,mem8 ; Subtract memory - 122 - 122 ; ---- ---- ; - 51 205+sign mov ah,119 ; Load register 119 sub al,ah ; and subtract - 51 ; ---- ; 86+overflow This example shows 8-bit subtraction. When the result goes below 0, the sign flag is set. A JS (Jump On Sign) instruction at this point could transfer control to error-recovery statements. When the result goes below -128, the carry flag is set. A JC (Jump On Carry) instruction at this point could transfer control to error-recovery statements. 14.2.2 Subtracting with Values in Multiple Registers The SBB (Subtract with Borrow) instruction makes it possible to subtract from numbers larger than can be held in a single register. The SBB instruction subtracts two numbers in the same fashion as the SUB instruction except that the value of the carry flag is included in the subtraction. If a previous calculation has set the carry flag, then 1 will be subtracted from the result. If the carry flag is not set, the SBB instruction has the same effect as the SUB instruction. When subtracting numbers in multiple registers, the carry flag should be ignored for the least-significant portion, but taken into account for the most-significant portion. This can be done by using the SUB instruction for the least-significant portion and the SBB instruction for most-significant portions. You can subtract and borrow repeatedly inside a loop for calculations that require more than two registers. Use the SBB instruction in each iteration, but turn off the carry flag with the CLC (Clear Carry Flag) instruction before entering the loop so that it will not be used for the first iteration. You could also do the first subtraction outside the loop. Example .DATA mem32a DD 316423 mem32b DD 156739 .CODE . . . mov ax,WORD PTR mem32a[0] ; Load mem32 316423 mov dx,WORD PTR mem32a[2] ; into DX:AX sub ax,WORD PTR mem32b[0] ; Subtract low 156739 sbb dx,WORD PTR mem32b[2] ; then high ------ ; Result in DX:AX 159684 14.3 Multiplying The MUL and IMUL instructions are used to multiply numbers. The MUL instruction should be used for unsigned numbers; the IMUL instruction should be used for signed numbers. This is the only difference between the two. Syntax MUL {register | memory} IMUL {register | memory} The multiply instructions require that one of the factors be in the accumulator register (AL for 8-bit numbers or AX for 16-bit numbers). This register is implied; it should not be specified in the source code. Its contents will be destroyed by the operation. The other factor to be multiplied must be specified in a single register or memory operand. The operand will not be destroyed by the operation, unless it is DX, AX, AH, or AL. A number may be squared by loading it into the accumulator, and then executing a multiplication instruction with the accumulator as the operand. Note that multiplying two 8-bit numbers will produce a 16-bit number. If the product is a 16-bit number, it will be placed in AX and the overflow and carry flags will be set. Similarly, multiplying two 16-bit numbers will produce a 32-bit number in the DX:AX register pair. If the product is a 32-bit number, the least-significant bits will be in AX, the most-significant bits will be in DX, and the overflow and carry flags will be set. ────────────────────────────────────────────────────────────────────────── NOTE Multiplication is one of the slower operations on 8086-family processors (especially the 8086 and 8088). Multiplying by certain common constants is often faster when done by shifting bits (see Section 14.7.1, "Multiplying and Dividing by Constants"). ────────────────────────────────────────────────────────────────────────── Examples .DATA mem16 DW -30000 .CODE . . . ; 8-bit unsigned multiply mov al,23 ; Load AL 23 mov bl,24 ; Load BL * 24 mul bl ; Multiply BL ----- ; 16-bit signed multiply mov ax,50 ; Load AX 50 ; -30000 imul mem16 ; Multiply memory ----- ; Product in DX:AX -1500000 ; overflow and carry set 80186/286/386 Only Starting with the 80186 processor, the IMUL instruction has two additional syntaxes that allow for 16-bit multiples that produce a 16-bit product. Syntax IMUL register16, immediate IMUL register16, memory16, immediate You can specify a 16-bit immediate value as the source instruction and a word register as the destination operand. The product appears in the destination operand. The 16-bit result will be placed in the destination operand. If the product is too large to fit in 16 bits, the carry and overflow flags will be set. In this context, IMUL can be used for either signed or unsigned multiplication, since the 16-bit product is the same. You can also specify three operands for IMUL. The first operand must be a 16-bit register operand, the second a 16-bit memory operand, and the third a 16-bit immediate operand. The second and third operands are multiplied and the product stored in the first operand. With both of these syntaxes, the carry and overflow flags will be set if the product is too large to fit in 16 bits. The IMUL instruction with multiple operands can be used for either signed or unsigned multiplication, since the 16-bit product is the same in either case. If you need to get a 32-bit result, you must use the single-operand version of MUL or IMUL. Examples imul dx,456 ; Multiply DX times 456 imul ax,[bx],6 ; Multiply the value pointed to by BX ; times 6 and put the result in AX 14.4 Dividing The DIV and IDIV instructions are used to divide integers. Both a quotient and a remainder are returned. The DIV instruction should be used for unsigned integers; the IDIV instruction should be used for signed integers. This is the only difference between the two. Syntax DIV {register | memory} IDIV {register | memory} To divide a 16-bit number by an 8-bit number, put the number to be divided (the dividend) in the AX register. The contents of this register will be destroyed by the operation. Specify the dividing number (the divisor) in any 8-bit memory or register operand (except AL or AH). This operand will not be changed by the operation. After the multiplication, the result (quotient) will be in AL and the remainder will be in AH. To divide a 32-bit number by a 16-bit number, put the dividend in the DX:AX register pair. The least-significant bits go in AX. The contents of these registers will be destroyed by the operation. Specify the divisor in any 16-bit memory or register operand (except AX or DX). This operand will not be changed by the operation. After the division, the quotient will be in AX and the remainder will be in DX. To divide a 16-bit number by a 16-bit number, you must first sign-extend or zero-extend (see Section 13.2, "Converting between Data Sizes") the dividend to 32 bits; then divide as described above. You cannot divide a 32-bit number by another 32-bit number. If division by zero is specified, or if the quotient exceeds the capacity of its register (AL or AX), the processor automatically generates an interrupt 0. By default, the program terminates and returns to DOS. This problem can be handled in two ways: check the divisor before division and go to an error routine if you can determine it to be invalid, or write your own interrupt routine to replace the processor's interrupt 0 routine. See Section 15.4 for more information on interrupts. ────────────────────────────────────────────────────────────────────────── NOTE Division is one of the slower operations on 8086-family processors (especially the 8086 and 8088). Dividing by common constants that are powers of two is often faster when done by shifting bits, as described in Section 14.7.1, "Multiplying and Dividing by Constants." ────────────────────────────────────────────────────────────────────────── Examples .DATA mem16 DW -2000 mem32 DD 500000 .CODE . . ; Divide 16-bit unsigned by 8-bit . mov ax,700 ; Load dividend 700 mov bl,36 ; Load divisor DIV 36 div bl ; Divide BL ----- ; Quotient in AL 19 ; Remainder in AH 16 ; Divide 32-bit signed by 16-bit mov ax,WORD PTR mem32[0] ; Load into DX:AX mov dx,WORD PTR mem32[2] ; 500000 idiv mem16 ; DIV -2000 ; Divide memory ------ ; Quotient in AX -250 ; Remainder in DX 0 ; Divide 16-bit signed by 16-bit mov ax,WORD PTR mem16 ; Load into AX -2000 cwd ; Extend to DX:AX mov bx,-421 ; DIV -421 idiv bx ; Divide by BX ----- ; Quotient in AX 4 ; Remainder in DX -316 14.5 Calculating with Binary Coded Decimals The 8086-family processors provide several instructions for adjusting BCD numbers. The BCD format is seldom used for applications programming in assembly language. Programmers who wish to use BCD numbers usually use a high-level language. However, BCD instructions are used to develop compilers, function libraries, and other systems tools. Since systems programming is beyond the scope of this manual, this section provides only a brief overview of calculations on the two kinds of BCD numbers, unpacked and packed. ────────────────────────────────────────────────────────────────────────── NOTE Intel mnemonics use the term "ASCII" to refer to unpacked BCD numbers and "decimal" to refer to packed BCD numbers. Thus AAA (ASCII Adjust After Addition) adjusts unpacked numbers, while DAA (Decimal Adjust After Addition) adjusts packed numbers. ────────────────────────────────────────────────────────────────────────── 14.5.1 Unpacked BCD Numbers Unpacked BCD numbers are made up of bytes containing a single decimal digit in the lower four bits of each byte. The 8086-family processors provide instructions for adjusting unpacked values with the four arithmetic operations──addition, subtraction, multiplication, and division. To do arithmetic on unpacked BCD numbers, you must do the eight-bit arithmetic calculations on each digit separately. The result should always be in the AL register. After each operation, use the corresponding BCD instruction to adjust the result. The ASCII-adjust instructions do not take an operand. They always work on the value in the AL register. When a calculation using two one-digit values produces a two-digit result, the ASCII-adjust instructions put the first digit in AL and the second in AH. If the digit in AL needs to carry to or borrow from the digit in AH, the carry and auxiliary carry flags are set. The four ASCII-adjust instructions are described below: Instruction Description ────────────────────────────────────────────────────────────────────────── AAA Adjusts after an addition operation. For example, to add 9 and 3, use the following lines: mov ax,9 ; Load 9 mov bx,3 ; and 3 as unpacked BCD add al,bl ; Add 09h and 03h to get 0Ch aaa ; Adjust 0Ch in AL to 02h, ; increment AH to 01h, set carry ; Result 12 unpacked BCD in AX AAS Adjusts after a subtraction operation. For example, to subtract 4 from 13, use the following lines: mov ax,103h ; Load 13 mov bx,4 ; and 4 as unpacked BCD sub al,bl ; Subtract 4 from 3 to get FFh (-1) aas ; Adjust 0FFh in AL to 9, ; decrement AH to 0, set carry ; Result 9 unpacked BCD in AX AAM Adjusts after a multiplication operation. Always use MUL, not IMUL. For example, to multiply 9 times 3, use the following lines: mov ax,903h ; Load 9 and 3 as unpacked BCD mul ah ; Multiply 9 and 3 to get 1Bh aam ; Adjust 1Bh in AL ; to get 27 unpacked BCD in AX AAD Adjusts before a division operation. Unlike other BCD instructions, this one converts a BCD value to a binary value before the operation. After the operation, the quotient must still be adjusted by using AAM. For example, to divide 25 by 2, use the following lines: mov ax,205h ; Load 25 mov bl,2 ; and 2 as unpacked BCD aad ; Adjust 0205h in AX ; to get 19h in AX div bl ; Divide by 2 to get ; quotient 0Ch in AL ; remainder 1 in AH aam ; Adjust 0Ch in AL ; to 12 unpacked BCD in AX ; (remainder destroyed) Notice that the remainder is lost. If you need the remainder, save it in another register before adjusting the quotient. Then move it back to AL and adjust if necessary. Multidigit BCD numbers are usually processed in loops. Each digit is processed and adjusted in turn. In addition to their use for processing unpacked BCD numbers, the ASCII-adjust instructions can be used in routines that convert between different number bases. Example mov al,79 ; Load 79 (04Fh) aam ; Adjust to BCD (0709h) add ah,'0' ; Adjust to ASCII characters add al,'0' ; (3739h) mov dx,ax ; Copy to DX xchg dl,dh ; Trade for most significant digit mov ah,2 ; DOS display character function int 21h ; Call DOS mov dl,dh ; Load least significant digit int 21h ; Call DOS The example converts an eight-bit binary number to hexadecimal and displays it on the screen. The routine could be enhanced to handle large numbers. 14.5.2 Packed BCD Numbers Packed BCD numbers are made up of bytes containing two decimal digits: one in the upper four bits and one in the lower four bits. The 8086-family processors provide instructions for adjusting packed BCD numbers after addition and subtraction. You must write your own routines to adjust for multiplication and division. To do arithmetic on packed BCD numbers, you must do the eight-bit arithmetic calculations on each byte separately. The result should always be in the AL register. After each operation, use the corresponding BCD instruction to adjust the result. The decimal-adjust instructions do not take an operand. They always work on the value in the AL register. Unlike the ASCII-adjust instructions, the decimal-adjust instructions never affect AH. The auxiliary carry flag is set if the digit in the lower four bits carries to or borrows from the digit in the upper four bits. The carry flag is set if the digit in the upper four bits needs to carry to or borrow from another byte. The decimal-adjust instructions are described below: Instruction Description ────────────────────────────────────────────────────────────────────────── DAA Adjusts after an addition operation. For example, to add 88 and 33, use the following lines: mov ax,8833h ; Load 88 and 33 as packed BCD add al,ah ; Add 88 and 33 to get 0BBh daa ; Adjust 0BBh to 121 packed BCD: ; 1 in carry and 21 in AL DAS Adjusts after a subtraction operation. For example, to subtract 38 from 83, put 83 in AL and 38 in AH in packed BCD format. Then use the following lines to subtract them: mov ax,3883h ; Load 83 and 38 as packed BCD sub al,ah ; Subtract 38 from 83 to get 04Bh das ; Adjust 04Bh to 45 packed BCD: ; 0 in carry and 45 in AL Multidigit BCD numbers are usually processed in loops. Each byte is processed and adjusted in turn. 14.6 Doing Logical Bit Manipulations The logical instructions do Boolean operations on individual bits. The AND, OR, XOR, and NOT operations are supported by the 8086-family instructions. AND compares two bits and sets the result if both bits are set. OR compares two bits and sets the result if either bit is set. XOR compares two bits and sets the result if the bits are different. NOT reverses a single bit. Table 14.1 shows a truth table for the logical operations. Table 14.1 Values Returned by Logical Operations X Y NOT X X AND Y X OR Y X XOR Y ────────────────────────────────────────────────────────────────────────── 1 1 0 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 0 1 0 0 0 ────────────────────────────────────────────────────────────────────────── The syntax of the AND, OR, and XOR instructions is the same. The only difference is the operation performed. For all instructions, the target value to be changed by the operation is placed in one operand. A mask showing the positions of bits to be changed is placed in the other operand. The format of the mask differs for each logical instruction. The destination operand can be register or memory. The source operand can be register, memory, or immediate. However, the source and destination operands cannot both be memory operands. Either of the values can be in either operand. However, the source operand will be unchanged by the operation, while the destination operand will be destroyed by it. Your choice of operands depends on whether you want to save a copy of the mask or of the target value. ────────────────────────────────────────────────────────────────────────── NOTE The logical instructions should not be confused with the logical operators. They specify completely different behavior. The instructions control run-time bit calculations. The operators control assembly-time bit calculations. Although the instructions and operators have the same name, the assembler can distinguish them from context. ────────────────────────────────────────────────────────────────────────── 14.6.1 AND Operations The AND instruction does an AND operation on the bits of the source and destination operands. The original destination operand is replaced by the resulting bits. Syntax AND {register | memory},{register | memory | immediate} The AND instruction can be used to clear the value of specific bits regardless of their current settings. To do this, put the target value in one operand and a mask of the bits you want to clear in the other. The bits of the mask should be 0 for any bit positions you want to clear and 1 for any bit positions you want to remain unchanged. Example 1 mov ax,035h ; Load value 00110101 and ax,0FBh ; Mask off bit 2 AND 11111011 ; -------- ; Value is now 31h 00110001 and ax,0F8h ; Mask off bits 2,1,0 AND 11111000 ; -------- ; Value is now 30h 00110000 Example 2 mov ah,7 ; Get character without echo int 21h and al,11011111b ; Convert to uppercase by clearing bit 5 cmp al,'Y' ; Is it Y? je yes ; If so, do Yes stuff . ; else do No stuff . yes: . Example 2 illustrates how to use the AND instruction to convert a character to uppercase. If the character is already uppercase, the AND instruction has no effect, since bit 5 is always clear in uppercase letters. If the character is lowercase, clearing bit 5 converts it to uppercase. 14.6.2 OR Operations The OR instruction does an OR operation on the bits of the source and destination operands. The original destination operand is replaced by the resulting bits. Syntax OR {register | memory},{register | memory | immediate} The OR instruction can be used to set the value of specific bits regardless of their current settings. To do this, put the target value in one operand and a mask of the bits you want to clear in the other. The bits of the mask should be 1 for any bit positions you want to set and 0 for any bit positions you want to remain unchanged. Example mov ax,035h ; Move value to register 00110101 or ax,08h ; Mask on bit 3 OR 00001000 ; -------- ; Value is now 3Dh 00111101 or ax,07h ; Mask on bits 2,1,0 OR 00000111 ; -------- ; Value is now 3Fh 00111111 Another common use for OR is to compare an operand to 0: or bx,bx ; Compare to 0 ; 2 bytes, 2 clocks on 8088 jg positive ; BX is positive jl negative ; BX is negative ; BX is zero The first statement has the same effect as the following statement, but is faster and smaller: cmp bx,0 ; 3 bytes, 3 clocks on 8088 14.6.3 XOR Operations The XOR (Exclusive OR) instruction does an XOR operation on the bits of the source and destination operands. The original destination operand is replaced by the resulting bits. Syntax XOR {register | memory},{register | memory | immediate} The XOR instruction can be used to toggle the value of specific bits (reverse them from their current settings). To do this, put the target value in one operand and a mask of the bits you want to toggle in the other. The bits of the mask should be 1 for any bit positions you want to toggle and 0 for any bit positions you want to remain unchanged. Example mov ax,035h ; Move value to register 00110101 xor ax,08h ; Mask on bit 3 XOR 00001000 ; -------- ; Value is now 3Dh 00111101 xor ax,07h ; Mask on bits 2,1,0 XOR 00000111 ; -------- ; Value is now 3Ah 00111010 Another common use for the XOR instruction is to set a register to 0: xor cx,cx ; 2 bytes, 3 clocks on 8088 This sets the CX register to 0. When the XOR instruction takes identical operands, each bit cancels itself, producing 0. The statement mov cx,0 ; 3 bytes, 4 clocks on 8088 is the obvious way of doing this, but it is larger and slower. The statement sub cx,cx ; 2 bytes, 3 clocks on 8088 is also smaller than the MOV version. The only advantage of using MOV is that it does not affect any flags. 14.6.4 NOT Operations The NOT instruction does a NOT operation on the bits of a single operand. It is used to toggle the value of all bits at once. Syntax NOT {register | memory} The NOT instruction is often used to reverse the sense of a bit mask from masking certain bits on to masking them off. Use the NOT instruction if the value of the mask is not known until run time; use the NOT operator (see Section 9.2.1.5, "Bitwise Logical Operators") if the mask is a constant. Example .DATA masker DB 00010000b ; Value may change at run time .CODE . . . mov ax,0D743h ; Load 0D7h to AH, 43h to AL 01000011 or al,masker ; Turn on bit 4 in AL OR 00010000 ; -------- ; Result is 53h 01010011 not masker ; Reverse sense of mask 11101111 and ah,masker ; Turn off bit 4 in AH AND 11010111 ; -------- ; Result is 0C7h 11000111 14.7 Shifting and Rotating Bits The 8086-family processors provide a complete set of instructions for shifting and rotating bits. Bits can be moved right (toward the most-significant bits) or left (toward the 0 bit). Values shifted off the end of the operand go into the carry flag. Shift instructions move bits a specified number of places to the right or left. The last bit in the direction of the shift goes into the carry flag, and the first bit is filled with 0 or with the previous value of the first bit. Rotate instructions move bits a specified number of places to the right or left. For each bit rotated, the last bit in the direction of the rotate operation is moved into the first bit position at the other end of the operand. With some variations, the carry bit is used as an additional bit of the operand. Figure 14.1 illustrates the eight variations of shift and rotate instructions for eight-bit operands. Notice that SHL and SAL are identical. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 14.7 of the manual │ └────────────────────────────────────────────────────────────────────────┘ Syntax SHL {register | memory},{CL | 1} SHR {register | memory},{CL | 1} SAL {register | memory},{CL | 1} SAR {register | memory},{CL | 1} ROL {register | memory},{CL | 1} ROR {register | memory},{CL | 1} RCL {register | memory},{CL | 1} RCR {register | memory},{CL | 1} The format of all the shift instructions is the same. The destination operand should contain the value to be shifted. It will contain the shifted operand after the instruction. The source operand should contain the number of bits to shift or rotate. It can be the immediate value 1 or the CL register. No other value or register is accepted on the 8088 and 8086 processors. 80186/286/386 Only Starting with the 80186 processor, eight-bit immediate values larger than 1 can be given as the source operand for shift or rotate instructions, as shown below: shr bx,4 ; 9 clocks, 3 bytes on 80286 The following statements are equivalent if the program must run the 8088 or 8086: mov cl,4 ; 2 clocks, 3 bytes on 80286 shr bx,cl ; 9 clocks, 2 bytes on 80286 ; 11 clocks, 5 bytes 14.7.1 Multiplying and Dividing by Constants Shifting right by one has the effect of dividing by two; shifting left by one has the effect of multiplying by two. You can take advantage of this to do fast multiplication and division by common constants. The easiest constants are the powers of two. Shifting left twice multiplies by four, shifting left three times multiplies by eight, and so on. SHR is used to divide unsigned numbers. SAR can be used to divide signed numbers, but SAR rounds negative numbers down──IDIV always rounds up. Code that divides by using SAR must adjust for this difference. Multiplication by shifting is the same for signed and unsigned numbers, so either SAL or SHL can be used. Both instructions do the same operation. Since the multiply and divide instructions are the slowest on the 8088 and 8086 processors, using shifts instead can often speed operations by a factor of 10 or more. For example, on the 8088 or 8086 processor, the following statements take four clocks: xor ah,ah ; Clear AH shl ax,1 ; Multiply byte in AL by 2 The following statements have the same effect, but take between 74 and 81 clocks on the 8088 or 8086: mov bl,2 ; Multiply byte in AL by 2 mul bl The same statements take 15 clocks on the 80286. See the on-line Help system for complete information on timing of instructions. Shift instructions can be combined with add or subtract instructions to do multiplication by common constants. These operations are best put in macros so that they can be changed if the constants in a program change. Example 1 mul_10 MACRO factor ; Factor must be unsigned mov ax,factor ; Load into AX shl ax,1 ; AX = factor * 2 mov bx,ax ; Save copy in BX shl ax,1 ; AX = factor * 4 shl ax,1 ; AX = factor * 8 add ax,bx ; AX = (factor * 8) + (factor * 2) ENDM ; AX = factor * 10 Example 2 div_u512 MACRO dividend ; Dividend must be unsigned mov ax,dividend ; Load into AX shr ax,1 ; AX = dividend / 2 (unsigned) xchg al,ah ; xchg is like rotate right 8 ; AL = (dividend / 2) / 256 cbw ; Clear upper byte ENDM ; AX = (dividend / 512 14.7.2 Moving Bits to the Least-Significant Position Sometimes a group of bits within an operand needs to be treated as a single unit──for example, to do an arithmetic operation on those bits without affecting other bits. This can be done by masking off the bits and then shifting them into the least-significant positions. After the arithmetic operation is done, the bits are shifted back to the original position and merged with the original bits by using OR. See Section 7.2.5, "Using Record-Field Operands," for an example of this operation. 14.7.3 Adjusting Masks Masks for logical instructions can be shifted to new bit positions. For example, an operand that masks off a bit or group of bits can be shifted to move the mask to a different position. Example .DATA masker DB 00000010b ; Mask that may change at run time .CODE . . . mov cl,2 ; Rotate two at a time mov bl,57h ; Load value to be changed 01010111b rol masker,cl ; Rotate two to left 00001000b or bl,masker ; Turn on masked values --------- ; New value is 05Fh 01011111b rol masker,cl ; Rotate two more 00100000b or bl,masker ; Turn on masked values --------- ; New value is 07Fh 01111111b This technique is useful only if the mask value is unknown until run time. 14.7.4 Shifting Multiword Values Sometimes it is necessary to shift a value that is too large to fit in a register. In this case, you can shift each part separately, passing the shifted bits through the carry flag. The RCR or RCL instructions must be used to move the carry value from the first register to the second. RCR and RCL can also be used to initialize the high or low bit of an operand. Since the carry flag is treated as part of the operand (like using a nine-bit operand), the flag value before the operation is crucial. The carry flag may be set by a previous instruction, or you can set it directly using the CLC (Clear Carry Flag), CMC (Complement Carry Flag), and STC (Set Carry Flag) instructions. Example .DATA mem32 DD 500000 .CODE . . ; Divide 32-bit unsigned by 16 . mov cx,4 ; Shift right 4 500000 again: shr WORD PTR mem32[2],1 ; Shift into carry DIV 16 rcr WORD PTR mem32[0],1 ; Rotate carry in ------ loop again ; 31250 ──────────────────────────────────────────────────────────────────────────── Chapter 15: Controlling Program Flow The 8086-family processors provide a variety of instructions for controlling the flow of a program. The four major types of program-flow instructions are jumps, loops, procedure calls, and interrupts. This chapter tells you how to use these instructions and how to test conditions for the instructions that change program flow conditionally. 15.1 Jumping Jumps are the most direct method of changing program control from one location to another. At the internal level, jumps work by changing the value of the IP (Instruction Pointer) register from the address of the current instruction to a target address. Jumps can be short, near, or far. QuickAssembler automatically handles near and short jumps, although it may not always generate the most efficient code if the label being jumped to is a forward reference. The size and control of jumps are discussed in Section 9.4.1, "Forward References to Labels." 15.1.1 Jumping Unconditionally The JMP instruction is used to jump unconditionally to a specified address. Syntax JMP {register | memory} The operand should contain the address to be jumped to. Unlike conditional jumps, whose target address must be short (within 128 bytes), the target address for unconditional jumps can be short, near, or far. See Section 9.4.1 for more information on specifying the distance for conditional jumps. If a conditional jump must be greater than 128 bytes, the construction must be reorganized. This can be done by reversing the sense of the conditional jump and adding an unconditional jump, as shown in Example 1. Example 1 cmp ax,7 ; If AX is 7 and jump is short je close ; then jump close cmp ax,6 ; If AX is 6 and jump is near jne close ; then test opposite and skip over jmp distant ; Now jump . . . close: ; Less than 128 bytes from jump . . . distant: ; More than 128 bytes from jump An unconditional jump can be used as a form of conditional jump by specifying the address in a register or indirect memory operand. The value of the operand can be calculated at run time, based on user interaction or other factors. You can use indirect memory operands to construct jump tables that work like C switch statements, BASIC ON GOTO statements, or Pascal case statements. Example 2 .CODE . . . jmp process ; Jump over data ctl_tbl LABEL WORD ; (required in overlay procedures) DW extended ; Null key (extended code) DW ctrla ; Address of CONTROL-A key routine DW ctrlb ; Address of CONTROL-B key routine process: mov ah,8h ; Get a key int 21h cbw ; Convert AL to AX mov bx,ax ; Copy shl bx,1 ; Convert to address jmp ctl_tbl[bx] ; Jump to key routine extended: mov ah,8h ; Get second key of extended int 21h . ; Use another jump table . ; for extended keys . ctrla: . ; CONTROL-A routine here . . jmp next ctrlb: . ; CONTROL-B routine here . . jmp next . . next: . ; Continue In Example 2, an indirect memory operand points to addresses of routines for handling different keystrokes. Notice that the jump table is placed in the code segment. This technique is optional in stand-alone assembler programs, but it may be required for procedures called from some languages. 15.1.2 Jumping Conditionally The most common way of transferring control in assembly language is with conditional jumps. This is a two-step process: first test the condition, and then jump if the condition is true or continue if it is false. Syntax Jcondition label Conditional-jump instructions take a single operand containing the address to be jumped to. The distance from the jump instruction to the specified address must be short (less than 128 bytes). If a longer distance is specified, an error will be generated telling the distance of the jump in bytes. See Section 15.1.1, "Jumping Unconditionally," for information on arranging longer conditional jumps. Conditional-jump instructions (except JCXZ) use the status of one or more flags as their condition. Thus, any statement that sets a flag under specified conditions can be the test statement. The most common test statements use the CMP or TEST instructions. The jump statement can be any one of 31 conditional-jump instructions. Because conditional jumps cannot refer to labels more than 128 bytes away, they are often used in combination with unconditional jumps, which have no such limitation. For example, the following statement is valid as long as target is not far away: jz target ; If previous operation resulted in ; zero, jump to target Once target becomes too distant, the following sequence must be used to enable a longer jump. Note that this sequence is logically equivalent to the example above: jnz skip ; If previous operation resulted in NOT zero ; jump to "skip" jmp target ; Otherwise, jump to target skip: The instructions above first test for the logical inverse of the desired condition. If the test condition (in this case, equality to zero) is not true, the jump to target is avoided. Yet if a zero condition is true, the program falls through to the instruction jmp target, which can jump any distance. The effect, of course, is to jump to target if the previous operation resulted in zero. The problem with this technique is that if used often, you may have to think up a label name just to jump around one instruction. Anonymous labels, described in Section 6.4.2, let you avoid having to invent so many label names. For example, you could use an anonymous label to rewrite the example above: jnz @F ; If previous operation resulted in NOT zero, ; jump forward to next @ label jmp target ; Otherwise, jump to target @: 15.1.2.1 Comparing and Jumping The CMP instruction is specifically designed to test for conditional jumps. It does not change the destination operand, so it can be used to compare two values without changing either of them. Instructions that change operands (such as SUB or AND) can also be used to test conditions. The CMP instruction compares two operands and sets flags based on the result. It is used to test the following relationships: equal; not equal; greater than; less than; greater than or equal; or less than or equal. Syntax CMP {register | memory},{register | memory | immediate} The destination operand can be memory or register. The source operand can be immediate, memory, or register. However, they cannot both be memory operands. The jump instructions that can be used with CMP are made up of mnemonic letters combined to indicate the type of jump. The letters are shown below: Letter Meaning ────────────────────────────────────────────────────────────────────────── J Jump G Greater than (for unsigned comparisons) L Less than (for unsigned comparisons) A Above (for signed comparisons) B Below (for signed comparisons) E Equal N Not The mnemonic names always refer to the relationship that the first operand of the CMP instruction has to the second operand of the CMP instruction. For instance, JG tests whether the first operand is greater than the second. Several conditional instructions have two names. You can use whichever name seems more mnemonic in context. Comparisons and conditional jumps can be thought of as statements in the following format: IF (value1 relationship value2) THEN GOTO truelabel: Statements of this type can be coded in assembly language by using the following syntax: CMP value1,value2 Jrelationship truelabel . . . truelabel: Table 15.1 lists conditional-jump instructions for each relationship and shows the flags that are tested in order to see if relationship is true. Table 15.1 Conditional-Jump Instructions Used after Compare Jump Condition Signed Jump if: Unsigned Jump if: Compare Compare ────────────────────────────────────────────────────────────────────────── = Equal JE ZF = 1 JE ZF = 1 ╪ Not equal JNE ZF = 1 JNE ZF = 1 > Greater than JG or JNLE ZF = 0 and JA or JNBE CF = 0 and ZF = 0 SF = OF <= Less than or JLE or JNG ZF = 1 and JBE or JNA CF = 1 or ZF = 1 equal SF ╪ OF < Less than JL or JNGE SF ╪ OF JB or JNAE CF = 1 >= Greater than JGE or JNL SF = OF JAE or JNB CF = 0 or equal ────────────────────────────────────────────────────────────────────────── Internally, the CMP instruction is exactly the same as the SUB instruction, except that the destination operand is not changed. The flags are set according to the result that would have been generated by a subtraction. Example 1 ; If CX is less than -20, then make DX 30, else make DX 20 cmp cx,-20 ; If signed CX is smaller than -20 jl less ; then do stuff at "less" mov dx,20 ; Else set DX to 20 jmp skip ; Finished less: mov dx,30 ; Then set DX to 30 skip: Example 1 shows the basic form of conditional jumps. Notice that in assembly language, if-then-else constructions are usually written in the form if-else-then. This theme has many variations. For example, you may find it more mnemonic to code in the if-then-else format. However, you must then use the opposite jump condition, as shown in Example 2. Example 2 ; If CX is greater than or equal to -20, then make DX 20, else make DX 30 cmp cx,-20 ; If signed CX is smaller than -20 jnl notless ; else do stuff at "notless" mov dx,30 ; Then set DX to 30 jmp continue ; Finished notless: mov dx,20 ; Else set DX to 20 continue: The then-if-else format shown in Example 3 is often more efficient. Do the work for the most likely case, and then compare for the opposite condition. If the condition is true, you are finished. Example 3 ; DX is 20, unless CX is less than -20, then make DX 30 mov dx,20 ; DX is 20 cmp cx,-20 ; If signed CX is greater than -20 jge greatequ ; then done mov dx,30 ; Else set DX to 30 greatequ: This example avoids the unconditional jump used in Examples 1 and 2 and thus is faster even if the less likely condition is true. 15.1.2.2 Jumping Based on Flag Status The CMP instruction is the most mnemonic way to set the flags for conditional jumps, but any instruction that changes flags can be used as the test condition. The conditional-jump instructions listed below enable you to jump based on the condition of flags rather than on relationships of operands. Some of these instructions have the same effect as instructions listed in Table 15.1. Instruction Action ────────────────────────────────────────────────────────────────────────── JO Jumps if the overflow flag is set JNO Jumps if the overflow flag is clear JC Jumps if the carry flag is set (same as JB) JNC Jumps if the carry flag is clear (same as JAE) JZ Jumps if the zero flag is set (same as JE) JNZ Jumps if the zero flag is clear (same as JNE) JS Jumps if the sign flag is set JNS Jumps if the sign flag is clear JP Jumps if the parity flag is set JNP Jumps if the parity flag is clear JPE Jumps if parity is even (parity flag set) JPO Jumps if parity is odd (parity flag clear) JCXZ Jumps if CX is 0 Notice that JCXZ is the only conditional jump based on the condition of a register (CX) rather than flags. Since JCXZ is usually used with loop instructions, it is discussed in more detail in Section 15.2, "Looping." Example 1 add ax,bx ; Add two values jo overflow ; If value too large, adjust . . . overflow: ; Adjustment routine here Example 2 sub ax,dx ; Subtract jnz skip ; If the result is not zero, continue call zhandler ; else do special case skip: 15.1.2.3 Testing Bits and Jumping Like the CMP instruction, the TEST instruction is designed to test for conditional jumps. However, specific bits are compared rather than entire operands. Syntax TEST {register | memory},{register | memory | immediate} The destination operand can be memory or register. The source operand can be immediate, memory, or register. However, they cannot both be memory operands. Normally, one of the operands is a mask in which the bits to be tested are the only bits set. The other operand contains the value to be tested. If all the bits set in the mask are clear in the operand being tested, the zero flag will be set. If any of the flags set in the mask are also set in the operand, the zero flag will be cleared. The TEST instruction is actually the same as the AND instruction, except that neither operand is changed. If the result of the operation is 0, the zero flag is set, but the 0 is not actually written to the destination operand. You can use the JZ and JNZ instructions to jump after the test. JE and JNE are the same and can be used if you find them more mnemonic. Example .DATA bits DB ? .CODE . . . ; If bit 2 or bit 4 is set, then call taska ; Assume "bits" is 0D3h 11010011 test bits,10100b ; If 2 or 4 is set AND 00010100 jz skip1 ; Else continue -------- call taska ; Then call taska 00010000 skip1: ; Jump not taken . . . ; If bits 2 and 4 are clear, then call taskb ; Assume "bits" is 0E9h 11101001 test bits,10100b ; If 2 and 4 are clear AND 00010100 jnz skip2 ; Else continue -------- call taskb ; Then call taskb 00000000 skip2: ; Jump not taken 15.2 Looping The 8086-family processors have several instructions specifically designed for creating loops of repeated instructions. In addition, you can create loops using conditional jumps. Syntax LOOP label LOOPE label LOOPZ label LOOPNE label LOOPNZ label JCXZ label The LOOP instruction is used for loops with a set number of iterations. For example, it can be used in constructions similar to the "for" loops of BASIC, C, and Pascal, and the "do" loops of FORTRAN. A single operand specifies the address to jump to each time through the loop. The CX register is used as a counter for the number of times to loop. On each iteration, CX is decremented. When CX reaches 0, control passes to the instruction after the loop. The LOOPE, LOOPZ, LOOPNE, and LOOPNZ instructions are used in loops that check for a condition. For example, they can be used in constructions similar to the "while" loops of BASIC, C, and Pascal; the "repeat" loops of Pascal; and the "do" loops of C. The LOOPE (also called LOOPZ) instruction can be thought of as meaning "loop while equal." Similarly, the LOOPNE (also called LOOPNZ) instruction can be thought of as meaning "loop while not equal." A single short memory operand specifies the address to loop to each time through. The CX register can specify a maximum number of times to go through the loop. The CX register can be set to a number that is out of range if you do not want a maximum count. The JCXZ instruction is often used in loop structures. For example, it may be used in loops that check a condition at the start of the loop rather than at the end. Unlike the loop instruction, JCXZ does not decrement CX, so the programmer must use another statement to decrement the count. You can also use JCX2 with string instructions, as described in Chapter 16, "Processing Strings." Example 1 ; For 0 to 200 do task mov cx,200 ; Set counter next: . ; Do the task here . . loop next ; Do again ; Continue after loop This loop has the same effect as the following statements: ; For 0 to 200, do task mov cx,200 ; Set counter next: . . ; Do the task here . dec cx cmp cx,0 jne next ; Do again ; Continue after loop The first version is more efficient as well as easier to understand. However, there are situations in which you must use conditional-jump instructions rather than loop instructions. For example, conditional jumps are often required for loops that test several conditions. If the counter in CX is variable because of previous instructions, you should use the JCXZ instruction to check for 0, as shown in Example 2. Otherwise, if CX is 0, it will be decremented to -1 in the first iteration and will continue through 65,535 iterations before it reaches 0 again. Example 2 ; For 0 to CX do task ; CX counter set previously jcxz done ; Check for 0 next: . ; Do the task here . . loop next ; Do again done: ; Continue after loop 15.3 Using Procedures A "procedure" is a program subdivision that typically executes a specific task. Once you write a procedure, you can execute it from anywhere in the program. This technique lets you avoid writing the same block of code over and over, thus saving space. Even if you execute it only once, writing a procedure can be a useful way of dividing a large program into manageable units. You can place a procedure in its own source module and test it separately. Assembly-language procedures are comparable to functions in C; subprograms, functions, and subroutines in BASIC; procedures and functions in Pascal; or routines and functions in FORTRAN. Two instructions control the use of assembly-language procedures. The CALL instruction can appear anywhere in a program. It temporarily transfers program control to a specified procedure. The RET instruction appears at the end of a procedure. It returns control back to the location that issued the call. These instructions use the stack to properly return from each call. The instruction immediately following the CALL instruction is called the "return address," and the procedure should return to this location when done. CALL pushes the return address onto the stack; RET pops this address off the stack and transfers program control there. Along with the RET instruction (which terminates a procedure), two directives help define a procedure. The PROC and ENDP directives normally mark the beginning and end of a procedure definition, as described in Section 15.3.2, "Defining Procedures." In addition, the PROC directive can save you time and effort by automating the following tasks: ■ Preserving register values that should not change, but that the procedure might otherwise alter ■ Setting up a framepointer, so that you can access parameters placed on the stack ■ Creating text macros, so that your source code can refer to each parameter by a meaningful name Section 15.3.4, "Declaring Parameters with the PROC Directive," describes how to use these features. Section 15.3.3, "Passing Arguments on the Stack," gives background information on the technique for accessing parameters. When you write procedures, you can create local variables, which exist only during execution of the procedure. The advantage of these variables is that they use memory dynamically, taking up space only in the procedure that uses them. Section 15.3.5, "Using Local Variables," describes the basic technique for allocating and accessing local variables. Section 15.3.6, "Creating Locals Automatically," describes how to make the assembler generate the necessary code for you. 15.3.1 Calling Procedures The CALL instruction saves the address following the instruction on the stack and passes control to a specified address. Syntax CALL {register | memory} The address is usually specified as a direct memory operand. However, the operand can also be a register or indirect memory operand containing a value calculated at run time. This enables you to write call tables similar to the jump table illustrated in Section 15.1.2.1, "Comparing and Jumping." Calls can be near or far. Near calls push only the offset portion of the calling address. Far calls push both the segment and offset. You must give the type of far calls to forward-referenced labels using the FAR type specifier and the PTR operator. For example, use the following statement to make a far call to a label that has not been earlier defined or declared external in the source code: call FAR PTR task 15.3.2 Defining Procedures Procedures are defined by labeling the start of the procedure and placing an ENDP directive at the end. The code should not fall through past the end of the procedure. Exit the procedure with a RET, RETF, RETN, or IRET instruction. There are several variations of this syntax. Syntax 1 label PROC [[NEAR|FAR]] RET [[constant]] label ENDP Procedures are normally defined by using the PROC directive at the start of the procedure and the ENDP directive at the end. The RET instruction is normally placed immediately before the ENDP directive. The size of the RET instruction automatically matches the size defined by the PROC directive. The syntax shown is always available. In addition, there is an extended PROC syntax available if you use .MODEL and specify a language. The extended PROC syntax is explained in Section 15.3.4, "Declaring Parameters with the PROC Directive." These language features automate many of the details of accessing parameters and saving registers. Syntax 2 label: statements RETN [[constant]] Syntax 3 label LABEL FAR statements RETF [[constant]] The RET instruction can be extended to RETN (Return Near) or RETF (Return Far) to override the default size. This enables you to define and use procedures without the PROC and ENDP directives, as shown in Syntax 2 and Syntax 3, above. However, with this method, the programmer is responsible for making sure the size of the CALL matches the size of the RET. The RET instruction (and its RETF and RETN variations) allows a constant operand that specifies a number of bytes to be added to the value of the SP register after the return. This operand can be used to adjust for arguments passed to the procedure before the call, as shown in the example in Section 15.3.5, "Using Local Variables." Example 1 call task ; Call is near because procedure is near . ; Return comes to here . . task PROC NEAR ; Define "task" to be near . . ; Instructions of "task" go here . ret ; Return to instruction after call task ENDP ; End "task" definition Example 1 shows the recommended way of making calls with QuickAssembler. Example 2 shows another method that programmers who are used to other assemblers may find more familiar. Example 2 call NEAR PTR task ; Call is declared near . ; Return comes to here . . task: ; Procedure begins with near label . . ; Instructions go here . retn ; Return declared near This method gives more direct control over procedures, but the programmer must make sure that calls have the same size as corresponding returns. For example, if a call is made with the statement call NEAR PTR task the assembler does a near call. This means that one word (the offset following the calling address) is pushed onto the stack. If the return is made with the statement retf two words are popped off the stack. The first will be the offset, but the second will be whatever happened to be on the stack before the call. Not only will the popped value be meaningless, but the stack status will be incorrect, causing the program to fail. 15.3.3 Passing Arguments on the Stack Procedure arguments can be passed in various ways. For example, values can be passed to a procedure in registers or in variables. However, the most common method of passing arguments is to use the stack. Microsoft languages have a specific convention for doing this. This section describes how a procedure accesses the parameters passed to it on the stack. Each parameter is accessed as an offset from BP, and you must calculate this offset. However, if you use the PROC directive to declare parameters, the assembler calculates these offsets for you and lets you refer to parameters by name. The next section explains how to use PROC this way. The arguments are pushed onto the stack before the call. After the call, the procedure retrieves and processes them. At the end of the procedure, the stack is adjusted to account for the arguments. Although the same basic method is used for all Microsoft high-level languages, the details vary. For instance, in some languages, pointers to the arguments are passed to the procedure; in others, the arguments themselves are passed. The order in which arguments are passed (whether the first argument is pushed first or last) also varies according to the language. Finally, in some languages, the stack is adjusted by the RET instruction in the called procedure; in others, the code immediately following the CALL instruction adjusts the stack. See Appendix A, "Mixed-Language Mechanics," for details on calling conventions. Example ; C-style procedure call and definition mov ax,10 ; Load and push ax ; push constant as third argument push arg2 ; Push memory as second argument push cx ; Push register as first argument call addup ; Call the procedure add sp,6 ; Destroy the pushed arguments . ; (equivalent to three pops) . . addup PROC NEAR ; Return address for near call ; takes two bytes push bp ; Save base pointer - takes two bytes ; so arguments start at 4th byte mov bp,sp ; Load stack into base pointer mov ax,[bp+4] ; Get first argument from ; 4th byte above pointer add ax,[bp+6] ; Add second argument from ; 6th byte above pointer add ax,[bp+8] ; Add third argument from ; 8th byte above pointer pop bp ; Restore BP ret ; Return result in AX addup ENDP The example shows one method of passing arguments to a procedure. This method is similar to the way procedures are called in the C language. Figure 15.1 shows the stack condition at key points in the process. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 15.3.3 of the manual │ └────────────────────────────────────────────────────────────────────────┘ ────────────────────────────────────────────────────────────────────────── NOTE Arguments passed on the stack in assembler routines cannot be accessed by name in debugging commands, unless you declare parameters with the PROC directive, as explained in the next section. ────────────────────────────────────────────────────────────────────────── 15.3.4 Declaring Parameters with the PROC Directive This section describes how to use the PROC directive in order to automate the parameter-accessing techniques described in the last section. The PROC directive lets you specify registers to be saved, define arguments to the procedure, and set up text macros so that you can refer to parameters by name (rather than as an offset to BP). For example, the following PROC directive could be placed at the beginning of a procedure called from BASIC that takes a single argument passed by value and that uses (and must save) the DI and SI registers: myproc PROC FAR BASIC USES DI SI, arg1:WORD Note that you must use the .MODEL directive and specify a language in order to use the extended features of PROC, including the lang type, reglist, and arguments. Syntax label PROC [[NEAR|FAR]] [[lang]] [[USES reglist]] [[arguments]] The NEAR and FAR keywords indicate whether you invoke the procedure with a near call or a far call, as described in Section 15.3.2, "Defining Procedures." The following list describes the other parts of the PROC directive: Argument Description ────────────────────────────────────────────────────────────────────────── label The name of the procedure. The assembler automatically adds an underscore to the beginning of the name if you specify C as the language in the .MODEL directive or if you specify C as the lang. lang An optional language specifier that overrides language conventions specified by the .MODEL directive. The language type may be C, Pascal, FORTRAN, or BASIC. The language type determines the calling convention used to access parameters and restore the stack. It also determines whether an underscore is prefixed to the procedure name, as required by the C naming convention. Note that use of the C specifier does not preserve lowercase letters in the procedure name. To guarantee compatibility with C naming conventions, choose Preserve Case or Preserve Extrn from the Assembler Flags dialog box, or assemble with /Cl or /Cx from the QCL command line. reglist A list of registers that the procedure uses and that should be saved on entry. Registers in the list must be separated by blanks or tabs. The assembler generates code to push these registers on the stack. When you exit, the assembler generates code to pop the saved register values off the stack. arguments The list of arguments passed to the procedure on the stack. See the discussion below for the syntax of the argument. The arguments indicate each of the procedure's arguments and are separated from the reglist argument by a comma if there is a list of registers. Each argument has the following syntax: argname [[ :[[[[NEAR|FAR]]PTR]]type]] If you have more than one argument, separate each by a comma. The argname is the name of the argument. The type is the type of the argument and may be WORD, DWORD, QWORD, TBYTE, or the name of a structure defined by a STRUC structure declaration (see Chapter 6, "Defining Labels, Constants, and Variables" for more information about types). If you omit type, the default is the WORD type. The FAR, NEAR, PTR, and type arguments are all optional. If you omit all of them, the assembler assumes the variable is a WORD type. If you use only the type argument, the assembler assumes the variable has the indicated type. ────────────────────────────────────────────────────────────────────────── Note If you are writing a routine to be called from BASIC, FORTRAN, or Pascal, and the routine returns a function value, you must declare an additional parameter if you return anything other than a two- or four-byte integer. See Appendix A, "Mixed-Language Mechanics," for more information. ────────────────────────────────────────────────────────────────────────── The PTR type generates debugging information so that the variable is treated as a pointer during debugging. The assembler assumes specific sizes for the variable, depending on the combination of NEAR, FAR, and PTR arguments you specify. The lines below show some example combinations of NEAR, FAR, PTR, and type: myproc PROC var1:PTR WORD, var2:PTR DWORD . . . myproc ENDP proc2 PROC var3:FAR PTR WORD, var4:NEAR PTR BYTE . . . proc2 ENDP If you omit NEAR or FAR, the default data size established by .MODEL is used. All PTR declarations are translated into a word-size variable if the data size is near or a doubleword variable if the data size is far. For example, the following declarations of procvar produce the same code for the variable name, although they generate different debugging information: aproc PROC procvar:PTR WORD aproc PROC procvar:PTR DWORD aproc PROC procvar:PTR BYTE Specifying a particular type changes only the debugging information, not the code produced for accessing the argument. If you specify a NEAR PTR or FAR PTR argument, as in the declarations of var3 and var4, the assembler ignores the memory model you selected and assigns a WORD type for a NEAR PTR argument and a DWORD type for a FAR PTR argument. The assembler does not generate any code to get the value or values the pointer references; your program must still explicitly treat the argument as a pointer. For example, the procedure in Section 5.1 can be rewritten for use with BASIC so that it gets its argument by near reference (the BASIC default): ; Call from BASIC as a FUNCTION returning an integer .MODEL medium, basic .CODE myadd PROC arg1:NEAR PTR WORD, arg2:NEAR PTR WORD mov bx,arg1 ; Load first argument mov ax,[bx] mov bx,arg2 ; Add second argument add ax,[bx] ret myadd ENDP END In the example above, even though the arguments are declared as near pointers, you still must code two move instructions in order to get the values of the arguments──the first move gets the address of the argument; the second move gets the argument. You can use conditional-assembly directives to make sure that your pointer arguments are loaded correctly for the memory model. For example, the following version of myadd treats the arguments as far arguments if necessary: .MODEL medium,c ;Could be any model .CODE myadd PROC arg1:PTR WORD, arf2:PTR WORD IF @DataSize les bx,arg1 ;Far arguments mov ax,es:[bx] les bx,arg2 add ax,es:[bx] ELSE mov bx,arg1 ;Near arguments mov ax,[bx] mov bx,arg2 add ax,[bx] ENDIF ret myadd ENDP END ────────────────────────────────────────────────────────────────────────── Note When you use the high-level-language features and the assembler encounters a RET instruction, it automatically generates instructions to pop saved registers, remove local variables from the stack, and, if necessary, remove arguments. The assembler does not generate this code if you use a RETF or RETN instruction. It generates this code for each RET instruction it encounters. You can save code by having only one exit and jumping to it from various points. ────────────────────────────────────────────────────────────────────────── 15.3.5 Using Local Variables In high-level languages, local variables are known only within a procedure. In Microsoft languages, these variables are usually stored on the stack. Assembly-language programs can use the same technique. These variables should not be confused with labels or variable names that are local to a module, as described in Chapter 8, "Creating Programs from Multiple Modules." ────────────────────────────────────────────────────────────────────────── NOTE If your procedure has relatively few variables, you can usually write the most efficient code by placing these values in registers. Local (stack) data is efficient when you have a large amount of local data for the procedure. ────────────────────────────────────────────────────────────────────────── This section outlines the standard methods for creating local variables. The next section shows how to use the LOCAL directive to make the assembler generate local variables for you automatically. When you use this directive, the assembler generates the same instructions as those used in this section, but hides some of the details from you. If you want to use LOCAL right away, you may want to skip directly to the next section. However, this section gives useful background. Local variables are created by saving stack space for the variable at the start of the procedure. The variable can then be accessed by its position in the stack. At the end of the procedure, the stack pointer is restored to restore the memory used by local variables. Example push ax ; Push one argument call task ; Call . . . arg EQU <[bp+4]> ; Name for argument loc EQU <[bp-2]> ; Name for local variable task PROC NEAR push bp ; Save base pointer mov bp,sp ; Load stack into base pointer sub sp,2 ; Save two bytes for local variable . . . mov loc,3 ; Initialize local variable add ax,loc ; Add local variable to AX sub arg,ax ; Subtract local from argument . ; Use "loc" and "arg" in other operations . . mov sp,bp ; Adjust for stack variable pop bp ; Restore base ret 2 ; Return result in AX and pop task ENDP ; two bytes to adjust stack In this example, two bytes are subtracted from the SP register to make room for a local word variable. This variable can then be accessed as [bp-2]. In the example, this value is given the name loc with a text equate. Notice that the instruction mov sp,bp is given at the end to restore the original value of SP. The statement is only required if the value of SP is changed inside the procedure (usually by allocating local variables). The argument passed to the procedure is returned with the RET instruction. Contrast this to the example in Section 15.3.3, "Passing Arguments on the Stack," in which the calling code adjusts for the argument. Figure 15.2 shows the state of the stack at key points in the process. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 15.3.5 of the manual │ └────────────────────────────────────────────────────────────────────────┘ ────────────────────────────────────────────────────────────────────────── NOTE Local variables created in assembler routines cannot be accessed by name with debugging commands, unless you declare local variables with the LOCAL directive, as explained in the next section. ────────────────────────────────────────────────────────────────────────── 15.3.6 Creating Locals Automatically This section describes how to automate the techniques for local-variable creation described in the last section. You can use the LOCAL directive to save time and effort when working with local variables. When you use this directive, simply list the variables you want to create, giving a type for each one. The assembler calculates how much space is required on the stack. It also generates instructions to properly decrement SP (as described in the previous section) and to later reset SP when you return from the procedure. The LOCAL directive can only be used inside procedures created with the extended PROC directive. This means that you must first use .MODEL and specify a language. When you create local variables this way, your source code can then refer to each local variable by name rather than as an offset. Moreover, the assembler generates debugging information for each local variable, so that you can enter the name of the local variable as part of a Watch expression. The procedure in Section 15.3.5 can be generated more simply with the following code: task PROC NEAR arg:WORD LOCAL loc:WORD . . . mov loc,3 ; Initialize local variable add ax,loc ; Add local variable to AX sub arg,ax ; Subtract local from argument . ; Use "Loc" and "arg" in other operat . . ret task ENDP The LOCAL directive has the following syntax: LOCAL vardef [[,vardef]]... Each vardef has the form: label[[[count]]][[:[[[[NEAR | FAR]]PTR]]type]]]]... The LOCAL directive arguments are as follows: Argument Description ────────────────────────────────────────────────────────────────────────── label The name given to the local variable. The assembler automatically defines a text macro you may use to access the variable. count The number of elements of this name and type to allocate on the stack. Using count allows you to allocate a simple array on the stack. The brackets around count are required. If this field is omitted, one data object is assumed. type The type of variable to allocate. The type argument may be one of the following: WORD, DWORD, QWORD, TBYTE, or the name of a structure defined by a STRUC structure declaration. The assembler sets aside space on the stack, following the same rules as for procedure arguments. The assembler does not initialize local variables. Your program must include code to perform any necessary initializations. For example, the following code fragment sets up a local array and initializes it to zero: arraysz EQU 20 aproc PROC LOCAL var1[arraysz]:WORD, var2:WORD . . . ; Initialize local array to zero mov cx,arraysz xor ax,ax xor di,di ; Use di as array index repeat: mov var1[di],ax inc di inc di loop repeat ; Use the array... . . . ret aproc 15.3.7 Variable Scope When you use the extended form of the .MODEL directive, the assembler makes all identifiers inside a procedure local to the procedure. Labels ending with a colon (:), procedure arguments, and local variables declared in a LOCAL directive are undefined outside of the procedure. Variables defined outside of any procedure are available inside a procedure. For example, in the following fragment, var1 can be used in proc1 and proc2, while var2──because it is defined in proc2──is not available to proc1: .MODEL medium,c .DATA var1 DW 256 ; Available to proc1 and proc2 .CODE proc1 PROC . . . exit: ret proc1 ENDP proc2 PROC LOCAL var2:WORD ; This var2 only available in proc2 . . . exit: ret proc2 ENDP If proc1 contained a LOCAL directive defining var2, that var2 would be a completely different variable than the var2 in proc2. Notice that both procedures contain the label exit. Because labels are local when you use the language option on the .MODEL directive, you may use the same labels in different procedures. You can make a label in a procedure global (make it available outside the procedure) by ending it with two colons: proc3 PROC . . . label1:: . . . proc3 ENDP In the preceding example, label1 is available throughout the file containing proc3. 15.3.8 Setting Up Stack Frames 80186/286/386 Only Starting with the 80186 processor, the ENTER and LEAVE instructions are provided for setting up a stack frame. These instructions do the same thing as the multiple instructions at the start and end of procedures in the Microsoft calling conventions (see the examples in Section 15.3.3, "Passing Arguments on the Stack"). The PROC statement takes advantage of these instructions if you enable the extended instruction set with the .186 or .286 directive. Syntax ENTER framesize, nestinglevel statements LEAVE The ENTER instruction takes two constant operands. The framesize (a 16-bit constant) specifies the number of bytes to reserve for local variables. The nestinglevel (an 8-bit constant) specifies the level at which the procedure is nested. This operand should always be 0 when writing procedures for BASIC, C, and FORTRAN. The nestinglevel can be greater than 0 with Pascal and other languages that enable procedures to access the local variables of calling procedures. The LEAVE instruction reverses the effect of the last ENTER instruction by restoring BP and SP to their values before the procedure call. Example 1 task PROC NEAR enter 6,0 ; Set stack frame and reserve 6 . ; bytes for local variables . ; Do task here . leave ; Restore stack frame ret ; Return task ENDP Example 1 has the same effect as the code in Example 2. Example 2 task PROC NEAR push bp ; Save base pointer mov bp,sp ; Load stack into base pointer sub sp,6 ; Reserve 6 bytes for local variables . . ; Do task here . mov sp,bp ; Restore stack pointer pop bp ; Restore base ret ; Return task ENDP The code in Example 1 takes fewer bytes, but is slightly slower. See on-line Help on instructions for exact comparisons of size and timing. 15.4 Using Interrupts "Interrupts" are a special form of routines that are called by number instead of by address. They can be initiated by hardware devices as well as by software. Hardware interrupts are called automatically whenever certain events occur in the hardware. Interrupts can have any number from 0 to 255. Most of the interrupts with lower numbers are reserved for use by the processor, DOS, or the ROM BIOS. The programmer can call existing interrupts with the INT instruction. Interrupt routines can also be defined or redefined to be called later. For example, an interrupt routine that is called automatically by a hardware device can be redefined so that its action is different. DOS defines several interrupt handlers. Two that are sometimes used by applications programmers are listed below: Interrupt Description ────────────────────────────────────────────────────────────────────────── 0 Divide overflow. Called automatically when the quotient of a divide operation is too large for the source operand or when a divide by zero is attempted. 4 Overflow. Called by the INTO instruction if the overflow flag is set. Interrupt 21H is the normal method of using DOS functions. To call a function, place the function number in AH, put arguments in registers as appropriate, then call the interrupt. For complete documentation of DOS functions, see the Microsoft MS-DOS Programmer's Reference, one of the many other books on DOS functions, or the on-line Help system. DOS has several other interrupts, but they should not normally be called. Some (such as 20H and 27H) have been replaced by DOS functions. Others are used internally by DOS. You can also access ROM-BIOS services through interrupt calls. See the on-line Help system for a description of all these services. 15.4.1 Calling Interrupts Interrupts are called with the INT instruction. Syntax INT interruptnumber INTO The INT instruction takes an immediate operand with a value between 0 and 255. When calling DOS and ROM-BIOS interrupts, a function number is usually placed in the AH register. Other registers may be used to pass arguments to functions. Some interrupts and functions return values in certain registers. Register use varies for each interrupt. When the instruction is called, the processor takes the following six steps: 1. Looks up the address of the interrupt routine in the interrupt descriptor table. In real mode, this table starts at the lowest point in memory (segment 0, offset 0) and consists of four bytes (two segment and two offset) for each interrupt. Thus, the address of an interrupt routine can be found by multiplying the number of the interrupt by 4. 2. Pushes the flags register, the current code segment (CS), and the current instruction pointer (IP). 3. Clears the trap (TF) and interrupt enable (IF) flags. 4. Jumps to the address of the interrupt routine, as specified in the interrupt description table. 5. Executes the code of the interrupt routine until it encounters an IRET instruction. 6. Pops the instruction pointer, code segment, and flags. Figure 15.3 illustrates how interrupts work. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 15.4.1 of the manual │ └────────────────────────────────────────────────────────────────────────┘ The INTO (Interrupt On Overflow) instruction is a variation of the INT instruction. It calls interrupt 04H if called when the overflow flag is set. By default, the routine for interrupt 4 simply consists of an IRET so that it returns without doing anything. However, you can write your own overflow interrupt routine. Using INTO is an alternative to using JO (Jump On Overflow) to jump to an overflow routine. Section 15.4.2, "Defining and Redefining Interrupt Routines," gives an example of this. The CLI (Clear Interrupt Flag) and STI (Set Interrupt Flag) instructions can be used to turn interrupts on or off. You can use CLI to turn interrupt processing off so that an important routine cannot be stopped by a hardware interrupt. After the routine has finished, use STI to turn interrupt processing back on. Interrupts received while interrupt processing was turned off by CLI are saved and executed when STI turns interrupts back on. Example 1 ; DOS call (Display String) mov ah,09h ; Load function number mov dx,OFFSET string ; Load argument int 21h ; Call DOS Example 2 ; BIOS call (Read Character from Keyboard) xor ah,ah ; Load function number 0 in AH int 16h ; Call BIOS ; Return scan code in AH ; Return ascii code in AL Example 1 is a call to a DOS function. Example 2 is a ROM-BIOS call that works on IBM Personal Computers and IBM-compatible computers. See the on-line Help system for complete information on DOS and BIOS calls. 15.4.2 Defining and Redefining Interrupt Routines You can write your own interrupt routines, either to replace an existing routine or to use an undefined interrupt number. Syntax label PROC FAR[[USES reglist]] statements IRET label ENDP An interrupt routine can be written like a procedure by using the PROC and ENDP directives. The only differences are that the routine should always be defined as far and the routine should be terminated by an IRET instruction instead of a RET instruction. ────────────────────────────────────────────────────────────────────────── NOTE Since the assembler doesn't know whether you are going to terminate with a RET or an IRET, it is possible to use the full extended PROC syntax (described in Section 15.3.4) for interrupt procedures. However, making interrupt procedures NEAR or specifying arguments for them makes no sense. The USES keyword does correctly generate code to save and restore a register list in interrupt procedures. ────────────────────────────────────────────────────────────────────────── Your program should replace the address in the interrupt descriptor table with the address of your routine. DOS calls are provided for this task. Another common technique is to jump to the old interrupt routine and let it do the IRET instruction. It is usually a good idea to save the old address and restore it before your program ends. Interrupt routines you may want to replace include the processor's divide-overflow (0H) and overflow (04H) interrupts. You can also replace DOS interrupts, such as the critical-error (24H) and CONTROL+C (23H) handlers. Interrupt routines can be part of device drivers. Writing interrupt routines is usually a systems task. The example below illustrates a simple routine. For complete information, see the Microsoft MS-DOS Programmer's Reference or one of the other reference books on DOS. Example .DATA message DB "Overflow - result set to 0",13,10,"$" vector DD ? .CODE .STARTUP mov ax,3504h ; Load interrupt 4 and call DOS int 21h ; get interrupt vector function mov WORD PTR vector[2],es ; Save segment mov WORD PTR vector[0],bx ; and offset push ds ; Save DS mov ax,cs ; Load segment of new routine mov ds,ax mov dx,OFFSET overflow ; Load offset of new routine mov ax,2504h ; Load interrupt 4 and call DOS int 21h ; set interrupt vector function pop ds ; Restore . . . add ax,bx ; Do addition (or multiplication) into ; Call interrupt 4 if overflow . . . lds dx,vector ; Load original interrupt address mov ax,2504h ; Restore interrupt number 4 int 21h ; with DOS set vector function mov ax,4C00h ; Terminate function int 21h overflow PROC FAR sti ; Enable interrupts ; (turned off by INT) mov ah,09h ; Display string function mov dx,OFFSET message ; Load address int 21h ; Call DOS xor ax,ax ; Set AX to 0 xor dx,dx ; Set DX to 0 iret ; Return overflow ENDP END start In this example, DOS functions are used to save the address of the initial interrupt routine in a variable and to put the address of the new interrupt routine in the interrupt table. Once the new address has been set, the new routine is called any time the interrupt is called. The sample interrupt handler sets the result of a calculation that causes an overflow (either in AX or AX:DX) to 0. It is good practice to restore the original interrupt address before terminating the program. 15.5 Checking Memory Ranges 80186/286/386 Only Starting with the 80186 processor, the BOUND instruction can check to see if a value is within a specified range. This instruction is usually used to check a signed index value to see if it is within the range of an array. BOUND is a conditional interrupt instruction like INTO. If the condition is not met (the index is out of range), an interrupt 5 is executed. Syntax BOUND register16, memory32 To use it for this purpose, the starting and ending values of the array mu st be stored as 16-bit values in the low and high words of a doubleword me mory operand. This operand is given as the source operand. The index value to be checked is given as the destination operand. If the index value is out of range, the instruction issues interrupt 5. This means that the oper ating system or the program must provide an interrupt routine for interrup t 5. DOS does not provide such a routine, so you must write your own. See Section 15.4, "Using Interrupts," for more information. Example .DATA bottom EQU 0 top EQU 19 dbounds LABEL DWORD ; Allocate boundaries wbounds DW bottom,top ; initialized to bounds array DB top+1 DUP (?) ; Allocate array .CODE . . . ; Assume index in DI bound di,dbounds ; Check to see if it is in range ; if out of range, interrupt 5 mov dx,array[di] ; If in range, use it ──────────────────────────────────────────────────────────────────────────── Chapter 16: Processing Strings The 8086-family processors have a full set of instructions for manipulating strings. In the discussion of these instructions, the term "string" refers not only to the common definition of a string──a sequence of bytes containing characters──but to any sequence of bytes or words The following instructions are provided for 8086-family string functions: Instruction Description ────────────────────────────────────────────────────────────────────────── MOVS Moves string from one location to another SCAS Scans string for specified values CMPS Compares values in one string with values in another LODS Loads values from a string to accumulator register STOS Stores values from accumulator register to a string INS Transfers values from a port to memory OUTS Transfers values from memory to a port All these instructions use registers in the same way and have a similar syntax. Most are used with the repeat instruction prefixes: REP, REPE, REPNE, REPZ, and REPNZ. This chapter first explains the general format for string instructions and then tells you how to use each instruction. 16.1 Setting Up String Operations The string instructions all work in a similar way. Once you understand the gen-eral procedure, it is easy to adapt the format for a particular string operation. The five steps are listed below: 1. Make sure the direction flag indicates the direction in which you want the string to be processed. If the direction flag is clear, the string will be pro-cessed up (from low addresses to high addresses). If the direction flag is set, the string will be processed down (from high addresses to low addresses). The CLD instruction clears the flag, while STD sets it. Under DOS, the direction flag will normally be cleared if your program has not changed it. 2. Load the number of iterations for the string instruction into the CX register. For instance, if you want to process a 100-byte string, load 100. If a string instruction will be terminated conditionally, load the maximum number of iterations that can be done without an error. 3. Load the starting offset address of the source string into DS:SI and the starting address of the destination string into ES:DI. Some string instructions take only a destination or source (shown in Table 16.1 below). Normally, the segment address of the source string should be DS, but you can use a segment override with the string instruction to specify a different segment. You cannot override the segment address for the destination string. Therefore, you may need to change the value of ES. 4. Choose the appropriate repeat-prefix instruction. Table 16.1 shows the repeat prefixes that can be used with each instruction. 5. Put the appropriate string instruction immediately after the repeat prefix (on the same line). String instructions have two basic forms, as shown below: Syntax 1 [[repeatprefix]] stringinstruction[[ES:[[destination,]]]] [[[[segmentregister:]]source]] The string instruction can be given with the source and/or destination as operands. The size of the operand or operands indicates the size of the objects to be processed by the string. Note that the operands only specify the size. The actual values to be worked on are the ones pointed to by DS:SI and/or ES:DI. No error is generated if the operand is not the same as the actual source or destination. One important advantage of this syntax is that the source operand can have a segment override. The destination operand is always relative to ES and cannot be overridden. Syntax 2 [[repeatprefix]] stringinstructionB [[repeatprefix]] stringinstructionW The letter B or W appended to stringinstruction indicates bytes or words. With a letter appended to a string instruction, no operand is allowed. For instance, MOVS can be given with byte operands to move bytes or with word operands to move words. As an alternative, MOVSB can be given with no operands to move bytes, or MOVSW can be given with no operands to move words. Note that instructions that specify the size in the name never accept operands. Therefore, the following statement is illegal: lodsb es:0 ; Illegal - no operand allowed Instead, the statement must be coded as shown below: lods BYTE PTR es:0 ; Legal - use type specifier If a repeat prefix is used, it can be one of the following instructions: Instruction Description ────────────────────────────────────────────────────────────────────────── REP Repeats for a specified number of iterations. The number is given in CX. REPE or REPZ Repeats while equal. The maximum number of iterations should be specified in CX. REPNE or REPNZ Repeats while not equal. The maximum number of iterations should be specified in CX. REPE is the same as REPZ, and REPNE is the same as REPNZ. You can use whichever name you find more mnemonic. The prefixes ending with E are used in syntax listings and tables in the rest of this chapter. Table 16.1 lists each string instruction with the type of repeat prefix it uses and whether the instruction works on a source, a destination, or both. Table 16.1 Requirements for String Instructions Instruction Repeat Prefix Source/Destination Register Pair ────────────────────────────────────────────────────────────────────────── MOVS REP Both DS:SI, ES:DI SCAS REPE/REPNE Destination ES:DI CMPS REPE/REPNE Both ES:DI, DS:SI LODS None Source DS:SI STOS REP Destination ES:DI INS REP Destination ES:DI OUTS REP Source DS:SI ────────────────────────────────────────────────────────────────────────── At run time, a string instruction preceded by a repeat sequence causes the processor to take the following steps: 1. Checks the CX registers and exits from the string instruction if CX is 0. 2. Performs the string operation once. 3. Increases SI and/or DI if the direction flag is cleared. Decreases SI and/or DI if the direction flag is set. The amount of increase or decrease is 1 for byte operations, 2 for word operations. 4. Decrements CX (no flags are modified). 5. If the string instruction is SCAS or CMPS, checks the zero flag and exits if the repeat condition is false──that is, if the flag is set with REPE or REPZ or if it is clear with REPNE or REPNZ. 6. Goes to the next iteration (step 1). Although string instructions (except LODS) are most often used with repeat prefixes, they can also be used by themselves. In this case, the SI and/or DI registers are adjusted as specified by the direction flag and the size of operands. However, you must decrement the CX register and set up a loop for the repeated action. ────────────────────────────────────────────────────────────────────────── NOTE Although you can use a segment override on the source operand, a segment override combined with a repeat prefix can cause problems in certain situations. If an interrupt occurs during the string operation, the segment override is lost and the rest of the string operation processes incorrectly. Segment overrides can be used safely when interrupts are turned off. ────────────────────────────────────────────────────────────────────────── 16.2 Moving Strings The MOVS instruction is used to move data from one area of memory to another. Syntax [[REP]] MOVS [[ES:]]destination,[[segmentregister:]]source [[REP]] MOVSB [[REP]] MOVSW To move the data, load the count and the source and destination addresses into the appropriate registers. Then use REP with the MOVS instruction. Example 1 .MODEL small .DATA source DB 10 DUP ('0123456789') destin DB 100 DUP (?) .CODE mov ax,@data ; Load same segment mov ds,ax ; to both DS mov es,ax ; and ES . . . cld ; Work upward mov cx,100 ; Set iteration count to 100 mov si,OFFSET source ; Load address of source mov di,OFFSET destin ; Load address of destination rep movsb ; Move 100 bytes Example 1 shows how to move a string by using string instructions. For comparison, Example 2 shows a much less efficient way of doing the same operation without string instructions. Example 2 .MODEL small .DATA source DB 10 DUP ('0123456789') destin DB 100 DUP (?) .CODE . ; Assume ES = DS . . mov cx,100 ; Set iteration count to 100 mov si,OFFSET source ; Load offset of source mov di,OFFSET destin ; Load offset of destination repeat: mov al,[si] ; Get a byte from source mov [di],al ; Put it in destination inc si ; Increment source pointer inc di ; Increment destination pointer loop repeat ; Do it again Both examples illustrate how to move byte strings in a small-model program in which DS already points to the segment containing the variables. In such programs, ES can be set to the same value as DS. There are several variations on this. If the source string was not in the current data segment, you could load the starting address of its segment into ES. Another option would be to use the MOVS instruction with operands and give a segment override on the source operand. For example, you could use the following statement if ES pointed to both the source and the destination strings: rep movs destin,es:source It is sometimes faster to move a string of bytes as words. You must adjust for any odd bytes, as shown in Example 3. Assume the source and destination are already loaded. Example 3 mov cx,count ; Load count shr cx,1 ; Divide by 2 (carry will be set ; if count is odd) rep movsw ; Move words rcl cx,1 ; If odd, make CX 1 rep movsb ; Move odd byte if there is one 16.3 Searching Strings The SCAS instruction is used to scan a string for a specified value. Syntax [[REPE | REPNE]] SCAS [[ES:]]destination [[REPE | REPNE]] SCASB [[REPE | REPNE]] SCASW SCAS and its variations work only on a destination string, which must be pointed to by ES:DI. The value to scan for must be in the accumulator register──AL for bytes, AX for words. The SCAS instruction works by comparing the value pointed to by DI with the value in the accumulator. If the values are the same, the zero flag is set. Thus, the instruction only makes sense when used with one of the repeat prefixes that checks the zero flag. If you want to search for the first occurrence of a specified value, use the REPNE or REPNZ instruction. If the value is found, ES:DI will point to the value immediately after the first occurrence. You can decrement DI to make it point to the first matching value. If you want to search for the first value that does not have a specified value, use REPE or REPZ. If the value is found, ES:DI will point to the position after the first nonmatching value. You can decrement DI to make it point to the first non-matching value. After a REPNE SCAS, the zero flag will be cleared if no match was found. After a REPE SCAS, the zero flag will be set if no nonmatch was found. Example .DATA string DB "The quick brown fox jumps over the lazy dog" lstring EQU $-string ; Length of string pstring DD string ; Far pointer to string .CODE . . . cld ; Work upward mov cx,lstring ; Load length of string les di,pstring ; Load address of string mov al,'z' ; Load character to find repne scasb ; Search jnz notfound ; CX is 0 if not found . ; ES:DI points to character . ; after first 'z' . notfound: ; Special case for not found This example assumes that ES is not the same as DS, but that the address of the string is stored in a pointer variable. The LES instruction is used to load the far address of the string into ES:DI. 16.4 Comparing Strings The CMPS instruction is used to compare two strings and point to the address where a match or nonmatch occurs. Syntax [[REPE | REPNE]] CMPS [[segmentregister:]]source,[[ES:]],destination [[REPE | REPNE]] CMPSB [[REPE | REPNE]] CMPSW The count and the addresses of the strings are loaded into registers, as described in Section 16.1, "Setting Up String Operations." Either string can be considered the destination or source string unless a segment override is used. Notice that unlike other instructions, CMPS requires that the source be on the left. The CMPS instruction works by comparing, in turn, each value pointed to by DI with the value pointed to by SI. If the values are the same, the zero flag is set. Thus, the instruction makes sense only when used with one of the repeat prefixes that checks the zero flag. If you want to search for the first match between the strings, use the REPNE or REPNZ instruction. If a match is found, ES:DI and DS:SI will point to the position after the first match in the respective strings. You can decrement DI or SI to point to the match. (Conversely, you would increment DI or SI if the direction flag was set.) If you want to search for a nonmatch, use REPE or REPZ. If a nonmatch is found, ES:DI and DS:SI will point to the position after the first nonmatch in the respective strings. You can decrement DI or SI to point to the nonmatch. After a REPNE CMPS, the zero flag will be cleared if no match was found. After a REPE CMPS, the zero flag will be set if no nonmatch was found. Example .MODEL large .DATA string1 DB "The quick brown fox jumps over the lazy dog" .FARDATA string2 DB "The quick brown dog jumps over the lazy fox" lstring EQU $-string2 .CODE mov ax,@data ; Load data segment mov ds,ax ; into DS mov ax,@fardata ; Load far data segment mov es,ax ; into ES . . . cld ; Work upward mov cx,lstring ; Load length of string mov si,OFFSET string1 ; Load offset of string1 mov di,OFFSET string2 ; Load offset of string2 repe cmpsb ; Compare jnz allmatch ; CX is 0 if no nonmatch dec si ; Adjust to point to nonmatch dec di ; in each string . . allmatch: . ; Special case for all match This example assumes that the strings are in different segments. Both segments must be initialized to the appropriate segment register. 16.5 Filling Strings The STOS instruction is used to store a specified value in each position of a string. Syntax [[REP]] STOS [[ES:]]destination [[REP]] STOSB [[REP]] STOSW The string is considered the destination, so it must be pointed to by ES:DI. The length and address of the string must be loaded into registers, as described in Section 16.1, "Setting Up String Operations." The value to store must be in the accumulator register──AL for bytes, AX for words. For each iteration specified by the REP instruction prefix, the value in the accumulator is loaded into the string. Example .MODEL small .DATA destin DB 100 DUP ? .CODE . ; Assume ES = DS . . cld ; Work upward mov ax,'aa' ; Load character to fill mov cx,50 ; Load length of string mov di,OFFSET destin ; Load address of destination rep stosw ; Store 'a' into array This example loads 100 bytes containing the character 'a'. Notice that this is done by storing 50 words rather than 100 bytes. This makes the code faster by reducing the number of iterations. You would have to adjust for the last byte if you wanted to fill an odd number of bytes. 16.6 Loading Values from Strings The LODS instruction is used to load a value from a string into a register. Syntax LODS [[segmentregister:]]source LODSB LODSW The string is considered the source, so it must be pointed to by DS:SI. The value is always loaded from the string into the accumulator register──AL for bytes, AX for words. Unlike other string instructions, LODS is not normally used with a repeat prefix since there is no reason to move a value repeatedly to a register. However, LODS does adjust the SI register as specified by the direction flag and the size of operands. The programmer must code the instructions to use the value after it is loaded. Example 1 .DATA stuff DB 0,1,2,3,4,5,6,7,8,9 .CODE . . . cld ; Work upward mov cx,10 ; Load length mov si,OFFSET stuff ; Load offset of source mov ah,2 ; Display character function get: lodsb ; Get a character add al,'0' ; Convert to ASCII mov dl,al ; Move to DL int 21h ; Call DOS to display character loop get ; Repeat Example 1 loads, processes, and displays each byte in a string of bytes. Example 2 .DATA buffer DB 80 DUP(?) ; Create buffer for argument strin .CODE start: mov ax,@data ; Initialize DS mov ds,ax ; On start-up ES points to PSP cld ; Work upward mov cl,BYTE PTR es:[80h] ; Load length of arguments xor ch,ch mov di,OFFSET buffer ; Load offset of buffer mov si,82h ; Load position of argument string mov dx,es ; Exchange ES and DS mov ax,ds mov es,ax mov ds,dx another: lodsb ; Get a character cmp al,'a' ; Is it high enough to be upper? jb noway ; No? Check cmp al,'z' ; Is it low enough to be letter? ja noway sub al,32 ; Yes? Convert to uppercase noway: stosb loop another ; Repeat mov dx,es ; Restore ES and DS mov ax,ds mov es,ax mov ds,dx Example 2 copies the command arguments from position 82H in the DOS Program Segment Prefix (PSP) while converting them to uppercase. See the Microsoft MS-DOS Programmer's Reference or one of the many other books on DOS for information about the PSP. Notice that both LODSB and STOSB are used without repeat prefixes. 16.7 Transferring Strings to and from Ports 80186/286/386 Only The INS instruction reads a string from a port to memory, and the OUTS instruction writes a string from memory to a port. Syntax OUTS DX,[[segmentregister:]]source OUTSB OUTSW INS [[ES:]]destination,DX INSB INSW The INS and OUTS instructions require that the number of the port be in DX. The port cannot be specified as an immediate value, as it can be with IN and OUT. To move the data, load the count into CX. The string to be transferred by INS is considered the destination string, so it must be pointed to by ES:DI. The string to be transferred by OUTS is considered the source string, so it must be pointed to by DS:SI. If you specify the source or destination as an operand, DX must be specified. Otherwise, DX is assumed and should be omitted. If you need to process the string as it is transferred (for instance, to check for the end of a null-terminated string), you must set up the loop yourself instead of using the REP instruction prefix. Example .DATA count EQU 100 buffer DB count DUP (?) inport DW ? .CODE . ; Assume ES = DS . . cld ; Work upward mov cx,count ; Load length to transfer mov di,OFFSET buffer ; Load address of destination mov dx,inport ; Load port number rep insb ; Transfer the string ; from port to buffer ──────────────────────────────────────────────────────────────────────────── Chapter 17: Calculating with a Math Coprocessor The 8087-family coprocessors are used to do fast mathematical calculations. When used with real numbers, packed binary coded decimal (BCD) numbers, or long integers, they do calculations many times faster than the same operations done with 8086-family processors. This chapter explains how to use the 8087-family processors to transfer and process data. The approach taken is from an applications standpoint. Features that would be used by systems programmers (such the flags used when writing exception handlers) are not explained. This chapter is intended as a reference, not a tutorial. ────────────────────────────────────────────────────────────────────────── NOTE This manual does not attempt to explain the mathematical concepts involved in using certain coprocessor features. It assumes that you will not need to use a feature unless you understand the mathematics involved. For example, you need to understand logarithms to use the FYL2X and FYL2XP1 instructions. ────────────────────────────────────────────────────────────────────────── 17.1 Coprocessor Architecture The math coprocessor works simultaneously with the main processor. However, since the coprocessor cannot handle device input or output, most data originates in the main processor. The main processor and the coprocessor have their own registers, which are completely separate and inaccessible to the other. They exchange data through memory, since memory is available to both. Ordinarily you follow these three steps when using the coprocessor: 1. Load data from memory to coprocessor registers. 2. Process the data. 3. Store the data from coprocessor registers back to memory. Step 2, processing the data, can occur while the main processor is handling other tasks. Steps 1 and 3 must be coordinated with the main processor so that the processor and coprocessor do not try to access the same memory at the same time, as explained in Section 17.5, "Transferring Data." 17.1.1 Coprocessor Data Registers The 8087-family coprocessors have eight 80-bit data registers. Unlike 8086-family registers, the coprocessor data registers are organized as a stack. As data is pushed into the top register, previous data items move into higher-numbered registers. Register 0 is the top of the stack; register 7 is the bottom. The syntax for specifying registers is shown below: ST[[(number)]] The number must be a digit between 0 and 7, or a constant expression that evaluates to a number from 0 to 7. If number is omitted, register 0 (top of stack) is assumed. All coprocessor data is stored in registers in the temporary-real format. This is the 10-byte IEEE format described in Section 6.5.1.4, "Real-Number Variables." The registers and the register format are shown in Figure 17.1. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 17.1.1 of the manual │ └────────────────────────────────────────────────────────────────────────┘ Internally, all calculations are done on numbers of the same type. Since temporary-real numbers have the greatest precision, lower-precision numbers are guaranteed not to lose precision as a result of calculations. The instructions that transfer values between the main processor and the coprocessor automatically convert numbers to and from the temporary-real format. 17.1.2 Coprocessor Control Registers The 8087-family coprocessors have seven 16-bit control registers. The most useful control registers are made up of bit fields or flags. Some flags control coprocessor operations, while others maintain the current status of the coprocessor. In this sense, they are much like the 8086-family flags registers. You do not need to understand these registers to do most coprocessor operations. Control flags are set by default to the values appropriate for most programs. Errors and exceptions are reported in the status-word register. However, the coprocessor already has a default system for handling exceptions. Applications programmers can usually accept the defaults. Systems programmers may want to use the status-word and control-word registers when writing exception handlers, but such problems are beyond the scope of this manual. Figure 17.2 shows the overall layout of the control registers, including the control word, status word, tag word, instruction pointer, and operand pointer. The format of each of the registers is not shown, since these registers are generally of use only to systems programmers. The exception is the condition-code bits of the status-word register. These bits are explained in Section 17.7, "Controlling Program Flow." ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 17.1.2 of the manual │ └────────────────────────────────────────────────────────────────────────┘ The control registers are explained in more detail in the on-line Help system. 17.2 Emulation You can write assembly-language procedures that use the emulator library when called from QuickC. First write the procedure by using coprocessor instructions, then assemble it using the /E option, and finally link it with your high-level-language modules. When compiling modules, use the compiler options that specify emulation. Some coprocessor instructions are not emulated by Microsoft emulation libraries. Which instructions are emulated varies depending on the language and version. If you use a coprocessor instruction that is not emulated, the program will generate a run-time error when it tries to execute the unemulated instruction. You cannot use a Microsoft emulation library with stand-alone assembler programs, since the library depends on the compiler start-up code. See Appendix B, Section B.6, "Creating Code for a Floating-Point Emulator," for information on the /E option. See Appendix A, "Mixed-Language Mechanics," for information on writing assembly-language procedures for high-level languages. 17.3 Using Coprocessor Instructions Coprocessor instructions are readily recognizable because, unlike all 8086-family instruction mnemonics, they start with the letter F. Most coprocessor instructions have two operands, but in many cases, one or both operands are implied. Often, one operand can be a memory operand; in this case, the other operand is always implied as the stack-top register. Coprocessor instructions can never have immediate operands, and with the exception of the FSTSW instruction (see Section 17.5.2, "Loading Constants"), they cannot have processor registers as operands. As with 8086-family instructions, memory-to-memory operations are never allowed. One operand must be a coprocessor register. Instructions usually have a source and a destination operand. The source specifies one of the values to be processed. It is never changed by the operation. The destination specifies the value to be operated on and replaced with the result of the operation. If two operands are specified, the first is the destination and the second is the source. The stack organization of registers gives the programmer flexibility to think of registers either as elements on a stack or as registers much like 8086-family registers. Table 17.1 lists the variations of coprocessor instructions along with the syntax for each. Table 17.1 Coprocessor Operand Forms Instruction Form Implied Syntax Operands Example ────────────────────────────────────────────────────────────────────────── Classical-stack Faction ST(1), ST fadd Memory Faction memory ST fadd memloc Register Faction ST(num), fadd st(5),st Faction ST, ST(num) ST fadd st,st(3) Register pop FactionP ST(num), faddp st(4),st ST ────────────────────────────────────────────────────────────────────────── Not all instructions accept all operand variations. For example, load and store instructions always require the memory form. Load-constant instructions always take the classical-stack form. Arithmetic instructions can usually take any form. Some instructions that accept the memory form can have the letter I (integer) or B (BCD) following the initial F to specify how a memory operand is to be interpreted. For example, FILD interprets its operand as an integer and FBLD interprets its operand as a BCD number. If no type letter is included in the instruction name, the instruction works on real numbers. 17.3.1 Using Implied Operands in the Classical-Stack Form The classical-stack form treats coprocessor registers like items on a stack. Items are pushed onto or popped off the top elements of the stack. Since only the top item can be accessed on a traditional stack, there is no need to specify operands. The first register (and the second if there are two operands) is always assumed. In arithmetic operations (see Section 17.6), the top of the stack (ST) is the source operand, and the second register (ST(1)) is the destination. The result of the operation goes into the destination operand, and the source is popped off the stack. The effect is that both of the values used in the operation are destroyed and the result is left at the top of the stack. Instructions that load constants always use the stack form (see Section 17.5.1, "Transferring Data to and from Registers"). In this case, the constant created by the instruction is the implied source, and the top of the stack (ST) is the destination. The source is pushed into the destination. Note that the classical-stack form with its implied operands is similar to the register-pop form, not to the register form. For example, fadd, with the implied operands ST(1),ST, is equivalent to faddp st(1),st, rather than to fadd st(1),st. Example fld1 ; Push 1 into first position fldpi ; Push pi into first position fadd ; Add pi and 1 and pop The status of the register stack after each instruction is shown below: ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 17.3.1 of the manual │ └────────────────────────────────────────────────────────────────────────┘ 17.3.2 Using Memory Operands The memory form treats coprocessor registers like items on a stack. Items are pushed from memory onto the top element of the stack, or popped from the top element to memory. Since only the top item can be accessed on a traditional stack, there is no need to specify the stack operand. The top register (ST) is always assumed. However, the memory operand must be specified. Memory operands can be used in load and store instructions (see Section 17.5.1, "Transferring Data to and from Registers"). Load instructions push source values from memory to an implied destination register (ST). Store instructions pop source values from an implied source register (ST) to the destination in memory. Some versions of store instructions pop the register stack so that the source is destroyed. Others simply copy the source without changing the stack. Memory operands can also be used in calculation instructions that operate on two values (see Section 17.6, "Doing Arithmetic Calculations"). The memory operand is always the source. The stack top (ST) is always the implied destination. The result of the operation replaces the destination without changing its stack position. Example .DATA m1 DD 1.0 m2 DD 2.0 .CODE . . . fld m1 ; Push m1 into first position fld m2 ; Push m2 into first position fadd m1 ; Add m2 to first position fstp m1 ; Pop first position into m1 fst m2 ; Copy first position to m2 The status of the register stack and the memory locations used in the instructions is shown below: ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 17.3.2 of the manual │ └────────────────────────────────────────────────────────────────────────┘ 17.3.3 Specifying Operands in the Register Form The register form treats coprocessor registers as traditional registers. Registers are specified the same as 8086-family instructions with two register operands. The only limitation is that one of the two registers must be the stack top (ST). In the register form, operands are specified by name. The second operand is the source; it is not affected by the operation. The first operand is the destination; its value is replaced with the result of the operation. The stack position of the operands does not change. The register form can only be used with the FXCH instruction and with arithmetic instructions that do calculations on two values. With the FXCH instruction, the stack top is implied and need not be specified. Example fadd st(1),st ; Add second position to first - ; result goes in second position fadd st,st(2) ; Add first position to second - ; result goes in first position fxch st(1) ; Exchange first and second positions The status of the register stack if the registers were previously initialized to 1.0, 2.0, and 3.0 is shown below: ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 17.3.3 of the manual │ └────────────────────────────────────────────────────────────────────────┘ 17.3.4 Specifying Operands in the Register-Pop Form The register-pop form treats coprocessor registers as a modified stack. This form has some of the aspects of both a stack and registers. The destination register can be specified by name, but the source register must always be the stack top. The result of the operation will be placed in the destination operand, and the stack top will be popped off the stack. The effect is that both values being operated on will be destroyed and the result of the operation will be saved in the specified destination register. The register-pop form is only used for instructions that do calculations on two values. Example faddp st(2),st ; Add first and third positions and pop - ; first position destroyed ; third moves to second and holds result The status of the register stack if the registers were already initialized to 1.0, 2.0, and 3.0 is shown below: ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 17.3.4 of the manual │ └────────────────────────────────────────────────────────────────────────┘ 17.4 Coordinating Memory Access Problems of coordinating memory access can occur when the coprocessor and the main processor both try to access a memory location at the same time. Since the processor and coprocessor work independently, they may not finish working on memory in the order in which you give instructions. There are two separate cases, and they are handled in different ways. In the first case, if a processor instruction is given and then followed by a coprocessor instruction, the coprocessor must wait until the processor is finished before it can start the next instruction. This is handled automatically by Quick-Assembler for the 8088 and 8086 or by the processor for the 80186 and 80286. ────────────────────────────────────────────────────────────────────────── Coprocessor Differences To synchronize operations between the 8088 or 8086 processor and the 8087 coprocessor, each 8087 instruction must be preceded by a WAIT instruction. This is not necessary for the 80287. If you use the .8087 directive, QuickAssembler inserts WAIT instructions automatically. However, if you use the .286 directive, QuickAssembler assumes the instructions are for the 80287 or 80387 and does not insert the WAIT instructions. If your code will never need to run on an 8086 or 8088 processor, you can make your programs shorter and more efficient by using the .286 directive. ────────────────────────────────────────────────────────────────────────── In the second case, if a coprocessor instruction that accesses memory is followed by a processor instruction attempting to access the same memory location, memory access is not automatically synchronized. For instance, if you store a coprocessor register to a variable and then try to load that variable into a processor register, the coprocessor may not be finished. Thus, the processor gets the value that was in memory before the coprocessor finished, rather than the value stored by the coprocessor. Use the WAIT or FWAIT instruction (they are mnemonics for the same instruction) to ensure that the coprocessor finishes before the processor begins. Example ; Coprocessor instruction first - Wait needed fist mem32 ; Store to memory fwait ; Wait until coprocessor is done mov ax,WORD PTR mem32 ; Move to register mov dx,WORD PTR mem32[2] ; Processor instruction first - No wait needed mov WORD PTR mem32,ax ; Load memory mov WORD PTR mem32[2],dx fild mem32 ; Load to register 17.5 Transferring Data The 8087-family coprocessors have separate instructions for each of the following types of transfers: 1. Transferring data between memory and registers, or between different registers 2. Loading certain common constants into registers 3. Transferring control data to and from memory 17.5.1 Transferring Data to and from Registers Data-transfer instructions transfer data between main memory and the coprocessor registers, or between different coprocessor registers. Two basic principles govern data transfers: ■ The instruction determines whether a value in memory will be considered an integer, a BCD number, or a real number. The value is always considered a temporary-real number once it is transferred to the coprocessor. ■ The size of the operand determines the size of a value in memory. Values in the coprocessor always take up 10 bytes. The adjustments between formats are made automatically. Notice that floating-point numbers must be stored in the IEEE format, not in the Microsoft Binary format. Data is automatically stored correctly by default. It is stored incorrectly and the coprocessor instructions disabled if you use the .MSFLOAT directive. Data formats for real numbers are explained in Section 6.5.1.4, "Real-Number Variables." Data is transferred to stack registers by using load commands. These commands push data onto the stack from memory or coprocessor registers. Data is removed by using store commands. Some store commands pop data off the register stack into memory or coprocessor registers, whereas others simply copy the data without changing it on the stack. 17.5.1.1 Real Transfers The following instructions are available for transferring real numbers: Syntax Description ────────────────────────────────────────────────────────────────────────── FLD mem Pushes a copy of mem into ST. The source must be a 4-, 8-, or 10-byte memory operand. It is automatically converted to the temporary-real format. FLD ST(num) Pushes a copy of the specified register into ST. FST mem Copies ST to mem without affecting the register stack. The destination can be a 4- or 8-byte memory operand. It is automatically converted from temporary-real format to short real or long real format, depending on the size of the operand. It cannot be stored in the 10-byte-real format. FST ST(num) Copies ST to the specified register. The current value of the specified register is replaced. FSTP mem Pops a copy of ST into mem. The destination can be a 4-, 8-, or 10-byte memory operand. It is automatically converted from temporary-real format to the appropriate real-number format, depending on the size of the operand. FSTP ST(num) Pops ST into the specified register. The current value of the specified register is replaced. FXCH [[ST(num)]] Exchanges the value in ST with the value in ST(num). If no operand is specified, ST(0) and ST(1) are exchanged. 17.5.1.2 Integer Transfers The following instructions are available for transferring binary integers: Syntax Description ────────────────────────────────────────────────────────────────────────── FILD mem The source must be a 2-, 4-, or 8-byte integer memory operand. It is interpreted as an integer and converted to temporary-real format. FIST mem Copies ST to mem. The destination must be a 2- or 4-byte memory operand. It is automatically converted from temporary-real format to a word or a doubleword, depending on the size of the operand. It cannot be converted to a quadword integer. FISTP mem Pops ST into mem. The destination must be a 2-, 4-, or 8-byte memory operand. It is automatically converted from temporary-real format to a word, doubleword, or quadword integer, depending on the size of the operand. 17.5.1.3 Packed BCD Transfers The following instructions are available for transferring BCD integers: Syntax Description ────────────────────────────────────────────────────────────────────────── FBLD mem Pushes a copy of mem into ST. The source must be a 10-byte memory operand. It should contain a packed BCD value, although no check is made to see that the data is valid. FBSTP mem Pops ST into mem. The destination must be a 10-byte memory operand. The value is rounded to an integer if necessary and converted to a packed BCD value. The following examples illustrate instructions described throughout this section: Example 1 fld m1 ; Push m1 into first item fld st(2) ; Push third item into first fst m2 ; Copy first item to m2 fxch st(2) ; Exchange first and third items fstp m1 ; Pop first item into m1 Assuming that registers ST and ST(1) were previously initialized to 3.0 and 4.0, the status of the register stack is shown below: ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 17.5.1.3 of the manual │ └────────────────────────────────────────────────────────────────────────┘ Example 2 .DATA shortreal DD 100 DUP (?) longreal DQ 100 DUP (?) .CODE . ; Assume array shortreal has been . ; filled by previous code . mov cx,100 ; Initialize loop xor si,si ; Clear pointer into shortreal xor di,di ; Clear pointer into longreal again: fld shortreal[si] ; Push shortreal fstp longreal[di] ; Pop longreal add si,4 ; Increment source pointer add di,8 ; Increment destination pointer loop again ; Do it again Example 2 illustrates one way of doing run-time type conversions. 17.5.2 Loading Constants Constants cannot be given as operands and loaded directly into coprocessor registers. You must allocate memory and initialize the variable to a constant value. The variable can then be loaded by using one of the load instructions described in Section 17.5.1, "Transferring Data to and from Registers." However, special instructions are provided for loading certain constants. You can load 0, 1, pi, and several common logarithmic values directly. Using these instructions is faster and often more precise than loading the values from initialized variables. The instructions that load constants all have the stack top as the implied destination operand. The constant to be loaded is the implied source operand. The instructions are listed below: Syntax Description ────────────────────────────────────────────────────────────────────────── FLDZ Pushes 0 into ST FLD1 Pushes 1 into ST FLDPI Pushes the value of pi into ST FLDL2E Pushes the value of log2^e into ST FLDL2T Pushes log2^10 into ST FLDLG2 Pushes log10^2 into ST FLDLN2 Pushes loge^2 ST 17.5.3 Transferring Control Data The coprocessor data area, or parts of it, can be stored to memory and later loaded back. One reason for doing this is to save a snapshot of the coprocessor state before going into a procedure, and restore the same status after the procedure. Another reason is to modify coprocessor behavior by storing certain data to main memory, operating on the data with 8086-family instructions, and then loading it back to the coprocessor data area. You can choose to transfer the entire coprocessor data area, the control registers, or just the status or control word. Applications programmers seldom need to load anything other than the status word. All the control-transfer instructions take a single memory operand. Load instructions use the memory operand as the destination; store instructions use it as the source. The coprocessor data area is the implied source for load instructions and the implied destination for store instructions. Each store instruction has two forms. The "wait form" checks for unmasked numeric-error exceptions and waits until they have been handled. The "no-wait" form (which always begins with FN) ignores unmasked exceptions. The instructions are listed below: Syntax Description ────────────────────────────────────────────────────────────────────────── FLDCW mem2byte Loads control word F[[N]]STCW mem2byte Stores control word F[[N]]STSW mem2byte Stores status word FLENV mem14byte Loads environment F[[N]]STENV mem14byte Stores environment FRSTOR mem94byte Restores state F[[N]]SAVE mem94byte Saves state 80287/80387 Only Starting with the 80287 coprocessor, the FSTSW and FNSTSW instructions can store data directly to the AX register. This is the only case in which data can be transferred directly between processor and coprocessor registers, as shown below: fstsw ax 17.6 Doing Arithmetic Calculations The math coprocessors offer a rich set of instructions for doing arithmetic. Most arithmetic instructions accept operands in any of the formats discussed in Section 17.3, "Using Coprocessor Instructions." When using memory operands with an arithmetic instruction, make sure you indicate in the name whether you want the memory operand to be treated as a real number or an integer. For example, use FADD to add a real number to the stack top or FIADD to add an integer to the stack top. You do not need to specify the operand type in the instruction if both operands are stack registers, since register values are always real numbers. You cannot do arithmetic on BCD numbers in memory. You must use FBLD to load the numbers into stack registers. The arithmetic instructions are listed below. Addition The following instructions add the source and destination and put the result in the destination: Syntax Description ────────────────────────────────────────────────────────────────────────── FADD Classical-stack form. Adds ST and ST(1) and pops the result into ST. Both operands are destroyed. FADD ST(num),ST Register form with stack top as source. Adds the two register values and replaces ST(num) with the result. FADD ST,ST(num) Register form with stack top as destination. Adds the two register values and replaces ST with the result. FADD mem Real-memory form. Adds a real number in mem to ST. The result replaces ST. FIADD mem Integer-memory form. Adds an integer in mem to ST. The result replaces ST. FADDP ST(num),ST Register-pop form. Adds the two register values and pops the result into ST(num). Normal Subtraction The following instructions subtract the source from the destination and put the difference in the destination. Thus, the number being subtracted from is replaced by the result. Syntax Description ────────────────────────────────────────────────────────────────────────── FSUB Classical-stack form. Subtracts ST from ST(1) and pops the result into ST. Both operands are destroyed. FSUB ST(num),ST Register form with stack top as source. Subtracts ST from ST(num) and replaces ST(num) with the result. FSUB ST,ST(num) Register form with stack top as destination. Subtracts ST(num) from ST and replaces ST with the result. FSUB mem Real-memory form. Subtracts the real number in mem from ST. The result replaces ST. FISUB mem Integer-memory form. Subtracts the integer in mem from ST. The result replaces ST. FSUBP ST(num),ST Register-pop form. Subtracts ST from ST(num) and pops the result into ST(num). Both operands are destroyed. Reversed Subtraction The following instructions subtract the destination from the source and put the difference in the destination. Thus, the number subtracted is replaced by the result. Syntax Description ────────────────────────────────────────────────────────────────────────── FSUBR Classical-stack form. Subtracts ST(1) from ST and pops the result into ST. Both operands are destroyed. FSUBR ST(num),ST Register form with stack top as source. Subtracts ST(num) from ST and replaces ST(num) with the result. FSUBR ST,ST(num) Register form with stack top as destination. Subtracts ST from ST(num) and replaces ST with the result. FSUBR mem Real-memory form. Subtracts ST from the real number in mem. The result replaces ST. FISUBR mem Integer-memory form. Subtracts ST from the integer in mem. The result replaces ST. FSUBRP ST(num),ST Register-pop form. Subtracts ST(num) from ST and pops the result into ST(num). Both operands are destroyed. Multiplication The following instructions multiply the source and destination and put the product in the destination: Syntax Description ────────────────────────────────────────────────────────────────────────── FMUL Classical-stack form. Multiplies ST by ST(1) and pops the result into ST. Both operands are destroyed. FMUL ST(num),ST Register form with stack top as source. Multiplies the two register values and replaces ST(num) with the result. FMUL ST,ST(num) Register form with stack top as destination. Multiplies the two register values and replaces ST with the result. FMUL mem Real-memory form. Multiplies a real number in mem by ST. The result replaces ST. FIMUL mem Integer-memory form. Multiplies an integer in mem by ST. The result replaces ST. FMULP ST(num),ST Register-pop form. Multiplies the two register values and pops the result into ST(num). Both operands are destroyed. Normal Division The following instructions divide the destination by the source and put the quotient in the destination. Thus, the dividend is replaced by the quotient. Syntax Description ────────────────────────────────────────────────────────────────────────── FDIV Classical-stack form. Divides ST(1) by ST and pops the result into ST. Both operands are destroyed. FDIV ST(num),ST Register form with stack top as source. Divides ST(num) by ST and replaces ST(num) with the result. FDIV ST,ST(num) Register form with stack top as destination. Divides ST by ST(num) and replaces ST with the result. FDIV mem Real-memory form. Divides ST by the real number in mem. The result replaces ST. FIDIV mem Integer-memory form. Divides ST by the integer in mem. The result replaces ST. FDIVP ST(num),ST Register-pop form. Divides ST(num) by ST and pops the result into ST(num). Both operands are destroyed. Reversed Division The following instructions divide the source by the destination and put the quotient in the destination. Thus, the divisor is replaced by the quotient. Syntax Description ────────────────────────────────────────────────────────────────────────── FDIVR Classical-stack form. Divides ST by ST(1) and pops the result into ST. Both operands are destroyed. FDIVR ST(num),ST Register form with stack top as source. Divides ST by ST(num) and replaces ST(num) with the result. FDIVR ST,ST(num) Register form with stack top as destination. Divides ST(num) by ST and replaces ST with the result. FDIVR mem Real-memory form. Divides the real number in mem by ST. The result replaces ST. FIDIVR mem Integer-memory form. Divides the integer in mem by ST. The result replaces ST. FDIVRP ST(num),ST Register-pop form. Divides ST by ST(num) and pops the result into ST(num). Both operands are destroyed. Other Operations The following instructions all use the stack top (ST) as an implied destination operand. The result of the operation replaces the value in the stack top. No operand should be given. Syntax Description ────────────────────────────────────────────────────────────────────────── FABS Sets the sign of ST to positive. FCHS Reverses the sign of ST. FRNDINT Rounds ST to an integer. FSQRT Replaces the contents of ST with its square root. FSCALE Scales by powers of 2 by adding the value of ST(1) to the exponent of the value in ST. This effectively multiplies the stack-top value by 2 to the power contained in ST(1). Since the exponent field is an integer, the value in ST(1) should normally be an integer. FPREM Calculates the partial remainder by performing modulo division on the top two stack registers. The value in ST is divided by the value in ST(1). The remainder replaces the value in ST. The value in ST(1) is unchanged. Since this instruction works by repeated subtractions, it can take a lot of execution time if the operands are very different in magnitude. FRPEM is sometimes used with trigonometric functions. FXTRACT Breaks a number down into its exponent and mantissa and pushes the mantissa onto the register stack. Following the operation, ST contains the value of the original mantissa and ST(1) contains the value of the unbiased exponent. Example .DATA a DD 3.0 b DD 7.0 c DD 2.0 posx DD 0.0 negx DD 0.0 .CODE . . . ; Solve quadratic equation - no error checking fld1 ; Get constants 2 and 4 fadd st,st ; 2 at bottom fld st ; Copy it fmul a ; = 2a fmul st(1),st ; = 4a fxch ; Exchange fmul c ; = 4ac fld b ; Load b fmul st,st ; = b^2 fsubr ; = b^2 - 4ac ; Negative value here produces error fsqrt ; = square root(b^2 - 4ac) fld b ; Load b fchs ; Make it negative fxch ; Exchange fld st ; Copy square root fadd st,st(2) ; Plus version = -b + root((b^2 - 4ac) fxch ; Exchange fsubp st(2),st ; Minus version = -b - root((b^2 - 4ac) fdiv st,st(2) ; Divide plus version fstp posx ; Store it fdivr ; Divide minus version fstp negx ; Store it This example solves quadratic equations. It does no error checking and fails for some values because it attempts to find the square root of a negative number. You could enhance the code by using the FTST instruction (see Section 17.7.1, "Comparing Operands to Control Program Flow") to check for a negative number or 0 just before the square root is calculated. If b^2 - 4ac is negative or 0, the code can jump to routines that handle special cases for no solution or one solution, respectively. 17.7 Controlling Program Flow The math coprocessors have several instructions that set control flags in the status word. The 8087-family control flags can be used with conditional jumps to direct program flow in the same way that 8086-family flags are used. Since the coprocessor does not have jump instructions, you must transfer the status word to memory so that the flags can be used by 8086-family instructions. An easy way to use the status word with conditional jumps is to move its upper byte into the lower byte of the processor flags. For example, use the following statements: fstsw mem16 ; Store status word in memory fwait ; Make sure coprocessor is done mov ax,mem16 ; Move to AX sahf ; Store upper word in flags As noted in Section 17.5.3, "Transferring Control Data," you can save several steps by loading the status word directly to AX on the 80287. Figure 17.3 shows how the coprocessor control flags line up with the processor flags. C3 overwrites the zero flag, C2 overwrites the parity flag, and C0 overwrites the carry flag. C1 overwrites an undefined bit, so it cannot be used directly with conditional jumps, although you can use the TEST instruction to check C1 in memory or in a register. The sign and auxiliary-carry flags are also overwritten, so you cannot count on them being unchanged after the operation. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section 17.7 of the manual │ └────────────────────────────────────────────────────────────────────────┘ See Section 15.1.2 for more information on using conditional-jump instructions based on flag status. 17.7.1 Comparing Operands to Control Program Flow The 8087-family coprocessors provide several instructions for comparing operands. All these instructions compare the stack top (ST) to a source operand, which may either be specified or implied as ST(1). The compare instructions affect the C3, C2, and C0 control flags. The C1 flag is not affected. Table 17.2 shows the flags set for each possible result of a comparison or test. Variations on the compare instructions allow you to pop the stack once or twice, and to compare integers and zero. For each instruction, the stack top is always the implied destination operand. If you do not give an operand, ST(1) is the implied source. Some compare instructions allow you to specify the source as a memory or register operand. Table 17.2 Control-Flag Settings after Compare or Test After FCOM After FTEST C3 C2 C0 ────────────────────────────────────────────────────────────────────────── ST > source ST is positive 0 0 0 ST < source ST is negative 0 0 1 ST = source ST is 0 1 0 0 Not comparable ST is NAN or 1 1 1 projective infinity ────────────────────────────────────────────────────────────────────────── The compare instructions are listed below. 17.7.1.1 Compare These instructions compare the stack top to the source. The source and destination are unaffected by the comparison. Syntax Description ────────────────────────────────────────────────────────────────────────── FCOM Compares ST to ST(1). FCOM ST(num) Compares ST to ST(num). FCOM mem Compares ST to mem. The memory operand can be a four- or eight-byte real number. FICOM mem Compares ST to mem. The memory operand can be a two- or four-byte integer. FTST Compares ST to 0. The control registers will be affected as if ST had been compared to 0 in ST(1). Table 17.2 above shows the possible results. 17.7.1.2 Compare and Pop These instructions compare the stack top to the source and then pop the stack. Thus, the destination is destroyed by the comparison. Syntax Description ────────────────────────────────────────────────────────────────────────── FCOMP Compares ST to ST(1) and pops ST off the register stack. FCOMP ST(num) Compares ST to ST(num) and pops ST off the register stack. FCOMP mem Compares ST to mem and pops ST off the register stack. The operand can be a four- or eight-byte real number. FICOMP mem Compares ST to mem and pops ST off the register stack. The operand can be a two- or four-byte integer. FCOMPP Compares ST to ST(1) and then pops the stack twice. Both the source and destination are destroyed by the comparison. Example IFDEF c287 .287 ENDIF .DATA down DD 10.35 ; Sides of a rectangle across DD 13.07 diameter DD 12.93 ; Diameter of a circle status DW ? .CODE . . . ; Get area of rectangle fld across ; Load one side fmul down ; Multiply by the other ; Get area of circle fld1 ; Load one and fadd st,st ; double it to get constant 2 fdivr diameter ; Divide diameter to get radius fmul st,st ; Square radius fldpi ; Load pi fmul ; Multiply it ; Compare area of circle and rectangle fcompp ; Compare and throw both away IFNDEF c287 fstsw status ; Load from coprocessor to memory fwait ; Wait for coprocessor mov ax,status ; Memory to register ELSE fstsw ax ; (for 287+, skip memory) ENDIF sahf ; to flags jp nocomp ; If parity set, can't compare jz same ; If zero set, they're the same jc rectangle ; If carry set, rectangle is bigger jmp circle ; else circle is bigger nocomp: . ; Error handler . same: . ; Both equal . rectangle: . ; Rectangle bigger . circle: . ; Circle bigger Notice how conditional blocks are used to enhance 80287 code. If you define the symbol c287 from the command line by using the /D symbol option (see Section B.4, "Defining Assembler Symbols"), the code is smaller and faster, but does not run on an 8087. 17.7.2 Testing Control Flags after Other Instructions In addition to the compare instructions, the FXAM and FPREM instructions affect coprocessor control flags. The FXAM instruction sets the value of the control flags based on the type of the number in the stack top (ST). This instruction is used to identify and handle special values, such as infinity, zero, unnormal numbers, denormal numbers, and NANs (not a number). Certain math operations are capable of producing these special-format numbers. A description of them is beyond the scope of this manual. The possible settings of the flags are shown in the on-line Help system. FPREM also sets control flags. Since this instruction must sometimes be repeated to get a correct remainder for large operands, it uses the C2 flag to indicate whether the remainder returned is partial (C2 is set) or complete (C2 is clear). If the bit is set, the operation should be repeated. FPREM also returns the least-significant three bits of the quotient in C0, C3, and C1. These bits are useful for reducing operands of periodic transcendental functions, such as sine and cosine, to an acceptable range. The technique is not explained here. The possible settings for each flag are shown in the on-line Help system. 17.8 Using Transcendental Instructions The 8087-family coprocessors provide a variety of instructions for doing transcendental calculations, including exponentiation, logarithmic calculations, and some trigonometric functions. Use of these advanced instructions is beyond the scope of this manual. However, the instructions are listed below for reference. All transcendental instructions have implied operands──either ST as a single-destination operand, or ST as the destination and ST(1) as the source. Instruction Description ────────────────────────────────────────────────────────────────────────── F2XM1 Calculates 2^x-1, where x is the value of the stack top. The value x must be between 0 and .5, inclusive. Returning 2^x-1 instead of 2^x allows the instruction to return the value with greater accuracy. The programmer can adjust the result to get 2^x. FYL2X Calculates Y times log2 X, where X is in ST and Y is in ST(1). The stack is popped, so both X and Y are destroyed, leaving the result in ST. The value of X must be positive. FYL2XP1 Calculates Y times log2 (X+1), where X is in ST and Y is in ST(1). The stack is popped, so both X and Y are destroyed, leaving the result in ST. The absolute value of X must be between 0 and the square root of 2 divided by 2. This instruction is more accurate than FYL2X when computing the log of a number close to 1. FPTAN Calculates the tangent of the value in ST. The result is a ratio Y/X, with Y replacing the value in ST and X pushed onto the stack so that after the instruction, ST contains Y and ST(1) contains X. The value being calculated must be a positive number less than pi/4. The result of the FPTAN instruction can be used to calculate other trigonometric functions, including sine and cosine. FPATAN Calculates the arctangent of the ratio Y/X, where X is in ST and Y is in ST(1). The stack is popped, so both X and Y are destroyed, leaving the result in ST. Both X and Y must be positive numbers less than infinity, and Y must be less than X. The result of the FPATAN instruction can be used to calculate other inverse trigonometric functions, including arcsine and arccosine. 17.9 Controlling the Coprocessor Additional instructions are available for controlling various aspects of the coprocessor. With the exception of FINIT, these instructions are generally used only by systems programmers. They are summarized below, but not fully explained or illustrated. Some instructions have a wait version and a no-wait version. The no-wait versions have N as the second letter. Syntax Description ────────────────────────────────────────────────────────────────────────── F[[N]]INIT Resets the coprocessor and restores all the default conditions in the control and status words. It is a good idea to use this instruction at the start and end of your program. Placing it at the start ensures that no register values from previous programs affect your program. Placing it at the end ensures that register values from your program will not affect later programs. F[[N]]CLEX Clears all exception flags and the busy flag of the status word. It also clears the error-status flag on the 80287, or the interrupt-request flag on the 8087. FINCSTP Adds 1 to the stack pointer in the status word. Do not use to pop the register stack. No tags or registers are altered. FDECSTP Subtracts 1 from the stack pointer in the status word. No tags or registers are altered. FREE ST(num) Marks the specified register as empty. FNOP Copies the stack top to itself, thus padding the executable file and taking up processing time without having any effect on registers or memory. 8087 Only The 8087 has the instructions FDISI, FNDISI, FENI, and FNENI. These instructions can be used to enable or disable interrupts. The 80287 coprocessor permits these instructions, but ignores them. Applications programmers will not normally need these instructions. Systems programmers should avoid using them so that their programs are portable to all coprocessors. ──────────────────────────────────────────────────────────────────────────── Chapter 18: Controlling the Processor The 8086-family processors provide instructions for processor control. These instructions are available on all 8086-family processors. System-control instructions have limited use in applications programming. They are primarily used by systems programmers who write operating systems and other control software. Since systems programming is beyond the scope of this manual, the systems-control instructions are summarized, but not explained in detail, in the sections below. This chapter ends with a description of all the directives that enable the instruction sets for the various processors in the 8086 family. 18.1 Controlling Timing and Alignment The NOP instruction takes up one byte of memory but does not have any effect when executed. The purpose of this instruction is generally to fill up space in the code segment; primarily, it is used to pad executable code for alignment. Although NOP has no effect, it does take a few clock cycles to execute. In a sense, NOP does do something──it is exactly equivalent to the following instruction: xchg ax,ax ; Exchange AX with itself Because NOP does use up some clock time, you can use it in timing loops by executing it many times. However, when writing a program for use on different machines, avoid using this technique. Timing loops that use NOP take different lengths of time on different machines. A better way to control timing is to use the DOS Get Time function, since it is based on the computer's internal clock rather than on the speed of the processor. QuickAssembler automatically inserts NOP instructions for padding when you use the ALIGN or EVEN directive (see Section 6.7, "Aligning Data") to align data or code on a given boundary. 18.2 Controlling the Processor The LOCK, WAIT, ESC, and HLT instructions control different aspects of the processor. These instructions can be used to control processes handled by external coprocessors. The 8087-family coprocessors are the coprocessors most commonly used with 8086-family processors, but 8086-based machines can work with other coprocessors if they have the proper hardware and control software. These instructions are summarized below: Instruction Description ────────────────────────────────────────────────────────────────────────── LOCK Locks out other processors until a specified instruction is finished. This is a prefix that precedes the instruction. It can be used to make sure that a coprocessor does not change data being worked on by the processor. WAIT Instructs the processor to do nothing until it receives a signal that a coprocessor has finished with a task being performed at the same time. See Section 17.4, "Coordinating Memory Access," for information on using WAIT or its coprocessor equivalent, FWAIT, with the 8087-family coprocessors. ESC Provides an instruction and possibly a memory operand for use by a coprocessor. QuickAssembler automatically inserts ESC instructions when required for use with 8087-family coprocessors. HLT Stops the processor until an interrupt is received. It can be used in place of an endless loop if a program needs to wait for an interrupt. 18.3 Processor Directives Processor and coprocessor directives define the instruction set that is recognized by QuickAssembler. They are listed and explained below: Directive Description ────────────────────────────────────────────────────────────────────────── .8086 The .8086 directive enables assembly of instructions for the 8086 and 8088 processors and the 8087 coprocessor. It disables assembly of the instructions unique to the 80186, 80286, and 80386 processors. This is the default mode and is used if no instruction set directive is specified. Using the default instruction set ensures that your program can be used on all 8086-family processors. However, if you choose this directive, your program will not take advantage of the more powerful instructions available on more advanced processors. .186 The .186 directive enables assembly of the 8086 processor instructions, 8087 coprocessor instructions, and the additional instructions for the 80186 processor. .286 The .286 directive enables assembly of the 8086 instructions plus the additional nonprivileged instructions of the 80286 processor. It also enables 80287 coprocessor instructions. If privileged instructions were previously enabled, the .286 directive disables them. This directive should be used for programs that will be executed only by an 80186, 80286, or 80386 processor. For compatibility with early versions of the Macro Assembler, the .286C directive is also available. It is equivalent to the .286 directive. .8087 The .8087 directive enables assembly of instructions for the 8087 math coprocessor and disables assem-bly of instructions unique to the 80287 coprocessor. It also specifies the IEEE format for encoding floating-point variables. This is the default mode and is used if no coprocessor directive is specified. This directive should be used for programs that must run with either the 8087, 80287, or 80387 coprocessors. .287 The .287 directive enables assembly of instructions for the 8087 floating-point coprocessor and the additional instructions for the 80287. It also specifies the IEEE format for encoding floating-point variables. Coprocessor instructions are optimized if you use this directive rather than the .8087 directive. Therefore, you should use it if you know your program will never need to run under an 8087 coprocessor. See Section 17.4, "Coordinating Memory Access," for an explanation. If you do not specify any processor directives, QuickAssembler uses the following defaults: 1. 8086/8088 processor instruction set 2. 8087 coprocessor instruction set 3. IEEE format for floating-point variables Normally, the processor and coprocessor directives can be used at the start of the source file to define the instruction sets for the entire assembly. However, it is possible to use different processor directives at different points in the source file to change assumptions for a section of code. For instance, you might have processor-specific code in different parts of the same source file. You can also turn privileged instructions on and off or allow unusual combinations of the processor and coprocessor. There are two limitations on changing the processor or coprocessor: 1. The directives must be given outside segments. You must end the current segment, give the processor directive, and then open another segment. See Section 5.1.5, "Using Predefined Segment Equates," for an example of changing the processor directives with simplified segment directives. 2. You can specify a lower-level coprocessor with a higher-level coprocessor, but an error message will be generated if you try to specify a lower-level processor with a higher-level coprocessor. The coprocessor directives have the opposite effect of the .MSFLOAT directive. .MSFLOAT turns off coprocessor instruction sets and enables the Microsoft Binary format for floating-point variables. Any coprocessor instruction turns on the specified coprocessor instruction set and enables IEEE format for floating-point variables. Examples ; .MSFLOAT affects the source file until turned off .MSFLOAT .8087 ; Ignored ; Illegal - can't use 8086 with 80287 .8086 .287 ──────────────────────────────────────────────────────────────────────────── Appendix A: Mixed-Language Mechanics The QuickAssembler PROC statement automates most details of interfacing to QuickC, as well as to other Microsoft high-level languages. When you use PROC with a parameter list or USES clause as described in Section 15.3.4, or when you use the LOCAL directive, the assembler generates code that properly enters and exits the procedure. The assembler also determines the location of each parameter on the stack for you. You refer to each parameter by a meaningful name, and the assembler translates each parameter name into the actual memory reference. The main purpose of this appendix is to show you what code the assembler generates when you use the LOCAL directive or expanded features of the PROC directive. However, you can write this code yourself rather than letting the assembler generate it. Doing so requires significant extra work, but it does give you complete control over your procedure. Using simplified segment directives is the easiest way to interface with a Microsoft high-level language (including QuickC). The simplified segment directives generate segment definitions similar to the ones generated by high-level languages and guarantee compatibility in the use of segment names and conventions. If you want to use full segment definitions, see Chapter 5, "Defining Segment Structure," for a description of the segments used in Microsoft languages. A.1 Writing the Assembly Procedure The Microsoft BASIC, C, FORTRAN, and Pascal compilers use roughly the same interface for procedure calls. This section describes the interface, so that you can call assembly procedures using essentially the same methods as Microsoft compiler-generated code. Procedures written with these methods can be called recursively. The standard assembly-interface method consists of these steps: 1. Setting up the procedure 2. Entering the procedure 3. Allocating local data (optional) 4. Preserving register values 5. Accessing parameters 6. Returning a value (optional) 7. Exiting the procedure The PROC statement, when used with a parameter list or USES clause, automates steps 1 and 2, and simplifies step 5 for you. The LOCAL directive automates step 3, and the USES clause automates step 4. Finally, if you use any of these features, the assembler automatically generates all the proper code to exit (step 7) wherever it encounters a RET directive. (However, the RETF and RETN statements never generate automatic code.) Sections A.1.1-A.1.7 describe these steps. A.1.1 Setting Up the Procedure The linker cannot combine the assembly procedure with the calling program unless compatible segments are used and unless the procedure itself is declared properly. The following points may be helpful: 1. Use the .MODEL directive at the beginning of the source file; this directive automatically causes the appropriate kind of returns to be generated (NEAR for small or compact model, FAR otherwise). Modules called from Pascal should be declared as .MODEL LARGE; modules called from BASIC should be .MODEL MEDIUM. 2. Use the simplified segment directives: .CODE to declare the code segment and .DATA to declare the data segment. (Having a code segment is sufficient if you do not have data declarations.) 3. Tie procedure label must be public. This makes the procedure available to be called by other modules. If you specify a language type with the .MODEL directive, the assembler automatically makes all procedure names public, but you must use the PUBLIC directive if you don't specify the language. Also, any data you want to make public to other modules must be declared as PUBLIC. 4. Global data or procedures accessed by the routine (but defined in other modules) must be declared EXTRN. A.1.2 Entering the Procedure Two instructions begin the procedure: push bp mov bp,sp This sequence establishes BP as the framepointer. The framepointer is used to access parameters and local data, which are located on the stack. SP cannot be used for this purpose because it is not an index or base register. Also, the value of SP may change as more data is pushed onto the stack. However, the value of the base register (BP) will remain constant throughout the procedure, so that each parameter can be addressed as a fixed displacement from the location pointed to by BP. The instruction sequence above first saves the value of BP, since it will be needed by the calling procedure as soon as the current procedure terminates. Then BP is loaded with the value of SP in order to capture the value of the stack pointer at the time of entry to the procedure. The PROC statement generates these two lines of code automatically if you use a parameter list, LOCAL directive, or USES clause. ────────────────────────────────────────────────────────────────────────── NOTE If you alter the direction flag with the STD instruction, make sure you reset this flag with the CLD instruction before you exit. ────────────────────────────────────────────────────────────────────────── A.1.3 Allocating Local Data (Optional) Local variables are also called dynamic, stack, or automatic variables. An assembly procedure can use the same technique for implementing local data used by high-level languages. To set up local data space, decrease the contents of SP in the third instruction of the procedure. (To ensure correct execution, you should always increase or decrease SP by an even amount.) Decreasing SP reserves space on the stack for the local data. The space must be restored at the end of the procedure. push bp mov bp,sp sub sp,space In the code above, space is the total size in bytes of the local data. Local variables are then accessed as fixed, negative displacements from the location pointed to by BP. Example push bp mov bp,sp sub sp,4 . . . mov WORD PTR [bp-2],0 mov WORD PTR [bp-4],0 The example above uses two local variables, each of which is two bytes in size. SP is decreased by 4, since there are four bytes total of local data. Later, each of the variables is initialized to 0. These variables are never formally declared with any assembler directive; the programmer must keep track of them manually. The LOCAL directive uses this same method for creating local variables. However, when you use LOCAL, you can refer to a local variable by a symbolic name rather than by a reference, such as WORD PTR [bp-2]. A.1.4 Preserving Register Values A procedure called from any of the Microsoft high-level languages should preserve the values of SI, DI, SS, and DS (in addition to BP, which is already saved). Therefore, push any of these register values that the procedure alters. If the procedure does not change the value of any of these registers, the registers do not need to be pushed. The recommended method (used by high-level languages) is to save registers after the framepointer is set and local data (if any) is allocated. push bp ; Save old framepointer mov bp,sp ; Establish current framepointer sub sp,4 ; Allocate local data space push si ; Save SI and DI push di . . . In the example above, DI and SI (in that order) must be popped from the stack before the end of the procedure. The USES clause in a PROC statement causes the assembler to generate this same code. A.1.5 Accessing Parameters When you use PROC with a parameter list, the assembler calculates the location of each parameter on the stack. This section shows how the assembler determines these locations. If you do not use a parameter list, you must calculate parameter locations yourself and refer to them explicitly by their offsets from BP. Otherwise, you can refer to each parameter by the name you gave it in the parameter list. To write instructions that can access parameters, consider the general picture of the stack frame after a procedure call, as illustrated in Figure A.1. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section A.1.5 of the manual │ └────────────────────────────────────────────────────────────────────────┘ When determining the order of parameters on the stack, note that the C calling convention (the default for QuickC) specifies that parameters are passed in the reverse order they appear in source code. The non-C calling convention (which you can specify in QuickC with the pascal or fortran keyword) specifies that parameters are passed in the same order they appear in source code. The stack frame for the procedure is established by the following sequence: 1. The calling program pushes each of the parameters on the stack, after which SP points to the last parameter pushed. 2. The calling program issues a CALL instruction, which causes the return address (the place in the calling program to which control will ultimately return) to be placed on the stack. This address may be either two bytes long (for near calls) or four bytes long (for far calls). SP now points to this address. 3. The first instruction of the called procedure saves the old value of BP, with the instruction push bp. SP now points to the saved copy of BP. 4. BP is used to capture the current value of SP, with the instruction mov bp,sp. BP therefore now points to the old value of BP. 5. Whereas BP remains constant throughout the procedure, SP may be decreased to provide room on the stack for local data or saved registers. In general, the displacement (from the location pointed to by BP) for a parameter X is equal to: 2 + size of return address + total size of parameters between X and BP For example, consider a FAR procedure that has received one parameter, a two-byte address. The displacement of the parameter would be: Argument's displacement = 2 + size of return address = 2 + 4 = 6 The argument can thus be loaded into BX with the following instruction: mov bx,[bp+6] Once you determine the displacement of each parameter, you may want to use string equates or structures so that the parameter can be referenced with a single identifier name in your assembly source code. For example, the parameter above at BP+6 can be conveniently accessed if you put the following statement at the beginning of the assembly source file: Arg1 EQU <[bp+6]> You could then refer to this parameter as Arg1 in any instruction. Use of this feature is optional. ────────────────────────────────────────────────────────────────────────── NOTE Microsoft high-level languages always push segment addresses before pushing offset addresses. Furthermore, when pushing arguments larger than two bytes, high-order words are always pushed before low-order words. This standard for pushing segment addresses before pushing offset addresses facilitates the use of the LES and LDS instructions, as described in Chapter 3, "Writing Assembly Modules for C Programs." ────────────────────────────────────────────────────────────────────────── A.1.6 Returning a Value (Optional) The assembler does not generate code to return a value. If you want your procedure to return a value, you must take care of the details yourself. Microsoft BASIC, C, FORTRAN, and Pascal share similar conventions for receiving return values. The conventions are the same when the data type to be returned is simple (that is, not an array or structured type) and is no more than four bytes long. This includes all NEAR and FAR address types (in other words, all pointers and all parameters passed by reference). Data Size Returned in Register ────────────────────────────────────────────────────────────────────────── 1 byte AL 2 bytes AX 4 bytes High-order portion (or segment address) in DX; low-order portion (or offset address) in AX When the return value is larger than four bytes, a procedure called by C must allocate space for the return value and then place its address in DX:AX. You can create space for the return value by simply declaring it in a data segment. If your assembly procedure uses the non-C calling convention, it must use a special convention in order to return floating-point values, records, user-defined types and arrays, and values larger than four bytes. This convention is presented below. BASIC/FORTRAN/Pascal Long Return Values To create an interface for long return values, modules using the non-C calling convention take the following actions before they call your procedure: 1. The calling modules create space, somewhere in the stack segment, to hold the actual return value. 2. When the call to your procedure is made, an extra parameter is passed containing the offset address of the actual return value. This parameter is placed immediately above the return address. (In other words, this parameter is the last one pushed.) 3. The segment address of the return value is contained in both SS and DS. The extra parameter (containing the offset address of the return value) is always located at BP+6. Furthermore, its presence automatically increases the displacement of all other parameters by 2, as shown in Figure A.2. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section A.1.6 of the manual │ └────────────────────────────────────────────────────────────────────────┘ Your assembly procedure will successfully return a long value if you follow these steps: 1. Put the data for the return value at the location pointed to by the return value offset. 2. Copy the return-value offset (located at BP+6) to AX, and copy SS to DX. This is necessary because the calling module expects DX:AX to point to the return value. 3. Exit the procedure as described in the next section. A.1.7 Exiting the Procedure Several steps may be involved in terminating the procedure: 1. If any of the registers SS, DS, SI, or DI have been saved, these must be popped off the stack in the reverse order that they were saved. 2. If local data space was allocated at the beginning of the procedure, SP must be restored with the instruction mov sp,bp. 3. Restore BP with pop bp. This step is always necessary. 4. Finally, return to the calling program with ret. If the BASIC, FORTRAN, or Pascal calling convention is in use, you can use the ret n form of the instruction to adjust the stack with respect to the parameters that were pushed by the caller. (If the procedure is called by a C module, the calling module will perform this adjustment.) Examples pop bp ret The example above shows the simplest possible exit sequence. No registers were saved, no local data space was allocated, and the C calling convention is in use. pop di ; Pop saved regs pop si mov sp,bp ; Remove local data space pop bp ; Restore old framepointer ret 6 ; Exit, and restore 6 byte of args The example above shows an exit sequence for a procedure that has previously saved SI and DI, allocated local data space, and uses a non-C calling convention. The procedure must therefore use ret 6 (where n is 6) to restore the six bytes of parameters on the stack. Assuming you use one of the automated features described above (such as a parameter list or LOCAL directive), the assembler generates all the code to properly exit from a procedure whenever it encounters a RET instruction. However, the assembler does not generate any exit code when you use the directives RETN or RETF. A.2 Calls from Modules Using C Conventions Most of the details below are automated when you use simplified segment directives and the expanded features of the PROC directive. Make sure to declare both a language type and a memory model with the .MODEL directive. This section reviews all the steps taken when you use the C language type. In addition to the steps outlined in Section A.1, the assembler observes the following rules to set up an interface to C. Follow these rules if you want to manually establish this interface: 1. Declare procedures called from C as FAR if the C module is compiled in large, huge, or medium model, and NEAR if the C module is compiled in small or compact model (although the near and far keywords can override these defaults). The correct declaration for the procedure is made implicitly when you use the .MODEL directive. Note that tiny memory model is not supported by QuickC 2.0. 2. Observe the C calling convention. a. Return with a simple ret instruction. Do not restore the stack with ret size, since the calling C routine will restore the stack itself as soon as it resumes control. b. Parameters are placed on the stack in the reverse order that they appear in the C source code. The first parameter will be lowest in memory (because it is placed on the stack last and the stack grows downward). c. By default, C parameters are passed by value, except for arrays, which are passed by reference. As a rule, do not expect an address to be placed on the stack, unless the C code specifically refers to a pointer or array in the function call or prototype. 3. Observe the C naming convention. Include an underscore (_) in front of any name that will be shared publicly with C. C recognizes only the first eight characters of any name, so do not make names shared with C longer than eight characters. Also, if you plan to link with the /NOIGNORECASE option, remember that C is case sensitive and does not convert names to uppercase. To preserve lowercase names in public symbols, choose Preserve Case or Preserve Extrn from the Assembler Flags dialog box, or assemble with the /Cl or /Cx option on the QCL command line. In the example program below, C calls an assembly procedure that calculates "A x 2^B," where A and B are the first and second parameters, respectively. The calculation is performed by shifting the bits in A to the left, B times. extern int power2( int, int ), main () { printf( "3 times 2 to the power of 5 is %d\n", power2( 3, 5 ) ); } The C program uses an extern declaration to create an interface with the assembly procedure. No special keywords are required because the assembly procedure will use the C calling convention. To understand how to write the assembly procedure, consider how the parameters are placed on the stack, as illustrated in Figure A.3. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section A.2 of the manual │ └────────────────────────────────────────────────────────────────────────┘ The return address is two bytes long, assuming that the C module is compiled in small or compact model. If the C module is compiled in large, huge, or medium model, the addresses of Arg 1 and Arg 2 are each increased by 2, to BP+6 and BP+8, respectively, because the return address will be four bytes long. Arg 1 (parameter 1) is lower in memory than Arg 2, because C pushes arguments in the reverse order that they appear. Each argument is passed by value. The assembly procedure can be written as follows: .MODEL small .CODE PUBLIC _power2 _power2 PROC push bp ; Entry sequence - save old BP mov bp,sp ; Set stack framepointer mov ax,[bp+4] ; Load Arg1 into AX mov cx,[bp+6] ; Load Arg2 into CX shl ax,cl ; AX = AX * (2 to power of CX) ; Leave return value in AX pop bp ; Exit sequence - restore old BP ret ; Return _power2 ENDP END The example above assumes that the C module is compiled in small model. The parameter offsets and the .MODEL directive will change for different models. Note that ret without a size variable is used, since the caller will adjust the stack upon return from the call. A.3 Calls from Non-C Modules In your C programs you can specify the pascal or fortran function type. These keywords are equivalent: they both specify use of non-C calling and naming conventions. Furthermore, you may want to interface to languages other than C (which you can do by linking .OBJ files together outside the environment). In all these cases, make sure you specify BASIC, FORTRAN, or Pascal as the language type with the .MODEL directive. Alternately, you can specify the language as part of the procedure if you are using the extended PROC directive. This section reviews all the steps taken when you use a non-C language type. In addition to the steps outlined in Section A.1, the assembler observes the following rules to set up an interface to a language using a non-C calling convention. Follow these rules if you want to manually establish an interface to a high-level language: 1. If the procedure is called from Microsoft BASIC, Pascal, or FORTRAN, make sure to declare the procedure as FAR, or use the .MODEL directive to specify medium or large memory model. BASIC always uses medium memory model; Pascal uses large memory model. 2. Observe the non-C calling convention. a. Upon exit, the procedure must reset SP to the value it had before the parameters were placed on the stack. This is accomplished with the instruction ret size, where size is the total size in bytes of all the parameters. b. Parameters are placed on the stack in the same order in which they appear in the high-level language source code. The first parameter will be highest in memory (because it is placed on the stack first and the stack grows downward). c. Each language has different defaults for passing parameters by value or reference. When a language passes by reference, it places a data pointer on the stack. When it passes by value, it places a complete copy of the parameter on the stack. Consult your language documentation for the details of when the language passes by value or reference. (In C, the default is by value except for arrays.) 3. Observe the language naming convention. Microsoft BASIC, FORTRAN, and Pascal output symbolic names in uppercase characters, which is also the default behavior of the assembler. Each language recognizes a different number of characters in a name. For example, BASIC recognizes up to 40 characters of a name, whereas the assembler recognizes only the first 31. In the following example program, QuickBASIC 4.0 calls an assembly procedure that calculates "A x 2B," where A and B are the first and second parameters, respectively. The calculation is performed by shifting the bits in A to the left, B times. (Note: with earlier versions of BASIC, you need to rewrite the example so that it calls a subprogram, not a function.) ' BASIC program DEFINT A-Z PRINT "3 times 2 to the power of 5 is "; PRINT Power2(3,5) END To understand how to write the assembly procedure, consider how the parameters are placed on the stack, as illustrated in Figure A.4. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section A.3 of the manual │ └────────────────────────────────────────────────────────────────────────┘ The return address is four bytes long because procedures called from BASIC must be FAR. Arg 1 (parameter 1) is higher in memory than Arg 2 because BASIC pushes arguments (parameters) in the same order in which they appear. Also, each argument is passed as a two-byte offset address, the BASIC default. The assembly procedure can be written as follows: .MODEL medium .CODE PUBLIC Power2 Power2 PROC push bp ; Entry sequence - save old BP mov bpsp ; Set stack framepointer mov bx,[bp+8] ; Load Arg1 into mov ax,[bx] ; AX mov bx,[bp] ; Load Arg2 into mov cx,[bx] ; CX shl ax,cl ; AX = AX * (2 to power of CX) ; Leave return value in AX pop bp ; Exit sequence - restore old BP ret 4 ; Return, and restore 4 bytes Power2 ENDP END Note that each parameter must be loaded in a two-step process because the address of each is passed rather than the value. Also, note that the stack is restored with the instruction ret 4, since the total size of the parameters is four bytes. A.4 Calling High-Level Languages from Assembly Language Many high-level-language routines assume that certain initialization code has previously been executed; you can ensure that the proper initialization is performed by starting in a high-level-language module, and then calling an assembly procedure. The assembly procedure can then call high-level-language routines as needed, as shown in Figure A.5. ┌────────────────────────────────────────────────────────────────────────┐ │ This figure can be found in Section A.4 of the manual │ └────────────────────────────────────────────────────────────────────────┘ To execute an assembly call to a high-level language, you need to observe the following guidelines: 1. Push each parameter onto the stack, observing the calling convention of the high-level language. Constants, such as offset addresses, must first be loaded into a register before being pushed. 2. With long parameters, always push the segment or high-order portion of the parameter first, regardless of the calling convention. 3. If you are using the BASIC/FORTRAN/Pascal calling convention with a function that returns a noninteger value, allocate an additional two-byte parameter. This additional parameter should contain the offset of the location where you want the value returned and must be pushed on the stack last. 4. Execute a call. The call must be far unless the high-level-language routine is small model. 5. If the routine used the C calling convention, after the call you must immediately clear the stack of parameters with the instruction add sp, size, where size is the total size in bytes of all parameters that were pushed. A.5 Using Full Segment Definitions If you use the simplified segment directives by themselves, you do not need to know the names assigned for each segment. However, if you choose to use full segment definitions, you should use the SEGMENT, GROUP, ASSUME, and ENDS directives equivalent to the simplified segment directives. The following example shows the C-assembly program from Section A.3, without the simplified segment directives: _TEXT SEGMENT WORD PUBLIC 'CODE' ASSUME cs:_TEXT PUBLIC _Power2 _Power2 PROC push bp ; Entry sequence - save BP mov bp,sp ; Set stack frame mov ax,[bp+4] ; Load Arg1 into AX mov cx,[bp+6] ; Load Arg2 into CX shl ax,cl ; AX = AX * (2 to power of CX) ; Leave return value in AX pop bp ; Exit sequence - restore BP ret ; Return _Power2 ENDP _TEXT ENDS END ──────────────────────────────────────────────────────────────────────────── Appendix B: Using Assembler Options with QCL You can use the QCL driver for both compiling and assembling. The driver compiles .C files and assembles .ASM files. Unless the /c option is given, the QCL driver then links together all resulting .OBJ files, as well as any .OBJ files specified on the command line. The default file extension is .OBJ. If you acquired QuickAssembler as an upgrade, make sure you use the version of QCL that came with the QuickAssembler package. This driver program is an updated and expanded version, and it supports assembly options in addition to all the compile options listed in the QuickC Tool Kit. The following options may affect work with .ASM files, but are not described here because they work precisely the same way as described in the QuickC Tool Kit: Option Action ────────────────────────────────────────────────────────────────────────── /help Print help listing for QCL /link flags Specify linker flags /Fefile Specify output file /Fofile Name object file /Z{d|i} Generate debugging information. The /c, /D, and /W options are documented in the QuickC Tool Kit, but are also documented here because their meaning and usage change somewhat for assembly-language files. In addition to the linker options documented in the QuickC Tool Kit, QCL supports one other option, /TINY. This option causes the linker to output a .COM file, if possible. The linker can only create a .COM file if the program is entirely written in assembly language, and all the modules observe the rules for the .COM format. (The easiest way to do this is to use tiny memory model as described in Chapter 5.) The following example generates a .COM file: QCL /AT TINYPROG.ASM /link /TINY The /AT option causes the assembler to check the assembly code for adherence to the .COM format. The /TINY linker option causes the linker to generate a .COM file. The QuickAssembler version of QCL supports the following options in addition to the ones supported for use with C-language modules: Option Action ────────────────────────────────────────────────────────────────────────── /a Writes segments in alphabetical order /AT Requires program to use tiny memory model; gives error messages for code that violates requirements of .COM format /C{l|u|x} Determines case sensitivity (l=preserve case, u=convert to upper, x=preserve case of external and public symbols) /D Defines symbols /Ez Displays error lines on screen /Flfile Generates an assembly-listing file with given file name /FPi Creates code for emulated floating-point instructions /l Generates an assembly-listing file /P1 Enables one-pass assembly /s Writes segments in source-code order (reverses effect of /a) /Sa Lists all lines of macro expansions (assumes /Flfile or /l is given) /Sd Adds pass 1 information to listing (assumes /Flfile or /l is given) /Se Creates editor-oriented listing file; the resulting listing has no page breaks or page headings (assumes /Flfile or /l is given) /Sn Suppresses symbol-table in listing (assumes /Flfile or /l is given) /Sq Generates an editor-based listing file with a source-line index at the end (assumes Flfile or /l is given) /Sx Suppresses listing of false conditionals (assumes Flfile or /l is given) /t Suppresses messages if assembly is successful /v Displays extra statistics during assembly /w Equivalent to /W0 /W{0|1|2} Sets warning-message level B.1 Specifying the Segment-Order Method Syntax /s Default /a The /a option directs QuickAssembler to place the assembled segments in alphabetical order before copying them to the object file. The /s option directs the assembler to write segments in the order in which they appear in the source code. Source-code order is the default. If no option is given, QuickAssembler copies the segments in the order encountered in the source file. The /s option is provided for compatibility with the XENIX(R) operating system and for overriding a default option in the QuickAssembler environment variable. ────────────────────────────────────────────────────────────────────────── NOTE Some previous versions of the IBM Macro Assembler ordered segments alphabetically by default. Listings in some books and magazines have been written with these early versions in mind. If you have trouble assembling and linking a listing taken from a book or magazine, try using the /a option. ────────────────────────────────────────────────────────────────────────── The order in which segments are written to the object file is only one factor in determining the order in which they will appear in the executable file. The significance of segment order and ways to control it are discussed in Sections 5.2.1, "Setting the Segment-Order Method" and 5.2.2.2, "Defining Segment Combinations with Combine Type." Example QCL /a file.asm The example above creates an object file, FILE.OBJ, whose segments are arranged in alphabetical order. If the /s option were used instead, or if no option were specified, the segments would be arranged in sequential order. B.2 Checking Code for Tiny Model Syntax /AT The /AT option causes the assembler to enforce the requirements of .COM format. If the .MODEL directive is used, /AT generates an error unless the directive specifies tiny memory model. If the .MODEL directive is not used, the /AT option generates an error if any program-defined segments are referenced (since these references violate conditions of .COM format). The use of /AT alone does not generate a .COM file. You must also use the /TINY linker option, as in the following example: QCL /AT TINYPROG.ASM /link /TINY B.3 Selecting Case Sensitivity Syntax /Cu Default /Cl /Cx The /Cl option directs the assembler to make all names case sensitive. The /Cx option directs the assembler to make public and external names case sensitive. The /Cu option directs the assembler to convert all names to uppercase. By default, QuickAssembler converts all names to uppercase (/Cu). If case sensitivity is turned on, all names that have the same spelling but use letters of different cases are considered distinct. For example, with the /Cl option, DATA and data are different. They would also be different with the /Cx option if they were declared external or public. Public and external names include any label, variable, or symbol names defined by using the EXTRN, PUBLIC, or COMM directives (see Chapter 8, "Creating Programs from Multiple Modules"). If you use the /Zi or /Zd option (these cause QCL to include debugging information), the /Cx, /Cl, and /Cu options affect the case of the symbolic data that will be available to a symbolic debugger. The /Cl and /Cx options are typically used when object modules created with QuickAssembler are to be linked with object modules created by a case-sensitive compiler such as the Microsoft C compiler. If case sensitivity is important, you should also use the linker /NOI option. Example QCL /Cx module.asm This example shows how to use the /Cx option with QuickAssembler to assemble a file with case-sensitive public symbols. B.4 Defining Assembler Symbols Syntax /Dsymbol[[=value]] The /D option, when given with a symbol argument, directs QuickAssembler to define a symbol that can be used during the assembly as if it were defined as a text equate in the source file. Multiple symbols can be defined in a single command line. The value can be any text string that does not include a space, comma, or semicolon. If value is not given, the symbol is assigned a null string. Example QCL /Dwide /Dmode=3 file,,; This example defines the symbol wide and gives it a null value. The symbol could then be used in the following conditional-assembly block: IFDEF wide PAGE 50,132 ENDIF When the symbol is defined in the command line, the listing file is formatted for a 132-column printer. When the symbol is not defined in the command line, the listing file is given the default width of 80 (see the description of the PAGE directive in Section 12.2, "Controlling Page Format in Listings"). The example also defines the symbol mode and gives it the value 3. The symbol could then be used in a variety of contexts, as shown below: IF mode LT 15 ; Use in expression scrmode DB mode ; Initialize to mode ELSE scrmode DB 15 ; Initialize to 15 ENDIF B.5 Displaying Error Lines on the Screen Syntax /Ez The /Ez option directs QuickAssembler to display lines containing errors on the screen. Normally, when the assembler encounters an error, it displays only an error message describing the problem. When you use the /Ez option in the command line, the assembler displays the source line that produced the error in addition to the error message. QuickAssembler assembles faster without the /Ez option, but you may find the convenience of seeing the incorrect source lines worth the slight cost in processing speed. Example QCL /Ez file.asm B.6 Creating Code for a Floating-Point Emulator Syntax /FPi 87 /FPi The /FPi and /FPi87 options control how instructions for a math coprocessor (such as the 8087, 80287, or 80387) are assembled. The /FPi option tells the assembler to generate code for a coprocessor emulator library. The /FPi87 option tells the assembler to generate code for a coprocessor. These options are different than most other QuickAssembler options in that the default for C files is /FPi, but the default for assembler files is /FPi87. They are also different in that the options must be specified separately for each file. An emulator library uses the instructions of a coprocessor if one is present; otherwise, the library executes interrupts that emulate coprocessor instructions. Emulator libraries are available for QuickC and other high-level language compilers, including Microsoft Pascal, BASIC, and FORTRAN compilers. With QuickAssembler, you should specify /FPi only for assembly modules that will be linked with a main C module, since the emulator code requires the start-up code generated by the C compiler. A stand-alone assembler program generated with /FPi will execute emulator interrupts, but the program will not work because the interrupts will not be initialized. If you are programming in the QC environment and you want the emulator library to be used with an assembler module, you must specify /FPi in the Global Custom Flags field of the Assembler Flags dialog box (reached from the Options menu). This will affect all assembly modules in the program list. To the applications programmer, writing code for the emulator is like writing code for a coprocessor. The instruction sets are the same (except as noted in Chapter 17, "Calculating with a Math Coprocessor"). However, at run time the coprocessor instructions are used only if there is a coprocessor available on the machine. If there is no coprocessor, the slower code from the emulator library is used instead. The /FPi87 option specifies that coprocessor instructions should be generated directly. It does not need to be given directly for assembly modules, since it is the default, but it must be specified for C modules. Programs that use this option can be run only on a system that has a coprocessor. If the program contains a main C module, it will fail with a warning if the system has no coprocessor. If the program is a stand-alone assembler program, you should write the code to check for a coprocessor and terminate with an error message if no coprocessor exists. Example QCL calc.c /FPi /Cx math.asm The example above assembles MATH.ASM with the /FPi option and compiles the C source file CALC.C. The resulting object files are then linked together to produce the file CALC.EXE. The C compiler generates emulator code for floating-point instructions. The FORTRAN, BASIC, and Pascal compilers generate similar code. B.7 Creating Listing Files Syntax /l /Flfile The /l option directs QuickAssembler to create a listing file. Files specified with this option always have the base name of the source file plus a .LST extension. You cannot specify any other file name. The /Fl option has the same purpose as /l, but lets you specify any file name as the listing file. The default file name is the base file name plus a .LST extension. Example QCL /l prog.asm This example causes the assembler to generate the file PROG.LST during assembly. B.8 Enabling One-Pass Assembly Syntax /P1 The /P1 option causes the assembler to attempt translation of source code in one pass. If successful, the translation is significantly faster than the default two-pass assembly. Assembly modules cannot be successfully assembled with this option if they contain conditional-assembly directives that make references to pass 1 or pass 2. ────────────────────────────────────────────────────────────────────────── NOTE One-pass assembly is not compatible with the generation of listing files or the /a option for specifying alphabetical segment order. ────────────────────────────────────────────────────────────────────────── If the assembler generates a message reporting that one-pass assembly is not possible, simply assemble the file again without using this option. Example QCL /P1 file.asm B.9 Listing All Lines of Macro Expansions Syntax /Sa The /Sa option causes the listing file to contain all statements generated by the assembler. It overrides directives that limit listings such as .XLIST, .XALL, and .SFCOND. It forces display of all statements generated automatically by simplified segment directives and the extended PROC syntax. The /Sa option has no effect unless /l or /Fl is also specified. Example QCL /l /Sa file.asm B.10 Creating a Pass 1 Listing Syntax /Sd The /Sd option causes the listing file to contain the results of both assembler passes. A pass 1 listing is typically used to locate phase errors. Phase errors occur when the assembler makes assumptions about the program in pass 1 that are not valid in pass 2. The /Sd option has no effect unless /l or /Fl is also specified. Example QCL /l /Sd file.asm B.11 Specifying an Editor-Oriented Listing Syntax /Se The /Se option causes the assembler to generate the listing file in a format suited to text editors. This format does not contain page breaks or page headings. The default behavior, which is designed for files output to a printer, assumes a page break and heading at periodic intervals. The /Se option has no effect unless /l or /Fl is also specified. Example QCL /l /Se file.asm B.12 Suppressing Tables in the Listing File Syntax /Sn The /Sn option tells the assembler to omit all tables from the end of the listing file. If this option is not chosen, QuickAssembler includes tables of macros, structures, records, segments and groups, and symbols. The code portion of the listing file is not changed by the /Sn option. The /Sn option has no effect unless /l or /Fl is also specified. Example QCL /l /Sn file.asm B.13 Adding a Line-Number Index to the Listing Syntax /Sq The /Sq option generates an editor-based listing file just as the /Se option does, but it also adds a source-line index to the end of the listing file. This index contains pairs of corresponding line numbers for the listing file and appropriate source files. The QuickC/QuickAssembler environment uses this information to let you move from a source file to the corresponding position in a listing file. When you create a listing file from within the QuickC/QuickAssembler environment, QC.EXE automatically passes this option to the assembler. The /Sq option has no effect unless /l or /Fl is also specified. Example QCL /l /Sq file.asm B.14 Listing False Conditionals Syntax /Sx The /Sx option directs QuickAssembler to copy to the assembly listing all statements forming the body of conditional-assembly blocks whose condition is false. If you do not give the /Sx option in the command line, QuickAssembler suppresses all such statements. The /Sx option lets you display conditionals that do not generate code. Conditional-assembly directives are explained in Chapter 12, "Controlling Assembly Output." The .LFCOND, .SFCOND, and .TFCOND directives can override the effect of the /Sx option, as described in Section 12.3.2, "Controlling Listing of Conditional Blocks." The /Sx option does not affect the assembly listing unless you direct the assembler to create an assembly-listing file. Example QCL /Sx file,,; Listing of false conditionals is turned on when FILE.ASM is assembled. Directives in the source file can override the /Sx option to change the status of false-conditional listing. B.15 Controlling Display of Assembly Statistics Syntax /v /t The /v and /t options specify the level of information displayed to the screen at the end of assembly (/v is a mnemonic for verbose; /t is a mnemonic for terse). If neither option is given, QuickAssembler outputs a line telling the amount of symbol space free and the number of warnings and errors. If the /v option is given, QuickAssembler also reports the number of lines and symbols processed. If the /t option is given, QuickAssembler does not output anything to the screen unless errors are encountered. This option may be useful in batch or make files if you do not want the output cluttered with unnecessary messages. If errors are encountered, they will be displayed whether these options are given or not. B.16 Setting the Warning Level Syntax /W{0 | 1 | 2} /w The /W option sets the assembler warning level. QuickAssembler gives warning messages for assembly statements that are ambiguous or questionable but not necessarily illegal. Some programmers purposely use practices that generate warnings. By setting the appropriate warning level, they can turn off warnings if they are aware of the problem and do not wish to take action to remedy it. The /w option is equivalent to /W0. QuickAssembler has three levels of errors, as shown in Table B.1. Table B.1 Warning Levels Level Type Description ────────────────────────────────────────────────────────────────────────── 0 Severe errors Illegal statements 1 Serious Ambiguous statements or questionable warnings programming practices 2 Advisory Statements that may produce inefficient warnings code ────────────────────────────────────────────────────────────────────────── The default warning level is 1. A higher warning level includes all of the messages reported by a lower level. Level 2 includes severe errors, serious warnings, and advisory warnings. If severe errors are encountered, no object file is produced. Warning level 0 reports error messages in the range 1000-2999. Warning level 1 reports warning and error messages in the ranges 1000-2999 and 4000-4999. Warning level 2 reports all warning and error messages, including those numbered 5000 and above. ──────────────────────────────────────────────────────────────────────────── Appendix C: Reading Assembly Listings QuickAssembler creates an assembly listing of your source file whenever you give an assembly-listing option on the QCL command line or select a listing file option in the Assembler Flags dialog box. The assembly listing contains both the statements in the source file and the object code (if any) generated for each statement. The listing also shows the names and values of all labels, variables, and symbols in your source file. The assembler creates tables for macros, structures, records, segments, groups, and other symbols. These tables are placed at the end of the assembly listing (unless you suppress them with the QCL /Sn option). QuickAssembler lists only the types of symbols encountered in the program. All symbol names will be shown in uppercase letters unless you choose Preserve Case or Preserve Extrn from the Assembler Flags dialog box or use a QCL option (/Cx or /Cl) that supports case sensitivity. C.1 Reading Code in a Listing The assembler lists the code generated from the statements of a source file. Each line has the syntax shown below: offset [[code]] statement The offset is the offset from the beginning of the current segment to the code. If the statement generates code or data, code shows the numeric value in hexadecimal if the value is known at assembly time. If the value is calculated at run time, the assembler indicates what action is necessary to compute the value. The statement is the source statement shown exactly as it appears in the source file, or as expanded by a macro. If any errors occur during assembly, each error message and error number will appear directly below the statement where the error occurred. An example of an error line and message is shown below: 0012 E8 001C R call doit test.ASM(46): error A2071: Forward needs override or FAR The assembler uses the symbols and abbreviations in Table C.1 to indicate addresses that need to be resolved by the linker or values that were generated in a special way. Table C.1 Symbols and Abbreviations in Listings Character Meaning ────────────────────────────────────────────────────────────────────────── R Relocatable address (linker must resolve) E External address (linker must resolve) ---- Segment/group address (linker must resolve) = EQU or equal-sign (=) directive nn: Segment override in statement nn/ REP or LOCK prefix instruction nn[xx] DUP expression: nn copies of the value xx n Macro-expansion nesting level (+ if more than nine) C Line from include file ────────────────────────────────────────────────────────────────────────── Example The sample listing shown in this section is produced using the /Se option, which produces an editor-oriented listing. The QuickC/QuickAssembler environment always produces this kind of listing. The editor-oriented environment produces no page headings and is thus ideal for viewing within the environment or another editor. If you are using QCL to generate a listing file that you intend to print, you may want to generate a printer-oriented listing file by giving the /Sp option: QCL /l /Sp listdemo.asm The code portion of the resulting listing is shown below. The tables normally seen at the end of the listing are explained later, in Sections C.2-C.7. Microsoft(R) QuickC with QuickAssembler Version 2.01 Listing features demo PAGE 65,132 TITLE Listing features demo C INCLUDE dos.mac C StrAlloc MACRO name,text C name DB &text C DB 13d,10d C l&name EQU $-name C ENDM = 0080 larg EQU 80h DOSSEG .MODEL small 0100 .STACK 256 color RECORD b:1,r:3=1,i:1=1,f:3=7 date STRUC 0000 05 month DB 5 0001 07 day DB 7 0002 07C3 year DW 1987 0004 date ENDS 0000 .DATA 0000 1F text color <> 0001 09 today date <9,22,1987> 0002 16 0003 07C3 0005 0064[ buffer DW 100 DUP(?) ???? StrAlloc ending,"Finished." 00CD 46 69 6E 69 73 68 65 1 ending DB "Finished." 00D6 0D 0A 1 DB 13d,10d 0000 .CODE 0000 B8 ---- R start: mov ax,@DATA 0003 8E D8 mov ds,ax 0005 B8 0063 mov ax,'c' 0008 26: 8B 0E 0080 mov cx,es:larg 000D BF 0052 mov di,82 0010 F2/ AE repne scasb 0012 57 push di EXTRN work:NEAR 0013 E8 0000 E call work 0016 B8 170C mov ax,4C00 listdemo.ASM(40): error A2107: Non-digit in number 0019 CD 21 int 21h 001B END start C.2 Reading a Macro Table A macro table at the end of a listing file gives the names and sizes (in lines) of all macros called or defined in the source file. The macros appear in alphabetical order. Example Macros: N a m e Lines STRALLOC . . . . . . . . . . . . 3 C.3 Reading a Structure and Record Table All structures and records declared in the source file are given at the end of the listing file. The names are listed in alphabetical order. Each name is followed by the fields in the order in which they are declared. Example Structures and Records: N a m e Width # fields Shift Width Mask Initial COLOR . . . . . . . . . . . . . 0008 0004 B . . . . . . . . . . . . . . 0007 0001 0080 0000 R . . . . . . . . . . . . . . 0004 0003 0070 0010 I . . . . . . . . . . . . . . 0003 0001 0008 0008 F . . . . . . . . . . . . . . 0000 0003 0007 0007 DATE . . . . . . . . . . . . . . 0004 0003 MONTH . . . . . . . . . . . . 0000 DAY . . . . . . . . . . . . . 0001 YEAR . . . . . . . . . . . . . 0002 The first row of headings only applies to the record or structure itself. For a record, the "Width" column shows the width in bits while the "# fields" column tells the total number of fields. The second row of headings applies only to fields of the record or structure. For records, the "Shift" column lists the offset (in bits) from the low-order bit of the record to the low-order bit in the field. The "Width" column lists the number of bits in the field. The "Mask" column lists the maximum value of the field, expressed in hexadecimal. The "Initial" column lists the initial value of the field, if any. For each field, the table shows the mask and initial values as if they were placed in the record and all other fields were set to 0. For a structure, the "Width" column lists the size of the structure in bytes. The "# fields" column lists the number of fields in the structure. Both values are in hexadecimal. For structure fields, the "Shift" column lists the offset in bytes from the beginning of the structure to the field. This value is in hexadecimal. The other columns are not used. C.4 Reading a Segment and Group Table Segments and groups used in the source file are listed at the end of the program with their size, align type, combine type, and class. If you used simplified segment directives in the source file, the actual segment names generated by QuickAssembler will be listed in the table. Example Segments and Groups: N a m e Size Align Combine Class DGROUP . . . . . . . . . . . . . GROUP _DATA . . . . . . . . . . . . 00D8 WORD PUBLIC 'DATA' STACK . . . . . . . . . . . . 0800 PARA STACK 'STACK' _TEXT . . . . . . . . . . . . . 0018 BYTE PUBLIC 'CODE' The "Name" column lists the names of all segments and groups. Segment and group names are given in alphabetical order, except that the names of segments belonging to a group are placed under the group name in the order in which they were added to the group. The "Size" column lists the byte size (in hexadecimal) of each segment. The size of groups is not shown. The "Align" column lists the align type of the segment. The "Combine" column lists the combine type of the segment. If no explicit combine type is defined for the segment, the listing shows NONE, representing the private combine type. If the "Align" column contains AT, the "Combine" column contains the hexadecimal address of the beginning of the segment. The "Class" column lists the class name of the segment. For a complete explanation of the align, combine, and class types, see Section 5.2.2, "Defining Full Segments." C.5 Reading a Symbol Table All symbols (except names for macros, structures, records, and segments) are listed in a symbol table at the end of the listing. Example Symbols: N a m e Type Value Attr BUFFER . . . . . . . . . . . . . L WORD 0005 _DATA Length = 0064 ENDING . . . . . . . . . . . . . L BYTE 00CD _DATA LARG . . . . . . . . . . . . . . NUMBER 0080 LENDING . . . . . . . . . . . . NUMBER 000B START . . . . . . . . . . . . . L NEAR 0000 _TEXT TEXT . . . . . . . . . . . . . . L BYTE 0000 _DATA TODAY . . . . . . . . . . . . . L DWORD 0001 _DATA WORK . . . . . . . . . . . . . . L NEAR 0000 _TEXT External @CODE . . . . . . . . . . . . . TEXT _TEXT @CODESIZE . . . . . . . . . . . TEXT 0 @DATA . . . . . . . . . . . . . TEXT DGROUP @DATASIZE . . . . . . . . . . . TEXT 0 @FARDATA . . . . . . . . . . . . TEXT FAR_DATA @FARDATA? . . . . . . . . . . . TEXT FAR_BSSk @FILENAME . . . . . . . . . . . TEXT listdemo @MODEL . . . . . . . . . . . . . TEXT 1 @VERSION . . . . . . . . . . . . TEXT 520 The "Name" column lists the names in alphabetical order. The "Type" column lists each symbol's type. A type is given as one of the following: Type Definition ────────────────────────────────────────────────────────────────────────── L NEAR A near label L FAR A far label N PROC A near procedure label F PROC A far procedure label NUMBER An absolute label ALIAS An alias for another symbol TEXT A text equate BYTE One byte WORD One word (two bytes) DWORD Doubleword (four bytes) QWORD Quadword (eight bytes) TBYTE Ten bytes number Length in bytes of a structure variable The length of a multiple-element variable, such as an array or string, is the length of a single element, not the length of the entire variable. For example, string variables are always shown as L BYTE. If the symbol represents an absolute value defined with an EQU or equal-sign (=) directive, the "Value" column shows the symbol's value. The value may be another symbol, a string, or a constant numeric value (in hexadecimal), depending on whether the type is ALIAS, TEXT, or NUMBER. If the symbol represents a variable, label, or procedure, the "Value" column shows the symbol's hexadecimal offset from the beginning of the segment in which it is defined. The "Attr" column shows the attributes of the symbol. The attributes include the name of the segment (if any) in which the symbol is defined, the scope of the symbol, and the code length. A symbol's scope is given only if the symbol is defined using the EXTRN and PUBLIC directives. The scope can be external, global, or communal. The code length (in hexadecimal) is given only for procedures. The "Attr" column is blank if the symbol has no attribute. The text equates shown at the end of the sample table are the ones defined automatically when you use simplified segment directives (see Section 5.1.1, "Understanding Memory Models"). C.6 Reading Assembly Statistics Data on the assembly, including the number of lines and symbols processed and the errors or warnings encountered, is shown at the end of the listing. Example 48 Source Lines 52 Total Lines 53 Symbols 45570 + 310654 Bytes symbol space free 0 Warning Errors 1 Severe Errors C.7 Reading a Pass 1 Listing When you specify the /Sd option in the QCL command line or select Pass One Information from the Assembler Flags dialog box, the assembler puts a pass 1 listing in the assembly-listing file. The listing file shows the results of both assembler passes. Pass 1 listings are useful in analyzing phase errors. The following example illustrates a pass 1 listing for a source file that assembled without error on the second pass. 0017 7E 00 jle label1 PASS_CMP.ASM(20) : error 9 : Symbol not defined LABEL1 0019 BB 1000 mov bx,4096 001C label1: During pass 1, the JLE instruction to a forward reference produces an error message and the value 0 is encoded as the operand. QuickAssembler displays this error because it has not yet encountered the symbol label1. Later in pass 1, label1 is defined. Therefore, the assembler knows about label1 on pass 2 and can fix the pass 1 error. The pass 2 listing is shown below: 0017 7E 03 jle label1 0019 BB 1000 mov bx,4096 001C label1: The operand for the JLE instruction is now coded as 3 instead of 0 to indicate that the distance of the jump to label1 is three bytes. Since QuickAssembler generated the same number of bytes for both passes, there was no error. Phase errors occur if the assembler makes an assumption on pass 1 that it cannot change on pass 2. If you get a phase error, you can examine the pass 1 listing to see what assumptions the assembler made. ═══════════════════════════════════════════════════════════════════════════ Index Symbols (brackets) with arrays index operator with registers + (plus sign), to separate registers _ (underscore) Numbers 10-byte temporary-real format .186 directive .286 directive .287 directive .8086 directive 8086-family processors assembly language calculating physical addresses (figure) flags (figure) instructions registers (figure) .8087 directive 8087-family registers, modifying A /a option AAA instruction AAD instruction AAM instruction AAS instruction ABS type Absolute segments Accumulator ADC instruction ADD instruction Adding Addressing modes contrasted with C defined direct memory immediate indirect memory listed register Advisor, Quick Advisory warnings AH register AL register Aliases ALIGN directive Align type Alignment, of segments .ALPHA directive & (ampersand), operator Ampersand (&), operator AND instruction AND operator \la (angle brackets), operator Angle brackets (\la), operator Animate command Arguments macros passing on stack repeat blocks Arithmetic operators (table) Arrays accessing elements of boundary checking defining specifying elements ASCII character set converting numerals to face value name for unpacked BCD numbers Assembler display mode Assembler Flags dialog box (figure) Assembler options, summary Assembly calling from C parameters, accessing passing by near reference value warning levels (table) Assembly listing addresses false conditionals macros page breaks page length page width reading subtitle suppressing title Assembly-language books ASSUME directive (at sign) symbol names, used in /AT option AT combine type Attributes of variables Auto display mode Automatic variables Auxiliary-carry flag AX register B Base register Based operands Based-indexed operands BASIC calling convention for loops nestinglevel ON GOTO statements stack frame (figure) long return values (figure) BCD (binary coded decimal) numbers calculations with constants defining of transfer instructions (list) variables initialized BH register Binary integers, transfer instructions (list) Binary radix Binary to decimal conversion BIOS functions calling with interrupts on-line help Bit fields Bit mask Bits, rotating and shifting (figure) Bitwise operators BL register Boolean bit operations BOUND instruction Boundary-checking array BP register Brackets () with arrays index operator with registers Buffers Bugs, reporting Building programs within environment QCL driver BY memory operator BYTE align type BYTE type specifier C C compiler C display mode C language assembly call (figure) calling convention if statement interfacing with loops memory models and nestinglevel value passing by value return value stack frame (figure) switch statements C register variables CALL instruction Call tables Calling convention Calls command Carry flag Case sensitivity compilers lack of options preserving CATSTR directive CBW instruction CH register Character constant Character set /Cl option /Cl option CL register defined use in shifting bits Class type Classical-stack operands, coprocessor CLC instruction CLD instruction CLI instruction CMP instruction CMPS instruction .CODE directive CODE class name Code equate Code Segment register Code segments developing programs with initializing (colon), operator definition (colon), operator in Debug expressions .COM file, assembling .COM format debugging example with full segment definitions initializing with linker restrictions on tiny memory model Combine type COMENT object record COMM directive Command line, QC COMMENT directive COMMON combine type Communal symbols Compact memory model Compare instructions Comparing register to zero strings Concatenating strings Conditional assembly Conditional directives assembly directives assembly passes error directives macro arguments nesting operators symbol definition value of true and false Conditional-error directives (table) Conditional-jump instructions based on flag status (table) defined logic used after Compare (table) .CONST directive Constants as direct memory operands integer loading instructions (list) multiplying and dividing by packed binary coded decimal real number string use of Control data, coprocessor Control registers, coprocessor (figure) Control-flag settings (table) Control-flow instructions Controlling program flow Converting binary to decimal data sizes Coprocessors architecture control data control flags (figure) control instructions (list) control registers (figure) data registers (figure) directives (list) emulator loading data loading pi no-wait instructions operand forms (table) storing data Copying data Count register Cpu equate CS register /Cu option Current-line highlighting Custom flags Customer support CWD instruction /Cx option /Cx option CX register D D DAA instruction DAS instruction .DATA directive .DATA? directive Data conversion Data equate Data pointer, using far Data registers, coprocessor (figure) Data segments defining developing programs with initializing registers Data-definition directives Data-manipulation instructions DataSize equate DB directive .DBG files DD directive Debug expression operators Debug flags Debugging commands local variables DEC instruction Decimal conversion example Decimal, packed BCD numbers Decimal radix Declaring far pointers strings Decrementing Defaults file extension QCL driver radix segment names segment registers simplified segment stack size types Defining symbols from command line Dereferencing, pointer Destination operand Destination string DGROUP group name COMM directive, with DOSSEG, with simplified segments, with DH register DI register DIF Direction flag Directives .186 .286 .287 .8086 .8087 ALIGN .ALPHA ASSUME CATSTR .CODE COMM COMMENT .CONST .DATA .DATA? data definition (list) data (list) DB DD defined DOSSEG DQ DT DW ELSE ELSEIF EQU equal sign (=) .ERR .ERR1 .ERR2 .ERRB .ERRDEF .ERRDIF .ERRE .ERRIDN .ERRNB .ERRNDEF .ERRNZ EVEN EXITM EXTRN .FARDATA .FARDATA? full segment global GROUP IF IF1 IF2 IFB IFDEF IFDIF IFE IFIDN IFNB IFNDEF INCLUDE INCLUDELIB INSTR instruction set IRP IRPC LABEL .LALL .LFCOND .LIST LOCAL MACRO .MODEL .MSFLOAT NAME on-line help ORG %OUT PAGE PROC PUBLIC PURGE .RADIX RECORD REPT .SALL SEGMENT .SEQ .SFCOND simplified segment SIZESTR .STACK .STARTUP string-manipulation (list) STRUC SUBSTR SUBTTL .TFCOND TITLE .XALL .XLIST Directives Displacement Display dialog box (figure) Display mode DIV instruction Divide overflow interrupt Dividing by constants integers DL register DM Document conventions Documentation feedback card $ (dollar sign) location counter symbol symbol names, used in Dollar sign ($) location counter symbol symbol names, used in DOS returning to version requirements DOS functions calling with interrupts Exit, registers used (list) on-line help segment-order convention Write, registers used (list) /DOSSEG linker option DOSSEG directive DP DQ directive DS DS register /Dsymbol option DT directive Dummy parameters macros repeat blocks Dummy segment definitions DUP operator DW directive DW memory operator DWORD type specifier DX register Dynamic variables E Effective address Elements, array ELSE directive ELSEIF directives Emulator, coprocessor Encoded real numbers END directive ENDIF directive ENDM directive ENDP directive ENDS directive ENTER instruction Environment variable, INCLUDE EQ operator EQU directive = (equal sign), directive Equal sign (=), directive Equates defined nonredefinable predefined. See predefined equates redefinable string .ERR directive .ERR1 directive .ERR2 directive .ERRB directive .ERRDEF directive .ERRDIF directive .ERRE directive .ERRIDN directive .ERRNB directive .ERRNDEF directive .ERRNZ directive Error lines, displaying ES register ESC instruction EVEN directive ! (exclamation point), operator Exclamation point (!), operator Execution, tracing Exit function, DOS Exiting a program EXITM directive Exponent, part of real-number constant Exponentiation, with 8087-family coprocessors Expression operator (%) Expressions in Debug commands defined External names External symbols Extra segment EXTRN directive /Ez option F F2XM1 instruction FABS instruction FADD instruction FADDP instruction False conditionals, listing Far data pointers decimal conversion with loading FAR type specifier .FARDATA? directive .FARDATA directive Fardata equate Fardata? equate farStack farStack keyword Fatal errors FBLD instruction FBSTP instruction FCHS instruction FCOM instruction FCOMP instruction FCOMPP instruction FDIV instruction FDIVP instruction FDIVR instruction FDIVRP instruction FIADD instruction FICOM instruction FICOMP instruction FIDIV instruction FIDIVR instruction Fields assembler statements bit records structures File extensions, default File menu, defaults Filename equate Files .COM .DBG include listing specifications FIMUL instruction Finishing execution FINIT instruction First-in-first-out (FIFO) FIST instruction FISTP instruction FISUB instruction FISUBR instruction /Fl option Flags 8086-family processors (figure) altering within environment build control, settings after compare or test (table) coprocessor, processor control (figure) loading and storing register, summary (table) FLD instruction FLD1 instruction FLDCW instruction FLDL2E instruction FLDL2T instruction FLDLG2 instruction FLDLN2 instruction FLDPI instruction FLDZ instruction Floating-point numbers FMUL instruction FMULP instruction For loops, emulating high-level-language statement FORTRAN compiler do loops, emulating nestinglevel return value (figure) Forward references defined during a pass labels variables FPATAN instruction /FPi option FPREM instruction FPTAN instruction Fraction Framepointer FRNDINT instruction FSCALE instruction FSQRT instruction FST instruction FSTCW instruction FSTP instruction FSTSW instruction FSUB instruction FSUBP instruction FSUBR instruction FSUBRP instruction FTST instruction Full segment definitions Function return values FXAM instruction FXCH instruction FXTRACT instruction FYL2X instruction FYL2XP1 instruction G GE operator General-purpose registers Global directives defined illustrated Global flags Global scope Global symbols GROUP directive Group-relative segments Groups assembly listing defined illustrated size restriction GT operator H Hardware interrupts Help menu Help on DOS and BIOS functions Help topics Hexadecimal conversion example Hexadecimal radix in Debug expressions specifier HIGH operator High-level languages interfacing with memory model HLT instruction Huge memory model I /I option IDIV instruction IEEE format If blocks, run-time IF directives IF1 directive IF2 directive IFB directive IFDEF directive IFDIF directive IFE directive IFIDN directive IFNB directive IFNDEF directive Immediate operands Implied operands IMUL instruction IN instruction INC instruction INCLUDE directive INCLUDE environment variable Include files assembly listings using View menu command INCLUDELIB directive Incrementing Indeterminate operand Index checking Index, Help menu selection Index operator Index registers Indexed operands Indirect addressing modes (table) Indirection, pointer Initializing segment registers variables INS instruction INSTR directive Instruction-pointer register (IP) Instructions AAA AAD AAM AAS ADC ADD addition (list) AND bit test BOUND CALL CBW CLC CLD CLI CMP CMPS compare conditional-jump control-flow CWD DAA DAS data-manipulation DEC defined DIV ESC F2XM1 FABS FADD FADDP FBLD FBSTP FCHS FCOM FCOMP FCOMPP FDIV FDIVP FDIVR FDIVRP FIADD FICOM FICOMP FIDIV FIDIVR FILD FIMUL FINIT FIST FISTP FISUB FISUBR FLD FLD1 FLDCW FLDL2E FLDL2T FLDLG2 FLDLN2 FLDPI FLDZ FMUL FMULP FPATAN FPREM FPTAN FRNDINT FSCALE FSQRT FST FSTCW FSTP FSTSW FSUB FSUBP FSUBR FSUBRP FTST FXAM FXCH FXTRACT FYL2X FYL2XP1 HLT IDIV IMUL IN INC INS INT INTO IRET JC Jcondition JCXZ JMP LAHF LDS LEA LEAVE LES LOCK LODS logical LOOP LOOPNE LOOPNZ MOV MOVS MUL multiplication (list) NEG NOP normal division (list) normal subtraction (list) NOT on-line help OR OUT OUTS POP POPA POPF program-flow PUSH PUSHA PUSHF REP REPE REPNE REPNZ REPZ RET RETF RETN reversed division (list) reversed subtraction (list) SAHF SBB SCAS STD STI STOS SUB TEST timing of WAIT XCHG XLAT XOR Instructions instructions AND LDS LES SHL Instruction-set directives INT instruction Integer formats (figure) Integers Interrupt-enable flag Interrupts defined operation of (figure) using INTO instruction IP register IRET instruction IRP directive IRPC directive J JC instruction Jcondition instruction JCXZ instruction JMP instruction JO instruction Jump tables Jumping conditionally K Keywords, on-line help for L /l option LABEL directive Labels macros, in near-code procedures LAHF instruction .LALL directive Language type COMM directive EXTRN directive .MODEL directive PROC statement PUBLIC directive Large memory model LDS instruction LE operator LEA instruction Learning assembly language LEAVE instruction LENGTH operator LES instruction .LFCOND directive Line-continuation character .LIST directive Listing controlling contents of false conditionals format addresses described EQU directive errors groups include files LOCK directive macro expansions macros Pass 1, reading records REP directive segment override segments structures symbols macros subtitles in suppressing output symbols and abbreviations in (table) Listing files creating editor-oriented example index to source code macro expansion Pass 1 setting title suppressing tables View menu viewing Literal-character operator (!) Literal-text operator (\la) Loading constants to coprocessor coprocessor data far pointers values from strings LOCAL directive declaring stack variables symbols declared with using Local symbols defined in macros Local variables in procedures on stack (figure) Location counter ($) LOCK directive, assembly listing LOCK instruction LODS instruction Logarithms Logical bit operations (table of values) Logical instructions Logical operators vs. logical instructions (table) LOOP instruction Looping overview without use of CX LOOPNE instruction LOOPNZ instruction LOW operator LT operator M Machine code Macro comment operator (\sc\sc) MACRO directive Macro expansions, assembly listings Macros argument testing arguments assembly listing calling compared to procedures defined efficiency penalty exiting early expansions in listing local symbols nested operators parameters recursive redefining removing from memory string-manipulation directives (list) text viewing listing of Make dialog box (figure) MASK operator Masking bits Masking out a bit Masks, adjusting Math coprocessors Medium memory model Memory access, coordinating MEMORY combine type Memory models assembly procedure with default segments, types (table) described (list) Memory operands coprocessor defined Memory requirements Memory-model-independent procedures Messages to screen suppressing Microsoft Binary format Microsoft Binary Real format Microsoft segment model Mixed-language interface C entry sequences exit sequences local data register considerations return value Mixed-language programs building C and assembler program list (figure) .MODEL directive Model equate Modular programming Modulo division MOV instruction MOVS instruction .MSFLOAT directive MUL instruction Multiple modules Multiplying by 16 by constants instructions Multiword values, shifting N NAME directive Names assigning external public reserved Naming convention NE operator Near pointers Near reference parameters, assembly NEAR type specifier nearStack nearStack keyword NEG instruction Negating Nesting conditional DUP operators include files macros procedures for Pascal segments Nonredefinable equates NOP instruction NOT instruction NOT operator No-wait coprocessor instructions Null class type Null string O Object records Octal radix OFFSET operator with group-relative segments with .MODEL directive overview ON GOSUB, emulating BASIC statement One-pass assembly option Operands based based indexed classical stack coprocessor defined destination direct memory immediate implied indeterminate indexed indirect memory location counter memory record field records register register indirect relocatable source strong typing structures types (list) undefined Operators AND arithmetic bitwise calculation defined DUP EQ expression (%) GE GT HIGH index LE LENGTH literal character (!) literal text logical (table) LOW LT macro comment (\sc\sc) macro (list) MASK NE NOT OFFSET OR precedence (table) PTR relational (table) SEG segment override (:). See (segment-override operator) shift SHL SHORT SHR SIZE structure-field name substitute (&) THIS TYPE .TYPE WIDTH XOR Options /a /AT /Cl /Cu /Cx /DOSSEG linker /Dsymbol /Ez /Fl /FPi /I /l /P1 /s /Sa /Sd /Se setting inside environment /Sn /Sq summary /Sx /t /v /W /w Options menu OR instruction OR operator ORG directive %OUT directive OUT instruction Output messages to screen OUTS instruction Overflow flag Overflow interrupt P /P1 option Packed BCD numbers Packed decimal integers Packed decimal numbers PAGE align type Page breaks in assembly listings PAGE directive Page format, in listings PARA align type Parameter list, in PROC statement Parameters assembly, accessing from defining in procedures macros repeat blocks types, common (list) Parity flag Partial remainder Pascal case statements, emulating compiler for loops nestinglevel repeat loops, emulating return value (figure) Pass 1 listing Passing by near reference, assembly value assembly C % (percent sign), expression operator Percent sign (%), expression operator . (period) Period (.) Phase errors Pi, loading to coprocessor Plus sign (+), to separate registers Pointer indirection Pointer registers Pointers defining loading POP instruction POPA instruction POPF instruction Ports defined getting strings from sending strings to Precedence of operators Predefined equates CodeSize Cpu CurSeg DataSize FileName Model segment Version WordSize Preserving register values PROC directive PROC type specifier Procedure arguments, on stack (figure) Procedures compared to macros defining labels using Processor directives Product assistance report Program list (figure) Program-flow instructions PTR operator declaring parameters with declaring pointers with syntax used with data types PUBLIC combine type PUBLIC directive Public names Public symbols PURGE directive PUSH instruction PUSHA instruction PUSHF instruction Q QC environment building programs debugging commands described on-line help starting QC register variables QCL driver introduction options (list) ? (question mark) Question mark (?) Quick Advisor QuickAssembler assembly interfaces, writing environment See QC environment QuickBASIC compiler QuickC, interfacing with QWORD type specifier R .RADIX directive Radixes binary default specifiers (table) Real numbers arithmetic calculations designator (R) directives for defining (list) encoding format transfer instructions (list) RECORD directive Record operands compared to variables (figure) using Record type contents (figure) declaring Record variables compared to operands (figure) contents (figure) defining Records assembly listing declarations defining field operands fields initializing MASK operator object WIDTH operator Recursive macros Redefinable equates Redefining interrupts macros Register stacks classical-stack form (figure) data transfer (figure) memory form (figure) register form (figure) register-pop form (figure) Register variables, in C Registers accumulator AH AL altering within environment AX BH BL BP BX CH CL coprocessor CS CX defined DH DI DL DS DX ES flags general purpose index IP operands operands, coprocessor pointer preserving value of register-pop operands, coprocessor segment setting to zero SI SP SS strategy for using types of uses of watching contents of Registers window Relational operators (table) REP directive, assembly listing REP instruction REPE instruction Repeat blocks arguments defined parameters repeat for each argument repeat for each character of string repeat for specified count Repeating instructions, execution of using 8086-family string functions REPNE instruction REPNZ instruction Reporting problems REPT directive REPZ instruction Reserved names RET instruction RETF instruction RETN instruction Return value, offset (figure) Rotating bits (figure) S /s option /Sa option SAHF instruction .SALL directive SBB instruction Scaling by powers of two Scaling factor SCAS instruction Screen swapping \sc\sc (semicolons), operator /Sd option /Se option Search paths, include files Searching strings Sections in assembly listings SEG operator SEGMENT directive Segment register Segmented addressing Segment-order method (Segment-override operator) definition OFFSET operator, with string instructions, with XLAT instructions, with (segment-override operator) used with ES Segments absolute align type align type (figure) assembly listing combine type (figure) (list) defined extra group-relative offset groups defining structure (figure) MEMORY nesting ordering override, assembly listings override operator (:). See (segment-override operator) registers types Semicolons (\sc\sc), operator .SEQ directive Serious warnings Severe errors .SFCOND directive Shift operators Shifting bits multiword values SHL instruction SHL operator SHORT operator SHR operator SI register Sign flag Signed numbers Simplified segment defaults SIZE operator SIZESTR directive Small memory model defining attributes described setting up procedures Smart help /Sn option Source files, include Source modules Source operand Source string SP register Specifying calling convention /Sq option Square root SS register .STACK directive Stack allocating defined frame local variables on (figure) near and far types operands, coprocessor procedure arguments on (figure) registers segment status, after pushes and pops (figure) trace. See Calls command type use of variables STACK combine type Stand-alone programs .STARTUP directive Statement fields Statistics STD instruction Step Over command STI instruction STOS instruction String directives (list) Strings comparing constants declaring defined destination strings equates filling instructions, requirements for (table) loading values from moving null ports, transfer from and to searching source structures, in variables Strong typing STRUC directive Structure type Structure-field-name operator Structures assembly listing declarations definitions fields initializing operands overview variables SUB instruction Substitute operator (&) SUBSTR directive Substring directive Subtitles in listings Subtracting values SUBTTL directive /Sx option Symbols assembly listing communal defined defining from command line external global location counter public relocatable operands scope SYMDEB System requirements T /t option TBYTE type specifier Temporary real format (figure) TER Terminating execution TEST instruction Text macros .TFCOND directive THIS operator /TINY linker option Tiny memory model Tiny model option TITLE directive Trace Into command Tracing execution Transcendental calculations Transferring BCD integers binary integers real numbers Trap flag Trigonometric functions Tutorial books, assembly language Type ABS align class combine null class operand matching operators, described PROC record structure TYPE operator Type specifiers (list) .TYPE operator U Undefined operand Underscore (_) Unpacked BCD numbers Unsigned numbers Updates USES, in PROC statement V /v option Value parameters, C Variables automatic communal defined dynamic external floating point initializing integer local public real number record string structure watching value of Version equate View menu Include command W /W option /w option WAIT instruction Warning levels Watch Value command Watch Value dialog box (figure) Watchpoint command Weak typing in other assemblers While, emulating high-level-language statement WIDTH operator Width, structures WO memory operator WORD align type WORD PTR, in example WORD type specifier WordSize equate Write function, DOS X .XALL directive XCHG instruction XENIX compatibility pathnames, with / (forward slash) /sI XLAT instruction .XLIST directive XOR instruction XOR operator Z Zero flag