home *** CD-ROM | disk | FTP | other *** search
- ────────────────────────────────────────────────────────────────────────────
- Chapter 3 Structure of MS-DOS Application Programs
-
- Programs that run under MS-DOS come in two basic flavors: .COM programs,
- which have a maximum size of approximately 64 KB, and .EXE programs, which
- can be as large as available memory. In Intel 8086 parlance, .COM programs
- fit the tiny model, in which all segment registers contain the same value;
- that is, the code and data are mixed together. In contrast, .EXE programs
- fit the small, medium, or large model, in which the segment registers
- contain different values; that is, the code, data, and stack reside in
- separate segments. .EXE programs can have multiple code and data segments,
- which are respectively addressed by long calls and by manipulation of the
- data segment (DS) register.
-
- A .COM-type program resides on the disk as an absolute memory image, in a
- file with the extension .COM. The file does not have a header or any other
- internal identifying information. A .EXE program, on the other hand,
- resides on the disk in a special type of file with a unique header, a
- relocation map, a checksum, and other information that is (or can be) used
- by MS-DOS.
-
- Both .COM and .EXE programs are brought into memory for execution by the
- same mechanism: the EXEC function, which constitutes the MS-DOS loader.
- EXEC can be called with the filename of a program to be loaded by
- COMMAND.COM (the normal MS-DOS command interpreter), by other shells or
- user interfaces, or by another program that was previously loaded by EXEC.
- If there is sufficient free memory in the transient program area, EXEC
- allocates a block of memory to hold the new program, builds the program
- segment prefix (PSP) at its base, and then reads the program into memory
- immediately above the PSP. Finally, EXEC sets up the segment registers and
- the stack and transfers control to the program.
-
- When it is invoked, EXEC can be given the addresses of additional
- information, such as a command tail, file control blocks, and an
- environment block; if supplied, this information will be passed on to the
- new program. (The exact procedure for using the EXEC function in your own
- programs is discussed, with examples, in Chapter 12.)
-
- .COM and .EXE programs are often referred to as transient programs. A
- transient program "owns" the memory block it has been allocated and has
- nearly total control of the system's resources while it is executing. When
- the program terminates, either because it is aborted by the operating
- system or because it has completed its work and systematically performed a
- final exit back to MS-DOS, the memory block is then freed (hence the term
- transient) and can be used by the next program in line to be loaded.
-
-
- The Program Segment Prefix
-
- A thorough understanding of the program segment prefix is vital to
- successful programming under MS-DOS. It is a reserved area, 256 bytes
- long, that is set up by MS-DOS at the base of the memory block allocated
- to a transient program. The PSP contains some linkages to MS-DOS that can
- be used by the transient program, some information MS-DOS saves for its
- own purposes, and some information MS-DOS passes to the transient
- program──to be used or not, as the program requires (Figure 3-1).
-
- Offset
- 0000H ┌────────────────────────────────────────────────────────┐
- │ Int 20H │
- 0002H ├────────────────────────────────────────────────────────┤
- │ Segment, end of allocation block │
- 0004H ├────────────────────────────────────────────────────────┤
- │ Reserved │
- 0005H ├────────────────────────────────────────────────────────┤
- │ Long call to MS-DOS function dispatcher │
- 000AH ├────────────────────────────────────────────────────────┤
- │ Previous contents of termination handler │
- │ interrupt vector (Int 22H) │
- 000EH ├────────────────────────────────────────────────────────┤
- │ Previous contents of Ctrl-C interrupt vector (Int 23H) │
- 0012H ├────────────────────────────────────────────────────────┤
- │ Previous contents of critical-error handler │
- │ interrupt vector (Int 24H) │
- 0016H ├────────────────────────────────────────────────────────┤
- │ Reserved │
- 002CH ├────────────────────────────────────────────────────────┤
- │ Segment address of environment block │
- 002EH ├────────────────────────────────────────────────────────┤
- │ Reserved │
- 005CH ├────────────────────────────────────────────────────────┤
- │ Default file control block #1 │
- 006CH ├────────────────────────────────────────────────────────┤
- │ Default file control block #2 │
- │ (overlaid if FCB #1 opened) │
- 008OH ├────────────────────────────────────────────────────────┤
- └──────────────────────────┐ │
- ┌────────────────────────┐ └─────────────────────────────┘
- │ └───────────────────────────────┐
- │ Command tail and default disk transfer area (buffer) │
- OOFFH └────────────────────────────────────────────────────────┘
-
- Figure 3-1. The structure of the program segment prefix.
-
- In the first versions of MS-DOS, the PSP was designed to be compatible
- with a control area that was built beneath transient programs under
- Digital Research's venerable CP/M operating system, so that programs could
- be ported to MS-DOS without extensive logical changes. Although MS-DOS has
- evolved considerably since those early days, the structure of the PSP is
- still recognizably similar to its CP/M equivalent. For example, offset
- 0000H in the PSP contains a linkage to the MS-DOS process-termination
- handler, which cleans up after the program has finished its job and
- performs a final exit. Similarly, offset 0005H in the PSP contains a
- linkage to the MS-DOS function dispatcher, which performs disk operations,
- console input/output, and other such services at the request of the
- transient program. Thus, calls to PSP:0000 and PSP:0005 have the same
- effect as CALL 0000 and CALL 0005 under CP/M. (These linkages are not the
- "approved" means of obtaining these services, however.)
-
- The word at offset 0002H in the PSP contains the segment address of the
- top of the transient program's allocated memory block. The program can use
- this value to determine whether it should request more memory to do its
- job or whether it has extra memory that it can release for use by other
- processes.
-
- Offsets 000AH through 0015H in the PSP contain the previous contents of
- the interrupt vectors for the termination, Ctrl-C, and critical-error
- handlers. If the transient program alters these vectors for its own
- purposes, MS-DOS restores the original values saved in the PSP when the
- program terminates.
-
- The word at PSP offset 002CH holds the segment address of the environment
- block, which contains a series of ASCIIZ strings (sequences of ASCII
- characters terminated by a null, or zero, byte). The environment block is
- inherited from the program that called the EXEC function to load the
- currently executing program. It contains such information as the current
- search path used by COMMAND.COM to find executable programs, the location
- on the disk of COMMAND.COM itself, and the format of the user prompt used
- by COMMAND.COM.
-
- The command tail──the remainder of the command line that invoked the
- transient program, after the program's name──is copied into the PSP
- starting at offset 0081H. The length of the command tail, not including
- the return character at its end, is placed in the byte at offset 0080H.
- Redirection or piping parameters and their associated filenames do not
- appear in the portion of the command line (the command tail) that is
- passed to the transient program, because redirection is transparent to
- applications.
-
- To provide compatibility with CP/M, MS-DOS parses the first two parameters
- in the command tail into two default file control blocks (FCBs) at
- PSP:005CH and PSP:006CH, under the assumption that they may be filenames.
- However, if the parameters are filenames that include a path
- specification, only the drive code will be valid in these default FCBs,
- because FCB-type file- and record-access functions do not support
- hierarchical file structures. Although the default FCBs were an aid in
- earlier years, when compatibility with CP/M was more of a concern, they
- are essentially useless in modern MS-DOS application programs that must
- provide full path support. (File control blocks are discussed in detail in
- Chapter 8 and hierarchical file structures are discussed in Chapter 9.)
-
- The 128-byte area from 0080H through 00FFH in the PSP also serves as the
- default disk transfer area (DTA), which is set by MS-DOS before passing
- control to the transient program. If the program does not explicitly
- change the DTA, any file read or write operations requested with the FCB
- group of function calls automatically use this area as a data buffer. This
- is rarely useful and is another facet of MS-DOS's handling of the PSP that
- is present only for compatibility with CP/M.
-
- ──────────────────────────────────────────────────────────────────────────
- WARNING
- Programs must not alter any part of the PSP below offset 005CH.
- ──────────────────────────────────────────────────────────────────────────
-
-
- Introduction to .COM Programs
-
- Programs of the .COM persuasion are stored in disk files that hold an
- absolute image of the machine instructions to be executed. Because the
- files contain no relocation information, they are more compact, and are
- loaded for execution slightly faster, than equivalent .EXE files. Note
- that MS-DOS does not attempt to ascertain whether a .COM file actually
- contains executable code (there is no signature or checksum, as in the
- case of a .EXE file); it simply brings any file with the .COM extension
- into memory and jumps to it.
-
- Because .COM programs are loaded immediately above the program segment
- prefix and do not have a header that can specify another entry point, they
- must always have an origin of 0100H, which is the length of the PSP.
- Location 0100H must contain an executable instruction. The maximum length
- of a .COM program is 65,536 bytes, minus the length of the PSP (256 bytes)
- and a mandatory word of stack (2 bytes).
-
- When control is transferred to the .COM program from MS-DOS, all of the
- segment registers point to the PSP (Figure 3-2). The stack pointer
- register contains 0FFFEH if memory allows; otherwise, it is set as high as
- possible in memory minus 2 bytes. (MS-DOS pushes a zero word on the stack
- before entry.)
-
- SS:SP ┌────────────────────────────────────────────────────────┐
- │ │
- │ Stack grows downward from top of segment │
- │ │ │
- │ │
- │ │
- │ │ │
- │ Program code and data │
- │ │
- CS:0100H ├────────────────────────────────────────────────────────┤
- A │ Program segment prefix │
- CS:0000H └────────────────────────────────────────────────────────┘
- DS:0000H
- ES:0000H
- SS:0000H
-
- Figure 3-2. A memory image of a typical .COM-type program after loading.
- The contents of the .COM file are brought into memory just above the
- program segment prefix. Program, code, and data are mixed together in the
- same segment, and all segment registers contain the same value.
-
- Although the size of an executable .COM file can't exceed 64 KB, the
- current versions of MS-DOS allocate all of the transient program area to
- .COM programs when they are loaded. Because many such programs date from
- the early days of MS-DOS and are not necessarily "well-behaved" in their
- approach to memory management, the operating system simply makes the
- worst-case assumption and gives .COM programs everything that is
- available. If a .COM program wants to use the EXEC function to invoke
- another process, it must first shrink down its memory allocation to the
- minimum memory it needs in order to continue, taking care to protect its
- stack. (This is discussed in more detail in Chapter 12.)
-
- When a .COM program finishes executing, it can return control to MS-DOS by
- several means. The preferred method is Int 21H Function 4CH, which allows
- the program to pass a return code back to the program, shell, or batch
- file that invoked it. However, if the program is running under MS-DOS
- version 1, it must exit by means of Int 20H, Int 21H Function 0, or a
- NEAR RETURN. (Because a word of zero was pushed onto the stack at entry, a
- NEAR RETURN causes a transfer to PSP:0000, which contains an Int 20H
- instruction.)
-
- A .COM-type application can be linked together from many separate object
- modules. All of the modules must use the same code-segment name and class
- name, and the module with the entry point at offset 0100H within the
- segment must be linked first. In addition, all of the procedures within a
- .COM program should have the NEAR attribute, because all executable code
- resides in one segment.
-
- When linking a .COM program, the linker will display the message
-
- Warning: no stack segment
-
- This message can be ignored. The linker output is a .EXE file, which must
- be converted into a .COM file with the MS-DOS EXE2BIN utility before
- execution. You can then delete the .EXE file. (An example of this process
- is provided in Chapter 4.)
-
- An Example .COM Program
-
- The HELLO.COM program listed in Figure 3-3 demonstrates the structure of
- a simple assembly-language program that is destined to become a .COM file.
- (You may find it helpful to compare this listing with the HELLO.EXE
- program later in this chapter.) Because this program is so short and
- simple, a relatively high proportion of the source code is actually
- assembler directives that do not result in any executable code.
-
- The NAME statement simply provides a module name for use during the
- linkage process. This aids understanding of the map that the linker
- produces. In MASM versions 5.0 and later, the module name is always the
- same as the filename, and the NAME statement is ignored.
-
- The PAGE command, when used with two operands, as in line 2, defines the
- length and width of the page. These default respectively to 66 lines and
- 80 characters. If you use the PAGE command without any operands, a
- formfeed is sent to the printer and a heading is printed. In larger
- programs, use the PAGE command liberally to place each of your subroutines
- on separate pages for easy reading.
-
- The TITLE command, in line 3, specifies the text string (limited to 60
- characters) that is to be printed at the upper left corner of each page.
- The TITLE command is optional and cannot be used more than once in each
- assembly-language source file.
-
- ──────────────────────────────────────────────────────────────────────────
- 1: name hello
- 2: page 55,132
- 3: title HELLO.COM--print hello on terminal
- 4:
- 5: ;
- 6: ; HELLO.COM: demonstrates various components
- 7: ; of a functional .COM-type assembly-
- 8: ; language program, and an MS-DOS
- 9: ; function call.
- 10: ;
- 11: ; Ray Duncan, May 1988
- 12: ;
- 13:
- 14: stdin equ 0 ; standard input handle
- 15: stdout equ 1 ; standard output handle
- 16: stderr equ 2 ; standard error handle
- 17:
- 18: cr equ 0dh ; ASCII carriage return
- 19: lf equ 0ah ; ASCII linefeed
- 20:
- 21:
- 22: _TEXT segment word public 'CODE'
- 23:
- 24: org 100h ; .COM files always have
- 25: ; an origin of 100h
- 26:
- 27: assume cs:_TEXT,ds:_TEXT,es:_TEXT,ss:_TEXT
- 28:
- 29: print proc near ; entry point from MS-DOS
- 30:
- 31: mov ah,40h ; function 40h = write
- 32: mov bx,stdout ; handle for standard output
- 33: mov cx,msg_len ; length of message
- 34: mov dx,offset msg ; address of message
- 35: int 21h ; transfer to MS-DOS
- 36:
- 37: mov ax,4c00h ; exit, return code = 0
- 38: int 21h ; transfer to MS-DOS
- 39:
- 40: print endp
- 41:
- 42:
- 43: msg db cr,lf ; message to display
- 44: db 'Hello World!',cr,lf
- 45:
- 46: msg_len equ $-msg ; length of message
- 47:
- 48:
- 49: _TEXT ends
- 50:
- 51: end print ; defines entry point
- ──────────────────────────────────────────────────────────────────────────
-
- Figure 3-3. The HELLO.COM program listing.
-
- Dropping down past a few comments and EQU statements, we come to a
- declaration of a code segment that begins in line 22 with a SEGMENT
- command and ends in line 49 with an ENDS command. The label in the
- leftmost field of line 22 gives the code segment the name _TEXT. The
- operand fields at the right end of the line give the segment the
- attributes WORD, PUBLIC, and `CODE'. (You might find it helpful to read
- the Microsoft Macro Assembler manual for detailed explanations of each
- possible segment attribute.)
-
- Because this program is going to be converted into a .COM file, all of its
- executable code and data areas must lie within one code segment. The
- program must also have its origin at offset 0100H (immediately above the
- program segment prefix), which is taken care of by the ORG statement
- in line 24.
-
- Following the ORG instruction, we encounter an ASSUME statement on line
- 27. The concept of ASSUME often baffles new assembly-language programmers.
- In a way, ASSUME doesn't "do" anything; it simply tells the assembler
- which segment registers you are going to use to point to the various
- segments of your program, so that the assembler can provide segment
- overrides when they are necessary. It's important to notice that the
- ASSUME statement doesn't take care of loading the segment registers with
- the proper values; it merely notifies the assembler of your intent to do
- that within the program. (Remember that, in the case of a .COM program,
- MS-DOS initializes all the segment registers before entry to point to the
- PSP.)
-
- Within the code segment, we come to another type of block declaration that
- begins with the PROC command on line 29 and closes with ENDP on line 40.
- These two instructions declare the beginning and end of a procedure, a
- block of executable code that performs a single distinct function. The
- label in the leftmost field of the PROC statement (in this case, print)
- gives the procedure a name. The operand field gives it an attribute. If
- the procedure carries the NEAR attribute, only other code in the same
- segment can call it, whereas if it carries the FAR attribute, code located
- anywhere in the CPU's memory-addressing space can call it. In .COM
- programs, all procedures carry the NEAR attribute.
-
- For the purposes of this example program, I have kept the print procedure
- ridiculously simple. It calls MS-DOS Int 21H Function 40H to send the
- message Hello World! to the video screen, and calls Int 21H Function 4CH
- to terminate the program.
-
- The END statement in line 51 tells the assembler that it has reached the
- end of the source file and also specifies the entry point for the program.
- If the entry point is not a label located at offset 0100H, the .EXE file
- resulting from the assembly and linkage of this source program cannot be
- converted into a .COM file.
-
-
- Introduction to .EXE Programs
-
- We have just discussed a program that was written in such a way that it
- could be assembled into a .COM file. Such a program is simple in
- structure, so a programmer who needs to put together this kind of quick
- utility can concentrate on the program logic and do a minimum amount of
- worrying about control of the assembler. However, .COM-type programs have
- some definite disadvantages, and so most serious assembly-language efforts
- for MS-DOS are written to be converted into .EXE files.
-
- Although .COM programs are effectively restricted to a total size of 64 KB
- for machine code, data, and stack combined, .EXE programs can be
- practically unlimited in size (up to the limit of the computer's available
- memory). .EXE programs also place the code, data, and stack in separate
- parts of the file. Although the normal MS-DOS program loader does not take
- advantage of this feature of .EXE files, the ability to load different
- parts of large programs into several separate memory fragments, as well as
- the opportunity to designate a "pure" code portion of your program that
- can be shared by several tasks, is very significant in multitasking
- environments such as Microsoft Windows.
-
- The MS-DOS loader always brings a .EXE program into memory immediately
- above the program segment prefix, although the order of the code, data,
- and stack segments may vary (Figure 3-4). The .EXE file has a header, or
- block of control information, with a characteristic format (Figures 3-5
- and 3-6). The size of this header varies according to the number of
- program instructions that need to be relocated at load time, but it is
- always a multiple of 512 bytes.
-
- Before MS-DOS transfers control to the program, the initial values of the
- code segment (CS) register and instruction pointer (IP) register are
- calculated from the entry-point information in the .EXE file header and
- the program's load address. This information derives from an END statement
- in the source code for one of the program's modules. The data segment (DS)
- and extra segment (ES) registers are made to point to the PSP so that the
- program can access the environment-block pointer, command tail, and other
- useful information contained there.
-
- SS:SP ┌────────────────────────────────────────────────────────┐
- │ │
- │ Stack segment: │
- │ stack grows downward from top of segment │
- │ │ │
- │ │
- SS:0000H ├────────────────────────────────────────────────────────┤
- │ Data segment │
- ├────────────────────────────────────────────────────────┤
- │ Program code │
- CS:0000H ├────────────────────────────────────────────────────────┤
- │ Program segment prefix │
- DS:0000H └────────────────────────────────────────────────────────┘
- ES:0000H
-
- Figure 3-4. A memory image of a typical .EXE-type program immediately
- after loading. The contents of the .EXE file are relocated and brought
- into memory above the program segment prefix. Code, data, and stack reside
- in separate segments and need not be in the order shown here. The entry
- point can be anywhere in the code segment and is specified by the END
- statement in the main module of the program. When the program receives
- control, the DS (data segment) and ES (extra segment) registers point to
- the program segment prefix; the program usually saves this value and then
- resets the DS and ES registers to point to its data area.
-
- The initial contents of the stack segment (SS) and stack pointer (SP)
- registers come from the header. This information derives from the
- declaration of a segment with the attribute STACK somewhere in the
- program's source code. The memory space allocated for the stack may be
- initialized or uninitialized, depending on the stack-segment definition;
- many programmers like to initialize the stack memory with a recognizable
- data pattern so that they can inspect memory dumps and determine how much
- stack space is actually used by the program.
-
- When a .EXE program finishes processing, it should return control to
- MS-DOS through Int 21H Function 4CH. Other methods are available, but
- they offer no advantages and are considerably less convenient (because
- they usually require the CS register to point to the PSP).
-
- Byte
- offset
- 0000H ┌────────────────────────────────────────────────────────┐
- │ First of .EXE file signature (4DH) │
- 0001H ├────────────────────────────────────────────────────────┤
- │ Second part of .EXE file signature (5AH) │
- 0002H ├────────────────────────────────────────────────────────┤
- │ Length of file MOD 512 │
- 0004H ├────────────────────────────────────────────────────────┤
- │ Size of file in 512-byte pages, including header │
- 0006H ├────────────────────────────────────────────────────────┤
- │ Number of relocation-table items │
- 0008H ├────────────────────────────────────────────────────────┤
- │ Size of header in paragraphs (16-byte units) │
- 000AH ├────────────────────────────────────────────────────────┤
- │ Minimum number of paragraphs needed above program │
- 000CH ├────────────────────────────────────────────────────────┤
- │ Maximum number of paragraphs desired above program │
- 000EH ├────────────────────────────────────────────────────────┤
- │ Segment displacement of stack module │
- 0010H ├────────────────────────────────────────────────────────┤
- │ Contents of SP register at entry │
- 0012H ├────────────────────────────────────────────────────────┤
- │ Word checksum │
- 0014H ├────────────────────────────────────────────────────────┤
- │ Contents of IP register at entry │
- 0016H ├────────────────────────────────────────────────────────┤
- │ Segment displacement of code module │
- 0018H ├────────────────────────────────────────────────────────┤
- │ Offset of first relocation item in file │
- 001AH ├────────────────────────────────────────────────────────┤
- │ Overlay number (0 for resident part of program) │
- 001BH ├────────────────────────────────────────────────────────┤
- │ Variable reserved space │
- ├────────────────────────────────────────────────────────┤
- │ Relocation table │
- ├────────────────────────────────────────────────────────┤
- │ Variable reserved space │
- ├────────────────────────────────────────────────────────┤
- │ Program and data segments │
- ├────────────────────────────────────────────────────────┤
- │ Stack segment │
- └────────────────────────────────────────────────────────┘
-
- Figure 3-5. The format of a .EXE load module.
-
- The input to the linker for a .EXE-type program can be many separate
- object modules. Each module can use a unique code-segment name, and the
- procedures can carry either the NEAR or the FAR attribute, depending on
- naming conventions and the size of the executable code. The programmer
- must take care that the modules linked together contain only one segment
- with the STACK attribute and only one entry point defined with an END
- assembler directive. The output from the linker is a file with a .EXE
- extension. This file can be executed immediately.
-
- ──────────────────────────────────────────────────────────────────────────
- C>DUMP HELLO.EXE
- 0 1 2 3 4 5 6 7 8 9 A B C D E F
- 0000 4D 5A 28 00 02 00 01 00 20 00 09 00 FF FF 03 00 MZ(..... .......
- 0010 80 00 20 05 00 00 00 00 1E 00 00 00 01 00 01 00 .. .............
- 0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- .
- .
- .
- 0200 B8 01 00 8E D8 B4 40 BB 01 00 B9 10 00 90 BA 08 ...............
- 0210 00 CD 21 B8 00 4C CD 21 0D 0A 48 65 6C 6C 6F 20 ..!..L.!..Hello
- 0220 57 6F 72 6C 64 21 0D 0A World!..
- ──────────────────────────────────────────────────────────────────────────
-
- Figure 3-6. A hex dump of the HELLO.EXE program, demonstrating the
- contents of a simple .EXE load module. Note the following interesting
- values: the .EXE signature in bytes 0000H and 0001H, the number of
- relocation-table items in bytes 0006H and 0007H, the minimum extra memory
- allocation (MIN_ALLOC) in bytes 000AH and 000BH, the maximum extra memory
- allocation (MAX_ALLOC) in bytes 000CH and 000DH, and the initial IP
- (instruction pointer) register value in bytes 0014H and 0015H. See also
- Figure 3-5.
-
- An Example .EXE Program
-
- The HELLO.EXE program in Figure 3-7 demonstrates the fundamental
- structure of an assembly-language program that is destined to become a
- .EXE file. At minimum, it should have a module name, a code segment, a
- stack segment, and a primary procedure that receives control of the
- computer from MS-DOS after the program is loaded. The HELLO.EXE program
- also contains a data segment to provide a more complete example.
-
- The NAME, TITLE, and PAGE directives were covered in the HELLO.COM example
- program and are used in the same manner here, so we'll move to the first
- new item of interest. After a few comments and EQU statements, we come to
- a declaration of a code segment that begins on line 21 with a SEGMENT
- command and ends on line 41 with an ENDS command. As in the HELLO.COM
- example program, the label in the leftmost field of the line gives the
- code segment the name _TEXT. The operand fields at the right end of the
- line give the attributes WORD, PUBLIC, and `CODE'.
-
- Following the code-segment instruction, we find an ASSUME statement on
- line 23. Notice that, unlike the equivalent statement in the HELLO.COM
- program, the ASSUME statement in this program specifies several different
- segment names. Again, remember that this statement has no direct effect on
- the contents of the segment registers but affects only the operation of
- the assembler itself.
-
- ──────────────────────────────────────────────────────────────────────────
- 1: name hello
- 2: page 55,132
- 3: title HELLO.EXE--print Hello on terminal
- 4: ;
- 5: ; HELLO.EXE: demonstrates various components
- 6: ; of a functional .EXE-type assembly-
- 7: ; language program, use of segments,
- 8: ; and an MS-DOS function call.
- 9: ;
- 10: ; Ray Duncan, May 1988
- 11: ;
- 12:
- 13: stdin equ 0 ; standard input handle
- 14: stdout equ 1 ; standard output handle
- 15: stderr equ 2 ; standard error handle
- 16:
- 17: cr equ 0dh ; ASCII carriage return
- 18: lf equ 0ah ; ASCII linefeed
- 19:
- 20:
- 21: _TEXT segment word public 'CODE'
- 22:
- 23: assume cs:_TEXT,ds:_DATA,ss:STACK
- 24:
- 25: print proc far ; entry point from MS-DOS
- 26:
- 27: mov ax,_DATA ; make our data segment
- 28: mov ds,ax ; addressable...
- 29:
- 30: mov ah,40h ; function 40h = write
- 31: mov bx,stdout ; standard output handle
- 32: mov cx,msg_len ; length of message
- 33: mov dx,offset msg ; address of message
- 34: int 21h ; transfer to MS-DOS
- 35:
- 36: mov ax,4c00h ; exit, return code = 0
- 37: int 21h ; transfer to MS-DOS
- 38:
- 39: print endp
- 40:
- 41: _TEXT ends
- 42:
- 43:
- 44: _DATA segment word public 'DATA'
- 45:
- 46: msg db cr,lf ; message to display
- 47: db 'Hello World!',cr,lf
- 48:
- 49: msg_len equ $-msg ; length of message
- 50:
- 51: _DATA ends
- 52:
- 53:
- 54: STACK segment para stack `STACK'
- 55:
- 56: db 128 dup (?)
- 57:
- 58: STACK ends
- 59:
- 60: end print ; defines entry point
- ──────────────────────────────────────────────────────────────────────────
-
- Figure 3-7. The HELLO.EXE program listing.
-
- Within the code segment, the main print procedure is declared by the PROC
- command on line 25 and closed with ENDP on line 39. Because the procedure
- resides in a .EXE file, we have given it the FAR attribute as an example,
- but the attribute is really irrelevant because the program is so small and
- the procedure is not called by anything else in the same program.
-
- The print procedure first initializes the DS register, as indicated in the
- earlier ASSUME statement, loading it with a value that causes it to point
- to the base of the data area. (MS-DOS automatically sets up the CS and SS
- registers.) Next, the procedure uses MS-DOS Int 21H Function 40H to
- display the message Hello World! on the screen, just as in the HELLO.COM
- program. Finally, the procedure exits back to MS-DOS with an Int 21H
- Function 4CH on lines 36 and 37, passing a return code of zero (which by
- convention means a success).
-
- Lines 44 through 51 declare a data segment named _DATA, which contains the
- variables and constants the program will use. If the various modules of a
- program contain multiple data segments with the same name, the linker will
- collect them and place them in the same physical memory segment.
-
- Lines 54 through 58 establish a stack segment; PUSH and POP instructions
- will access this area of scratch memory. Before MS-DOS transfers control
- to a .EXE program, it sets up the SS and SP registers according to the
- declared size and location of the stack segment. Be sure to allow enough
- room for the maximum stack depth that can occur at runtime, plus a safe
- number of extra words for registers pushed onto the stack during an MS-DOS
- service call. If the stack overflows, it may damage your other code and
- data segments and cause your program to behave strangely or even to crash
- altogether!
-
- The END statement on line 60 winds up our brief HELLO.EXE program, telling
- the assembler that it has reached the end of the source file and providing
- the label of the program's point of entry from MS-DOS.
-
- The differences between .COM and .EXE programs are summarized in Figure
- 3-8.
-
-
- .COM program .EXE program
- ──────────────────────────────────────────────────────────────────────────
- Maximum size 65,536 bytes minus 256 No limit
- bytes for PSP and 2 bytes
- for stack
-
- Entry point PSP:0100H Defined by END statement
-
- AL at entry 00H if default FCB #1 has Same
- valid drive, 0FFH if
- invalid drive
-
- AH at entry 00H if default FCB #2 has Same
- valid drive, 0FFH if
- invalid drive
-
- CS at entry PSP Segment containing module
- with entry point
-
- IP at entry 0100H Offset of entry point within
- its segment
-
- DS at entry PSP PSP
-
- ES at entry PSP PSP
-
- SS at entry PSP Segment with STACK attribute
-
- SP at entry 0FFFEH or top word in Size of segment defined with
- available memory, STACK attribute
- whichever is lower
-
- Stack at entry Zero word Initialized or uninitialized
-
- Stack size 65,536 bytes minus 256 Defined in segment with
- bytes for PSP and size of STACK attribute
- executable code and data
-
- Subroutine calls Usually NEAR NEAR or FAR
-
- Exit method Int 21H Function 4CH Int 21H Function 4CH
- preferred, NEAR RET if preferred
- MS-DOS version 1
-
- Size of file Exact size of program Size of program plus header
- (multiple of 512 bytes)
- ──────────────────────────────────────────────────────────────────────────
-
-
- Figure 3-8. Summary of the differences between .COM and .EXE programs,
- including their entry conditions.
-
-
- More About Assembly-Language Programs
-
- Now that we've looked at working examples of .COM and .EXE
- assembly-language programs, let's backtrack and discuss their elements a
- little more formally. The following discussion is based on the Microsoft
- Macro Assembler, hereafter referred to as MASM. If you are familiar with
- MASM and are an experienced assembly-language programmer, you may want to
- skip this section.
-
- MASM programs can be thought of as having three structural levels:
-
- ■ The module level
-
- ■ The segment level
-
- ■ The procedure level
-
- Modules are simply chunks of source code that can be independently
- maintained and assembled. Segments are physical groupings of like items
- (machine code or data) within a program and a corresponding segregation of
- dissimilar items. Procedures are functional subdivisions of an executable
- program──routines that carry out a particular task.
-
- Program Modules
-
- Under MS-DOS, the module-level structure consists of files containing the
- source code for individual routines. Each source file is translated by the
- assembler into a relocatable object module. An object module can reside
- alone in an individual file or with many other object modules in an
- object-module library of frequently used or related routines. The
- Microsoft Object Linker (LINK) combines object-module files, often with
- additional object modules extracted from libraries, into an executable
- program file.
-
- Using modules and object-module libraries reduces the size of your
- application source files (and vastly increases your productivity), because
- these files need not contain the source code for routines they have in
- common with other programs. This technique also allows you to maintain the
- routines more easily, because you need to alter only one copy of their
- source code stored in one place, instead of many copies stored in
- different applications. When you improve (or fix) one of these routines,
- you can simply reassemble it, put its object module back into the library,
- relink all of the programs that use the routine, and voilga: instant
- upgrade.
-
- Program Segments
-
- The term segments refers to two discrete programming concepts: physical
- segments and logical segments.
-
- Physical segments are 64 KB blocks of memory. The Intel 8086/8088 and
- 80286 microprocessors have four segment registers, which are essentially
- used as pointers to these blocks. (The 80386 has six segment registers,
- which are a superset of those found on the 8086/8088 and 80286.) Each
- segment register can point to the bottom of a different 64 KB area of
- memory. Thus, a program can address any location in memory by appropriate
- manipulation of the segment registers, but the maximum amount of memory
- that it can address simultaneously is 256 KB.
-
- As we discussed earlier in the chapter, .COM programs assume that all four
- segment registers always point to the same place──the bottom of the
- program. Thus, they are limited to a maximum size of 64 KB. .EXE programs,
- on the other hand, can address many different physical segments and can
- reset the segment registers to point to each segment as it is needed.
- Consequently, the only practical limit on the size of a .EXE program is
- the amount of available memory. The example programs throughout the
- remainder of this book focus on .EXE programs.
-
- Logical segments are the program components. A minimum of three logical
- segments must be declared in any .EXE program: a code segment, a data
- segment, and a stack segment. Programs with more than 64 KB of code or
- data have more than one code or data segment. The routines or data that
- are used most frequently are put into the primary code and data segments
- for speed, and routines or data that are used less frequently are put into
- secondary code and data segments.
-
- Segments are declared with the SEGMENT and ENDS directives in the
- following form:
-
- name SEGMENT attributes
- .
- .
- .
- name ENDS
-
- The attributes of a segment include its align type (BYTE, WORD, or PARA),
- combine type (PUBLIC, PRIVATE, COMMON, or STACK), and class type. The
- segment attributes are used by the linker when it is combining logical
- segments to create the physical segments of an executable program. Most of
- the time, you can get by just fine using a small selection of attributes
- in a rather stereotypical way. However, if you want to use the full range
- of attributes, you might want to read the detailed explanation in the MASM
- manual.
-
- Programs are classified into one memory model or another based on the
- number of their code and data segments. The most commonly used memory
- model for assembly-language programs is the small model, which has one
- code and one data segment, but you can also use the medium, compact, and
- large models (Figure 3-9). (Two additional models exist with which we
- will not be concerning ourselves further: the tiny model, which consists
- of intermixed code and data in a single segment── for example, a .COM file
- under MS-DOS; and the huge model, which is supported by the Microsoft C
- Optimizing Compiler and which allows use of data structures larger than 64
- KB.)
-
- Model Code segments Data segments
- ──────────────────────────────────────────────────────────────────────────
- Small One One
- Medium Multiple One
- Compact One Multiple
- Large Multiple Multiple
- ──────────────────────────────────────────────────────────────────────────
-
- Figure 3-9. Memory models commonly used in assembly-language and C
- programs.
-
- For each memory model, Microsoft has established certain segment and class
- names that are used by all its high-level-language compilers (Figure
- 3-10). Because segment names are arbitrary, you may as well adopt the
- Microsoft conventions. Their use will make it easier for you to integrate
- your assembly-language routines into programs written in languages such as
- C, or to use routines from high-level-language libraries in your
- assembly-language programs.
-
- Another important Microsoft high-level-language convention is to use the
- GROUP directive to name the near data segment (the segment the program
- expects to address with offsets from the DS register) and the stack
- segment as members of DGROUP (the automatic data group), a special name
- recognized by the linker and also by the program loaders in Microsoft
- Windows and Microsoft OS/2. The GROUP directive causes logical segments
- with different names to be combined into a single physical segment so that
- they can be addressed using the same segment base address. In C programs,
- DGROUP also contains the local heap, which is used by the C runtime
- library for dynamic allocation of small amounts of memory.
-
-
- Memory Segment Align Combine Class Group
- model name type type type
- ──────────────────────────────────────────────────────────────────────────
- Small _TEXT WORD PUBLIC CODE
- _DATA WORD PUBLIC DATA DGROUP
- STACK PARA STACK STACK DGROUP
-
- Medium module_TEXT WORD PUBLIC CODE
- . WORD PUBLIC DATA DGROUP
- .
- .
- _DATA
- STACK PARA STACK STACK DGROUP
-
- Compact _TEXT WORD PUBLIC CODE
- data PARA PRIVATE FAR_DATA
- . WORD PUBLIC DATA DGROUP
- .
- .
- _DATA
- STACK PARA STACK STACK DGROUP
-
- Large module_TEXT WORD PUBLIC CODE
- .
- .
- .
- data PARA PRIVATE FAR_DATA
- .
- .
- .
- _DATA WORD PUBLIC DATA DGROUP
- STACK PARA STACK STACK DGROUP
- ──────────────────────────────────────────────────────────────────────────
-
-
- Figure 3-10. Segments, groups, and classes for the standard memory models
- as used with assembly-language programs. The Microsoft C Optimizing
- Compiler and other high-level-language compilers use a superset of these
- segments and classes.
-
- For pure assembly-language programs that will run under MS-DOS, you can
- ignore DGROUP. However, if you plan to integrate assembly-language
- routines and programs written in high-level languages, you'll want to
- follow the Microsoft DGROUP convention. For example, if you are planning
- to link routines from a C library into an assembly-language program, you
- should include the line
-
- DGROUP group _DATA,STACK
-
- near the beginning of the program.
-
- The final Microsoft convention of interest in creating .EXE programs is
- segment order. The high-level compilers assume that code segments always
- come first, followed by far data segments, followed by the near data
- segment, with the stack and heap last. This order won't concern you much
- until you begin integrating assembly-language code with routines from
- high-level-language libraries, but it is easiest to learn to use the
- convention right from the start.
-
- Program Procedures
-
- The procedure level of program structure is partly real and partly
- conceptual. Procedures are basically just a fancy guise for subroutines.
-
- Procedures within a program are declared with the PROC and ENDP directives
- in the following form:
-
- name PROC attribute
- .
- .
- .
- RET
- name ENDP
-
- The attribute carried by a PROC declaration, which is either NEAR or FAR,
- tells the assembler what type of call you expect to use to enter the
- procedure──that is, whether the procedure will be called from other
- routines in the same segment or from routines in other segments. When the
- assembler encounters a RET instruction within the procedure, it uses the
- attribute information to generate the correct opcode for either a near
- (intra-segment) or far (inter-segment) return.
-
- Each program should have a main procedure that receives control from
- MS-DOS. You specify the entry point for the program by including the name
- of the main procedure in the END statement in one of the program's source
- files. The main procedure's attribute (NEAR or FAR) is really not too
- important, because the program returns control to MS-DOS with a function
- call rather than a RET instruction. However, by convention, most
- programmers assign the main procedure the FAR attribute anyway.
-
- You should break the remainder of the program into procedures in an
- orderly way, with each procedure performing a well-defined single
- function, returning its results to its caller, and avoiding actions that
- have global effects within the program. Ideally procedures invoke each
- other only by CALL instructions, have only one entry point and one exit
- point, and always exit by means of a RET instruction, never by jumping to
- some other location within the program.
-
- For ease of understanding and maintenance, a procedure should not exceed
- one page (about 60 lines); if it is longer than a page, it is probably too
- complex and you should delegate some of its function to one or more
- subsidiary procedures. You should preface the source code for each
- procedure with a detailed comment that states the procedure's calling
- sequence, results returned, registers affected, and any data items
- accessed or modified. The effort invested in making your procedures
- compact, clean, flexible, and well-documented will be repaid many times
- over when you reuse the procedures in other programs.
-
-
-
-