Understanding the "New Executable" file format, which is required for OS/2, will allow you to appreciate how OS/2 operates.
DOS supports two binary file formats for executable files: .COM and .EXE. The .COM format was inherited from CP/M. The format is adequate for programs up to 64K in length that do not require relocation information for adjusting explicit references to segment addresses within the program. DOS loads a .COM file into memory at offset 100h in a segment and sets all four segment registers to the same segment address. Many small assembly language programs (such as those that appear regularly in the Productivity section of PC Magazine) use the .COM file format because it is compact and simple.
The .EXE format was designed specifically for DOS. It supports programs over 64K in length and those that make references to explicit segment addresses. A relocation table in the header section of the .EXE file allows DOS to adjust these references based on where the program is loaded into memory.
NOT ADEQUATE FOR OS/2 Neither the .COM format nor the .EXE format is suitable for a protected mode operating system such as OS/2. The .COM format is clearly inadequate because code and data share the same segment. Under OS/2, code and data must be located in separate segments.
The .EXE format is also inadequate because there is no information in the file that defines how the binary image of the executable breaks down into code and data segments. Although .EXE format programs often use separate segments for code and data, the operating system cannot determine where these segments begin and end in the file.
For this reason, a new format for executable files was required for OS/2. It's called (appropriately enough) the "New Executable" format, or sometimes the "Segmented Executable" format. The latter name indicates that the various code and data segments that make up the program are identified in the file.
Actually, the New Executable file format isn't all that new. It was first used with Microsoft Windows, which was introduced in November, 1985. The format is documented in the Windows 2.0 re Development Kit, in Appendix K of the all-new MS-DOS Encyclopedia (Microsoft Press, 1988), and in Microsoft's mammoth $3000 OS/2 Software Development Kit.
New Executable files have an .EXE extension because they incorporate the old MS-DOS executable format within the new format. Under OS/2, the New Executable format is used for all programs that run under protected-mode (these files can have a .EXE or a .COM extension), all dynamic link libraries (which have the extension .DLL), and all protected-mode device drivers (which have a .SYS extension).
Although you don't need to understand its format to program in OS/2, examining a typical New Executable file provides some valuable insights into the workings of OS/2. So, let's use a hex dump program on an OS/2 .EXE file and take a look. Since you probably don't yet have an OS/2 hex dump program, let's solve that problem right away.
AN OS/2 HEX DUMP Figure 1 shows an OS/2 assembly language program called DUMP.ASM, which is a simple hex dump program. You can assemble this program using Microsoft's MASM 5.0 under the OS/2 DOS compatibility mode simply by entering
MASM DUMP;
(You could also assemble it with an OS/2 version of MASM, of course.) Then, using the LINK.EXE program included with IBM's OS/2 1.0, run
LINK DUMP, /ALIGN:16,, DOSCALLS;
The DOSCALLS.LIB file is also included with OS/2 1.0. Before you run LINK, use the SET command to set the LIB environment variable to the directory where you store DOSCALLS.LIB.
SET LIB=directory
If you don't have the Microsoft Macro Assembler 5.0, you can create DUMP.EXE by typing the BASIC program shown in Figure 2 into BASICA and running it. This will create DUMP.EXE for you.
DUMPING DUMP.EXE You can use DUMP to dump itself by typing
DUMP DUMP.EXE
Figure 3 shows what you'll see, and it identifies the five major sections of the file. (If you created DUMP.EXE from the BASIC program, you'll see two more bytes at the end of the file.)
DUMP is a "small model" program--that is, it has one code segment and one data segment. Medium model programs (which have multiple code segments), compact model programs (multiple data segments), and large model programs (multiple code and data segments) have additional code and data segments. Windows programs and OS/2 Presentation Manager programs have additional segments following the code and data segments. These additional segments are for "resources," which include icons, cursors, and bitmaps, and templates for menus and dialog boxes.
You'll notice that the code and data segments begin on a nice even 16-byte boundary. That's the result of using the /ALIGN:16 switch when running LINK.
A TALE OF TWO HEADERS An OS/2 .EXE file has two header sections. The first is the familiar DOS .EXE header; the second is the New Executable header. Each header begins with a two-byte "signature" word. The DOS signature word is "MZ," the initials of Mark Zbikowski, one of the key designers of DOS at Microsoft. The more anonymous "NE" signature that begins the New Executable header stands for "New Executable."
When you run DUMP.EXE under DOS or under the OS/2 DOS Compatibility Mode, the operating system recognizes the DOS .EXE header and assumes the file is a normal DOS executable. It therefore runs the DOS program that is imbedded in the file. This program simply displays the message, "This program cannot be run in DOS mode," and terminates.
This tiny DOS program is automatically inserted into the .EXE file by LINK. (Earlier versions of LINK distributed with the Microsoft OS/2 Software Development Kit did not do this.) It is also possible to specify a different program to be inserted into the OS/2 .EXE file by indicating its name in a "module definition file" and specifying the module definition file as the fifth parameter to LINK.
When you create a "dual-mode" (or Family API) program that runs under both OS/2 and DOS, this tiny DOS program is replaced with a much larger DOS loader program. This loader program patches the OS/2 program so that the calls it makes to OS/2 functions are replaced with calls to DOS functions that approximate the OS/2 functions.
When you execute a dual-mode .EXE file under OS/2, OS/2 looks at the double word stored at offset 3Ch in the DOS header section (see Figure 4). This double word is the offset in the .EXE file where the new header section begins. In the discussion below, I'll refer to this offset as NEWHDR ("new header").
IDENTIFYING THE SEGMENTS When OS/2 loads a program into memory to be executed, it must be able to identify the code and data segments in that program. OS/2 can then allocate segments for the program and set up the program's "local descriptor table," which is required in protected-mode.
The header section contains a "segment table" that identifies all the code and data segments in the program. Figure 5 shows the new executable header and the information that OS/2 uses to locate and use this segment table.
The word at (NEWHDR + 1Ch)--which is the word at offset 9Ch in DUMP.EXE--is the number of code and data segments in the file. This is simply 2.
The word stored at offset (NEWHDR + 22h)--which is offset A2h in the DUMP.EXE file--is 0040h. This indicates that the segment table begins at offset (NEWHDR + 40h) or C0h.
The word at (NEWHDR + 32h)--offset B2h in DUMP.EXE--is the "alignment shift count," which is set to 4. (Recall that when linking we specified an ALIGN value of 16, which is 1 shifted left by 4 bits.)
Each segment requires 4 words (8 bytes) in the segment table. These four words are shown in the table, "DUMP.EXE's Segment Table".
The word in the segment table that indicates the location of the segment in the file must be adjusted. You take the number in the table (0010h or 0028h) and shift it left by the alignment shift count. Thus, segment 1 begins at offset 0100h from the beginning of the .EXE file and segment 2 begins at offset 0280h from the beginning of the file, which agress with what I showed in Figure 3.
The flags word in the segment table describes the segment. The least significant bit (bit 0) is 0 for a code segment and 1 for a data segment. So, segment 1 is a code segment and segment 2 is a data segment. Bits 10 and 11 indicate the I/O privilege level of the segment. Both these segments run under ring 3, which under protected mode is the least privileged level. This is normal for a simple program like DUMP.
Bit 8 is set to 1 when the segment includes relocation information. This bit is set for segment 1 but not for segment 2. I'll describe this relocation information towards the end of this column.
EXPANDING THE SEGMENTS Notice that the segment table contains two sizes--the first is the size of the segment in the file; the second is the size of the segment in memory. For the code segment, these two values are the same, but for the data segment they are not.
When OS/2 loads DUMP.EXE's data segment into memory, it must first allocate a segment that is 310h bytes long. The data segment in the DUMP.EXE file is loaded into the first 96h bytes of this memory segment. The rest of the segment is initialized to 0.
In DUMP.ASM (Figure 1), the segments identified with the .STACK, .DATA, and .DATA? keywords are consolidated into one segment group, often called DGROUP. If you compare the .DATA and .DATA? segments in DUMP.ASM (Figure 1) with the data segment in DUMP.EXE (Figure 3), you'll notice that the data segment in the file encompasses only the data initialized to something other than 0. Everything in .DATA? (plus the last few bytes in .DATA) does not require space in the .EXE file because OS/2 initializes the variables to zero when the program is loaded into memory.
When programming for OS/2 in assembly language, keep this in mind. Any variable you want initialized to zero can go in the .DATA? segment or at the end of the .DATA segment. This prevents these variables from taking up space in the .EXE file. Do not do something like this:
.DATA
Data1 db "Some data"
Data2 db 256 dup (0)
Data3 db "More data"
Because the assembler will not reorder variables, this code will require 256 bytes of zeros in the .EXE file's data segment.
ENTRY CONDITIONS Other information in the header section of the file indicates how OS/2 is to set registers on entry to the program. This is shown in Figure 6.
The word at offset (NEWHDR + 0Eh) is the segment number of the "automatic data segment." In DUMP.EXE this word is set to 2, which indicates segment 2, or the data segment. The automatic data segment is often called DGROUP and contains the program's stack. On entry to the program, DS is set to the address of this segment and CX is set to the segment's size (310h).
The double word at offset (NEWHDR + 14h) is the initial value of the instruction pointer on entry to the program. The first word is the offset (10h in DUMP.EXE) and the second word is the segment number--segment 1, or the code segment.
The double word at offset (NEWHDR + 18h) is the initial value of the stack pointer. In DUMP.EXE this indicates segment 2 (the data segment) and offset 310h, which is the very top of segment 2 after it has been expanded in memory. In DUMP.ASM, a stack size of 200h was specified. The size of the data segment (310h) is enough to accomodate the total size of the initialized data, the uninitialized data, and the 200h byte stack.
DYNAMIC LINKING The New Executable format must also support dynamic linking. Code segments that contain calls to OS/2 functions are followed by a "relocation table" in the .EXE file. This table is used with information in the header to resolve calls to OS/2 functions when the program is loaded into memory.
I've described how dynamic linking works in previous Environments columns. Basically, OS/2 inserts the addresses of OS/2 functions into a program's code segment when the program is executed. However, it is instructive to see how this information is encoded into the .EXE file.
Figure 7 shows DUMP.EXE's code segment. The calls to OS/2 functions in DUMP.EXE are coded as far calls (whose op-code is 9Ah), followed by an offset and segment address. Figure 7 shows these addresses for each of the seven OS/2 functions that DUMP.EXE calls. You'll notice that most of these addresses are coded as 0000:FFFF, which is an impossible address for a far call. It's a dummy address because the addresses of these functions are not known until the program is run.
The segment table indicates that the code segment is 14Eh bytes long. This means that the information that starts at offset 24Eh in the DUMP.EXE file is not part of the code segment. In fact, it's the relocation table.
In medium or large model programs, this relocation table could also contain information necessary to adjust far calls between code segment. However, in DUMP.EXE, the relocation table contains only dynamic link information.
The table begins with a word that indicates the number of entries in the relocation table. In DUMP.EXE this is 5. Not coincidentally, this is the number of unique OS/2 functions called by the program (DosWrite, DosExit, DosOpen, DosRead, and DosClose).
Each entry in the relocation table is 8 bytes long. Let's take a look at the first entry. The first byte is a 3. This indicates that the adjustment to be made to the code segment is to a far address (both segment and offset). The second byte is a 1, which means that the far address in the code segment must be set to an address of a routine in a dynamic link library that is identified by an "ordinal number."
This is followed by the word 131h. This is an offset relative to the beginning of the code segment where the adjustment is to be made. This is actually offset 231h in the DUMP.EXE file, which corresponds to the last DosExit call. The entry in the relocation table continues with two words--a 1 and a 5. The 1 indicates a particular dynamic link library. The 5 indicates the ordinal number of the function in that library.
Hold onto your hats. This gets insane: The 1 in the relocation table that indicates the particular dynamic link library is actually an index into the "module reference table," which is another table in the header section. The offset of the module reference table relative to NEWHDR is stored at (NEWHDR + 28h), which contains the word 58h. Thus, the module reference table begins at (NEWHDR + 58h) or D8h. The first word found there (since we're dealing with an index of 1) is another 1.
This number represents an offset into the "imported names table." The offset of the imported names table relative to NEWHDR is stored at (NEWHDR + 2Ah), which contains 5Ah. The imported names table thus begins at (NEWHDR + 5Ah), or DAh. One byte into the imported names table is the number 8, followed by the string "DOSCALLS." (The 8 is the number of characters in "DOSCALLS.") These tables are shown in Figure 8.
What this means is that the far address at offset 131h in DUMP.EXE's code segment must be set by OS/2 to the address of function 5 in the DOSCALLS dynamic link library. This function happens to be DosExit. (OS/2 does not really have a DOSCALLS.DLL dynamic link library. The DOSCALLS functions are actually part of IBMDOS.COM.)
The far address at offset 131h in DUMP.EXE's code segment is not 0000:FFFF (like most of the others), but 0000:0059. This means that there is another call to DosExit in DUMP.EXE. It is located at 59h bytes from the beginning of the code segment, i.e. at offset 159h in DUMP.EXE. The far address at 159h is 0000:FFFF. This marks the end of a chain of DosExit calls in the segment.
OS/2 uses the other four entries in the relocation table similarly. This is shown in the table "DUMP.EXE's OS/2 Function Calls." When LINK is creating the DUMP.EXE file, how does it know that DosExit is function 5 in the DOSCALLS dynamic link library? That information is in the import library DOSCALLS.LIB. The sole purpose of DOSCALLS.LIB is to provide this information to LINK so LINK can construct this relocation table.
JUST A LITTLE MORE COMPLEX Of course, we haven't looked at all the information in the header section or at other ways the relocation table can be coded. For example, in OS/2 programs that contain calls to the VIOCALLS, KBDCALLS, and other dynamic link libraries, the functions are identified by name rather than ordinal number. Without a doubt, the OS/2 .EXE format is necessarily a bit more complex than the DOS .EXE format. Fortunately, as I said at the outset, you don't have to understand the new .EXE format to program for OS/2.