Programming Tool Box

home *** CD-ROM | disk | FTP | other *** search

/ Programming Tool Box / SIMS_2.iso / bp_6_93 / bonus / winer / chap5.txt < prev next >

Wrap

Text File | 1994-09-03 | 130KB | 2,382 lines

CHAPTER 5 COMPILING AND LINKING The final step in the creation of any program is compiling and linking, to produce a stand-alone .EXE file. Although you can run a program in the BASIC editing environment, it cannot be used by others unless they also have their own copy of BASIC. In preceding chapters I explained the fundamental role of the BASIC compiler, and how it translates BASIC source statements to assembly language. However, that is only an intermediate action. Before a final executable program can be created, the compiled code in the object file must be joined to routines in the BASIC language library. This process is called linking, and it is performed by the LINK program that comes with BASIC. In this chapter you will learn about the many options and features available with the BASIC compiler and LINK. By thoroughly understanding all of the capabilities these programs offer, you will be able to create applications that are as small and fast as possible. Many programmers are content to let the BASIC editor create the final program using the pulldown menu selections. And indeed, it is possible to create a program without invoking BC and LINK manually--many programmers never advance beyond BASIC's "Make .EXE" menu. But only by understanding fully the many options that are available will you achieve the highest performance possible from your programs. I'll begin with a brief summary of the compiling and linking process, and explain how the two processes interact. I will then move on to more advanced aspects of compiling and linking. BC and LINK are very complex programs which possess many features and capabilities, and all of their many options will be described throughout this chapter. You may also refer back to Chapter 1, which describes compiling in more detail. AN OVERVIEW OF COMPILING AND LINKING ==================================== When you run the BC.EXE compiler, it reads your BASIC source code and translates some statements directly into the equivalent assembly language commands. In particular, integer math and comparisons are converted directly, as well as integer-controlled DO, WHILE, and FOR loops. Floating point arithmetic and comparisons, and string operations and comparisons are instead translated to calls to existing routines written by the programmers at Microsoft. These routines are in the BCOM and BRUN libraries that come with BASIC. As BC compiles your program, it creates an object file (having an .OBJ extension) that contains both the translated code as well as header information that LINK needs to create a final executable program. Some examples of the information in an object file header are the name of the original source file, copyright notices, offsets within the file that specify external procedures whose addresses are not known at compile time, and code and data segment names. In truth, most of this header information is of little or no relevance to the BASIC programmer; however, it is useful to know that it exists. All Microsoft-compatible object files use the same header structure, regardless of the original source language they were written in. The LINK program is responsible for combining the object code that BC produces with the routines in the BASIC libraries. A library (any file with a .LIB extension) is merely a collection of individual object files, combined one after the other in an organized manner. A header portion of the .LIB file holds the name of each object file and the procedure names contained therein, as well as the offset within the library where each object module is located. Therefore, LINK identifies which routines are being accessed by the BASIC program, and searches the library file for the procedures with those names. Once found, a copy of that portion of the library is then appended to the .EXE file being created. LINK can also join multiple object files compiled by BC to create a single executable program, and it can produce a Quick Library comprised of one or more object files. Quick Libraries are used only in the editing environment, primarily to let BASIC access non-BASIC procedures. Because the BASIC editor is really an interpreter and not a true compiler, Quick Libraries were devised as a way to let you call compiled (or assembled) subroutines during the development of a program. When LINK is invoked it reads the header information in each object file compiled by BC, and uses that to know which routines in the specified library or libraries must be added to your program. Since every external routine is listed by name, LINK simply examines the library header for the same name. It is worth mentioning that BASIC places the name of the default library in the object file, so you don't have to specify it when linking. For example, when you compile a stand-alone program (with the /o) switch) using BC version 4.5, it places the name BCOM45.LIB in the header. BASIC is not responsible for determining where external routines are located. If your program uses a PRINT statement, the compiler generates the instruction CALL 0000:0000, and identifies where in the object file that instruction is located. BASIC knows that the print routine will be located in another segment, and so leaves room for both a segment and address in the Call instruction. But it doesn't know where in the final executable file the print routine will end up. The absolute address depends on how many other modules will be linked with the current object file, and the size of the main program. In fact, LINK does not even know in which segment a given routine will ultimately reside. While it can resolve all of the code and data addresses among modules, the absolute segment in which the program will be loaded depends on whether there are TSR programs in memory, the version of DOS (and thus its size), and the number of buffers specified in the host PC's CONFIG.SYS file, among other factors. Therefore, all .EXE files also have a header portion to identify segment references. DOS actually modifies the program, assigning the final segment values as it loads the program into memory. Figure 5.1 shows how DOS, file buffers, and device drivers are loaded in memory, before any executable programs. ┌─────────────────────┐ │ ROM BIOS routines │ ├─────────────────────┤ │ Video memory │ ╞═════════════════════╡ <-- top of DOS memory (640K boundary) │ │ │ Far heap storage │ for dynamic arrays │ │ ├─────────────────────┤ │ String memory │ ├─────────────────────┤ │ The stack │ ├─────────────────────┤ │ Variable data │ ├─────────────────────┤ │ Compiled BASIC code │ ╞═════════════════════╡ <-- this address is changeable │ TSR programs │ ├─────────────────────┤ │ Device drivers │ ├─────────────────────┤ │ File control blocks │ ├─────────────────────┤ │ File buffers │ ├─────────────────────┤ │ DOS program │ ├─────────────────────┤ <-- address 0000:0600 │ BIOS work area │ ├─────────────────────┤ <-- address 0000:0400 │ Interrupt vectors │ └─────────────────────┘ <-- bottom of memory Figure 5-1: DOS and BASIC memory organization. It is important to understand that library routines are added to your program only once, regardless of how many times they are called. Even if you use PRINT three hundred times in a program, only one instance of the PRINT routine is included in the final .EXE file. LINK simply modifies each use of PRINT to call the same memory address. Further, LINK is generally smart enough to not add all of the routines in the library. Rather, it just includes those that are actually called. However, LINK can extract only entire object files from a library. If a single object module contains, say, four routines, all of them will be added, even if only one is called. For BASIC modules that you write, you can control which procedures are in which object files, and thus how they are combined. But you have no control over how the object modules provided with BASIC were written. If the routines that handle POS(0), CSRLIN, and SCREEN are contained in a single assembly language source file (and they are), all of them are added to your program even if you use only one of those BASIC statements. Now that you understand what compiling and linking are all about, you may wonder why it is necessary to know this, or why you would ever want to compile manually from the DOS command line. The most important reason is to control fully the many available compile and link options. For example, when you let the BASIC editor compile for you, there is no way to override BC's default size for the communications receive buffer. Likewise, the QuickBASIC editor does not let you specify the /s (string) option that in many cases will reduce the size of your programs. LINK offers many powerful options as well, such as the ability to combine code segments to achieve faster performance during procedure calls. Another important LINK option lets you create an .EXE file that can be run under CodeView. Again, these options are not selectable from within the QuickBASIC environment [but PDS and VB/DOS Pro Edition let you select more options than QuickBASIC], and they can be specified only by compiling and linking manually. All of these options are established via command line switches, and each will be discussed in turn momentarily. Finally, BASIC PDS includes a number of *stub files* which reduce the size of your programs, although at the expense of decreased functionality. For example, if your program does not use the SCREEN statement to enable graphics mode, a stub file is provided to eliminate graphics support for the PRINT statement. BASIC PDS [and the VB/DOS Pro Edition] also support program overlays, and to use those requires linking manually from DOS. COMPILING ========= To compile a program you run BC.EXE specifying the name of the BASIC program source file. BC accepts several optional parameters, as well as many optional command line switches. The general syntax for BC is as follows, with brackets used to indicate optional information. bc program [/options] [, object] [, listfile] [;] In most cases you will simply give the name of the BASIC source file, any option switches, and a terminating semicolon. A typical BC command is as follows: bc program /o; Here, a BASIC source file named PROGRAM.BAS is being compiled, and the output object file will be called PROGRAM.OBJ. The /o option indicates that the program will be a stand-alone .EXE file that does not require the BRUN library to be present at runtime. If the semicolon is omitted, the compiler will prompt for each of the file name parameters it needs. For example, entering bc program /o invokes the compiler, which then prompts you for the output and listing file names. Pressing Enter in response to any prompt tells BC to use the source file's first name. You may also start BC with no source file name, and let it prompt for that as well. In most cases the default file names are acceptable; however, it is not uncommon to want the output file placed into a different directory. This is done as follows: bc program, \objdir\ /o; [Note that if the trailing backslash were omitted from \objdir\ above, BC would create an output file named OBJDIR.OBJ in the root directory. Of course, that is not what is intended. Therefore, a trailing backslash is added to tell BC to use the default name of PROGRAM.OBJ, and to place that file in the directory named \OBJDIR.] If you are letting BC prompt you for the file names, you would enter the output path name at that prompt position. You may also include a drive letter as part of the path, or a drive letter only to use the default directory on the specified drive. The listing that follows shows a typical BC session that uses prompting. C>bc program /o Microsoft (R) QuickBASIC Compiler Version 4.50 (C) Copyright Microsoft Corporation 1982-1988. All rights reserved. Simultaneously published in the U.S. and Canada. Object Filename [PROGRAM.OBJ]: d:\objects\ <Enter> Source Listing [NUL.LST]: <Enter> 43965 Bytes Free 43751 Bytes Available 0 Warning Error(s) 0 Severe Error(s) C> Although you can override the default file extensions, this is not common and you shouldn't do that unless you have a good reason to. For example, the command BC source.txt , output.out; will compile a BASIC source file named SOURCE.TXT and create an object module named OUTPUT.OUT. Since there are already standard default file extension conventions, I recommend against using any others you devise. The optional list file contains a source listing of the BASIC program showing the addresses of each program statement, and uses a .LST extension by default. There are a number of undocumented options you can specify to control how the list file is formatted, and these are described later in this chapter in the section *Compiler Metacommands*. A list file may also include the compiler-generated assembly language instructions, and you specify that with the /a option switch. All of the various command options will be discussed in the section following. Notice that the positioning of the file name delimiting commas must be maintained when the object file name is omitted. If you plan to accept the default file name but also want to specify a listing file, you must use two commas like this: bc source , , listfile; The Bytes Available and Bytes Free messages indicate how much working memory the compiler has at its disposal, and how much of it remained free while compiling your program. BC must keep track of many different kind of information as it processes your source code, and it uses its own internal DGROUP memory for that. For example, every variable that you use must be remembered, as well as its address. When BASIC sees a statement such as X = 100, it must look in its *symbol table* to see if it has already encountered that variable. If so, it creates an assembly language instruction to store the value 100 at the corresponding address. Otherwise, it adds the variable X to the table, assigns a new address for it, and then adds code to assign the value 100 to that address. When you use PRINT X later on, BASIC will again search its table, find the address, and use that when it creates the code that calls the PRINT routine. Other data that BASIC must remember as it works includes the number and type of arguments for each SUB or FUNCTION that is declared, line label names and their corresponding addresses, and quoted string constants. As you may recall, in Chapter 2 I explained that BC maintains a table of string constants, and stores each in the final program only once. Even when the same quoted string is used in different places in a program, BC remembers that they are the same and stores only a single copy. Therefore, an array is used by BC to store these strings while your program is being compiled. In most cases you can simply ignore the Bytes Available and Bytes Free messages, since how much memory BASIC used or had available is of no consequence. The only exception, of course, is when your program is so large that BC needed more than was available. But again, you will receive an error message when that occurs. However, if you notice that the Bytes Free value is approaching zero, you should consider splitting your program into separate modules. The error message display indicates any errors that occurred during compilation, and if so how many. This display is mostly a throw-back to the earlier versions of the BASIC compiler, because they had no development environment. These days, most people get their program working correctly in the BASIC editor, before attempting to compile it. Of course, there must still be a facility for reporting errors. In most cases, any errors that BC reports will be severe errors. These include a mismatched number of parentheses, using a reserved word as a variable name (for example, PRINT = 12), and so forth. One example of a warning error is referencing an array that has not been dimensioned. When this happens, BASIC creates the array with a default 11 elements (0 through 10), and then reports that it did this as a warning. One interesting quirk worth mentioning is that BASIC will not let you compile a program named USER.BAS. If you enter BC USER, BC assumes that you intend to enter the entire program manually, statement by statement! This too must be a holdover from earlier versions of the compiler; however, when USER.BAS is specified it will appear that the compiler has crashed, because nothing happens and no prompt is displayed. In my testing with BASIC 7.1, any statements I entered were also ignored, and no object file was created. COMPILER OPTIONS All of the options available for use with the BASIC compiler are described in this section in alphabetical order. Some options pertain only to BASIC 7 PDS, and these are noted in the accompanying discussion. Each option is specified by listing it on the BC command line, along with a preceding forward slash (/). Also, these options apply to the BC compiler only, and not necessarily to the QB and QBX editing environments. /A The /a (assembly) switch tells BC to include the assembly language source code it creates in the listing file. The format of the file was described in detail in Chapter 4, so I won't belabor that here. Note, however, that a file name must be given in the list file position of the BC command line. Otherwise, a list file will not be written. /Ah Using /ah (array huge) tells BASIC that you plan to create dynamic arrays that may exceed 64K in total data size. This option affects numeric, TYPE, and fixed-length string arrays only, and not conventional string arrays. Normally, BASIC calculates the element addresses for array references directly, based on the segment and other information in the array descriptor. This is the most direct method, and thus provides the fastest performance and smallest code. When /ah is used, all access to non-string dynamic arrays is instead made through a called routine. This called routine calculates the segment and address of a single array element, and because it must also manipulate segment values, increases the size of your programs. Therefore, /ah should be avoided unless you truly need the ability to create huge arrays. Even if a particular array does not currently exceed the 64K segment limit, BASIC has no way to know that when it compiles your program. To minimize the size and speed penalty /ah imposes, it may be used selectively on only some of the source modules in a program. If you have one subprogram that needs to manipulate huge arrays but the rest of program does not, you should create a separate file containing only that subprogram and compile it using /ah. When the program is linked, only that module's array accesses will be slower. Note that the /ah switch is also needed if you plan to create huge arrays when running programs in the BASIC editor. However, with the BASIC editor, using /ah does not impinge on available memory or make the program run slower. Rather, it merely tells BASIC not to display an error message when an array is dimensioned to a size greater than 64K. [The BASIC editor always uses the slower code that checks for illegal array elements anyway, so it can report an error rather than lock up your computer.] One limitation that /ah will not overcome is BASIC's limit of 32,767 elements in a single dimension. That is, the statement REDIM Array%(1 to 32768) will fail, regardless of whether /ah is used. There are two ways to exceed this limit: one is to create a TYPE array in which each element is comprised of two or more variables. The other is to create an array that has more than one dimension. The brief program below shows how to access a 2-dimensional array as if it had only a single dimension. DEFINT A-Z '----- pick an arbitrary group size, and number of groups (in this ' case 100,000 elements) GroupSize = 1000: NumGroups = 100 '----- dimension the array REDIM Array(1 TO GroupSize, 1 TO NumGroups) '----- pick an element number to assign (note use of a long integer) Element& = 50000 '----- calculate the first and second subscripts First = ((Element& - 1) MOD GroupSize) + 1 Second = (Element& - 1) \ GroupSize + 1 '----- assign the appropriate array element Array(First, Second) = 123 '----- show how to derive the original element based on First and ' Second (CLNG is needed to prevent an Overflow error) CalcEl& = First + (Second - 1) * CLNG(GroupSize) /C The /c (communications) option lets you specify the size of the receive buffer when writing programs that open the COM port. The value specified represents the total buffer size in bytes, and is shared when two ports are open at once. For example, if two ports are open and the total buffer size is 4096 bytes, then each port has 2048 bytes available for itself. A receive buffer is needed when performing communications, and it accumulates the incoming characters as they are received. Each time a character is accepted by the serial port, it is placed into the receive buffer automatically. When your program subsequently uses INPUT or INPUT$ or GET to read the data, it is actually reading the characters from the buffer and not from the hardware port. Without this buffering, your program would have to wait in a loop constantly looking for each character, which would preclude it from doing anything else! Communications data is received in a continuous stream, and each byte must be processed before the next one arrives, otherwise the data will be lost. The communications port hardware generates an interrupt as each character is received, and the communications routines within BASIC act on that interrupt. The byte is retrieved from the hardware port using an assembly language IN instruction, which is equivalent to BASIC's INP function. This allows the characters to accumulate in the background, without any additional effort on your part. As each byte is received it is placed into the buffer, and a pointer is updated showing the current ending address within the buffer. As your program reads those bytes, another pointer is updated to show the new starting address within the buffer. This type of buffer is called a *circular buffer*, because the starting and ending buffer addresses are constantly changing. That is, the buffer's end point "wraps" around to the beginning when it becomes full. The receive buffer whose size is specified with /c is located in far memory. However, BASIC also maintains a second buffer in near memory, and its size is dictated by the optional LEN= argument used with the OPEN statement. Because near memory can be accessed more quickly than far memory, it is sensible for BASIC to copy a group of characters from the far receive buffer to the near buffer all at once, rather than individually each time you use GET or INPUT$. When /c is not specified, the buffer size defaults to 512 bytes. This means that up to 512 characters can be received with no intervention on your part. If more than 512 bytes arrive and your program still hasn't removed them using INPUT$ or GET, new characters that come later will be lost. It is also possible to stipulate hardware handshaking when you open the communications port. This means that the sender and receiver use physical control wires to indicate when the buffer is full, and when it is okay to resume transmitting. In many programming situations, the 512 byte default will be more than adequate. However, if many characters are being received at a high baud rate (9600 or greater) and your program is unable to accept and process those characters quickly enough, you should consider using a larger buffer. Fortunately, the buffer is located in far memory, so increasing its size will not impinge on available string and data stored in DGROUP. /D The /d (debug) option switch is intended solely to help you find problems in a program while it is being developed. Because /d causes BC to generate additional code and thus bloat your executable program, it should be used only during development. When /d is specified, four different types of tests are added to your program. The first is a call to a routine that checks if Ctrl-Break has been pressed. One call is added for every BASIC source statement, and each adds five bytes of code to your final executable program. The second addition is a one-byte assembly language INTO instruction following each integer and long integer math operation, to detect overflow errors. The third is a call to a routine that calculates array element addresses, to ensure that the element number is in fact legal. Normally, element addresses are computed directly without checking the upper and lower bounds, unless you are using huge (greater than 64K) arrays. Without /d, it is therefore possible to corrupt memory by assigning an element that doesn't exist. The final code addition implements GOSUB and RETURN statements using a library routine, rather than calling and returning from the target line directly. Normally, a GOSUB statement is translated into a three-byte assembly language *near call* instruction, and a RETURN is implemented using a one-byte *near return*. But when /d is used, the library routines ensure that each RETURN did in fact result from a corresponding GOSUB, to detect RETURN without GOSUB errors. This is accomplished by incrementing an internal variable each time GOSUB is used, and decrementing it at each RETURN. If that variable is decremented below 0 during a RETURN statement, then BASIC knows that there was no corresponding GOSUB. These library routines are added to your program only once by LINK, and comprise only a few bytes of code. However, a separate five-byte call is generated for each GOSUB and RETURN statement. Many aspects of the /d option were described in detail in Chapters 1 and 4, and there is no need to repeat that information here. But it is important to remember that /d always makes your programs larger and run more slowly. Therefore, it should be avoided once a program is running correctly. /E The /e (error) option is necessary for any program that uses ON ERROR or RESUME with a line label or number. In most cases using /e adds little or no extra code to your final .EXE program, unless ON ERROR and RESUME are actually used, or unless you are using line numbers. For each line number, four bytes are added to remember the number itself as well as its position in the file [two bytes each]. As with /d, every GOSUB and RETURN statement is implemented through a far call to a library routine, rather than by calling the target line directly. Without this added protection it would not be possible to trap "RETURN without GOSUB" errors correctly, or recover from them in an ON ERROR handler. Also see the /x option which is needed when RESUME is used alone, or with a 0 or NEXT argument. The /x switch is closely related to /e, and is described separately below. /Fpa and /Fpi (BASIC PDS and later) When Microsoft introduced their BASIC compiler version 6.0, they included an alternate method for performing floating point math. This Floating Point Alternate library (hence the /fpa) offered a meaningful speed improvement over the IEEE standard, though at a cost of slightly reduced accuracy. This optional math library has been continued with BASIC 7 PDS, and is specified using the /fpa command switch. By default, two parallel sets of floating point math routines are added to every program. When the program runs, code in BASIC's runtime startup module detects the presence of a math coprocessor chip, and selects which set of math routines will be used. The coprocessor version is called the Inline Library, and it merely serves as an interface to the 80x87 math coprocessor that does the real work in its hardware. (Note that inline is really a misnomer, because that term implies that the compiler generates coprocessor instructions directly. It doesn't.) The second version is called the Emulator Library, because it imitates the behavior of the coprocessor using assembly language subroutines. Although the ability to take advantage of a coprocessor automatically is certainly beneficial, there are two problems with this dual approach: code size and execution speed. The coprocessor version is much smaller than the routines that perform the calculations manually, since it serves only as an interface to the coprocessor chip itself. When a coprocessor is in fact present, the entire emulator library is still loaded into memory. And when a coprocessor is not installed in the host PC, the library code to support it is still loaded. The real issue, however, is that each BASIC math operation requires additional time to route execution to the appropriate routines. Since BC has no way to know if a coprocessor will be present when the program eventually runs, it cannot know which routine names to call. Therefore, BASIC uses a system of software interrupts that route execution to one library or the other. That is, instead of using, say, CALL MultSingle, it instead creates code such as INT 39h. The Interrupt 39h vector is set when the program starts to point to the correct library routine. Unfortunately, the extra level of indirection to first read the interrupt address and then call that address impacts the program's speed. Recall that Chapter 1 explained how the library routines in a BRUN- style program modify the caller's code the first time they are invoked. The compiler creates code that uses an interrupt to access the library routines, and those routines actually rewrite that code to produce a direct call. Although this code modification increases the time needed to call a library routine initially, subsequent calls will be noticeably faster. BASIC statements executed many times within a FOR or DO loop will show the greatest improvement, but statements executed only once will be much slower than usual. In a similar fashion, the coprocessor routines that are in BASIC's runtime library alter the caller's code, replacing the interrupt commands with equivalent coprocessor instructions. Each floating point interrupt that BC generates includes the necessary variable addresses and other arguments within the caller's code. These arguments are in the same format as a coprocessor instruction. The first time an interrupt is invoked, it subtracts the "magic value" &H5C32 from the bytes that comprise the interrupt instruction, thus converting the instruction into a coprocessor command. This will be covered in Chapter 12 and I won't belabor it here. Since the alternate floating point math routines do not use a coprocessor even if one is present, the interrupt method is not necessary. BC simply hardcodes the library subroutine names into the generated code, and the program is linked with the alternate math library. Besides the speed improvement achieved by avoiding the indirection of interrupts, the alternate math library is also inherently faster than the emulator library when a coprocessor is not present. The /fpi switch tells BASIC to use its normal method of including both the coprocessor and emulator math libraries in the program, and determining which to use at runtime. (See the discussion of /fpa above.) Using /fpi is actually redundant and unnecessary, because this is the default that is used if no math option is specified. /Fs (BASIC PDS only) BASIC PDS offers an option to use far strings, and this is specified with the /fs (far strings) switch. Without /fs, all conventional (not fixed- length) string variables and string arrays are stored in the same 64K DGROUP memory that holds numeric variables, DATA items, file buffers, and static numeric and TYPE arrays. Using the /fs option tells BASIC to instead store strings and file buffers in a separate segment in far memory. Although a program using far strings can subsequently hold more data, the capability comes at the expense of speed and code size. Obviously, more code is required to access strings that are stored in a separate data segment. Furthermore, the string descriptors are more complex than when near strings are used, and the code that acts on those descriptors requires more steps. Therefore, you should use /fs only when truly necessary, for example when BASIC reports an Out of string space error. Far versus near strings were discussed in depth in Chapter 2, and you should refer to that chapter for additional information. [One very unfortunate limitation of VB/DOS is that only far strings are supported. The decision makers at Microsoft apparently decided it was too much work to also write a near-strings version of the forms library. So users of VB/DOS are stuck with the additional size and speed overhead of far strings, even for small programs that would have been better served with near strings.] /G2 (BASIC PDS and later) The /g2 option tells BASIC to create code that takes advantage of an 80286 or later CPU. Each new generation of Intel microprocessors has offered additional instructions, as well as performance optimizations to the internal microcode that interprets and executes the original instructions. When an existing instruction is recoded and improved within the CPU, anyone who owns a PC using the newer CPU will benefit from the performance increase. For example, the original 8086/8088 had several instructions that performed poorly. These include Push and Pop, and Mul and Div. When Intel released the 80186, they rewrote the microcode that performs those instructions, increasing their speed noticeably. The 80286 is an offshoot of the 80186, and of course includes the same optimizations. The 80386 and 80486 offer even more improvements and additions to the original 8086 instruction set. Besides the enhancements to existing instructions, newer CPU types also include additional instructions not present in the original 8086. For example, the 80286 offers the Enter and Leave commands, each of which can replace a lengthy sequence of instructions on the earlier microprocessors. Another useful enhancement offered in the 80286 is the ability to push numbers directly onto the stack. Where the 8086 can use only registers as arguments to Push, the instructions Push 1234 and Push Offset Variable are legal with 80186 and later CPUs. Likewise, the 80386 offers several new commands to directly perform long integer operations. For example, adding two long integer values using the 8086 instruction set requires a number of separate steps. The 80386 and later CPUs can do this using only one instruction. If you are absolutely certain that your program will be run only on PCs with an 80286 or later microprocessor, the /g2 option can provide a modest improvement in code size and performance. In particular, programs that use /g2 can save one byte each time a variable address is passed to a routine. When /g2 is not used, the command PRINT Work$ results in the code shown below. PRINT Work$ Mov AX,Offset Work$ 'this requires 3 bytes Push AX 'this requires 1 byte Call B$PESD 'a far call is 5 bytes When /g2 is used, the address is pushed directly rather than first being loaded into AX, as shown following. PRINT Work$ Push Offset Work$ 'this requires 3 bytes Call B$PESD 'this call is 5 bytes With the rapid proliferation of 80386 and 80486 [and Pentium] computers, Microsoft should certainly consider adding a /g3 switch. Taking advantage of 80386 instructions could provide substantially more improvement over 80286 instructions than the 80286 provides beyond the 8086. [In fact, Microsoft has added a /g3 switch to VB/DOS. Unfortunately, it does little more than the /g2 switch. Most of a program's execution is spent running code inside the Microsoft-supplied runtime libraries. But those libraries contain only 8088 code! Using /g2 and /g3 affect only the compiler-generated code, which has little impact on a program's overall performance. Until Microsoft writes additional versions of their runtime libraries using 80386 instructions (yeah, right), using /g2 or /g3 will offer very little practical improvement.] /Ix (BASIC PDS and later) Another important addition to BASIC 7 PDS is its integral ISAM data file handler. Microsoft's ISAM (Indexed Sequential Access Method) offers three key features: The first is indexing, which lets you search a data file very quickly. A simple sequential search reads each record from the disk in order until the desired information is found. That is, to find the record for customer David Eagle you would start at the beginning of the file, and read each record until you found the one containing that name. An index system, on the other hand, keeps as many names in memory as will fit, and searches memory instead of the disk. This is many time faster than reading the disk repeatedly. If Mr. Eagle is found in, say, the 1200th position, the index manager can go directly to the corresponding record on disk and return the data it contains. The second ISAM feature is its ability to maintain the data file in sorted order. In most situations, records are stored in a data file in the order they were originally entered. For example, with a sales database, each time a customer purchases a product a new record is added holding the item and price for the item. When you subsequently step through the data file, the entries will most likely be ordered by the date and time they were entered. ISAM lets you access records in sorted order--for example, alphabetically by the customer's last name--regardless of the order in which the data was actually entered. The last important ISAM feature is its ability to establish relationships between files, based on the information they contain. Many business applications require at least two data files: one to hold names and addresses of each customer which rarely changes, and another to hold the products or other items that are ordered periodically. It would be impractical and wasteful to duplicate the name and address information repeatedly in each product detail record. Instead, many database programs store a unique customer number in each record. Then, it is possible to determine which sales record goes with which customer based on the matching numbers in both files. A program that uses this technique is called a *relational database*. To help the BASIC ISAM routines operate efficiently, you are required to provide some information when compiling your program. Each of the /i switches requires a letter indicating which option is being specified, and a numeric value. For each field in the file that requires fast (indexed) access, ISAM must reserve a block of memory for file buffers. This is the purpose of the /ii: switch. Notice that /ii: is needed only if more than 30 indexes will be active at one time. The /ie: option tells ISAM how much EMS memory to reserve for buffers, and is specified in kilobytes. This allows other applications to use the remaining EMS for their own use. The /ib: option switch tells ISAM how many 2K (2048-byte) *page buffers* to create in memory. In general, the more memory that is reserved for buffers, the faster the ISAM program can work. Of course, each buffer that you specify reduces the amount of memory that is available for other uses in your program. An entire chapter in the BASIC PDS manual is devoted to explaining the ISAM file system, and there is little point in duplicating that information here. Please refer to your BASIC documentation for more examples and tutorial information on using ISAM. In particular, advice and formulas are given that show how to calculate the numeric values these options require. In Chapter 6 I will cover file handling and indexing techniques in detail, with accompanying code examples showing how you can create your own indexing methods. /Lp And /Lr (BASIC PDS only) BASIC 7 PDS includes an option to write programs that operate under OS/2, as well as MS-DOS. Although OS/2 has yet to be accepted by most PC users, many programmers agree that it offers a number of interesting and powerful capabilities. By default, BC compiles a program for the operating system that is currently running. If you are using DOS when the program is compiled and linked, the resultant program will also be for use with DOS. Similarly, if you are currently running OS/2, then the program will be compiled and linked for use with that operating system. The /lp (protected) switch lets you override the assumption that BC makes, and tell it to create OS/2 instructions that will run in protected mode. The /lr (real) option tells BC that even though you are currently running under OS/2, the program will really be run with DOS. Again, these switches are needed only when you need to compile for the operating system that is not currently in use. /Mbf With the introduction of QuickBASIC 4.0, Microsoft standardized on the IEEE format for floating point data storage. Earlier versions of QuickBASIC and GW-BASIC used a faster, but non-standard proprietary numeric format that is incompatible with other compilers and languages. In many cases, the internal numeric format a compiler uses is of little consequence to the programmer. After all, the whole point of a high-level language is to shield the programmer from machine-specific details. One important exception is when numeric data is stored in a disk file. While it is certainly possible to store numbers as a string of ASCII characters, this is not efficient. As I described in Chapter 2, converting between binary and decimal formats is time consuming, and also wastes disk space. Therefore, BASIC (and most other languages) write numeric data to a file using its native fixed-length format. That is, integers are stored in two bytes, and double-precision data in eight. Although QuickBASIC 4 and later compilers use the IEEE format for numeric data storage, earlier version of the compiler do not. This means that values written to disk by programs compiled using earlier version of QuickBASIC or even GW-BASIC cannot be read correctly by programs built using the newer compilers. The /mbf option tells BASIC that it is to convert to the original Microsoft Binary Format (hence the MBF) prior to writing those values to disk. Likewise, floating point numbers read from disk will be converted from MBF to IEEE before being stored in memory. [Even when /mbf is used, all floating point numbers are still stored in memory and manipulated using the IEEE method. It is only when numbers are read from or written to disk that a conversion between MBF and IEEE format is performed.] Notice that current versions of Microsoft BASIC also include functions to convert between the MBF and IEEE formats manually. For example, the statement Value# = CVDMBF(Fielded$) converts the MBF-format number held in Fielded$, and assigns an IEEE-format result to Value#. When /mbf is used, however, you do not have to perform this conversion explicitly, and using Value# = CVD(Fielded$) provides the identical result. Also see the data format discussion in Chapter 2, that compares the IEEE and MBF storage methods in detail. /O BASIC can create two fundamentally different types of .EXE programs: One type is a stand-alone program that is completely self-contained. The other type requires the presence of a special runtime .EXE library file when it runs, which contains the routines that handle all of BASIC's commands. By default, BASIC creates a program that requires the runtime .EXE library, which produces smaller program files. However, the runtime library is also needed, and is loaded along with the program into memory. The differences between the BRUN and BCOM programs were described in detail in Chapter 1. The /o switch tells BASIC to create a stand-alone program that does not require the BRUN library to be present. Notice that when /o is used, the CHAIN command is treated as if you had used RUN, and COMMON variables may not be passed to a subsequently executed program. /Ot (BASIC PDS and later) Each time you invoke a BASIC subprogram, function, or DEF FN function, code BC adds to the subprogram or function creates a stack frame that remembers the caller's segment and address. Normally, Call and Return statements in assembly language are handled directly by the microprocessor. DEF FN functions and GOSUB statements are translated by the compiler into near calls, which means that the target address is located in the same segment. Invoking a formal function or subprogram is instead treated as a far call, to support multiple segments and thus larger programs. Therefore, a RETURN or EXIT DEF statement assumes that a single address word is on the stack, where EXIT SUB or EXIT FUNCTION expect both a segment and address to be present (two words). A problem can arise if you invoke a GOSUB routine within a SUB or FUNCTION procedure, and then attempt to exit the procedure from inside that subroutine with EXIT SUB or EXIT FUNCTION. If a GOSUB is active, EXIT SUB will incorrectly return to the segment and address that are currently on the stack. Unfortunately, the address is that of the statement following the GOSUB, and the "segment" is in fact the address portion of the original caller's return location. This is shown in Figure 5-2. ┌── This is the original caller's segment and address to return to. │ │ │ │ │ │ │ │ ├─────────────────────────┤ ├─> │ Caller's return segment │ │ ├─────────────────────────┤ └─> │ Caller's return address │ <─┐ ├─────────────────────────┤ │ │ GOSUB's return address │ <─┤ ├─────────────────────────┤ │ │(next available location)│ │ ├─────────────────────────┤ │ │ │ │ │ │ │ These addresses will incorrectly ─┘ be used as a segment and address. Figure 5.2: The stack frame within a procedure while a GOSUB is pending. To avoid this potential problem, the original caller's segment and address are saved when a subprogram or function is first invoked. The current stack pointer is also saved, so it can be restored to the correct value, no matter how deeply nested GOSUB calls may become. Then when the procedure is exited, another library routine is called that forces the originally saved segment and address to be on the stack in the correct position. Because this process reduces the speed of procedure calls and adds to the resultant code size, the /ot option was introduced with BASIC 7 PDS. Using /ot tells BASIC not to employ the larger and slower method, unless you are in fact using a GOSUB statement within a procedure. Since this optimization is disabled automatically anyway in that case, it is curious that Microsoft requires a switch at all. That is, BC should simply optimize procedure calls where it can, and use the older method only when it has to. /R The /r switch tells BASIC to store multi-dimensioned arrays in row, rather than column order. All arrays, regardless of their type, are stored in a contiguous block of memory. Even though string data can be scattered in different places, the table of descriptors that comprise a string array is contiguous. When you dimension an array using two or more subscripts, each group of rows and columns is placed immediately after the preceding one. By default, BASIC stores multi-dimensioned arrays in column order, as shown in Figure 5-3. ┌─────────────┐ │ Array(5, 2) │ ^ ├─────────────┤ │ │ Array(4, 2) │ │ ├─────────────┤ │ │ Array(3, 2) │ └── toward higher addresses ├─────────────┤ │ Array(2, 2) │ ├─────────────┤ │ Array(1, 2) │ ├─────────────┤ │ Array(5, 1) │ ├─────────────┤ │ Array(4, 1) │ ├─────────────┤ │ Array(3, 1) │ ├─────────────┤ │ Array(2, 1) │ ├─────────────┤ │ Array(1, 1) │ └─────────────┘ Figure 5.3: How BASIC stores a 2-dimensional array dimensioned created using DIM Array(1 TO 5, 1 TO 2). As you can see, each of the elements in the first subscript are stored in successive memory locations, followed each of the elements in the second subscript. In some situations it may be necessary to maintain arrays in row order, for example when interfacing with another language that expects array data to be organized that way [notably FORTRAN]. When an array is stored in row order, the elements are arranged such that Array(1, 1) is followed by Array(1, 2), which is then followed by Array(2, 1), Array(2, 2), Array(3, 1), and so forth. Although many of the BC option switches described here are also available for use with the QB editing environment, /r is not one of them. /S The /s switch has been included with BASIC since the first BASCOM 1.0 compiler, and it remains perhaps the least understood of all the BC options. Using /s affects your programs in two ways. The first is partially described in the BASIC manuals, which is to tell BC not to combine like string constants as it compiles your program. As you learned in Chapter 2, BASIC makes available as much string memory as possible in your programs, by consolidating identical constant string data. For example, if you have the statement PRINT "Insert disk in drive A" seven times in your program, the message is stored only once, and used for each instance of PRINT. In order to combine like data the BC compiler examines each string as it is encountered, and then searches its own memory to see if that string is already present. Having to store all of the strings your program uses just to check for duplicates impinges on BC's own working memory. At some point it will run out of memory, since it also has to remember variable and procedure names, line labels and their corresponding addresses, and so on. When this happens, BC has no recourse but to give up and display an "Out of memory" error message. The /s switch is intended to overcome this problem, because it tells the compiler not to store your program's string constants. Instead of retaining the strings in memory for comparison, each is simply added to the object file as it is encountered. However, strings four characters long or shorter are always combined, since short strings are very common and doing that does not require much of BC's memory. The second [undocumented] thing /s does is to add two short (eight bytes each) assembly language subroutines to the very beginning of your program. Two of the most common string operations are assignments and concatenations, which are handled by routines in the runtime library. Normally, a call to either of these routines generates thirteen bytes of code, including the statements that pass the appropriate string addresses. The subroutines that /s adds are accessed using a near rather than a far call, and they receive the string addresses in CPU registers rather than through the stack. Therefore, they can be called using between three and nine bytes, depending on whether the necessary addresses are already in the correct registers at the time. The inevitable trade-off, however, is that calling one subroutine that in turn calls another reduces the speed of your programs slightly. In many cases--especially when there are few or no duplicated string constants--using /s will reduce the size of your programs. This is contrary to the Microsoft documentation which implies that /s will make your programs larger because the duplicate strings are not combined. I would like to see Microsoft include this second feature of /s as a separate option, perhaps using /ss (string subroutine) as a designator. /T The /t (terse) switch tells BC not to display its copyright notice or any warning (non-fatal) error messages. This option was not documented until BASIC PDS, even though it has been available since at least QuickBASIC 4.0. The only practical use I can see for /t is to reduce screen clutter, which is probably why QB and QBX use it when they shell to DOS to create an .EXE program. /V and /W Any programs that use event handling such as ON KEY, ON COM, ON PLAY, or the like [but not ON GOTO or ON GOSUB] require that you compile using either the /v or /w option switches. These options do similar things, adding extra code to call a central handler that determines if action is needed to process an event. However, the /v switch checks for events at every program statement while /w checks only at numbered or labeled lines. In Chapter 1 I described how event handling works in BASIC, using polling rather than true interrupt handling. There you saw how a five-byte call is required each time BASIC needs to see if an event has occurred. Because of this added overhead, many programmers prefer to avoid BASIC's event trapping statements in favor of manually polling when needed. However, it is important to point out that by using line numbers and labels sparingly in conjunction with /w, you can reduce the amount of extra code BASIC creates thus controlling where such checking is performed. /X Like the /e switch, /x is used with ON ERROR and RESUME; however, /x increases substantially the size of your final .EXE program file. When RESUME, RESUME 0, or RESUME NEXT are used, BASIC needs a way to find where execution is to resume in your program. Unfortunately, this is not a simple task. Since a single BASIC source statement can create a long series of assembly language commands, there is no direct correlation between the two. When an error occurs and you use RESUME with no argument telling BASIC to execute the same statement again, it can't know directly how many bytes earlier that statement begins. Therefore, when /x is specified, a numbered line marker is added in the object code to identify the start of every BASIC source statement. These markers comprise a linked list of statement addresses, and the RESUME statement walks through this list looking for the address that most closely precedes the offending BASIC statement. Because of the overhead to store these addresses--four bytes for each BASIC source statement--many professional programmers avoid using /x unless absolutely necessary. However, the table of addresses is stored within the code segment, and does not take away from DGROUP memory. /Z (BASIC PDS and later) The /z switch is meant to be used in conjunction with the Microsoft editor. This editor is included with BASIC PDS, and allows editing programs that are too large to be contained within the QB and QBX editing environments. When a program is compiled with /z, BASIC includes line number information in the object file. The Microsoft editor can then read these numbers after an unsuccessful compile, to help you identify which lines were in error. Because the addition of these line number identifiers increases a program's size, /z should be used only for debugging and not in a final production. In general, the Microsoft editor has not been widely accepted by BASIC programmers, primarily because it is large, slow, and complicated to use. Microsoft also includes a newer editing environment called the Programmer's Workbench with BASIC PDS; however, that too is generally shunned by serious developers for the same reasons. /Zd Like /z, the /zd switch tells BC to include line number information in the object file it creates. Unlike /zi which works with CodeView (see the /zi switch below), /zd is intended for use with the earlier SYMDEB debugger included with MASM 4.0. It is extremely unlikely that you will ever need to use /zd in your programming. /Zi The /zi option is used when you will execute your program in the Microsoft CodeView debugger. CodeView was described in Chapter 4, and there is no reason to repeat that information here. Like /z and /zd, /zi tells BC to include additional information about your program in the object file. Besides indicating which assembler statements correspond to which BASIC source lines, /zi also adds variable and procedure names and addresses to the file. This allows CodeView to display meaningful names as you step through the assembly language compiled code, instead of addresses only. In order to create a CodeView-compatible program, you must also link with the /co LINK option. All of the options that LINK supports are listed elsewhere in this chapter, along with a complete explanation of what each does. Note that CodeView cannot process a BASIC source file that has been saved in the Fast Load format. This type of file is created by default in QuickBASIC, when you save a newly created program. Therefore, you must be sure to select the ASCII option button manually from the Save File dialog box. In fact, there are so many bugs in the Fast Load method that you should never use it. Problems range from QuickBASIC hanging during the loading process to completely destroying your source file! If a program that has been saved as ASCII is accidentally damaged, it is at least possible to reconstruct it or salvage most of it using a DOS tool such as the Norton Utilities. But a Fast Load file is compressed and encrypted; if even a single byte is corrupted, QB will refuse to load it. Since a Fast Load file doesn't really load that much faster than a plain ASCII file anyway, there is no compelling reason to use it. [Rather than fix the Fast Load bug, which Microsoft claims they cannot reproduce, beginning with PDS version 7 BASIC now defaults to storing programs as plain ASCII files.] COMPILER METACOMMANDS There are a number of compiler metacommands that you can use to control how your program is formatted in the listing file that BC optionally creates. Although these list file formatting options have been available since the original IBM BASCOM 1.0 compiler [which Microsoft wrote], they are not documented in the current versions. As with '$INCLUDE and '$DYNAMIC and the other documented metacommands, each list formatting option is preceded by a REM or apostrophe, and a dollar sign. The requirement to imbed metacommands within remarks was originally to let programs run under the GW-BASIC interpreter without error. Each of the available options is listed below, along with an explanation and range of acceptable values. Many options require a numeric parameter as well; in those cases the number is preceded by a colon. For example, a line width of 132 columns is specified using '$LINESIZE: 132. Other options such as '$PAGE do not require or accept parameters. Notice that variables may not be used for metacommand parameters, and you must use numbers. CONST values are also not allowed. Understand that the list file that BASIC creates is of dubious value, except when debugging a program to determine the address at which a runtime error occurred. While a list file could be considered as part of the documentation for a finished program, it conveys no useful information. These formatting options are given here in the interest of completeness, and because they are not documented anywhere else. [In order to use any of these list options you must specify a list file name when compiling.] '$LINESIZE The '$LINESIZE option lets you control the width of the list file, to prevent or force line wrapping at a given column. The default list width is 80 columns, and any text that would have extended beyond that is instead continued on the next line. Many printers offer a 132-column mode, which you can take advantage of by using '$LINESIZE: 132. [Of course, it's up to you to send the correct codes to your printer before printing such a wide listing.] Note that the minimum legal width is 40, and the maximum is 255. '$LIST The '$LIST metacommand accepts either a minus (-) or plus (+) argument, to indicate that the listing should be turned off and on respectively. That is, using '$LIST - suspends the listing at that point in the program, and '$LIST + turns it back on. This option is useful to reduce the size of the list file and to save paper when a listing is not needed for the entire program. '$PAGE To afford control over the list file format, the '$PAGE metacommand forces subsequent printing to begin on the next page. Typically '$PAGE would be used prior to the start of a new section of code; for example, just before each new SUB or FUNCTION procedure. This tells BC to begin the procedure listing on a new page, to avoid starting it near the bottom of a page. 'PAGEIF '$PAGEIF is related to '$PAGE, except it lets you specify that a new page is to be started only if a certain minimum number of lines remain on the current page. For example, '$PAGEIF: 6 tells BC to advance to the next page only if there are six or less printable lines remaining. '$PAGESIZE You can specify the length of each page with the '$PAGESIZE metacommand, to override the 66-line default. This would be useful with laser printers, if you are using a small font that supports more than that many lines on each page. Notice that a 6-line bottom margin is added automatically, so specifying a page size of 66 results in only 60 actual lines of text on each page. The largest value that can be used with '$PAGESIZE is 255, and the smallest is 15. To set the page length to 100 lines you would use '$PAGESIZE: 100. There is no way to disable the page numbering altogether, and using values outside this range result in a warning error message. '$OCODE Using '$OCODE (object code) allows you to turn the assembly language source listing on or off, using "+" or "-" arguments. Normally, the /a switch is needed to tell BC to include the assembly language code in the list file. But you can optionally begin a listing at any place in the program with '$OCODE +, and then turn it off again using '$OCODE -. '$SKIP Like '$PAGE and '$PAGEIF, the '$SKIP option lets you control the appearance of the source listing. '$SKIP accepts a colon and a numeric argument that tells BC to print that many blank lines in the list file or skip to the end of the page, whichever comes first. '$TITLE and '$SUBTITLE By default, each page of the list file has a header that shows the current page number, and date and time of compilation. The '$TITLE and '$SUBTITLE metacommands let you also specify one or two additional strings, which are listed at the start of each page. Using '$TITLE: 'My program' tells BASIC to print the text between the single quotes on the first line of each page. If a subtitle is also specified, it will be printed on the second line. Note that the title will be printed on the first page of the list file only if the '$TITLE metacommand is the very first line in the BASIC source file. LINKING ======= Once a program has been compiled to an object file, it must be linked with the routines in the BASIC library before it can be run. LINK combines one or more object files with routines in a library, and produces an executable program file having an .EXE extension. LINK is also used to create Quick Libraries for use in the QB editing environment, and that is discussed later in this chapter. LINK can combine multiple BASIC object files, as well as object files created with other Microsoft-compatible languages. In the section that follows you will learn how the LINK command line is structured, what each parameter is for, and how the many available options may be used. Using the various LINK options can reduce the size of your programs, and help them run faster as well. I should mention here it is imperative that you use the correct version of LINK. DOS comes with an old version of LINK.EXE that is not suitable for use with QuickBASIC or BASIC PDS. Therefore, you should always use the LINK.EXE program that came with your compiler. I also suggest that you remove or rename the copy of LINK that came with DOS if it is still on your hard disk. More than once I have seen programmers receive inexplicable LINK error messages because their PATH setting included the \DOS directory. In particular, many of the switches that current versions of LINK support cause an "Unrecognized option" message from older versions. If the correct version of LINK is not in the current directory, then DOS will use its PATH variable to see where else to look, possibly running an older version. The LINK command line is structured as follows, using brackets to indicate optional information. The example below is intended to be entered all on one line. link [/options] objfile [objfile] [libfile.lib], [exefile], [mapfile], [libfile] [libfile] [;] As with the BC compiler, you may either enter all of the information on a single command, let LINK prompt you for the file names, or use a combination of the two. That is, you could enter LINK [filename] and let LINK prompt you for the remaining information. Default choices are displayed by LINK, and these are used if Enter alone is pressed. Typing a semicolon on a prompt line by itself or after a file name tells LINK to assume the default responses for the remaining fields. LINK also lets you use a *response file* to hold the file names and options. When there are dozens or even hundreds of files being specified, this is the only practical method. Response files are described later in this section. Also like BC, the separating commas are required as place holders when successive fields are omitted. For example, the command: link program , , mapfile; links PROGRAM.OBJ to produce PROGRAM.EXE, and creates a map file with the name MAPFILE.MAP. If the second comma had not been included, the output file would be named MAPFILE.EXE and a map file would not be written at all. The first LINK argument is one or more optional command switches, which let you control some of the ways in which link works. For example, the /co switch tells LINK to add line number and other information needed when debugging the resultant EXE program with CodeView. Another option, /ex, tells LINK to reduce the size of the program using a primitive form of data compression. Each LINK option will be discussed in the section that follows, and we won't belabor them here. The second argument is the name of the main program object module, which contains the code that will be executed when the program is run from the DOS command line. Many programs use only a single object file; however, in a multi-module program you must list the main module first. That is then followed by the other modules that contain additional subprograms and functions. Of course, you can precede any file name with a drive letter and/or directory name as necessary. You may also specify that all of the object modules in an entire library be included in the executable program by entering the library name where the object name would be given. Since LINK assumes an .OBJ file extension, you must explicitly include the .LIB extension when linking an entire library. For example, the command link mainprog subs.lib; creates a program named MAINPROG.EXE which is comprised of the code in MAINPROG.OBJ and all of the routines in SUBS.LIB. Normally, a library is specified at the end of the LINK command line. However, in that case only the routines that are actually called will be added to the program. Placing a library name in the object name field tells LINK to add all of the routines it contains, regardless of whether they are actually needed. Normally you do not want LINK to include unused routines, but that is often needed when creating Quick Libraries which will be discussed in a moment. Notice that when more than one object file is given, the first listed is the one that is run initially. Its name is also used for the executable file name if an output file name is not otherwise given. Like the BC compiler, LINK assumes that you are using certain file naming conventions but lets you override those assumptions with explicit extensions. I recommend that you use the standard extensions, and avoid any unnecessary heartache and confusion. In particular, using non-standard names is a poor practice when more than one programmer is working on a project. Also notice that either spaces or plus signs (+) may be used to separate each object and library file name. Which you use is a matter of personal preference. The third LINK field is the optional executable output file name. If omitted, the program will use the base name of the first object file listed. Otherwise, the specified name will be used, and given an .EXE extension. Again, you can override the .EXE extension, but this is not recommended. Following the output file name field is the map file entry. A map file contains information about the executable program, such as segment names and sizes, the size of the stack, and so forth. The /map option, which is described later, tells LINK to include additional information in the map file. In general, a map file is not useful in high-level language programming. One interesting LINK quirk is that it will create a map file if empty commas are used, but not if a semicolon is used prior to that field. You can specify the reserved DOS device name nul to avoid creating a map file. For example, the command link program, , nul, library; links PROGRAM.OBJ to create PROGRAM.EXE, but not does not create the file PROGRAM.MAP. I use a similar line in the batch files I use for compiling and linking, to avoid cluttering my hard disk with these useless files. The last field specifies one or more libraries that hold additional routines needed for the program. In purely BASIC programming you do not need to specify a library name, because the compiler specifies a default library in the object file header. If you are linking with assembly or other language subroutines that are in a library, you would list the library names here. You can list any number of library names, and LINK will search each of them in turn looking for any routines it does not find in the object files. The version of LINK that comes with BASIC 7 also accepts a definitions file as an optional last argument. But that is used only for OS/2 and Windows programming, and is not otherwise needed with BASIC. LINK OPTIONS All of the available LINK options that are useful with BASIC running under DOS are shown following in alphabetical order. As with the switches supported by BC, each is specified on the LINK command line by preceding it forward slash (/). Many of the options may be abbreviated by entering just the first few letters of their name. For example, what I refer to as the /co option is actually named /codeview; however, the first two letters are sufficient for LINK to know what you mean. Each option is described using only enough letters to understand the meaning of its name. You can see the full name for those options in the section headers below, or run LINK with the /help switch. Any switch may be specified using only as many characters as needed to distinguish it from other options. That is, /e is sufficient to indicate /exepack because it is the only one that starts with that letter. But you must use at least the first three characters of the /nologo switch, since /no could mean either /nologo or /nodefaultlibrary. The details section for each option shows the minimum letters that are actually needed. /BATCH Using /ba tells LINK that you are running it from a batch file, and that it is not to pause and prompt for library names it is unable to find. When /ba is used and external routines are not found, a warning message is issued rather than the usual prompt. The /ba option is not generally very useful--even if you are linking with a batch file--since it offers no chance to fix an incorrect file or directory name. One interesting LINK quirk worth noting is when it is unable to find a library you must include a trailing backslash (\) after the path name when reentering it manually. If LINK displays the prompt "Enter new file spec:" and you type \pathname, you are telling LINK to use the library named PATHNAME.LIB and look for it in the root directory. What is really needed is to enter \pathname\, which tells it to look in that directory for the library. Furthermore, if you initially enter the directory incorrectly, you must then specify both the directory and library name. If you are not sure of the default library name it is often easier to simply press Ctrl-C and start again. /CODEVIEW The /co switch is necessary when preparing a program for debugging with CodeView. Because of the extra information that LINK adds to the resultant executable file, /co should be used only for debugging purposes. However, the added data is stored at the end of the file, and is not actually loaded into memory if the program is run from the DOS command line. The program will therefore have the same amount of memory available to it as if /co had not been used. /EXEPACK When /e is used, LINK compresses repeated character strings to reduce the executable file size. Because variables and static arrays are initialized to zero by the compiler, they are normally stored in the file as a group of CHR$(0) zero bytes. The /e switch tells LINK to replace these groups of zero bytes with a group count. Then when the program is run, the first code that actually executes is the unpacking code that LINK adds to your program. This is not unlike the various self-extracting archive utilities that are available commercially and as shareware. Notice that the compression algorithm LINK employs is not particularly sophisticated. For example, SLR System's OptLink is an alternate linker that reduces a program to a much smaller file size than Microsoft's LINK. PKWare and SEA Associates are two other third-party companies that produce utilities to create smaller executable files that unpack and run themselves automatically. /FARCALLTRANSLATE By default, all calls from BASIC to its runtime library routines are far calls, which means that both a segment and address are needed to specify the location of the routine being accessed. Assembly language and C routines meant to be used with BASIC are also designed as far calls, as are BASIC subprograms and functions. This affords the most flexibility, and also lets you create programs larger than could fit into a single 64K segment. Within the BASIC runtime library there are both near and far calls to other library routines. Which is used depends on the routines involved, and how the various segments were named by the programmers at Microsoft. Because a far call is a five-byte instruction compared to a near call which is only three, a near call requires less code and can execute more quickly. In many cases, separate code segments that are less than 64K in size can be combined by LINK to form a single segment. The routines in those segments could then be accessed using near calls. However, BASIC always generates far calls as it compiles your programs. The /f option tells LINK to replace the far calls it encounters with near calls, if the target address is indeed close enough to be accessed with a near call. The improvement /f affords is further increased by also using the /packcode switch (see below). Although the far call is replaced with a near call, LINK can't actually reduce the size of the original instruction. Instead it inserts a Nop (no operation) assembly language command where part of the far call had been. But since a near call does not require segment relocation information in the .EXE file header, the file size may be reduced slightly. See the text that accompanies Figure 5- 1 earlier in this chapter for an explanation of DOS' loading and relocation process. There is one condition under which the /f option can cause your program to fail. The machine code for a far call is a byte with the value of &H9A, which is what LINK searches for as it converts the far calls to near ones. Most high-level languages, store all data in a separate segment, which is ignored by LINK when servicing /f. BASIC, however, stores line label addresses in the program's code segment when ON GOTO and the other ON commands are used. If one of those addresses happens to be &H9A, then LINK may incorrectly change it. In my personal experience, I have never seen this happen. I recommend that you try /f in conjunction with /packc, and then test your program thoroughly. You could also examine any ON statements with CodeView if you are using them, to determine if an address happens to contain the byte &H9A. /HELP Starting LINK with the /he option tells it to display a list of all the command options it recognizes. This is useful both as a reminder, and to see what new features may have been added when upgrading to a newer compiler. In many cases, new compilers also include a new version of LINK. /INFO The /inf switch tells LINK to display a log of its activity on the screen as it processes your file. The name of each object file being linked is displayed, as are the routines being read from the libraries. It is extremely unlikely that you will find /inf very informative. /LINENUM If you have compiled with the /zd switch to create SYMDEB information, you will also need to specify the /li LINK switch. This tells LINK to read the line number information in the object file, and include it in the resultant executable program. SYMDEB is an awkward predecessor to CodeView that is also hard to use, and you are not likely to find /li useful. /MAP If you give a map file name when linking, LINK creates a file showing the names of every segment in your program. The /m switch tells LINK to also include all of the public symbol names. A public symbol is any procedure or data in the object file whose address must be determined by LINK. This information is not particularly useful in purely BASIC programming, but it is occasionally helpful when writing subroutines in assembly language. Segment naming and grouping will be discussed in Chapter 13. /NODEFAULTLIB When BC compiles your program, it places the default runtime library name into the created object file's header. This way you can simply run LINK, without having to specify the correct library manually. Before BASIC PDS there were only two runtime library names you had to deal with--QuickBASIC 4.5 uses BCOM45.LIB and BRUN45.LIB. But PDS version 7.1 comes with 16 different libraries, each intended for a different use. For example, there are BRUN and BCOM libraries for every combination of near and far strings, IEEE and /fpa (alternate) math, and DOS and OS/2. That is, BRT71EFR.LIB stands for BASIC Runtime 7.1 Emulator Far strings Real mode. Likewise, BCL71ANP is for use with a BCOM stand-along program using Alternate math and Near strings under OS/2 Protected mode. Using /nod tells LINK not to use the library name imbedded within the object file, which of course means that you must specify a library name manually. The /nod switch also accepts an optional colon and explicit library name to exclude. That is, /nod:libname means use all of the default libraries listed in the object file except libname. In general, /nod is not useful with BASIC, unless you are using an alternate library such as Crescent Software's P.D.Q. Another possible use for /nod is if you have renamed the BASIC libraries. /NOEXTDICT As LINK combines the various object files that comprise your program with routines in the runtime library, it maintains a table of all the procedure and data names it encounters. Some of these names are in the object modules, such as the names of your BASIC subprograms and functions. Other procedure names are those in the library. In some situations the same procedure or data name may be encountered more than once. For example, when you are linking with a stub file it will contain a routine with the same name as the one it replaces in BASIC's library. Usually, LINK will issue an error message when it finds more than one occurrence of a public name. If you use /noe (No Extended Dictionary) LINK knows to use the routine or data item it finds first, and not to issue an error message. The /noe option should be used only when necessary, because it causes LINK to run more slowly. Linking with stub files is described separately later in this chapter. /NOFARCALL The /nof switch is usually not needed, since by default LINK does not translate far calls to near ones (see /farcalltranslate earlier in this section). But since you can set an environment variable to tell LINK to assume /far automatically, /nof would be used to override that behavior. Setting LINK options through the use of environment variables is described later in this chapter. /NOLOGO The /nol switch tells LINK not to display its copyright notice, and, like the /t BC switch may be used to minimize screen clutter. /NOPACKCODE As with the /nof switch, /nop is not necessary unless you have established /packc as the default behavior using an environment variable. /OVERLAYINT When you have written a program that uses overlays, BASIC uses an *overlay manager* to handle loading subprograms and functions in pieces as they are needed. Instead of simply calling the overlay manager directly, it uses an interrupt. This is similar to how the routines in a BRUN library are accessed. BASIC by default uses Interrupt &H3F, which normally will not conflict with the interrupts used by DOS, the BIOS, or network adapter cards. If an interrupt conflict is occurring, you can use the /o switch to specify that a different interrupt number be used to invoke the overlay manager. This might be necessary in certain situations, perhaps when data acquisition or other special hardware is installed in the host PC. /PACKCODE The /packc switch is meant to be used with /far, and it combines multiple adjacent code segments into as few larger ones as possible. This enable the routines within those segments to call each other using near, rather than far calls. When combined with /f, /packc will make your programs slightly faster and possibly reduce their size. /PAUSE Using /pau tells link to pause after reading and processing the object and library files, but before writing the final executable program to disk. This is useful only when no hard drive is available, and all of the files will not fit onto a single floppy disk. /QUICKLIB The /q switch tells LINK that you are creating a Quick Library having a .QLB extension, rather than an .EXE program file. A Quick Library is a special file comprised of one or more object modules, that is loaded into the QB editing environment. Although BASIC can call routines written in non-BASIC languages, they must already be compiled or assembled. Since the BASIC editor can interpret only BASIC source code, Quick Libraries provide a way to access routines written in other languages. Creating and using Quick Libraries is discussed separately later in this chapter. /SEGMENTS The /seg: switch tells LINK to reserve memory for the specified number of segment names. When LINK begins, it allocates enough memory to hold 128 different segment names. This is not unlike using DIM in a BASIC program you might write to create a 128-element string array. If LINK encounters more than 128 names as it processes your program, it will terminate with a "Too many segments" error. When that happens, you must start LINK again using the /seg switch. All of the segments in an object module that contain code or data are named according to a convention developed by Microsoft. Segment naming allows routines in separate files to ultimately reside in the same memory segment. Routines in the same segment can access each other using near calls instead of far calls, which results in smaller and faster programs. Also, all data in a BASIC program is combined into a single segment, even when the data is brought in from different modules. LINK knows which segments are to be combined by looking for identical names. The routines in BASIC's runtime library use only a few different names, and it is not likely that you will need to use /seg in most situations. But when writing a large program that also incorporates many non-BASIC routines, it is possible to exceed the 128-name limit. It is also possible to exceed 128 segments when creating a very large Quick Library comprised of many individual routines. The /seg switch requires a trailing colon, followed by a number that indicates the number of segment names to reserve memory for. For example, to specify 250 segments you would use this command line: link /seg:250 program, , nul, library; In most cases, there is no harm in specifying a number that is too large, unless that takes memory LINK needs for other purposes. Besides the segment names, LINK must also remember object file names, procedure names, data variables that are shared among programs, and so forth. But if LINK runs out of memory while it is processing your program, it simply creates a temporary work file to hold the additional information. /STACK The /st: option lets you control the size of BASIC's stack. One situation where you might need to do this is if your program has deeply nested calls to non-static procedures. Likewise, calling a recursive subprogram or function that requires many levels of invocation will quickly consume stack space. You can increase the stack size in a QuickBASIC program by using the CLEAR command: CLEAR , , stacksize where stacksize specifies the number of bytes needed. However, CLEAR also clears all of your variables, closes all open files, and erases any arrays. Therefore, CLEAR is suitable only when used at the very beginning of a program. Unfortunately, this precludes you from using it in a chained-to program, since any variables being passed are destroyed. Using /stack: avoids this by letting you specify how much memory is to be set aside for the stack when you link the chained-to program. The /stack: option accepts a numeric argument, and can be used to specify the stack size selectively for each program module. For example, /stack:4096 specifies that a 4K block be set aside in DGROUP for use as a stack. Furthermore, you do not need to use the same value for each module. Since setting aside more stack memory than necessary impinges on available string space, you can override BASIC's default for only those modules that actually need it. Note that this switch is not needed or recommended if you have BASIC PDS, since that version includes the STACK statement for this purpose. STUB FILES (PDS and later) A stub file is an object module that contains an alternate version of a BASIC language statement. A stub file could also be an alternate library containing multiple object files. The primary purpose of a stub file is to let you replace one or more BASIC statements with an alternate version having reduced capability and hence smaller code. Some stub files completely remove a particular feature or language statement. Others offer increased functionality at the expense of additional code. Several stub files are included with BASIC PDS, to reduce the size of your programs. For example, NOCOM.OBJ removes the routines that handle serial communications, replacing them with code that prints the message "Feature stubbed out" in case you attempt to open a communications port. When BASIC compiles your program and sees a statement such as OPEN Some$ FOR OUTPUT AS #1, it has no way to know what the contents of Some$ will be when the program runs. That is, Some$ could hold a file name, a device name such as "CON" or "LPT1:", or a communications argument like "COM1:2400,N,8,1,RS,DS". Therefore, BASIC instructs LINK to include code to support all of those possibilities. It does this by placing all of the library routine names in the object file header. When the program runs, the code that handles OPEN examines Some$ and determines which routine to actually call. Within BASIC's runtime library are a number of individual object modules, each of which contains code to handle one or more BASIC statements. In chapter 1 you learned that how finely LINK can extract individual routines from BASIC's libraries depends on how the routines were combined in the original assembly language source files. In BASIC 7.1, using the SCREEN function in a program also causes LINK to add the routines that handle CSRLIN and POS(0), even if those statements are not used. This is because all three routines are in the same object module. The manner in which these routines are combined is called *granularity*, and a library's granularity dictates which routines can be replaced by a stub file. That is, a stub file that eliminated the code to support SCREEN would also remove CSRLIN and POS(0). Some of the stub files included with BASIC 7 PDS are NOGRAPH.OBJ, NOLPT.OBJ, and SMALLERR.OBJ. NOGRAPH.OBJ removes all support for graphics, NOLPT.OBJ eliminates the code needed to send data to a printer, and SMALLERR.OBJ contains a small subset of the many runtime error messages that a BASIC program normally contains. Other stub files selectively eliminate VGA or CGA graphics support, and another, OVLDOS21.OBJ, adds the extra code necessary for the BASIC overlay manager to operate with DOS 2.1. When linking with a stub file, it is essential that you use the /noe LINK switch, so LINK will not be confused by the presence of two routines with the same name. The general syntax for linking with a stub file is as follows: link /noe basfile stubfile; Of course, you could add other LINK options, such as /ex and /packc, and specify other object and library files that are needed as well. You can also create your own BASIC stub files, perhaps to produce a demo version of a program that has all features except the ability to save data to disk. In order for this to work, you must organize your subprograms and functions such that all of the routines that are to be stubbed out are in separate source files, or combined together in one file. In the example above, you would place the routines that save the data in a separate file. Then, simply create an empty subprogram that has the same name and the same number and type of parameters, and compile that separately. Finally, you would link the BASIC stub file with the rest of the program. Note that such a replacement file is not technically a stub, unless the BASIC routines being replaced have been compiled and placed into a library. But the idea is generally the same. QUICK LIBRARIES For many programmers, one of the most confusing aspects of Microsoft BASIC is creating and managing Quick Libraries. The concept is quite simple, however, and there are only a few rules you must follow. The primary purpose of a Quick Library is to let you access non-BASIC procedures from within the BASIC editor. For example, BASIC comes with a Quick Library that contains the Interrupt routine, to let you call DOS and BIOS system services. A Quick Library can contain routines written in any language, including BASIC. Although the BASIC editor provides a menu option to create a Quick Library, that will not be addressed here. Rather, I will show the steps necessary to invoke LINK manually from the DOS command line. There are several problems and limitations imposed by BASIC's automated menus, which can be overcome only by creating the library manually. One limitation is that the automated method adds all of the programs currently loaded into memory into the Quick Library, including the main program. Unfortunately, only subprograms and functions should be included. Code in the main module will never be executed, and its presence merely wastes the memory it occupies. Another, more serious problem is there's no way to specify a /seg parameter, which is needed when many routines are to be included in the library. [Actually, you can set a DOS environment variable that tells LINK to default to a given number of segments. But that too has problems when using VB/DOS, because the VB/DOS editor specifies a /seg: value manually, and incorrectly. Unfortunately, LINK honors the value passed to it by VB/DOS, rather than the value you assigned to the environment variable.] Quick Libraries are built from one or more object files using LINK with the /q switch, and once created may not be altered. Unlike the LIB.EXE library manager that lets you add and remove object files from an existing .LIB library, there is no way to modify a Quick Library. When LINK combines the various components of an executable file, it resolves the data and procedure addresses in each object module header. The header contains relocation information that shows the names of all external routines being called, as well as where in the object file the final address is to be placed. Since the address of an external routine is not known when the source file is compiled or assembled, the actual CALL instruction is left blank. This was described earlier in this chapter in the section *Overview of Compiling and Linking*. Resolving these data and procedure addresses is one of the jobs that LINK performs. Because the external names that had been in each object file are removed by LINK and replaced with numeric addresses, there is no way to reconstruct them later. Similarly, when LINK creates a Quick Library it resolves all incomplete addresses, and removes the information that shows where in the object module they were located. Thus, it is impossible to extract an object module from a Quick Library, or to modify it by adding or removing modules. Understand that the names of the procedures within the Quick Library are still present, so QuickBASIC can find them and know the addresses to call. But if a routine in a Quick Library in turn calls another routine in the library, the name of the called routine is lost. Creating a Quick Library Quick Libraries are created using the version of LINK that came with your compiler, and the general syntax is as follows: link /q obj1 [obj2] [library.lib] , , nul , support; The support library file shown above is included with BASIC, and its name will vary depending on your compiler version. The library that comes with QuickBASIC version 4.5 is named BQLB45.LIB; BASIC 7 instead includes QBXQLB.LIB for the same purpose. You must specify the appropriate support library name when creating a Quick Library. Notice that LINK also lets you include all of the routines in one or more conventional (.LIB) libraries. Simply list the library names where the object file names would go. The .LIB extension must be given, because .OBJ is the default extension that LINK assumes. You can also combine object files and multiple libraries in the same Quick Library like this: link /q obj1 obj2 lib1.lib lib2.lib , , nul , support; Although Quick Libraries are necessary for accessing non-BASIC subroutines, you can include compiled BASIC object files. In general, I recommend against doing that; however, there are some advantages. One advantage is that a compiled subprogram or function will usually require less memory, because comments are not included in the compiled code and long variable names are replaced with equivalent 2-byte addresses. Another advantage is that compiled code in a Quick Library can be loaded very quickly, thus avoiding the loading and parsing process needed when BASIC source code is loaded. But there are several disadvantages to storing BASIC procedures in a Quick Library. One problem is that you cannot trace into them to determine the cause of an error. Another is that all of the routines in a Quick Library must be loaded together. If the files are retained in their original BASIC source form, you can selectively load and unload them as necessary. The last disadvantage affects BASIC 7 [and VB/DOS] users only. The QBX [and VB/DOS] editors places certain subprogram and function procedures into expanded memory if any is available. Understand that all procedures are not placed there; only those whose BASIC source code size is between 1K and 16K. But Quick Libraries are always stored in conventional DOS memory. Therefore, more memory will be available to your programs if the procedures are still in source form, because they can be placed into EMS memory. Note that when compiling BASIC PDS programs for placement in a Quick Library, it is essential that you compile using the /fs (far strings) option. Near strings are not supported within the QBX editor, and failing to use /fs will cause your program to fail spectacularly. RESPONSE FILES A response file contains information that LINK requires, and it can completely or partially replace the commands that would normally be given from the DOS command line. The most common use for a LINK response file is to specify a large number of object files. If you are creating a Quick Library that contains dozens or even hundreds of separate object files, it is far easier to maintain the names in a file than to enter them each time manually. To tell LINK that it is to read its input from a response file enter an at sign (@) followed by the response file name, as shown below. link /q @quicklib.rsp Since the /q switch was already given, the response file need only contain the remaining information. A typical response is shown in the listing below. object1 + object2 + object3 + object4 + object5 qlbname nul support Even though this example lists only five object files, there could be as many as necessary. Each object file name except the last one is followed by a plus sign (+), so LINK will know that another object file name input line follows. The qlbname line indicates the output file name. If it is omitted and replaced with a blank line, the library will assume the name of the first object file but with a .QLB extension. In this case, the name would be OBJECT1.QLB. The nul entry could also be replaced with a blank line, in which case LINK would create a map file named OBJECT1.MAP. As shown in the earlier examples, the support library will actually be named BQLB45 or QBXQLB, depending on which version of BASIC you are using. LINK recognizes several variations on the structure of a response file. For example, several object names could be placed on each line, up to the 126-character line length limit imposed by DOS. That is, you could have a response file like this: object1 object2 object3 + object4 object5 object6 + ... I have found that placing only one name on each line makes it easier to maintain a large response file. That also lends itself to keeping the names in alphabetical order. You may also place the various option switches in a response file, by listing them on the first line with the object files: /ex /seg:250 object1 + object2 + ... Response files can be used for conventional linking, and not just for creating Quick Libraries. This is useful when you are developing a very large project comprised of many different modules. Regardless of what you are linking, however, understanding how response files are used is a valuable skill. LINKING WITH BATCH FILES Because so many options are needed to fully control the compiling and linking process, many programmers use a batch file to create their programs. The C.BAT batch file below compiles and links a single BASIC program module, and exploits DOS' replaceable batch parameter feature. bc /o /s /t %1; link /e /packc /far /seg:250 %1, , nul, mylib; Like many programs, a batch file can also accept command line arguments. The first argument is known within the batch file as %1, the second is %2, and so forth, up to the ninth parameter. Therefore, when this file is started using this command: c myprog the compiler is actually invoked with the command bc /o /s /t myprog; The second line becomes link /e /far /packc /seg:250 myprog, , nul, mylib; That is, every occurrence of the replaceable parameter %1 is replaced by the first (and in this case only) argument: myprog. I often create a separate batch file for each new project I begin, to avoid having to type even the file name. I generally use the name C.BAT because its purpose is obvious, and it requires typing only one letter! Once the project is complete, I rename the batch file to have the same first name as the main BASIC program. This lets me see exactly how the program was created if I have to come back to it again months later. An example of a batch file that compiles and links three BASIC source files is shown below. bc /o /s /t mainprog; bc /o /s /t module1; bc /o /s /t module2; link /e /packc /far mainprog module1 module2, , nul, mylib; Of course, you'd use the compiler and link switches that are appropriate to your particular project. You could also specify a LINK response file within a batch file. In the example above you would replace the last line with a command such as this: link @mainprog.rsp; LINKING WITH OVERLAYS (PDS and VB/DOS PRO EDITION ONLY) At one time or another, most programmers face the problem of having an executable program become too large to fit into memory when run. With QuickBASIC your only recourse is to divide the program into separate .EXE files, and use CHAIN to go back and forth between them. This method requires a lot of planning, and doesn't lend itself to structured programming methods. Each program is a stand-alone main module, rather than a subprogram or function. Worse, chaining often requires the same subroutine code to be duplicated in each program, since only one program can be loaded into memory at a time. If both PROGRAM1.EXE and PROGRAM2.EXE make calls to the same subprogram, that subprogram will have to be added to each program. Obviously, this wastes disk space. BASIC 6.0 included the BUILDRTM program to create custom runtime program files that combines common subroutine code with the BASIC runtime library. But that program is complicated to use and often buggy in operation. Therefore, one of the most useful features introduced with BASIC 7 is support for program overlays. An overlay is a module that contains one or more subprograms or functions that is loaded into memory only when needed. All overlaid modules are contained in a single .EXE file along with the main program, as opposed to the separate files needed when programs use CHAIN. The loading and unloading of modules is handled for you automatically by the overlay manager contained in the BASIC runtime library. Consider, as an example, a large accounting program comprised of three modules. The main module would consist of a menu that controls the remaining modules, and perhaps also contains some ancillary subprograms and functions. The second module would handle data entry, and the third would print all of the reports. In this case, the data entry and reporting modules are not both required at the same time; only the module currently selected from the menu is necessary. Therefore, you would link those modules as overlays, and let BASIC's overlay manager load and unload them automatically when they are called. The overall structure of an overlaid program is shown in Figure 5-4. ┌────────────────────────────┐ │ '**** MAINPROG.BAS │ │ CALL Menu(Choice) │ │ IF Choice = 1 THEN │ │ CALL EnterData │ │ ELSEIF Choice = 2 THEN │ │ CALL DoReports │ │ END IF │ ├────────────────────────────┤ │ SUB Menu(Choice) │ │ ... │ │ CALL GetChoice(Choice) │ │ ... │ │ END SUB │ ├────────────────────────────┤ │ SUB GetChoice(ChoiceNum) │ │ ... │ │ ... │ │ END SUB │ └────────────────────────────┘ ┌────────────────────────────┐ │ '*** ENTERDAT.BAS │ │ SUB EnterData │ │ ... │ │ CALL GetChoice(Choice) │ │ ... │ │ END SUB │ └────────────────────────────┘ ┌────────────────────────────┐ │ '*** REPORTS.BAS │ │ SUB DoReports │ │ PRINT "Which report? "; │ │ CALL GetChoice(Choice) │ │ ... │ │ ... │ │ END SUB │ └────────────────────────────┘ Figure 5-4: The structure of a program that uses overlays. Here, the main program is loaded into memory when the program is first run. Since the main program also contains the Menu and GetChoice subprograms, they too are initially loaded into memory. Understand that the main program is always present in memory, and only the overlaid modules are swapped in and out. Thus, EnterData and DoReports can both freely call the GetChoice subprogram which is always in memory, without incurring any delay to load it into memory from disk. If the host computer has expanded memory, BASIC will use that to hold the overlaid modules. Since EMS can be accessed much more quickly than a disk, this reduces the load time to virtually instantaneous. You should be aware, however, that BASIC PDS contains a bug in the EMS portion of its overlay manager. If EMS is present but less than 64K is available, your program will terminate with the error message "Insufficient EMS to load overlay." If no expanded memory is available, BASIC simply reads the overlaid modules from the original disk file each time they are called. It should also use the disk if it determines that there isn't enough EMS to handle the overlay requirements, but it doesn't. Therefore, it is up to your users to determine how much expanded memory is present, and disable the EMS driver in their PC if there isn't at least 64K. To specify that a module is to be overlaid, simply surround its name with parentheses when linking. Using the earlier example shown in Figure 5-4, you would link MAINPROG.OBJ with ENTERDAT.OBJ and REPORTS.OBJ as follows: link mainprog (enterdat) (reports); Of course, you may include any link switches that are needed, and also include any non-overlaid object files. Any object file names that are not surrounded by parentheses will be kept in memory at all times. Therefore, you should organize your programs such that subprograms and functions that are common to the entire application are always loaded. Otherwise, the program could become very slow if those procedures are swapped in and out of memory each time they are called. OTHER LINK DETAILS The BASIC PDS documentation lists no less than 143 different LINK error messages, and at one time or another you are bound to see at least some of those. LINK errors are divided into two general categories: warning errors and fatal errors. Warning errors can sometimes be ignored. For example, failing to use the /noe switch when linking with a stub file produces the message "Symbol multiply defined", because LINK encountered the same procedure name in the stub file and in the runtime library. In this case LINK simply uses the first procedure it encountered. In general, however, you should not run a program whose linking resulted in any error messages. Fatal errors are exactly that--an indication that LINK was unable to create the program successfully. Even if an .EXE file is produced, running it is almost certain to cause your PC to lock up. One example of a fatal error is "Unresolved external." This means that your program made a call to a procedure, but LINK wasn't able to find its name in the list of object and library files you gave it. Another fatal error is "Too many segments." You might think that LINK would be smart enough to finish reading the files, count the number of segment names it needs, and then restart itself again reserving enough memory. Unfortunately, it isn't. Regardless of the type of error messages you receive, it is impossible to read all of them if there are so many that they scroll off the screen. Although you can press Ctrl-P to tell DOS to echo the messages to your printer, there is an even better method. You can use the DOS redirection feature to send the message to a disk file. This lets you load the file into a text editor for later perusal. To send all of LINK's output to a file simply use the "greater than" symbol (>) specifying a file name as follows: link [/options] [object files]; > error.log Instead of displaying the messages on the screen, DOS intercepts and routes them to the ERROR.LOG file. It is important to understand that this is a DOS issue, and has nothing to do with LINK. Therefore, you can use this same general technique to redirect the output of most programs to a file. Note that using redirection causes *all* of the program's output to go to the file, not just the error messages. Therefore, nothing will appear to happen on the screen, since the copyright and sign-on notices are also redirected. Another LINK detail you should be aware of is that numeric arguments may be given in either decimal or hexadecimal form. Any LINK option that expects a number--for example, the /seg: switch--may be given as a Hexadecimal value by preceding the digits with 0x. That is, /seg:0x100 is equivalent to /seg:256. The use of 0x is a C notation convention, and the "x" character is used because it sounds like "hex". Finally, if you are using QuickBASIC 4.0 there is a nasty bug you should be aware of. All versions of QuickBASIC let you create an executable program from within the editing environment. And if a Quick Library is currently loaded, QB knows to link your program with a parallel .LIB library having the same name. But instead of specifying that library in the proper LINK field, QB 4.0 puts its name in the object file position. This causes LINK to add every routine in the library to your program, rather than only those routines that are actually called. There is no way to avoid this bug, and QB 4.0 users must compile and link manually from DOS. MAINTAINING LIBRARIES ===================== As you already know, multiple object files may be stored in a single library. A library has a .LIB extension, and LINK can extract from it only those object modules actually needed as it creates an executable file. All current versions of Microsoft compiled BASIC include the LIB.EXE program, which lets you manage a library file. With LIB.EXE you can add and remove objects, extract a copy of a single object without actually deleting it from the library, and create a cross-referenced list of all the procedures contained therein. It is important to understand that a .LIB library is very different from a Quick Library. A .LIB library is simply a collection of individual object files, with a header portion that tells which objects are present, and where in the library they are located. A Quick Library, on the other hand, contains the raw code and data only. The routines in a Quick Library do not contain any of the relocation and address information that was present in the original object module. The runtime libraries that Microsoft includes with BASIC are .LIB libraries, as are third-party support libraries you might purchase. You can also create your own libraries from both compiled BASIC code and assembly language subroutines. The primary purpose of using a library is to avoid having to list every object file needed manually. Another important use is to let LINK add only those routines actually necessary to your final .EXE program. Like BC and LINK, you can invoke LIB giving all of the necessary parameters on a single command line, or wait for it to prompt you for the information. LIB can also read file names and options from a response file, which avoids having to enter many object names manually. A LIB response file is similar--but not identical--to a LINK response file. Using LIB response files will be described later in this section. The general syntax of the LIB command line is shown below, with brackets indicating optional information. lib [/options] libname [commands] , [listfile] , [newlib] [;] After any optional switches, the first parameter is the name of the library being manipulated, and that is followed by one or more commands that tell LIB what you want to do. A list file can also be created, and it contains the names of every object file in the library along with the procedure names each object contains. The last argument indicates an optional new library; if present LIB will leave the original library intact, and copy it to a new one applying the changes you have asked for. There are three commands that can be used with LIB, and each is represented using a punctuation character. However, LIB lets you combine some of these commands, for a total of five separate actions. This is shown in Table 5-1. Command Action ======= ========================================= + Add an object module or entire library. - Remove an object module from the library. * Extract a copy of an object module. -+ Replace an object module with a new one. -* Extract and then remove an object module. Table 5-1: The LIB commands for managing libraries. To add the file NEWOBJ.OBJ to the existing library MYLIB.LIB you would use the plus sign (+) as follows: lib mylib +newobj; And to update the library using a newer version of an object already present in the library you would instead use this: lib mylib -+d:\newstuff\anyobj; As you can see, the combination operators use a sensible syntax. Here, you are instructing LIB to first remove ANYOBJ.OBJ from MYLIB.LIB, and then add a newer version in its place. A drive and directory are given just to show that it is possible, and how that would be specified. To extract a copy of an object file from a library, use the asterisk (*) command. Again, you can specify a directory in which the extracted file is to be placed, as follows: lib mylib *\objdir\thisobj; You should understand that LIB never actually modifies an existing library. Rather, it first renames the original library to have a .BAK extension, and then creates and modifies a new file using the original name. It is up to you to delete the backup copy once you are certain that the new library is correct. [But this backup is made only if you do not specify a new output library name--NEWLIB in the earlier syntax example.] If the named library does not exist, LIB asks if you want to create it. This gives you a chance to abort the process if you accidentally typed the wrong name. If you really do want to create a new library, simply answer Y (Yes) at the prompt. Of course, the only thing you can do to a non-existent library is add new objects to it with the plus (+) command. One important LIB feature is its ability to create a list file showing what routines are present in the library. This is particularly valuable if you are managing a library you did not create, such as a library purchased from a third-party vendor. Many vendors use the same name for the object file as the routine it contains when possible, but there are exceptions. For example, an object file name is limited to eight characters, even though procedure names can be as long as 40. If you want to know which object file contains the procedure ReadDirectories, you will need to create a list file. Also, one object file can hold multiple procedures, and it is not always obvious which procedure is in which file. Individual procedures cannot necessarily be extracted from a library--only entire object files. To create a library list file you will run LIB giving the name of the library, as well as the name of a list file to create. The example below creates a list file named MYLIST.LST for the library named MYLIB.LIB: lib mylib , mylist.lst; The list file that is created contains two cross-referenced tables; one shows each object name and the procedures it contains, and the other shows the procedure names and which object they are in. A typical list file is shown in the Figure 5-5, using the QB.LIB file that comes with QuickBASIC 4.5 as an example. ABSOLUTE..........absolute INT86OLD..........int86old INT86XOLD.........int86old INTERRUPT.........intrpt INTERRUPTX........intrpt absolute Offset: 00000010H Code and data size: cH ABSOLUTE intrpt Offset: 000000e0H Code and data size: 107H INTERRUPT INTERRUPTX int86old Offset: 000002a0H Code and data size: 11eH INT86OLD INT86XOLD Figure 5-5: The format of a LIB list file. In this list file, each object module contains only one procedure. The first section shows each procedure name in upper case, followed by the object name in lower case. The second section shows each object file name, its offset within the library and size in bytes, and the routine names within that object file. Just for fun, you should create a list file from one of the libraries that came with your compiler. Besides showing how a large listing is structured, you will also be able to see which statements are combined with others in the same object file. Thus, you can determine the granularity of these libraries. In many cases the names of the procedures are similar to the corresponding BASIC keywords. For example, if you create a list file for the BCOM45.LIB library that comes with QuickBASIC 4.5, you will see an object file named STRFCN.OBJ (string function) that contains the procedures B$FASC, B$FLEN, B$FMID, B$INS2, B$INS3, B$LCAS, B$LEFT, and several other string functions. Most of the library routines start with the characters B$, which ensures that the names will not conflict with procedure names you are using. (A dollar sign is illegal in a BASIC procedure name.) Other procedures (and data items) use an imbedded underscore (_) which is also illegal in BASIC. FASC stands for Function ASC, FLEN is for Function LEN, and so forth. INS2 and INS3 contain the code to handle BASIC's INSTR function, with the first being the two-argument version and the second the three-argument version. That is, using INSTR(Work$, Substring$) calls B$INS2, and INSTR(Start, Work$, Substring$) instead calls B$INS3. As you can see, most of the internal procedure names are sensible, albeit somewhat abbreviated. LIB OPTIONS Many LIB options are frankly not that useful to purely BASIC programming. However, I will list them here in the interest of completeness. Note that none of these option switches are available in versions of LIB prior to the one that comes with BASIC 7.0. /HELP As with the LINK switch of the same name, using /help (or /?) tells LIB to display its command syntax, and a list of all the available options. /I Using /i means that LIB should ignore capitalization when searching the library for procedure names. This is the default for LIB, and is not necessary unless you are manipulating an existing library that was created with /noi (see below). /NOE The /noe option has a similar meaning as its LINK counterpart, and should be used if LIB reports an Out of memory error. Creating an extended dictionary requires memory, and using /noe will avoid that. /NOI The /noi switch tells LIB not to ignore capitalization, and it should not be used with BASIC programs. /NOLOGO Like the LINK option, /nologo reduces screen clutter by eliminating the sign-on logo and copyright display. /PA The /pa: option lets you change the default library page size of 16 bytes. Larger values waste memory, because each object file will always occupy the next higher multiple number of bytes. For example, with a page size of 200 bytes, a 50 byte object file will require an entire 200-byte page. Since a library can hold no more than 65,536 pages, a larger page size is useful only when you need to create a library larger than 1 megabyte. The /pa: switch requires a colon, followed by an integer value between 16 and 32768. For example, using /pa:256 sets a page size of 256 bytes. USING RESPONSE FILES WITH LIB.EXE A LIB response file is similar to a LINK response file, in that it lets you specify a large number of operations by entering them on separate lines of a text file. The syntax is similar to a LINK response file, but it is not identical. Since the plus sign continuation character that LINK uses serves as a command character to LIB, an ampersand (&) is used instead. A typical LIB response file is shown below. + object1 & + \subdir\object2 & + c:\subdir2\object3 & + object4 ; As with LINK, you will use an at sign (@) to tell LIB to look in the file for its input, as opposed to reading the names from the command line: lib @filename.rsp USEFUL BC, LINK, AND LIB ENVIRONMENT PARAMETERS =============================================== Most programmers are familiar with the DOS environment as a way to establish PATH and PROMPT variables. The PATH environment variable tells DOS where to search for executable program files it doesn't find in the current directory. The PROMPT variable specifies a new prompt that DOS displays at the command line. For example, many people use the command SET PROMPT=$P$G to show the current drive and directory. However, the DOS environment can be used to hold other, more general information as well. The environment is simply an area of memory that DOS maintains to hold variables you have assigned. Some of these variables are used by DOS, such as the PATH and PROMPT settings. Other variables may be defined by you or your programs, to hold any type of information. For example, you could enter SET USERNAME=TAMI in the AUTOEXEC.BAT file, and a program could read that to know the name of the person who is using it. The contents of this variable (TAMI) could then be used as a file or directory name, or for any other purpose. LINK looks at the DOS environment to see if you have specified LINK= or LIB= or TMP= variables. The first is used to specify default option switches. For example, if you set LINK=/SEG:450 from the DOS command line or a batch file, you do not need to use that option each time LINK is run. Multiple options may be included in a single SET statement, by listing each in succession. The command SET LINK=/NOE/NOD/EX establishes those three options shown as the default. Additional separating spaces may also be included; however, that is unnecessary and wastes environment memory. Likewise, setting LIB=D:\LIBDIR\ tells LINK to look in the LIBDIR directory of drive D: for any libraries it cannot find it the current directory. In this case, LIB= acts as a sort of PATH command. Like PATH, the LIB= variable accepts multiple path names with or without drive letters, and each is separated by a semicolon. The command SET LIB=C:\LIBS\;D:\WORKDIR\ sets a library path to both C:\LIBS and D:\WORKDIR, and even more directories could be added if needed. To remove an environment variable simply assign it to a null value; in this case you would use SET LIB=. The TMP= variable also specifies a path that tells LINK where to write any temporary files. When a very large program or Quick Library is being created, it is possible for LINK to run out of memory. Rather than abort with an error message, LINK will open a temporary disk file and spool the excess data to that file. If no TMP= variable has been defined, that file is created in the current directory. However, if you have a RAM disk you can specify that as the TMP parameter, to speed up the linking process. For example, SET TMP=F:\ establishes the root directory of drive F as the temporary directory. The INCLUDE= variable is recognized by both BC and MASM (the Microsoft Macro Assembler program), to specify where they should look for Include files. In my own programming, I prefer to give an explicit directory name as part of the $INCLUDE metacommand. This avoids unpleasant surprises when an obsolete version of a file is accidentally included. But you may also store all $INCLUDE files in a single directory, and then set the INCLUDE variable to show where that directory is. Like LIB and PATH, the INCLUDE variable accepts one or more directory names separated by semicolons. SUMMARY ======= In this chapter you have learned about compiling and linking manually from the DOS command line, to avoid the limitations imposed by the automated menus in the BASIC editor. You have also learned how to create and maintain both Quick Libraries and conventional .LIB libraries. Besides accepting information you enter at the DOS command line, LINK and LIB can also process instructions and file names contained in a response file. All of the commands and option switches available with BC, LINK, and LIB were described in detail, along with a listing of the undocumented BC metacommands for controlling the format of a compiler list file. Library list files were also discussed, and a sample printout was given showing how LIB shows all the procedure and object names in a library cross-referenced alphabetically. The discussion about stub files explained what they are and how to use them, to reduce the size of your programs. Overlays were also covered, accompanied by some reasons you will find them useful along with specific linking instructions. Finally, I explained some of the details of the linking process. Information in each object file header tells LINK the names of external procedures being called, and where in the object file the incomplete addresses are located. Besides the segment and address fixups that LINK performs, DOS also makes some last-minute patches to your program as it is loaded into memory. In the next chapter I will cover file handling in similar detail, explaining how files are manipulated at a low level, and also offering numerous tips for achieving high performance and small program size.