home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Programming Tool Box
/
SIMS_2.iso
/
bp_6_93
/
bonus
/
winer
/
chap5.txt
< prev
next >
Wrap
Text File
|
1994-09-03
|
130KB
|
2,382 lines
CHAPTER 5
COMPILING AND LINKING
The final step in the creation of any program is compiling and linking, to
produce a stand-alone .EXE file. Although you can run a program in the
BASIC editing environment, it cannot be used by others unless they also
have their own copy of BASIC. In preceding chapters I explained the
fundamental role of the BASIC compiler, and how it translates BASIC source
statements to assembly language. However, that is only an intermediate
action. Before a final executable program can be created, the compiled
code in the object file must be joined to routines in the BASIC language
library. This process is called linking, and it is performed by the LINK
program that comes with BASIC.
In this chapter you will learn about the many options and features
available with the BASIC compiler and LINK. By thoroughly understanding
all of the capabilities these programs offer, you will be able to create
applications that are as small and fast as possible. Many programmers are
content to let the BASIC editor create the final program using the pulldown
menu selections. And indeed, it is possible to create a program without
invoking BC and LINK manually--many programmers never advance beyond
BASIC's "Make .EXE" menu. But only by understanding fully the many options
that are available will you achieve the highest performance possible from
your programs.
I'll begin with a brief summary of the compiling and linking process,
and explain how the two processes interact. I will then move on to more
advanced aspects of compiling and linking. BC and LINK are very complex
programs which possess many features and capabilities, and all of their
many options will be described throughout this chapter. You may also refer
back to Chapter 1, which describes compiling in more detail.
AN OVERVIEW OF COMPILING AND LINKING
====================================
When you run the BC.EXE compiler, it reads your BASIC source code and
translates some statements directly into the equivalent assembly language
commands. In particular, integer math and comparisons are converted
directly, as well as integer-controlled DO, WHILE, and FOR loops. Floating
point arithmetic and comparisons, and string operations and comparisons are
instead translated to calls to existing routines written by the programmers
at Microsoft. These routines are in the BCOM and BRUN libraries that come
with BASIC.
As BC compiles your program, it creates an object file (having an .OBJ
extension) that contains both the translated code as well as header
information that LINK needs to create a final executable program. Some
examples of the information in an object file header are the name of the
original source file, copyright notices, offsets within the file that
specify external procedures whose addresses are not known at compile time,
and code and data segment names. In truth, most of this header information
is of little or no relevance to the BASIC programmer; however, it is useful
to know that it exists. All Microsoft-compatible object files use the same
header structure, regardless of the original source language they were
written in.
The LINK program is responsible for combining the object code that BC
produces with the routines in the BASIC libraries. A library (any file
with a .LIB extension) is merely a collection of individual object files,
combined one after the other in an organized manner. A header portion of
the .LIB file holds the name of each object file and the procedure names
contained therein, as well as the offset within the library where each
object module is located. Therefore, LINK identifies which routines are
being accessed by the BASIC program, and searches the library file for the
procedures with those names. Once found, a copy of that portion of the
library is then appended to the .EXE file being created.
LINK can also join multiple object files compiled by BC to create a
single executable program, and it can produce a Quick Library comprised of
one or more object files. Quick Libraries are used only in the editing
environment, primarily to let BASIC access non-BASIC procedures. Because
the BASIC editor is really an interpreter and not a true compiler, Quick
Libraries were devised as a way to let you call compiled (or assembled)
subroutines during the development of a program.
When LINK is invoked it reads the header information in each object
file compiled by BC, and uses that to know which routines in the specified
library or libraries must be added to your program. Since every external
routine is listed by name, LINK simply examines the library header for the
same name. It is worth mentioning that BASIC places the name of the
default library in the object file, so you don't have to specify it when
linking. For example, when you compile a stand-alone program (with the /o)
switch) using BC version 4.5, it places the name BCOM45.LIB in the header.
BASIC is not responsible for determining where external routines are
located. If your program uses a PRINT statement, the compiler generates
the instruction CALL 0000:0000, and identifies where in the object file
that instruction is located. BASIC knows that the print routine will be
located in another segment, and so leaves room for both a segment and
address in the Call instruction. But it doesn't know where in the final
executable file the print routine will end up. The absolute address
depends on how many other modules will be linked with the current object
file, and the size of the main program.
In fact, LINK does not even know in which segment a given routine will
ultimately reside. While it can resolve all of the code and data addresses
among modules, the absolute segment in which the program will be loaded
depends on whether there are TSR programs in memory, the version of DOS
(and thus its size), and the number of buffers specified in the host PC's
CONFIG.SYS file, among other factors. Therefore, all .EXE files also have
a header portion to identify segment references. DOS actually modifies the
program, assigning the final segment values as it loads the program into
memory. Figure 5.1 shows how DOS, file buffers, and device drivers are
loaded in memory, before any executable programs.
┌─────────────────────┐
│ ROM BIOS routines │
├─────────────────────┤
│ Video memory │
╞═════════════════════╡ <-- top of DOS memory (640K boundary)
│ │
│ Far heap storage │
for dynamic arrays
│ │
├─────────────────────┤
│ String memory │
├─────────────────────┤
│ The stack │
├─────────────────────┤
│ Variable data │
├─────────────────────┤
│ Compiled BASIC code │
╞═════════════════════╡ <-- this address is changeable
│ TSR programs │
├─────────────────────┤
│ Device drivers │
├─────────────────────┤
│ File control blocks │
├─────────────────────┤
│ File buffers │
├─────────────────────┤
│ DOS program │
├─────────────────────┤ <-- address 0000:0600
│ BIOS work area │
├─────────────────────┤ <-- address 0000:0400
│ Interrupt vectors │
└─────────────────────┘ <-- bottom of memory
Figure 5-1: DOS and BASIC memory organization.
It is important to understand that library routines are added to your
program only once, regardless of how many times they are called. Even if
you use PRINT three hundred times in a program, only one instance of the
PRINT routine is included in the final .EXE file. LINK simply modifies
each use of PRINT to call the same memory address. Further, LINK is
generally smart enough to not add all of the routines in the library.
Rather, it just includes those that are actually called.
However, LINK can extract only entire object files from a library. If
a single object module contains, say, four routines, all of them will be
added, even if only one is called. For BASIC modules that you write, you
can control which procedures are in which object files, and thus how they
are combined. But you have no control over how the object modules provided
with BASIC were written. If the routines that handle POS(0), CSRLIN, and
SCREEN are contained in a single assembly language source file (and they
are), all of them are added to your program even if you use only one of
those BASIC statements.
Now that you understand what compiling and linking are all about, you
may wonder why it is necessary to know this, or why you would ever want to
compile manually from the DOS command line. The most important reason is
to control fully the many available compile and link options. For example,
when you let the BASIC editor compile for you, there is no way to override
BC's default size for the communications receive buffer. Likewise, the
QuickBASIC editor does not let you specify the /s (string) option that in
many cases will reduce the size of your programs.
LINK offers many powerful options as well, such as the ability to
combine code segments to achieve faster performance during procedure calls.
Another important LINK option lets you create an .EXE file that can be run
under CodeView. Again, these options are not selectable from within the
QuickBASIC environment [but PDS and VB/DOS Pro Edition let you select more
options than QuickBASIC], and they can be specified only by compiling and
linking manually. All of these options are established via command line
switches, and each will be discussed in turn momentarily.
Finally, BASIC PDS includes a number of *stub files* which reduce the
size of your programs, although at the expense of decreased functionality.
For example, if your program does not use the SCREEN statement to enable
graphics mode, a stub file is provided to eliminate graphics support for
the PRINT statement. BASIC PDS [and the VB/DOS Pro Edition] also support
program overlays, and to use those requires linking manually from DOS.
COMPILING
=========
To compile a program you run BC.EXE specifying the name of the BASIC
program source file. BC accepts several optional parameters, as well as
many optional command line switches. The general syntax for BC is as
follows, with brackets used to indicate optional information.
bc program [/options] [, object] [, listfile] [;]
In most cases you will simply give the name of the BASIC source file, any
option switches, and a terminating semicolon. A typical BC command is as
follows:
bc program /o;
Here, a BASIC source file named PROGRAM.BAS is being compiled, and the
output object file will be called PROGRAM.OBJ. The /o option indicates
that the program will be a stand-alone .EXE file that does not require the
BRUN library to be present at runtime. If the semicolon is omitted, the
compiler will prompt for each of the file name parameters it needs. For
example, entering bc program /o invokes the compiler, which then prompts
you for the output and listing file names. Pressing Enter in response to
any prompt tells BC to use the source file's first name. You may also
start BC with no source file name, and let it prompt for that as well.
In most cases the default file names are acceptable; however, it is
not uncommon to want the output file placed into a different directory.
This is done as follows:
bc program, \objdir\ /o;
[Note that if the trailing backslash were omitted from \objdir\ above, BC
would create an output file named OBJDIR.OBJ in the root directory. Of
course, that is not what is intended. Therefore, a trailing backslash is
added to tell BC to use the default name of PROGRAM.OBJ, and to place that
file in the directory named \OBJDIR.]
If you are letting BC prompt you for the file names, you would enter
the output path name at that prompt position. You may also include a drive
letter as part of the path, or a drive letter only to use the default
directory on the specified drive. The listing that follows shows a typical
BC session that uses prompting.
C>bc program /o
Microsoft (R) QuickBASIC Compiler Version 4.50
(C) Copyright Microsoft Corporation 1982-1988.
All rights reserved.
Simultaneously published in the U.S. and Canada.
Object Filename [PROGRAM.OBJ]: d:\objects\ <Enter>
Source Listing [NUL.LST]: <Enter>
43965 Bytes Free
43751 Bytes Available
0 Warning Error(s)
0 Severe Error(s)
C>
Although you can override the default file extensions, this is not common
and you shouldn't do that unless you have a good reason to. For example,
the command BC source.txt , output.out; will compile a BASIC source file
named SOURCE.TXT and create an object module named OUTPUT.OUT. Since there
are already standard default file extension conventions, I recommend
against using any others you devise.
The optional list file contains a source listing of the BASIC program
showing the addresses of each program statement, and uses a .LST extension
by default. There are a number of undocumented options you can specify to
control how the list file is formatted, and these are described later in
this chapter in the section *Compiler Metacommands*. A list file may also
include the compiler-generated assembly language instructions, and you
specify that with the /a option switch. All of the various command options
will be discussed in the section following.
Notice that the positioning of the file name delimiting commas must be
maintained when the object file name is omitted. If you plan to accept the
default file name but also want to specify a listing file, you must use two
commas like this:
bc source , , listfile;
The Bytes Available and Bytes Free messages indicate how much working
memory the compiler has at its disposal, and how much of it remained free
while compiling your program. BC must keep track of many different kind of
information as it processes your source code, and it uses its own internal
DGROUP memory for that. For example, every variable that you use must be
remembered, as well as its address.
When BASIC sees a statement such as X = 100, it must look in its
*symbol table* to see if it has already encountered that variable. If so,
it creates an assembly language instruction to store the value 100 at the
corresponding address. Otherwise, it adds the variable X to the table,
assigns a new address for it, and then adds code to assign the value 100 to
that address. When you use PRINT X later on, BASIC will again search its
table, find the address, and use that when it creates the code that calls
the PRINT routine.
Other data that BASIC must remember as it works includes the number
and type of arguments for each SUB or FUNCTION that is declared, line label
names and their corresponding addresses, and quoted string constants. As
you may recall, in Chapter 2 I explained that BC maintains a table of
string constants, and stores each in the final program only once. Even
when the same quoted string is used in different places in a program, BC
remembers that they are the same and stores only a single copy. Therefore,
an array is used by BC to store these strings while your program is being
compiled.
In most cases you can simply ignore the Bytes Available and Bytes Free
messages, since how much memory BASIC used or had available is of no
consequence. The only exception, of course, is when your program is so
large that BC needed more than was available. But again, you will receive
an error message when that occurs. However, if you notice that the Bytes
Free value is approaching zero, you should consider splitting your program
into separate modules.
The error message display indicates any errors that occurred during
compilation, and if so how many. This display is mostly a throw-back to
the earlier versions of the BASIC compiler, because they had no development
environment. These days, most people get their program working correctly
in the BASIC editor, before attempting to compile it. Of course, there
must still be a facility for reporting errors.
In most cases, any errors that BC reports will be severe errors.
These include a mismatched number of parentheses, using a reserved word as
a variable name (for example, PRINT = 12), and so forth. One example of a
warning error is referencing an array that has not been dimensioned. When
this happens, BASIC creates the array with a default 11 elements (0 through
10), and then reports that it did this as a warning.
One interesting quirk worth mentioning is that BASIC will not let you
compile a program named USER.BAS. If you enter BC USER, BC assumes that
you intend to enter the entire program manually, statement by statement!
This too must be a holdover from earlier versions of the compiler; however,
when USER.BAS is specified it will appear that the compiler has crashed,
because nothing happens and no prompt is displayed. In my testing with
BASIC 7.1, any statements I entered were also ignored, and no object file
was created.
COMPILER OPTIONS
All of the options available for use with the BASIC compiler are described
in this section in alphabetical order. Some options pertain only to BASIC
7 PDS, and these are noted in the accompanying discussion. Each option is
specified by listing it on the BC command line, along with a preceding
forward slash (/). Also, these options apply to the BC compiler only, and
not necessarily to the QB and QBX editing environments.
/A
The /a (assembly) switch tells BC to include the assembly language source
code it creates in the listing file. The format of the file was described
in detail in Chapter 4, so I won't belabor that here. Note, however, that
a file name must be given in the list file position of the BC command line.
Otherwise, a list file will not be written.
/Ah
Using /ah (array huge) tells BASIC that you plan to create dynamic arrays
that may exceed 64K in total data size. This option affects numeric, TYPE,
and fixed-length string arrays only, and not conventional string arrays.
Normally, BASIC calculates the element addresses for array references
directly, based on the segment and other information in the array
descriptor. This is the most direct method, and thus provides the fastest
performance and smallest code.
When /ah is used, all access to non-string dynamic arrays is instead
made through a called routine. This called routine calculates the segment
and address of a single array element, and because it must also manipulate
segment values, increases the size of your programs. Therefore, /ah should
be avoided unless you truly need the ability to create huge arrays. Even
if a particular array does not currently exceed the 64K segment limit,
BASIC has no way to know that when it compiles your program.
To minimize the size and speed penalty /ah imposes, it may be used
selectively on only some of the source modules in a program. If you have
one subprogram that needs to manipulate huge arrays but the rest of program
does not, you should create a separate file containing only that subprogram
and compile it using /ah. When the program is linked, only that module's
array accesses will be slower.
Note that the /ah switch is also needed if you plan to create huge
arrays when running programs in the BASIC editor. However, with the BASIC
editor, using /ah does not impinge on available memory or make the program
run slower. Rather, it merely tells BASIC not to display an error message
when an array is dimensioned to a size greater than 64K. [The BASIC editor
always uses the slower code that checks for illegal array elements anyway,
so it can report an error rather than lock up your computer.]
One limitation that /ah will not overcome is BASIC's limit of 32,767
elements in a single dimension. That is, the statement REDIM Array%(1 to
32768) will fail, regardless of whether /ah is used. There are two ways to
exceed this limit: one is to create a TYPE array in which each element is
comprised of two or more variables. The other is to create an array that
has more than one dimension. The brief program below shows how to access a
2-dimensional array as if it had only a single dimension.
DEFINT A-Z
'----- pick an arbitrary group size, and number of groups (in this
' case 100,000 elements)
GroupSize = 1000: NumGroups = 100
'----- dimension the array
REDIM Array(1 TO GroupSize, 1 TO NumGroups)
'----- pick an element number to assign (note use of a long integer)
Element& = 50000
'----- calculate the first and second subscripts
First = ((Element& - 1) MOD GroupSize) + 1
Second = (Element& - 1) \ GroupSize + 1
'----- assign the appropriate array element
Array(First, Second) = 123
'----- show how to derive the original element based on First and
' Second (CLNG is needed to prevent an Overflow error)
CalcEl& = First + (Second - 1) * CLNG(GroupSize)
/C
The /c (communications) option lets you specify the size of the receive
buffer when writing programs that open the COM port. The value specified
represents the total buffer size in bytes, and is shared when two ports are
open at once. For example, if two ports are open and the total buffer size
is 4096 bytes, then each port has 2048 bytes available for itself.
A receive buffer is needed when performing communications, and it
accumulates the incoming characters as they are received. Each time a
character is accepted by the serial port, it is placed into the receive
buffer automatically. When your program subsequently uses INPUT or INPUT$
or GET to read the data, it is actually reading the characters from the
buffer and not from the hardware port. Without this buffering, your
program would have to wait in a loop constantly looking for each character,
which would preclude it from doing anything else!
Communications data is received in a continuous stream, and each byte
must be processed before the next one arrives, otherwise the data will be
lost. The communications port hardware generates an interrupt as each
character is received, and the communications routines within BASIC act on
that interrupt. The byte is retrieved from the hardware port using an
assembly language IN instruction, which is equivalent to BASIC's INP
function. This allows the characters to accumulate in the background,
without any additional effort on your part.
As each byte is received it is placed into the buffer, and a pointer
is updated showing the current ending address within the buffer. As your
program reads those bytes, another pointer is updated to show the new
starting address within the buffer. This type of buffer is called a
*circular buffer*, because the starting and ending buffer addresses are
constantly changing. That is, the buffer's end point "wraps" around to the
beginning when it becomes full.
The receive buffer whose size is specified with /c is located in far
memory. However, BASIC also maintains a second buffer in near memory, and
its size is dictated by the optional LEN= argument used with the OPEN
statement. Because near memory can be accessed more quickly than far
memory, it is sensible for BASIC to copy a group of characters from the far
receive buffer to the near buffer all at once, rather than individually
each time you use GET or INPUT$.
When /c is not specified, the buffer size defaults to 512 bytes. This
means that up to 512 characters can be received with no intervention on
your part. If more than 512 bytes arrive and your program still hasn't
removed them using INPUT$ or GET, new characters that come later will be
lost. It is also possible to stipulate hardware handshaking when you open
the communications port. This means that the sender and receiver use
physical control wires to indicate when the buffer is full, and when it is
okay to resume transmitting.
In many programming situations, the 512 byte default will be more than
adequate. However, if many characters are being received at a high baud
rate (9600 or greater) and your program is unable to accept and process
those characters quickly enough, you should consider using a larger buffer.
Fortunately, the buffer is located in far memory, so increasing its size
will not impinge on available string and data stored in DGROUP.
/D
The /d (debug) option switch is intended solely to help you find problems
in a program while it is being developed. Because /d causes BC to generate
additional code and thus bloat your executable program, it should be used
only during development.
When /d is specified, four different types of tests are added to your
program. The first is a call to a routine that checks if Ctrl-Break has
been pressed. One call is added for every BASIC source statement, and each
adds five bytes of code to your final executable program. The second
addition is a one-byte assembly language INTO instruction following each
integer and long integer math operation, to detect overflow errors.
The third is a call to a routine that calculates array element
addresses, to ensure that the element number is in fact legal. Normally,
element addresses are computed directly without checking the upper and
lower bounds, unless you are using huge (greater than 64K) arrays. Without
/d, it is therefore possible to corrupt memory by assigning an element that
doesn't exist.
The final code addition implements GOSUB and RETURN statements using
a library routine, rather than calling and returning from the target line
directly. Normally, a GOSUB statement is translated into a three-byte
assembly language *near call* instruction, and a RETURN is implemented
using a one-byte *near return*. But when /d is used, the library routines
ensure that each RETURN did in fact result from a corresponding GOSUB, to
detect RETURN without GOSUB errors. This is accomplished by incrementing
an internal variable each time GOSUB is used, and decrementing it at each
RETURN. If that variable is decremented below 0 during a RETURN statement,
then BASIC knows that there was no corresponding GOSUB. These library
routines are added to your program only once by LINK, and comprise only a
few bytes of code. However, a separate five-byte call is generated for
each GOSUB and RETURN statement.
Many aspects of the /d option were described in detail in Chapters 1
and 4, and there is no need to repeat that information here. But it is
important to remember that /d always makes your programs larger and run
more slowly. Therefore, it should be avoided once a program is running
correctly.
/E
The /e (error) option is necessary for any program that uses ON ERROR or
RESUME with a line label or number. In most cases using /e adds little or
no extra code to your final .EXE program, unless ON ERROR and RESUME are
actually used, or unless you are using line numbers. For each line number,
four bytes are added to remember the number itself as well as its position
in the file [two bytes each]. As with /d, every GOSUB and RETURN statement
is implemented through a far call to a library routine, rather than by
calling the target line directly. Without this added protection it would
not be possible to trap "RETURN without GOSUB" errors correctly, or recover
from them in an ON ERROR handler.
Also see the /x option which is needed when RESUME is used alone, or
with a 0 or NEXT argument. The /x switch is closely related to /e, and is
described separately below.
/Fpa and /Fpi (BASIC PDS and later)
When Microsoft introduced their BASIC compiler version 6.0, they included
an alternate method for performing floating point math. This Floating
Point Alternate library (hence the /fpa) offered a meaningful speed
improvement over the IEEE standard, though at a cost of slightly reduced
accuracy. This optional math library has been continued with BASIC 7 PDS,
and is specified using the /fpa command switch.
By default, two parallel sets of floating point math routines are
added to every program. When the program runs, code in BASIC's runtime
startup module detects the presence of a math coprocessor chip, and selects
which set of math routines will be used. The coprocessor version is called
the Inline Library, and it merely serves as an interface to the 80x87 math
coprocessor that does the real work in its hardware. (Note that inline is
really a misnomer, because that term implies that the compiler generates
coprocessor instructions directly. It doesn't.) The second version is
called the Emulator Library, because it imitates the behavior of the
coprocessor using assembly language subroutines.
Although the ability to take advantage of a coprocessor automatically
is certainly beneficial, there are two problems with this dual approach:
code size and execution speed. The coprocessor version is much smaller
than the routines that perform the calculations manually, since it serves
only as an interface to the coprocessor chip itself. When a coprocessor is
in fact present, the entire emulator library is still loaded into memory.
And when a coprocessor is not installed in the host PC, the library code to
support it is still loaded. The real issue, however, is that each BASIC
math operation requires additional time to route execution to the
appropriate routines.
Since BC has no way to know if a coprocessor will be present when the
program eventually runs, it cannot know which routine names to call.
Therefore, BASIC uses a system of software interrupts that route execution
to one library or the other. That is, instead of using, say, CALL
MultSingle, it instead creates code such as INT 39h. The Interrupt 39h
vector is set when the program starts to point to the correct library
routine. Unfortunately, the extra level of indirection to first read the
interrupt address and then call that address impacts the program's speed.
Recall that Chapter 1 explained how the library routines in a BRUN-
style program modify the caller's code the first time they are invoked.
The compiler creates code that uses an interrupt to access the library
routines, and those routines actually rewrite that code to produce a direct
call. Although this code modification increases the time needed to call a
library routine initially, subsequent calls will be noticeably faster.
BASIC statements executed many times within a FOR or DO loop will show the
greatest improvement, but statements executed only once will be much slower
than usual.
In a similar fashion, the coprocessor routines that are in BASIC's
runtime library alter the caller's code, replacing the interrupt commands
with equivalent coprocessor instructions. Each floating point interrupt
that BC generates includes the necessary variable addresses and other
arguments within the caller's code. These arguments are in the same format
as a coprocessor instruction. The first time an interrupt is invoked, it
subtracts the "magic value" &H5C32 from the bytes that comprise the
interrupt instruction, thus converting the instruction into a coprocessor
command. This will be covered in Chapter 12 and I won't belabor it here.
Since the alternate floating point math routines do not use a
coprocessor even if one is present, the interrupt method is not necessary.
BC simply hardcodes the library subroutine names into the generated code,
and the program is linked with the alternate math library. Besides the
speed improvement achieved by avoiding the indirection of interrupts, the
alternate math library is also inherently faster than the emulator library
when a coprocessor is not present.
The /fpi switch tells BASIC to use its normal method of including both
the coprocessor and emulator math libraries in the program, and determining
which to use at runtime. (See the discussion of /fpa above.) Using /fpi
is actually redundant and unnecessary, because this is the default that is
used if no math option is specified.
/Fs (BASIC PDS only)
BASIC PDS offers an option to use far strings, and this is specified with
the /fs (far strings) switch. Without /fs, all conventional (not fixed-
length) string variables and string arrays are stored in the same 64K
DGROUP memory that holds numeric variables, DATA items, file buffers, and
static numeric and TYPE arrays. Using the /fs option tells BASIC to
instead store strings and file buffers in a separate segment in far memory.
Although a program using far strings can subsequently hold more data,
the capability comes at the expense of speed and code size. Obviously,
more code is required to access strings that are stored in a separate data
segment. Furthermore, the string descriptors are more complex than when
near strings are used, and the code that acts on those descriptors requires
more steps. Therefore, you should use /fs only when truly necessary, for
example when BASIC reports an Out of string space error.
Far versus near strings were discussed in depth in Chapter 2, and you
should refer to that chapter for additional information.
[One very unfortunate limitation of VB/DOS is that only far strings
are supported. The decision makers at Microsoft apparently decided it was
too much work to also write a near-strings version of the forms library.
So users of VB/DOS are stuck with the additional size and speed overhead of
far strings, even for small programs that would have been better served
with near strings.]
/G2 (BASIC PDS and later)
The /g2 option tells BASIC to create code that takes advantage of an 80286
or later CPU. Each new generation of Intel microprocessors has offered
additional instructions, as well as performance optimizations to the
internal microcode that interprets and executes the original instructions.
When an existing instruction is recoded and improved within the CPU, anyone
who owns a PC using the newer CPU will benefit from the performance
increase. For example, the original 8086/8088 had several instructions
that performed poorly. These include Push and Pop, and Mul and Div. When
Intel released the 80186, they rewrote the microcode that performs those
instructions, increasing their speed noticeably. The 80286 is an offshoot
of the 80186, and of course includes the same optimizations. The 80386 and
80486 offer even more improvements and additions to the original 8086
instruction set.
Besides the enhancements to existing instructions, newer CPU types
also include additional instructions not present in the original 8086. For
example, the 80286 offers the Enter and Leave commands, each of which can
replace a lengthy sequence of instructions on the earlier microprocessors.
Another useful enhancement offered in the 80286 is the ability to push
numbers directly onto the stack. Where the 8086 can use only registers as
arguments to Push, the instructions Push 1234 and Push Offset Variable are
legal with 80186 and later CPUs. Likewise, the 80386 offers several new
commands to directly perform long integer operations. For example, adding
two long integer values using the 8086 instruction set requires a number of
separate steps. The 80386 and later CPUs can do this using only one
instruction.
If you are absolutely certain that your program will be run only on
PCs with an 80286 or later microprocessor, the /g2 option can provide a
modest improvement in code size and performance. In particular, programs
that use /g2 can save one byte each time a variable address is passed to a
routine. When /g2 is not used, the command PRINT Work$ results in the code
shown below.
PRINT Work$
Mov AX,Offset Work$ 'this requires 3 bytes
Push AX 'this requires 1 byte
Call B$PESD 'a far call is 5 bytes
When /g2 is used, the address is pushed directly rather than first being
loaded into AX, as shown following.
PRINT Work$
Push Offset Work$ 'this requires 3 bytes
Call B$PESD 'this call is 5 bytes
With the rapid proliferation of 80386 and 80486 [and Pentium] computers,
Microsoft should certainly consider adding a /g3 switch. Taking advantage
of 80386 instructions could provide substantially more improvement over
80286 instructions than the 80286 provides beyond the 8086.
[In fact, Microsoft has added a /g3 switch to VB/DOS. Unfortunately,
it does little more than the /g2 switch. Most of a program's execution is
spent running code inside the Microsoft-supplied runtime libraries. But
those libraries contain only 8088 code! Using /g2 and /g3 affect only the
compiler-generated code, which has little impact on a program's overall
performance. Until Microsoft writes additional versions of their runtime
libraries using 80386 instructions (yeah, right), using /g2 or /g3 will
offer very little practical improvement.]
/Ix (BASIC PDS and later)
Another important addition to BASIC 7 PDS is its integral ISAM data file
handler. Microsoft's ISAM (Indexed Sequential Access Method) offers three
key features: The first is indexing, which lets you search a data file very
quickly. A simple sequential search reads each record from the disk in
order until the desired information is found. That is, to find the record
for customer David Eagle you would start at the beginning of the file, and
read each record until you found the one containing that name. An index
system, on the other hand, keeps as many names in memory as will fit, and
searches memory instead of the disk. This is many time faster than reading
the disk repeatedly. If Mr. Eagle is found in, say, the 1200th position,
the index manager can go directly to the corresponding record on disk and
return the data it contains.
The second ISAM feature is its ability to maintain the data file in
sorted order. In most situations, records are stored in a data file in the
order they were originally entered. For example, with a sales database,
each time a customer purchases a product a new record is added holding the
item and price for the item. When you subsequently step through the data
file, the entries will most likely be ordered by the date and time they
were entered. ISAM lets you access records in sorted order--for example,
alphabetically by the customer's last name--regardless of the order in
which the data was actually entered.
The last important ISAM feature is its ability to establish
relationships between files, based on the information they contain. Many
business applications require at least two data files: one to hold names
and addresses of each customer which rarely changes, and another to hold
the products or other items that are ordered periodically. It would be
impractical and wasteful to duplicate the name and address information
repeatedly in each product detail record. Instead, many database programs
store a unique customer number in each record. Then, it is possible to
determine which sales record goes with which customer based on the matching
numbers in both files. A program that uses this technique is called a
*relational database*.
To help the BASIC ISAM routines operate efficiently, you are required
to provide some information when compiling your program. Each of the /i
switches requires a letter indicating which option is being specified, and
a numeric value. For each field in the file that requires fast (indexed)
access, ISAM must reserve a block of memory for file buffers. This is the
purpose of the /ii: switch. Notice that /ii: is needed only if more than
30 indexes will be active at one time.
The /ie: option tells ISAM how much EMS memory to reserve for buffers,
and is specified in kilobytes. This allows other applications to use the
remaining EMS for their own use.
The /ib: option switch tells ISAM how many 2K (2048-byte) *page
buffers* to create in memory. In general, the more memory that is reserved
for buffers, the faster the ISAM program can work. Of course, each buffer
that you specify reduces the amount of memory that is available for other
uses in your program.
An entire chapter in the BASIC PDS manual is devoted to explaining the
ISAM file system, and there is little point in duplicating that information
here. Please refer to your BASIC documentation for more examples and
tutorial information on using ISAM. In particular, advice and formulas are
given that show how to calculate the numeric values these options require.
In Chapter 6 I will cover file handling and indexing techniques in
detail, with accompanying code examples showing how you can create your own
indexing methods.
/Lp And /Lr (BASIC PDS only)
BASIC 7 PDS includes an option to write programs that operate under OS/2,
as well as MS-DOS. Although OS/2 has yet to be accepted by most PC users,
many programmers agree that it offers a number of interesting and powerful
capabilities. By default, BC compiles a program for the operating system
that is currently running. If you are using DOS when the program is
compiled and linked, the resultant program will also be for use with DOS.
Similarly, if you are currently running OS/2, then the program will be
compiled and linked for use with that operating system.
The /lp (protected) switch lets you override the assumption that BC
makes, and tell it to create OS/2 instructions that will run in protected
mode. The /lr (real) option tells BC that even though you are currently
running under OS/2, the program will really be run with DOS. Again, these
switches are needed only when you need to compile for the operating system
that is not currently in use.
/Mbf
With the introduction of QuickBASIC 4.0, Microsoft standardized on the IEEE
format for floating point data storage. Earlier versions of QuickBASIC and
GW-BASIC used a faster, but non-standard proprietary numeric format that is
incompatible with other compilers and languages. In many cases, the
internal numeric format a compiler uses is of little consequence to the
programmer. After all, the whole point of a high-level language is to
shield the programmer from machine-specific details.
One important exception is when numeric data is stored in a disk file.
While it is certainly possible to store numbers as a string of ASCII
characters, this is not efficient. As I described in Chapter 2, converting
between binary and decimal formats is time consuming, and also wastes disk
space. Therefore, BASIC (and most other languages) write numeric data to a
file using its native fixed-length format. That is, integers are stored in
two bytes, and double-precision data in eight.
Although QuickBASIC 4 and later compilers use the IEEE format for
numeric data storage, earlier version of the compiler do not. This means
that values written to disk by programs compiled using earlier version of
QuickBASIC or even GW-BASIC cannot be read correctly by programs built
using the newer compilers. The /mbf option tells BASIC that it is to
convert to the original Microsoft Binary Format (hence the MBF) prior to
writing those values to disk. Likewise, floating point numbers read from
disk will be converted from MBF to IEEE before being stored in memory.
[Even when /mbf is used, all floating point numbers are still stored in
memory and manipulated using the IEEE method. It is only when numbers are
read from or written to disk that a conversion between MBF and IEEE format
is performed.]
Notice that current versions of Microsoft BASIC also include functions
to convert between the MBF and IEEE formats manually. For example, the
statement Value# = CVDMBF(Fielded$) converts the MBF-format number held in
Fielded$, and assigns an IEEE-format result to Value#. When /mbf is used,
however, you do not have to perform this conversion explicitly, and using
Value# = CVD(Fielded$) provides the identical result.
Also see the data format discussion in Chapter 2, that compares the
IEEE and MBF storage methods in detail.
/O
BASIC can create two fundamentally different types of .EXE programs: One
type is a stand-alone program that is completely self-contained. The other
type requires the presence of a special runtime .EXE library file when it
runs, which contains the routines that handle all of BASIC's commands. By
default, BASIC creates a program that requires the runtime .EXE library,
which produces smaller program files. However, the runtime library is also
needed, and is loaded along with the program into memory. The differences
between the BRUN and BCOM programs were described in detail in Chapter 1.
The /o switch tells BASIC to create a stand-alone program that does
not require the BRUN library to be present. Notice that when /o is used,
the CHAIN command is treated as if you had used RUN, and COMMON variables
may not be passed to a subsequently executed program.
/Ot (BASIC PDS and later)
Each time you invoke a BASIC subprogram, function, or DEF FN function, code
BC adds to the subprogram or function creates a stack frame that remembers
the caller's segment and address. Normally, Call and Return statements in
assembly language are handled directly by the microprocessor. DEF FN
functions and GOSUB statements are translated by the compiler into near
calls, which means that the target address is located in the same segment.
Invoking a formal function or subprogram is instead treated as a far call,
to support multiple segments and thus larger programs. Therefore, a RETURN
or EXIT DEF statement assumes that a single address word is on the stack,
where EXIT SUB or EXIT FUNCTION expect both a segment and address to be
present (two words).
A problem can arise if you invoke a GOSUB routine within a SUB or
FUNCTION procedure, and then attempt to exit the procedure from inside that
subroutine with EXIT SUB or EXIT FUNCTION. If a GOSUB is active, EXIT SUB
will incorrectly return to the segment and address that are currently on
the stack. Unfortunately, the address is that of the statement following
the GOSUB, and the "segment" is in fact the address portion of the original
caller's return location. This is shown in Figure 5-2.
┌── This is the original caller's segment and address to return to.
│
│
│
│ │ │
│
│ ├─────────────────────────┤
├─> │ Caller's return segment │
│ ├─────────────────────────┤
└─> │ Caller's return address │ <─┐
├─────────────────────────┤ │
│ GOSUB's return address │ <─┤
├─────────────────────────┤ │
│(next available location)│ │
├─────────────────────────┤ │
│
│ │ │
│
│
These addresses will incorrectly ─┘
be used as a segment and address.
Figure 5.2: The stack frame within a procedure while a GOSUB is pending.
To avoid this potential problem, the original caller's segment and address
are saved when a subprogram or function is first invoked. The current
stack pointer is also saved, so it can be restored to the correct value, no
matter how deeply nested GOSUB calls may become. Then when the procedure
is exited, another library routine is called that forces the originally
saved segment and address to be on the stack in the correct position.
Because this process reduces the speed of procedure calls and adds to
the resultant code size, the /ot option was introduced with BASIC 7 PDS.
Using /ot tells BASIC not to employ the larger and slower method, unless
you are in fact using a GOSUB statement within a procedure. Since this
optimization is disabled automatically anyway in that case, it is curious
that Microsoft requires a switch at all. That is, BC should simply
optimize procedure calls where it can, and use the older method only when
it has to.
/R
The /r switch tells BASIC to store multi-dimensioned arrays in row, rather
than column order. All arrays, regardless of their type, are stored in a
contiguous block of memory. Even though string data can be scattered in
different places, the table of descriptors that comprise a string array is
contiguous. When you dimension an array using two or more subscripts, each
group of rows and columns is placed immediately after the preceding one.
By default, BASIC stores multi-dimensioned arrays in column order, as shown
in Figure 5-3.
┌─────────────┐
│ Array(5, 2) │ ^
├─────────────┤ │
│ Array(4, 2) │ │
├─────────────┤ │
│ Array(3, 2) │ └── toward higher addresses
├─────────────┤
│ Array(2, 2) │
├─────────────┤
│ Array(1, 2) │
├─────────────┤
│ Array(5, 1) │
├─────────────┤
│ Array(4, 1) │
├─────────────┤
│ Array(3, 1) │
├─────────────┤
│ Array(2, 1) │
├─────────────┤
│ Array(1, 1) │
└─────────────┘
Figure 5.3: How BASIC stores a 2-dimensional array dimensioned created
using DIM Array(1 TO 5, 1 TO 2).
As you can see, each of the elements in the first subscript are stored in
successive memory locations, followed each of the elements in the second
subscript. In some situations it may be necessary to maintain arrays in
row order, for example when interfacing with another language that expects
array data to be organized that way [notably FORTRAN]. When an array is
stored in row order, the elements are arranged such that Array(1, 1) is
followed by Array(1, 2), which is then followed by Array(2, 1), Array(2,
2), Array(3, 1), and so forth.
Although many of the BC option switches described here are also
available for use with the QB editing environment, /r is not one of them.
/S
The /s switch has been included with BASIC since the first BASCOM 1.0
compiler, and it remains perhaps the least understood of all the BC
options. Using /s affects your programs in two ways. The first is
partially described in the BASIC manuals, which is to tell BC not to
combine like string constants as it compiles your program. As you learned
in Chapter 2, BASIC makes available as much string memory as possible in
your programs, by consolidating identical constant string data. For
example, if you have the statement PRINT "Insert disk in drive A" seven
times in your program, the message is stored only once, and used for each
instance of PRINT.
In order to combine like data the BC compiler examines each string as
it is encountered, and then searches its own memory to see if that string
is already present. Having to store all of the strings your program uses
just to check for duplicates impinges on BC's own working memory. At some
point it will run out of memory, since it also has to remember variable and
procedure names, line labels and their corresponding addresses, and so on.
When this happens, BC has no recourse but to give up and display an "Out of
memory" error message.
The /s switch is intended to overcome this problem, because it tells
the compiler not to store your program's string constants. Instead of
retaining the strings in memory for comparison, each is simply added to the
object file as it is encountered. However, strings four characters long or
shorter are always combined, since short strings are very common and doing
that does not require much of BC's memory.
The second [undocumented] thing /s does is to add two short (eight
bytes each) assembly language subroutines to the very beginning of your
program. Two of the most common string operations are assignments and
concatenations, which are handled by routines in the runtime library.
Normally, a call to either of these routines generates thirteen bytes of
code, including the statements that pass the appropriate string addresses.
The subroutines that /s adds are accessed using a near rather than a
far call, and they receive the string addresses in CPU registers rather
than through the stack. Therefore, they can be called using between three
and nine bytes, depending on whether the necessary addresses are already in
the correct registers at the time. The inevitable trade-off, however, is
that calling one subroutine that in turn calls another reduces the speed of
your programs slightly.
In many cases--especially when there are few or no duplicated string
constants--using /s will reduce the size of your programs. This is
contrary to the Microsoft documentation which implies that /s will make
your programs larger because the duplicate strings are not combined. I
would like to see Microsoft include this second feature of /s as a separate
option, perhaps using /ss (string subroutine) as a designator.
/T
The /t (terse) switch tells BC not to display its copyright notice or any
warning (non-fatal) error messages. This option was not documented until
BASIC PDS, even though it has been available since at least QuickBASIC 4.0.
The only practical use I can see for /t is to reduce screen clutter, which
is probably why QB and QBX use it when they shell to DOS to create an .EXE
program.
/V and /W
Any programs that use event handling such as ON KEY, ON COM, ON PLAY, or
the like [but not ON GOTO or ON GOSUB] require that you compile using
either the /v or /w option switches. These options do similar things,
adding extra code to call a central handler that determines if action is
needed to process an event. However, the /v switch checks for events at
every program statement while /w checks only at numbered or labeled lines.
In Chapter 1 I described how event handling works in BASIC, using
polling rather than true interrupt handling. There you saw how a five-byte
call is required each time BASIC needs to see if an event has occurred.
Because of this added overhead, many programmers prefer to avoid BASIC's
event trapping statements in favor of manually polling when needed.
However, it is important to point out that by using line numbers and labels
sparingly in conjunction with /w, you can reduce the amount of extra code
BASIC creates thus controlling where such checking is performed.
/X
Like the /e switch, /x is used with ON ERROR and RESUME; however, /x
increases substantially the size of your final .EXE program file. When
RESUME, RESUME 0, or RESUME NEXT are used, BASIC needs a way to find where
execution is to resume in your program. Unfortunately, this is not a
simple task. Since a single BASIC source statement can create a long
series of assembly language commands, there is no direct correlation
between the two. When an error occurs and you use RESUME with no argument
telling BASIC to execute the same statement again, it can't know directly
how many bytes earlier that statement begins.
Therefore, when /x is specified, a numbered line marker is added in
the object code to identify the start of every BASIC source statement.
These markers comprise a linked list of statement addresses, and the RESUME
statement walks through this list looking for the address that most closely
precedes the offending BASIC statement. Because of the overhead to store
these addresses--four bytes for each BASIC source statement--many
professional programmers avoid using /x unless absolutely necessary.
However, the table of addresses is stored within the code segment, and does
not take away from DGROUP memory.
/Z (BASIC PDS and later)
The /z switch is meant to be used in conjunction with the Microsoft editor.
This editor is included with BASIC PDS, and allows editing programs that
are too large to be contained within the QB and QBX editing environments.
When a program is compiled with /z, BASIC includes line number information
in the object file. The Microsoft editor can then read these numbers after
an unsuccessful compile, to help you identify which lines were in error.
Because the addition of these line number identifiers increases a program's
size, /z should be used only for debugging and not in a final production.
In general, the Microsoft editor has not been widely accepted by BASIC
programmers, primarily because it is large, slow, and complicated to use.
Microsoft also includes a newer editing environment called the Programmer's
Workbench with BASIC PDS; however, that too is generally shunned by serious
developers for the same reasons.
/Zd
Like /z, the /zd switch tells BC to include line number information in the
object file it creates. Unlike /zi which works with CodeView (see the /zi
switch below), /zd is intended for use with the earlier SYMDEB debugger
included with MASM 4.0. It is extremely unlikely that you will ever need
to use /zd in your programming.
/Zi
The /zi option is used when you will execute your program in the Microsoft
CodeView debugger. CodeView was described in Chapter 4, and there is no
reason to repeat that information here. Like /z and /zd, /zi tells BC to
include additional information about your program in the object file.
Besides indicating which assembler statements correspond to which BASIC
source lines, /zi also adds variable and procedure names and addresses to
the file. This allows CodeView to display meaningful names as you step
through the assembly language compiled code, instead of addresses only.
In order to create a CodeView-compatible program, you must also link
with the /co LINK option. All of the options that LINK supports are listed
elsewhere in this chapter, along with a complete explanation of what each
does.
Note that CodeView cannot process a BASIC source file that has been
saved in the Fast Load format. This type of file is created by default in
QuickBASIC, when you save a newly created program. Therefore, you must be
sure to select the ASCII option button manually from the Save File dialog
box. In fact, there are so many bugs in the Fast Load method that you
should never use it. Problems range from QuickBASIC hanging during the
loading process to completely destroying your source file!
If a program that has been saved as ASCII is accidentally damaged, it
is at least possible to reconstruct it or salvage most of it using a DOS
tool such as the Norton Utilities. But a Fast Load file is compressed and
encrypted; if even a single byte is corrupted, QB will refuse to load it.
Since a Fast Load file doesn't really load that much faster than a plain
ASCII file anyway, there is no compelling reason to use it.
[Rather than fix the Fast Load bug, which Microsoft claims they cannot
reproduce, beginning with PDS version 7 BASIC now defaults to storing
programs as plain ASCII files.]
COMPILER METACOMMANDS
There are a number of compiler metacommands that you can use to control how
your program is formatted in the listing file that BC optionally creates.
Although these list file formatting options have been available since the
original IBM BASCOM 1.0 compiler [which Microsoft wrote], they are not
documented in the current versions. As with '$INCLUDE and '$DYNAMIC and
the other documented metacommands, each list formatting option is preceded
by a REM or apostrophe, and a dollar sign. The requirement to imbed
metacommands within remarks was originally to let programs run under the
GW-BASIC interpreter without error.
Each of the available options is listed below, along with an
explanation and range of acceptable values. Many options require a numeric
parameter as well; in those cases the number is preceded by a colon. For
example, a line width of 132 columns is specified using '$LINESIZE: 132.
Other options such as '$PAGE do not require or accept parameters. Notice
that variables may not be used for metacommand parameters, and you must use
numbers. CONST values are also not allowed.
Understand that the list file that BASIC creates is of dubious value,
except when debugging a program to determine the address at which a runtime
error occurred. While a list file could be considered as part of the
documentation for a finished program, it conveys no useful information.
These formatting options are given here in the interest of completeness,
and because they are not documented anywhere else. [In order to use any of
these list options you must specify a list file name when compiling.]
'$LINESIZE
The '$LINESIZE option lets you control the width of the list file, to
prevent or force line wrapping at a given column. The default list width
is 80 columns, and any text that would have extended beyond that is instead
continued on the next line. Many printers offer a 132-column mode, which
you can take advantage of by using '$LINESIZE: 132. [Of course, it's up to
you to send the correct codes to your printer before printing such a wide
listing.] Note that the minimum legal width is 40, and the maximum is 255.
'$LIST
The '$LIST metacommand accepts either a minus (-) or plus (+) argument, to
indicate that the listing should be turned off and on respectively. That
is, using '$LIST - suspends the listing at that point in the program, and
'$LIST + turns it back on. This option is useful to reduce the size of the
list file and to save paper when a listing is not needed for the entire
program.
'$PAGE
To afford control over the list file format, the '$PAGE metacommand forces
subsequent printing to begin on the next page. Typically '$PAGE would be
used prior to the start of a new section of code; for example, just before
each new SUB or FUNCTION procedure. This tells BC to begin the procedure
listing on a new page, to avoid starting it near the bottom of a page.
'PAGEIF
'$PAGEIF is related to '$PAGE, except it lets you specify that a new page
is to be started only if a certain minimum number of lines remain on the
current page. For example, '$PAGEIF: 6 tells BC to advance to the next
page only if there are six or less printable lines remaining.
'$PAGESIZE
You can specify the length of each page with the '$PAGESIZE metacommand, to
override the 66-line default. This would be useful with laser printers, if
you are using a small font that supports more than that many lines on each
page. Notice that a 6-line bottom margin is added automatically, so
specifying a page size of 66 results in only 60 actual lines of text on
each page. The largest value that can be used with '$PAGESIZE is 255, and
the smallest is 15. To set the page length to 100 lines you would use
'$PAGESIZE: 100. There is no way to disable the page numbering altogether,
and using values outside this range result in a warning error message.
'$OCODE
Using '$OCODE (object code) allows you to turn the assembly language source
listing on or off, using "+" or "-" arguments. Normally, the /a switch is
needed to tell BC to include the assembly language code in the list file.
But you can optionally begin a listing at any place in the program with
'$OCODE +, and then turn it off again using '$OCODE -.
'$SKIP
Like '$PAGE and '$PAGEIF, the '$SKIP option lets you control the appearance
of the source listing. '$SKIP accepts a colon and a numeric argument that
tells BC to print that many blank lines in the list file or skip to the end
of the page, whichever comes first.
'$TITLE and '$SUBTITLE
By default, each page of the list file has a header that shows the current
page number, and date and time of compilation. The '$TITLE and '$SUBTITLE
metacommands let you also specify one or two additional strings, which are
listed at the start of each page. Using '$TITLE: 'My program' tells BASIC
to print the text between the single quotes on the first line of each page.
If a subtitle is also specified, it will be printed on the second line.
Note that the title will be printed on the first page of the list file only
if the '$TITLE metacommand is the very first line in the BASIC source file.
LINKING
=======
Once a program has been compiled to an object file, it must be linked with
the routines in the BASIC library before it can be run. LINK combines one
or more object files with routines in a library, and produces an executable
program file having an .EXE extension. LINK is also used to create Quick
Libraries for use in the QB editing environment, and that is discussed
later in this chapter.
LINK can combine multiple BASIC object files, as well as object files
created with other Microsoft-compatible languages. In the section that
follows you will learn how the LINK command line is structured, what each
parameter is for, and how the many available options may be used. Using
the various LINK options can reduce the size of your programs, and help
them run faster as well.
I should mention here it is imperative that you use the correct
version of LINK. DOS comes with an old version of LINK.EXE that is not
suitable for use with QuickBASIC or BASIC PDS. Therefore, you should
always use the LINK.EXE program that came with your compiler. I also
suggest that you remove or rename the copy of LINK that came with DOS if it
is still on your hard disk. More than once I have seen programmers receive
inexplicable LINK error messages because their PATH setting included the
\DOS directory. In particular, many of the switches that current versions
of LINK support cause an "Unrecognized option" message from older versions.
If the correct version of LINK is not in the current directory, then DOS
will use its PATH variable to see where else to look, possibly running an
older version.
The LINK command line is structured as follows, using brackets to
indicate optional information. The example below is intended to be entered
all on one line.
link [/options] objfile [objfile] [libfile.lib], [exefile], [mapfile],
[libfile] [libfile] [;]
As with the BC compiler, you may either enter all of the information on a
single command, let LINK prompt you for the file names, or use a
combination of the two. That is, you could enter LINK [filename] and let
LINK prompt you for the remaining information. Default choices are
displayed by LINK, and these are used if Enter alone is pressed. Typing a
semicolon on a prompt line by itself or after a file name tells LINK to
assume the default responses for the remaining fields. LINK also lets you
use a *response file* to hold the file names and options. When there are
dozens or even hundreds of files being specified, this is the only
practical method. Response files are described later in this section.
Also like BC, the separating commas are required as place holders when
successive fields are omitted. For example, the command:
link program , , mapfile;
links PROGRAM.OBJ to produce PROGRAM.EXE, and creates a map file with the
name MAPFILE.MAP. If the second comma had not been included, the output
file would be named MAPFILE.EXE and a map file would not be written at all.
The first LINK argument is one or more optional command switches,
which let you control some of the ways in which link works. For example,
the /co switch tells LINK to add line number and other information needed
when debugging the resultant EXE program with CodeView. Another option,
/ex, tells LINK to reduce the size of the program using a primitive form of
data compression. Each LINK option will be discussed in the section that
follows, and we won't belabor them here.
The second argument is the name of the main program object module,
which contains the code that will be executed when the program is run from
the DOS command line. Many programs use only a single object file;
however, in a multi-module program you must list the main module first.
That is then followed by the other modules that contain additional
subprograms and functions. Of course, you can precede any file name with a
drive letter and/or directory name as necessary.
You may also specify that all of the object modules in an entire
library be included in the executable program by entering the library name
where the object name would be given. Since LINK assumes an .OBJ file
extension, you must explicitly include the .LIB extension when linking an
entire library. For example, the command
link mainprog subs.lib;
creates a program named MAINPROG.EXE which is comprised of the code in
MAINPROG.OBJ and all of the routines in SUBS.LIB. Normally, a library is
specified at the end of the LINK command line. However, in that case only
the routines that are actually called will be added to the program.
Placing a library name in the object name field tells LINK to add all of
the routines it contains, regardless of whether they are actually needed.
Normally you do not want LINK to include unused routines, but that is often
needed when creating Quick Libraries which will be discussed in a moment.
Notice that when more than one object file is given, the first listed
is the one that is run initially. Its name is also used for the executable
file name if an output file name is not otherwise given. Like the BC
compiler, LINK assumes that you are using certain file naming conventions
but lets you override those assumptions with explicit extensions. I
recommend that you use the standard extensions, and avoid any unnecessary
heartache and confusion. In particular, using non-standard names is a poor
practice when more than one programmer is working on a project. Also
notice that either spaces or plus signs (+) may be used to separate each
object and library file name. Which you use is a matter of personal
preference.
The third LINK field is the optional executable output file name. If
omitted, the program will use the base name of the first object file
listed. Otherwise, the specified name will be used, and given an .EXE
extension. Again, you can override the .EXE extension, but this is not
recommended.
Following the output file name field is the map file entry. A map
file contains information about the executable program, such as segment
names and sizes, the size of the stack, and so forth. The /map option,
which is described later, tells LINK to include additional information in
the map file. In general, a map file is not useful in high-level language
programming.
One interesting LINK quirk is that it will create a map file if empty
commas are used, but not if a semicolon is used prior to that field. You
can specify the reserved DOS device name nul to avoid creating a map file.
For example, the command
link program, , nul, library;
links PROGRAM.OBJ to create PROGRAM.EXE, but not does not create the file
PROGRAM.MAP. I use a similar line in the batch files I use for compiling
and linking, to avoid cluttering my hard disk with these useless files.
The last field specifies one or more libraries that hold additional
routines needed for the program. In purely BASIC programming you do not
need to specify a library name, because the compiler specifies a default
library in the object file header. If you are linking with assembly or
other language subroutines that are in a library, you would list the
library names here. You can list any number of library names, and LINK
will search each of them in turn looking for any routines it does not find
in the object files.
The version of LINK that comes with BASIC 7 also accepts a definitions
file as an optional last argument. But that is used only for OS/2 and
Windows programming, and is not otherwise needed with BASIC.
LINK OPTIONS
All of the available LINK options that are useful with BASIC running under
DOS are shown following in alphabetical order. As with the switches
supported by BC, each is specified on the LINK command line by preceding it
forward slash (/). Many of the options may be abbreviated by entering just
the first few letters of their name. For example, what I refer to as the
/co option is actually named /codeview; however, the first two letters are
sufficient for LINK to know what you mean.
Each option is described using only enough letters to understand the
meaning of its name. You can see the full name for those options in the
section headers below, or run LINK with the /help switch. Any switch may
be specified using only as many characters as needed to distinguish it from
other options. That is, /e is sufficient to indicate /exepack because it
is the only one that starts with that letter. But you must use at least
the first three characters of the /nologo switch, since /no could mean
either /nologo or /nodefaultlibrary. The details section for each option
shows the minimum letters that are actually needed.
/BATCH
Using /ba tells LINK that you are running it from a batch file, and that it
is not to pause and prompt for library names it is unable to find. When
/ba is used and external routines are not found, a warning message is
issued rather than the usual prompt. The /ba option is not generally very
useful--even if you are linking with a batch file--since it offers no
chance to fix an incorrect file or directory name.
One interesting LINK quirk worth noting is when it is unable to find a
library you must include a trailing backslash (\) after the path name when
reentering it manually. If LINK displays the prompt "Enter new file spec:"
and you type \pathname, you are telling LINK to use the library named
PATHNAME.LIB and look for it in the root directory. What is really needed
is to enter \pathname\, which tells it to look in that directory for the
library. Furthermore, if you initially enter the directory incorrectly,
you must then specify both the directory and library name. If you are not
sure of the default library name it is often easier to simply press Ctrl-C
and start again.
/CODEVIEW
The /co switch is necessary when preparing a program for debugging with
CodeView. Because of the extra information that LINK adds to the resultant
executable file, /co should be used only for debugging purposes. However,
the added data is stored at the end of the file, and is not actually loaded
into memory if the program is run from the DOS command line. The program
will therefore have the same amount of memory available to it as if /co had
not been used.
/EXEPACK
When /e is used, LINK compresses repeated character strings to reduce the
executable file size. Because variables and static arrays are initialized
to zero by the compiler, they are normally stored in the file as a group of
CHR$(0) zero bytes. The /e switch tells LINK to replace these groups of
zero bytes with a group count. Then when the program is run, the first
code that actually executes is the unpacking code that LINK adds to your
program. This is not unlike the various self-extracting archive utilities
that are available commercially and as shareware.
Notice that the compression algorithm LINK employs is not particularly
sophisticated. For example, SLR System's OptLink is an alternate linker
that reduces a program to a much smaller file size than Microsoft's LINK.
PKWare and SEA Associates are two other third-party companies that produce
utilities to create smaller executable files that unpack and run themselves
automatically.
/FARCALLTRANSLATE
By default, all calls from BASIC to its runtime library routines are far
calls, which means that both a segment and address are needed to specify
the location of the routine being accessed. Assembly language and C
routines meant to be used with BASIC are also designed as far calls, as are
BASIC subprograms and functions. This affords the most flexibility, and
also lets you create programs larger than could fit into a single 64K
segment.
Within the BASIC runtime library there are both near and far calls to
other library routines. Which is used depends on the routines involved,
and how the various segments were named by the programmers at Microsoft.
Because a far call is a five-byte instruction compared to a near call which
is only three, a near call requires less code and can execute more quickly.
In many cases, separate code segments that are less than 64K in size can be
combined by LINK to form a single segment. The routines in those segments
could then be accessed using near calls. However, BASIC always generates
far calls as it compiles your programs.
The /f option tells LINK to replace the far calls it encounters with
near calls, if the target address is indeed close enough to be accessed
with a near call. The improvement /f affords is further increased by also
using the /packcode switch (see below). Although the far call is replaced
with a near call, LINK can't actually reduce the size of the original
instruction. Instead it inserts a Nop (no operation) assembly language
command where part of the far call had been. But since a near call does
not require segment relocation information in the .EXE file header, the
file size may be reduced slightly. See the text that accompanies Figure 5-
1 earlier in this chapter for an explanation of DOS' loading and relocation
process.
There is one condition under which the /f option can cause your
program to fail. The machine code for a far call is a byte with the value
of &H9A, which is what LINK searches for as it converts the far calls to
near ones. Most high-level languages, store all data in a separate
segment, which is ignored by LINK when servicing /f. BASIC, however,
stores line label addresses in the program's code segment when ON GOTO and
the other ON commands are used. If one of those addresses happens to be
&H9A, then LINK may incorrectly change it. In my personal experience, I
have never seen this happen. I recommend that you try /f in conjunction
with /packc, and then test your program thoroughly. You could also examine
any ON statements with CodeView if you are using them, to determine if an
address happens to contain the byte &H9A.
/HELP
Starting LINK with the /he option tells it to display a list of all the
command options it recognizes. This is useful both as a reminder, and to
see what new features may have been added when upgrading to a newer
compiler. In many cases, new compilers also include a new version of LINK.
/INFO
The /inf switch tells LINK to display a log of its activity on the screen
as it processes your file. The name of each object file being linked is
displayed, as are the routines being read from the libraries. It is
extremely unlikely that you will find /inf very informative.
/LINENUM
If you have compiled with the /zd switch to create SYMDEB information, you
will also need to specify the /li LINK switch. This tells LINK to read the
line number information in the object file, and include it in the resultant
executable program. SYMDEB is an awkward predecessor to CodeView that is
also hard to use, and you are not likely to find /li useful.
/MAP
If you give a map file name when linking, LINK creates a file showing the
names of every segment in your program. The /m switch tells LINK to also
include all of the public symbol names. A public symbol is any procedure
or data in the object file whose address must be determined by LINK. This
information is not particularly useful in purely BASIC programming, but it
is occasionally helpful when writing subroutines in assembly language.
Segment naming and grouping will be discussed in Chapter 13.
/NODEFAULTLIB
When BC compiles your program, it places the default runtime library name
into the created object file's header. This way you can simply run LINK,
without having to specify the correct library manually. Before BASIC PDS
there were only two runtime library names you had to deal with--QuickBASIC
4.5 uses BCOM45.LIB and BRUN45.LIB. But PDS version 7.1 comes with 16
different libraries, each intended for a different use.
For example, there are BRUN and BCOM libraries for every combination
of near and far strings, IEEE and /fpa (alternate) math, and DOS and OS/2.
That is, BRT71EFR.LIB stands for BASIC Runtime 7.1 Emulator Far strings
Real mode. Likewise, BCL71ANP is for use with a BCOM stand-along program
using Alternate math and Near strings under OS/2 Protected mode.
Using /nod tells LINK not to use the library name imbedded within the
object file, which of course means that you must specify a library name
manually. The /nod switch also accepts an optional colon and explicit
library name to exclude. That is, /nod:libname means use all of the
default libraries listed in the object file except libname.
In general, /nod is not useful with BASIC, unless you are using an
alternate library such as Crescent Software's P.D.Q. Another possible use
for /nod is if you have renamed the BASIC libraries.
/NOEXTDICT
As LINK combines the various object files that comprise your program with
routines in the runtime library, it maintains a table of all the procedure
and data names it encounters. Some of these names are in the object
modules, such as the names of your BASIC subprograms and functions. Other
procedure names are those in the library.
In some situations the same procedure or data name may be encountered
more than once. For example, when you are linking with a stub file it will
contain a routine with the same name as the one it replaces in BASIC's
library. Usually, LINK will issue an error message when it finds more than
one occurrence of a public name. If you use /noe (No Extended Dictionary)
LINK knows to use the routine or data item it finds first, and not to issue
an error message.
The /noe option should be used only when necessary, because it causes
LINK to run more slowly. Linking with stub files is described separately
later in this chapter.
/NOFARCALL
The /nof switch is usually not needed, since by default LINK does not
translate far calls to near ones (see /farcalltranslate earlier in this
section). But since you can set an environment variable to tell LINK to
assume /far automatically, /nof would be used to override that behavior.
Setting LINK options through the use of environment variables is described
later in this chapter.
/NOLOGO
The /nol switch tells LINK not to display its copyright notice, and, like
the /t BC switch may be used to minimize screen clutter.
/NOPACKCODE
As with the /nof switch, /nop is not necessary unless you have established
/packc as the default behavior using an environment variable.
/OVERLAYINT
When you have written a program that uses overlays, BASIC uses an *overlay
manager* to handle loading subprograms and functions in pieces as they are
needed. Instead of simply calling the overlay manager directly, it uses an
interrupt. This is similar to how the routines in a BRUN library are
accessed.
BASIC by default uses Interrupt &H3F, which normally will not conflict
with the interrupts used by DOS, the BIOS, or network adapter cards. If an
interrupt conflict is occurring, you can use the /o switch to specify that
a different interrupt number be used to invoke the overlay manager. This
might be necessary in certain situations, perhaps when data acquisition or
other special hardware is installed in the host PC.
/PACKCODE
The /packc switch is meant to be used with /far, and it combines multiple
adjacent code segments into as few larger ones as possible. This enable
the routines within those segments to call each other using near, rather
than far calls. When combined with /f, /packc will make your programs
slightly faster and possibly reduce their size.
/PAUSE
Using /pau tells link to pause after reading and processing the object and
library files, but before writing the final executable program to disk.
This is useful only when no hard drive is available, and all of the files
will not fit onto a single floppy disk.
/QUICKLIB
The /q switch tells LINK that you are creating a Quick Library having a
.QLB extension, rather than an .EXE program file. A Quick Library is a
special file comprised of one or more object modules, that is loaded into
the QB editing environment. Although BASIC can call routines written in
non-BASIC languages, they must already be compiled or assembled. Since the
BASIC editor can interpret only BASIC source code, Quick Libraries provide
a way to access routines written in other languages. Creating and using
Quick Libraries is discussed separately later in this chapter.
/SEGMENTS
The /seg: switch tells LINK to reserve memory for the specified number of
segment names. When LINK begins, it allocates enough memory to hold 128
different segment names. This is not unlike using DIM in a BASIC program
you might write to create a 128-element string array. If LINK encounters
more than 128 names as it processes your program, it will terminate with a
"Too many segments" error. When that happens, you must start LINK again
using the /seg switch.
All of the segments in an object module that contain code or data are
named according to a convention developed by Microsoft. Segment naming
allows routines in separate files to ultimately reside in the same memory
segment. Routines in the same segment can access each other using near
calls instead of far calls, which results in smaller and faster programs.
Also, all data in a BASIC program is combined into a single segment, even
when the data is brought in from different modules. LINK knows which
segments are to be combined by looking for identical names.
The routines in BASIC's runtime library use only a few different
names, and it is not likely that you will need to use /seg in most
situations. But when writing a large program that also incorporates many
non-BASIC routines, it is possible to exceed the 128-name limit. It is
also possible to exceed 128 segments when creating a very large Quick
Library comprised of many individual routines.
The /seg switch requires a trailing colon, followed by a number that
indicates the number of segment names to reserve memory for. For example,
to specify 250 segments you would use this command line:
link /seg:250 program, , nul, library;
In most cases, there is no harm in specifying a number that is too large,
unless that takes memory LINK needs for other purposes. Besides the
segment names, LINK must also remember object file names, procedure names,
data variables that are shared among programs, and so forth. But if LINK
runs out of memory while it is processing your program, it simply creates a
temporary work file to hold the additional information.
/STACK
The /st: option lets you control the size of BASIC's stack. One situation
where you might need to do this is if your program has deeply nested calls
to non-static procedures. Likewise, calling a recursive subprogram or
function that requires many levels of invocation will quickly consume stack
space.
You can increase the stack size in a QuickBASIC program by using the
CLEAR command:
CLEAR , , stacksize
where stacksize specifies the number of bytes needed. However, CLEAR also
clears all of your variables, closes all open files, and erases any arrays.
Therefore, CLEAR is suitable only when used at the very beginning of a
program. Unfortunately, this precludes you from using it in a chained-to
program, since any variables being passed are destroyed. Using /stack:
avoids this by letting you specify how much memory is to be set aside for
the stack when you link the chained-to program.
The /stack: option accepts a numeric argument, and can be used to
specify the stack size selectively for each program module. For example,
/stack:4096 specifies that a 4K block be set aside in DGROUP for use as a
stack. Furthermore, you do not need to use the same value for each module.
Since setting aside more stack memory than necessary impinges on available
string space, you can override BASIC's default for only those modules that
actually need it.
Note that this switch is not needed or recommended if you have BASIC
PDS, since that version includes the STACK statement for this purpose.
STUB FILES (PDS and later)
A stub file is an object module that contains an alternate version of a
BASIC language statement. A stub file could also be an alternate library
containing multiple object files. The primary purpose of a stub file is to
let you replace one or more BASIC statements with an alternate version
having reduced capability and hence smaller code. Some stub files
completely remove a particular feature or language statement. Others offer
increased functionality at the expense of additional code.
Several stub files are included with BASIC PDS, to reduce the size of
your programs. For example, NOCOM.OBJ removes the routines that handle
serial communications, replacing them with code that prints the message
"Feature stubbed out" in case you attempt to open a communications port.
When BASIC compiles your program and sees a statement such as OPEN
Some$ FOR OUTPUT AS #1, it has no way to know what the contents of Some$
will be when the program runs. That is, Some$ could hold a file name, a
device name such as "CON" or "LPT1:", or a communications argument like
"COM1:2400,N,8,1,RS,DS". Therefore, BASIC instructs LINK to include code
to support all of those possibilities. It does this by placing all of the
library routine names in the object file header. When the program runs,
the code that handles OPEN examines Some$ and determines which routine to
actually call.
Within BASIC's runtime library are a number of individual object
modules, each of which contains code to handle one or more BASIC
statements. In chapter 1 you learned that how finely LINK can extract
individual routines from BASIC's libraries depends on how the routines were
combined in the original assembly language source files. In BASIC 7.1,
using the SCREEN function in a program also causes LINK to add the routines
that handle CSRLIN and POS(0), even if those statements are not used. This
is because all three routines are in the same object module. The manner in
which these routines are combined is called *granularity*, and a library's
granularity dictates which routines can be replaced by a stub file. That
is, a stub file that eliminated the code to support SCREEN would also
remove CSRLIN and POS(0).
Some of the stub files included with BASIC 7 PDS are NOGRAPH.OBJ,
NOLPT.OBJ, and SMALLERR.OBJ. NOGRAPH.OBJ removes all support for graphics,
NOLPT.OBJ eliminates the code needed to send data to a printer, and
SMALLERR.OBJ contains a small subset of the many runtime error messages
that a BASIC program normally contains. Other stub files selectively
eliminate VGA or CGA graphics support, and another, OVLDOS21.OBJ, adds the
extra code necessary for the BASIC overlay manager to operate with DOS 2.1.
When linking with a stub file, it is essential that you use the /noe
LINK switch, so LINK will not be confused by the presence of two routines
with the same name. The general syntax for linking with a stub file is as
follows:
link /noe basfile stubfile;
Of course, you could add other LINK options, such as /ex and /packc, and
specify other object and library files that are needed as well.
You can also create your own BASIC stub files, perhaps to produce a
demo version of a program that has all features except the ability to save
data to disk. In order for this to work, you must organize your
subprograms and functions such that all of the routines that are to be
stubbed out are in separate source files, or combined together in one file.
In the example above, you would place the routines that save the data
in a separate file. Then, simply create an empty subprogram that has the
same name and the same number and type of parameters, and compile that
separately. Finally, you would link the BASIC stub file with the rest of
the program. Note that such a replacement file is not technically a stub,
unless the BASIC routines being replaced have been compiled and placed into
a library. But the idea is generally the same.
QUICK LIBRARIES
For many programmers, one of the most confusing aspects of Microsoft BASIC
is creating and managing Quick Libraries. The concept is quite simple,
however, and there are only a few rules you must follow.
The primary purpose of a Quick Library is to let you access non-BASIC
procedures from within the BASIC editor. For example, BASIC comes with a
Quick Library that contains the Interrupt routine, to let you call DOS and
BIOS system services. A Quick Library can contain routines written in any
language, including BASIC.
Although the BASIC editor provides a menu option to create a Quick
Library, that will not be addressed here. Rather, I will show the steps
necessary to invoke LINK manually from the DOS command line. There are
several problems and limitations imposed by BASIC's automated menus, which
can be overcome only by creating the library manually.
One limitation is that the automated method adds all of the programs
currently loaded into memory into the Quick Library, including the main
program. Unfortunately, only subprograms and functions should be included.
Code in the main module will never be executed, and its presence merely
wastes the memory it occupies. Another, more serious problem is there's no
way to specify a /seg parameter, which is needed when many routines are to
be included in the library.
[Actually, you can set a DOS environment variable that tells LINK to
default to a given number of segments. But that too has problems when
using VB/DOS, because the VB/DOS editor specifies a /seg: value manually,
and incorrectly. Unfortunately, LINK honors the value passed to it by
VB/DOS, rather than the value you assigned to the environment variable.]
Quick Libraries are built from one or more object files using LINK
with the /q switch, and once created may not be altered. Unlike the
LIB.EXE library manager that lets you add and remove object files from an
existing .LIB library, there is no way to modify a Quick Library.
When LINK combines the various components of an executable file, it
resolves the data and procedure addresses in each object module header.
The header contains relocation information that shows the names of all
external routines being called, as well as where in the object file the
final address is to be placed. Since the address of an external routine is
not known when the source file is compiled or assembled, the actual CALL
instruction is left blank. This was described earlier in this chapter in
the section *Overview of Compiling and Linking*.
Resolving these data and procedure addresses is one of the jobs that
LINK performs. Because the external names that had been in each object
file are removed by LINK and replaced with numeric addresses, there is no
way to reconstruct them later. Similarly, when LINK creates a Quick
Library it resolves all incomplete addresses, and removes the information
that shows where in the object module they were located. Thus, it is
impossible to extract an object module from a Quick Library, or to modify
it by adding or removing modules.
Understand that the names of the procedures within the Quick Library
are still present, so QuickBASIC can find them and know the addresses to
call. But if a routine in a Quick Library in turn calls another routine in
the library, the name of the called routine is lost.
Creating a Quick Library
Quick Libraries are created using the version of LINK that came with your
compiler, and the general syntax is as follows:
link /q obj1 [obj2] [library.lib] , , nul , support;
The support library file shown above is included with BASIC, and its name
will vary depending on your compiler version. The library that comes with
QuickBASIC version 4.5 is named BQLB45.LIB; BASIC 7 instead includes
QBXQLB.LIB for the same purpose. You must specify the appropriate support
library name when creating a Quick Library.
Notice that LINK also lets you include all of the routines in one or
more conventional (.LIB) libraries. Simply list the library names where
the object file names would go. The .LIB extension must be given, because
.OBJ is the default extension that LINK assumes. You can also combine
object files and multiple libraries in the same Quick Library like this:
link /q obj1 obj2 lib1.lib lib2.lib , , nul , support;
Although Quick Libraries are necessary for accessing non-BASIC subroutines,
you can include compiled BASIC object files. In general, I recommend
against doing that; however, there are some advantages. One advantage is
that a compiled subprogram or function will usually require less memory,
because comments are not included in the compiled code and long variable
names are replaced with equivalent 2-byte addresses. Another advantage is
that compiled code in a Quick Library can be loaded very quickly, thus
avoiding the loading and parsing process needed when BASIC source code is
loaded.
But there are several disadvantages to storing BASIC procedures in a
Quick Library. One problem is that you cannot trace into them to determine
the cause of an error. Another is that all of the routines in a Quick
Library must be loaded together. If the files are retained in their
original BASIC source form, you can selectively load and unload them as
necessary. The last disadvantage affects BASIC 7 [and VB/DOS] users only.
The QBX [and VB/DOS] editors places certain subprogram and function
procedures into expanded memory if any is available. Understand that all
procedures are not placed there; only those whose BASIC source code size is
between 1K and 16K. But Quick Libraries are always stored in conventional
DOS memory. Therefore, more memory will be available to your programs if
the procedures are still in source form, because they can be placed into
EMS memory.
Note that when compiling BASIC PDS programs for placement in a Quick
Library, it is essential that you compile using the /fs (far strings)
option. Near strings are not supported within the QBX editor, and failing
to use /fs will cause your program to fail spectacularly.
RESPONSE FILES
A response file contains information that LINK requires, and it can
completely or partially replace the commands that would normally be given
from the DOS command line. The most common use for a LINK response file is
to specify a large number of object files. If you are creating a Quick
Library that contains dozens or even hundreds of separate object files, it
is far easier to maintain the names in a file than to enter them each time
manually.
To tell LINK that it is to read its input from a response file enter
an at sign (@) followed by the response file name, as shown below.
link /q @quicklib.rsp
Since the /q switch was already given, the response file need only contain
the remaining information. A typical response is shown in the listing
below.
object1 +
object2 +
object3 +
object4 +
object5
qlbname
nul
support
Even though this example lists only five object files, there could be as
many as necessary. Each object file name except the last one is followed
by a plus sign (+), so LINK will know that another object file name input
line follows. The qlbname line indicates the output file name. If it is
omitted and replaced with a blank line, the library will assume the name of
the first object file but with a .QLB extension. In this case, the name
would be OBJECT1.QLB. The nul entry could also be replaced with a blank
line, in which case LINK would create a map file named OBJECT1.MAP. As
shown in the earlier examples, the support library will actually be named
BQLB45 or QBXQLB, depending on which version of BASIC you are using.
LINK recognizes several variations on the structure of a response
file. For example, several object names could be placed on each line, up
to the 126-character line length limit imposed by DOS. That is, you could
have a response file like this:
object1 object2 object3 +
object4 object5 object6 +
...
I have found that placing only one name on each line makes it easier to
maintain a large response file. That also lends itself to keeping the
names in alphabetical order.
You may also place the various option switches in a response file, by
listing them on the first line with the object files:
/ex /seg:250 object1 +
object2 +
...
Response files can be used for conventional linking, and not just for
creating Quick Libraries. This is useful when you are developing a very
large project comprised of many different modules. Regardless of what you
are linking, however, understanding how response files are used is a
valuable skill.
LINKING WITH BATCH FILES
Because so many options are needed to fully control the compiling and
linking process, many programmers use a batch file to create their
programs. The C.BAT batch file below compiles and links a single BASIC
program module, and exploits DOS' replaceable batch parameter feature.
bc /o /s /t %1;
link /e /packc /far /seg:250 %1, , nul, mylib;
Like many programs, a batch file can also accept command line arguments.
The first argument is known within the batch file as %1, the second is %2,
and so forth, up to the ninth parameter. Therefore, when this file is
started using this command:
c myprog
the compiler is actually invoked with the command
bc /o /s /t myprog;
The second line becomes
link /e /far /packc /seg:250 myprog, , nul, mylib;
That is, every occurrence of the replaceable parameter %1 is replaced by
the first (and in this case only) argument: myprog.
I often create a separate batch file for each new project I begin, to
avoid having to type even the file name. I generally use the name C.BAT
because its purpose is obvious, and it requires typing only one letter!
Once the project is complete, I rename the batch file to have the same
first name as the main BASIC program. This lets me see exactly how the
program was created if I have to come back to it again months later. An
example of a batch file that compiles and links three BASIC source files is
shown below.
bc /o /s /t mainprog;
bc /o /s /t module1;
bc /o /s /t module2;
link /e /packc /far mainprog module1 module2, , nul, mylib;
Of course, you'd use the compiler and link switches that are appropriate to
your particular project. You could also specify a LINK response file
within a batch file. In the example above you would replace the last line
with a command such as this:
link @mainprog.rsp;
LINKING WITH OVERLAYS (PDS and VB/DOS PRO EDITION ONLY)
At one time or another, most programmers face the problem of having an
executable program become too large to fit into memory when run. With
QuickBASIC your only recourse is to divide the program into separate .EXE
files, and use CHAIN to go back and forth between them. This method
requires a lot of planning, and doesn't lend itself to structured
programming methods. Each program is a stand-alone main module, rather
than a subprogram or function.
Worse, chaining often requires the same subroutine code to be
duplicated in each program, since only one program can be loaded into
memory at a time. If both PROGRAM1.EXE and PROGRAM2.EXE make calls to the
same subprogram, that subprogram will have to be added to each program.
Obviously, this wastes disk space. BASIC 6.0 included the BUILDRTM program
to create custom runtime program files that combines common subroutine code
with the BASIC runtime library. But that program is complicated to use and
often buggy in operation.
Therefore, one of the most useful features introduced with BASIC 7 is
support for program overlays. An overlay is a module that contains one or
more subprograms or functions that is loaded into memory only when needed.
All overlaid modules are contained in a single .EXE file along with the
main program, as opposed to the separate files needed when programs use
CHAIN. The loading and unloading of modules is handled for you
automatically by the overlay manager contained in the BASIC runtime
library.
Consider, as an example, a large accounting program comprised of three
modules. The main module would consist of a menu that controls the
remaining modules, and perhaps also contains some ancillary subprograms and
functions. The second module would handle data entry, and the third would
print all of the reports. In this case, the data entry and reporting
modules are not both required at the same time; only the module currently
selected from the menu is necessary. Therefore, you would link those
modules as overlays, and let BASIC's overlay manager load and unload them
automatically when they are called.
The overall structure of an overlaid program is shown in Figure 5-4.
┌────────────────────────────┐
│ '**** MAINPROG.BAS │
│ CALL Menu(Choice) │
│ IF Choice = 1 THEN │
│ CALL EnterData │
│ ELSEIF Choice = 2 THEN │
│ CALL DoReports │
│ END IF │
├────────────────────────────┤
│ SUB Menu(Choice) │
│ ... │
│ CALL GetChoice(Choice) │
│ ... │
│ END SUB │
├────────────────────────────┤
│ SUB GetChoice(ChoiceNum) │
│ ... │
│ ... │
│ END SUB │
└────────────────────────────┘
┌────────────────────────────┐
│ '*** ENTERDAT.BAS │
│ SUB EnterData │
│ ... │
│ CALL GetChoice(Choice) │
│ ... │
│ END SUB │
└────────────────────────────┘
┌────────────────────────────┐
│ '*** REPORTS.BAS │
│ SUB DoReports │
│ PRINT "Which report? "; │
│ CALL GetChoice(Choice) │
│ ... │
│ ... │
│ END SUB │
└────────────────────────────┘
Figure 5-4: The structure of a program that uses overlays.
Here, the main program is loaded into memory when the program is first run.
Since the main program also contains the Menu and GetChoice subprograms,
they too are initially loaded into memory. Understand that the main
program is always present in memory, and only the overlaid modules are
swapped in and out. Thus, EnterData and DoReports can both freely call the
GetChoice subprogram which is always in memory, without incurring any delay
to load it into memory from disk.
If the host computer has expanded memory, BASIC will use that to hold
the overlaid modules. Since EMS can be accessed much more quickly than a
disk, this reduces the load time to virtually instantaneous. You should be
aware, however, that BASIC PDS contains a bug in the EMS portion of its
overlay manager. If EMS is present but less than 64K is available, your
program will terminate with the error message "Insufficient EMS to load
overlay."
If no expanded memory is available, BASIC simply reads the overlaid
modules from the original disk file each time they are called. It should
also use the disk if it determines that there isn't enough EMS to handle
the overlay requirements, but it doesn't. Therefore, it is up to your
users to determine how much expanded memory is present, and disable the EMS
driver in their PC if there isn't at least 64K.
To specify that a module is to be overlaid, simply surround its name
with parentheses when linking. Using the earlier example shown in Figure
5-4, you would link MAINPROG.OBJ with ENTERDAT.OBJ and REPORTS.OBJ as
follows:
link mainprog (enterdat) (reports);
Of course, you may include any link switches that are needed, and also
include any non-overlaid object files. Any object file names that are not
surrounded by parentheses will be kept in memory at all times. Therefore,
you should organize your programs such that subprograms and functions that
are common to the entire application are always loaded. Otherwise, the
program could become very slow if those procedures are swapped in and out
of memory each time they are called.
OTHER LINK DETAILS
The BASIC PDS documentation lists no less than 143 different LINK error
messages, and at one time or another you are bound to see at least some of
those. LINK errors are divided into two general categories: warning errors
and fatal errors. Warning errors can sometimes be ignored. For example,
failing to use the /noe switch when linking with a stub file produces the
message "Symbol multiply defined", because LINK encountered the same
procedure name in the stub file and in the runtime library. In this case
LINK simply uses the first procedure it encountered. In general, however,
you should not run a program whose linking resulted in any error messages.
Fatal errors are exactly that--an indication that LINK was unable to
create the program successfully. Even if an .EXE file is produced, running
it is almost certain to cause your PC to lock up. One example of a fatal
error is "Unresolved external." This means that your program made a call
to a procedure, but LINK wasn't able to find its name in the list of object
and library files you gave it. Another fatal error is "Too many segments."
You might think that LINK would be smart enough to finish reading the
files, count the number of segment names it needs, and then restart itself
again reserving enough memory. Unfortunately, it isn't.
Regardless of the type of error messages you receive, it is impossible
to read all of them if there are so many that they scroll off the screen.
Although you can press Ctrl-P to tell DOS to echo the messages to your
printer, there is an even better method. You can use the DOS redirection
feature to send the message to a disk file. This lets you load the file
into a text editor for later perusal. To send all of LINK's output to a
file simply use the "greater than" symbol (>) specifying a file name as
follows:
link [/options] [object files]; > error.log
Instead of displaying the messages on the screen, DOS intercepts and routes
them to the ERROR.LOG file. It is important to understand that this is a
DOS issue, and has nothing to do with LINK. Therefore, you can use this
same general technique to redirect the output of most programs to a file.
Note that using redirection causes *all* of the program's output to go to
the file, not just the error messages. Therefore, nothing will appear to
happen on the screen, since the copyright and sign-on notices are also
redirected.
Another LINK detail you should be aware of is that numeric arguments
may be given in either decimal or hexadecimal form. Any LINK option that
expects a number--for example, the /seg: switch--may be given as a
Hexadecimal value by preceding the digits with 0x. That is, /seg:0x100 is
equivalent to /seg:256. The use of 0x is a C notation convention, and the
"x" character is used because it sounds like "hex".
Finally, if you are using QuickBASIC 4.0 there is a nasty bug you
should be aware of. All versions of QuickBASIC let you create an
executable program from within the editing environment. And if a Quick
Library is currently loaded, QB knows to link your program with a parallel
.LIB library having the same name. But instead of specifying that library
in the proper LINK field, QB 4.0 puts its name in the object file position.
This causes LINK to add every routine in the library to your program,
rather than only those routines that are actually called. There is no way
to avoid this bug, and QB 4.0 users must compile and link manually from
DOS.
MAINTAINING LIBRARIES
=====================
As you already know, multiple object files may be stored in a single
library. A library has a .LIB extension, and LINK can extract from it only
those object modules actually needed as it creates an executable file. All
current versions of Microsoft compiled BASIC include the LIB.EXE program,
which lets you manage a library file. With LIB.EXE you can add and remove
objects, extract a copy of a single object without actually deleting it
from the library, and create a cross-referenced list of all the procedures
contained therein.
It is important to understand that a .LIB library is very different
from a Quick Library. A .LIB library is simply a collection of individual
object files, with a header portion that tells which objects are present,
and where in the library they are located. A Quick Library, on the other
hand, contains the raw code and data only. The routines in a Quick Library
do not contain any of the relocation and address information that was
present in the original object module.
The runtime libraries that Microsoft includes with BASIC are .LIB
libraries, as are third-party support libraries you might purchase. You
can also create your own libraries from both compiled BASIC code and
assembly language subroutines. The primary purpose of using a library is
to avoid having to list every object file needed manually. Another
important use is to let LINK add only those routines actually necessary to
your final .EXE program.
Like BC and LINK, you can invoke LIB giving all of the necessary
parameters on a single command line, or wait for it to prompt you for the
information. LIB can also read file names and options from a response
file, which avoids having to enter many object names manually. A LIB
response file is similar--but not identical--to a LINK response file.
Using LIB response files will be described later in this section.
The general syntax of the LIB command line is shown below, with
brackets indicating optional information.
lib [/options] libname [commands] , [listfile] , [newlib] [;]
After any optional switches, the first parameter is the name of the library
being manipulated, and that is followed by one or more commands that tell
LIB what you want to do. A list file can also be created, and it contains
the names of every object file in the library along with the procedure
names each object contains. The last argument indicates an optional new
library; if present LIB will leave the original library intact, and copy it
to a new one applying the changes you have asked for.
There are three commands that can be used with LIB, and each is
represented using a punctuation character. However, LIB lets you combine
some of these commands, for a total of five separate actions. This is
shown in Table 5-1.
Command Action
======= =========================================
+ Add an object module or entire library.
- Remove an object module from the library.
* Extract a copy of an object module.
-+ Replace an object module with a new one.
-* Extract and then remove an object module.
Table 5-1: The LIB commands for managing libraries.
To add the file NEWOBJ.OBJ to the existing library MYLIB.LIB you would use
the plus sign (+) as follows:
lib mylib +newobj;
And to update the library using a newer version of an object already
present in the library you would instead use this:
lib mylib -+d:\newstuff\anyobj;
As you can see, the combination operators use a sensible syntax. Here, you
are instructing LIB to first remove ANYOBJ.OBJ from MYLIB.LIB, and then add
a newer version in its place. A drive and directory are given just to show
that it is possible, and how that would be specified.
To extract a copy of an object file from a library, use the asterisk
(*) command. Again, you can specify a directory in which the extracted
file is to be placed, as follows:
lib mylib *\objdir\thisobj;
You should understand that LIB never actually modifies an existing library.
Rather, it first renames the original library to have a .BAK extension, and
then creates and modifies a new file using the original name. It is up to
you to delete the backup copy once you are certain that the new library is
correct. [But this backup is made only if you do not specify a new output
library name--NEWLIB in the earlier syntax example.]
If the named library does not exist, LIB asks if you want to create
it. This gives you a chance to abort the process if you accidentally typed
the wrong name. If you really do want to create a new library, simply
answer Y (Yes) at the prompt. Of course, the only thing you can do to a
non-existent library is add new objects to it with the plus (+) command.
One important LIB feature is its ability to create a list file showing
what routines are present in the library. This is particularly valuable if
you are managing a library you did not create, such as a library purchased
from a third-party vendor. Many vendors use the same name for the object
file as the routine it contains when possible, but there are exceptions.
For example, an object file name is limited to eight characters, even
though procedure names can be as long as 40. If you want to know which
object file contains the procedure ReadDirectories, you will need to create
a list file. Also, one object file can hold multiple procedures, and it is
not always obvious which procedure is in which file. Individual procedures
cannot necessarily be extracted from a library--only entire object files.
To create a library list file you will run LIB giving the name of the
library, as well as the name of a list file to create. The example below
creates a list file named MYLIST.LST for the library named MYLIB.LIB:
lib mylib , mylist.lst;
The list file that is created contains two cross-referenced tables; one
shows each object name and the procedures it contains, and the other shows
the procedure names and which object they are in. A typical list file is
shown in the Figure 5-5, using the QB.LIB file that comes with QuickBASIC
4.5 as an example.
ABSOLUTE..........absolute INT86OLD..........int86old
INT86XOLD.........int86old INTERRUPT.........intrpt
INTERRUPTX........intrpt
absolute Offset: 00000010H Code and data size: cH
ABSOLUTE
intrpt Offset: 000000e0H Code and data size: 107H
INTERRUPT INTERRUPTX
int86old Offset: 000002a0H Code and data size: 11eH
INT86OLD INT86XOLD
Figure 5-5: The format of a LIB list file.
In this list file, each object module contains only one procedure. The
first section shows each procedure name in upper case, followed by the
object name in lower case. The second section shows each object file name,
its offset within the library and size in bytes, and the routine names
within that object file.
Just for fun, you should create a list file from one of the libraries
that came with your compiler. Besides showing how a large listing is
structured, you will also be able to see which statements are combined with
others in the same object file. Thus, you can determine the granularity of
these libraries. In many cases the names of the procedures are similar to
the corresponding BASIC keywords.
For example, if you create a list file for the BCOM45.LIB library that
comes with QuickBASIC 4.5, you will see an object file named STRFCN.OBJ
(string function) that contains the procedures B$FASC, B$FLEN, B$FMID,
B$INS2, B$INS3, B$LCAS, B$LEFT, and several other string functions. Most
of the library routines start with the characters B$, which ensures that
the names will not conflict with procedure names you are using. (A dollar
sign is illegal in a BASIC procedure name.) Other procedures (and data
items) use an imbedded underscore (_) which is also illegal in BASIC.
FASC stands for Function ASC, FLEN is for Function LEN, and so forth.
INS2 and INS3 contain the code to handle BASIC's INSTR function, with the
first being the two-argument version and the second the three-argument
version. That is, using INSTR(Work$, Substring$) calls B$INS2, and
INSTR(Start, Work$, Substring$) instead calls B$INS3. As you can see, most
of the internal procedure names are sensible, albeit somewhat abbreviated.
LIB OPTIONS
Many LIB options are frankly not that useful to purely BASIC programming.
However, I will list them here in the interest of completeness. Note that
none of these option switches are available in versions of LIB prior to the
one that comes with BASIC 7.0.
/HELP
As with the LINK switch of the same name, using /help (or /?) tells LIB to
display its command syntax, and a list of all the available options.
/I
Using /i means that LIB should ignore capitalization when searching the
library for procedure names. This is the default for LIB, and is not
necessary unless you are manipulating an existing library that was created
with /noi (see below).
/NOE
The /noe option has a similar meaning as its LINK counterpart, and should
be used if LIB reports an Out of memory error. Creating an extended
dictionary requires memory, and using /noe will avoid that.
/NOI
The /noi switch tells LIB not to ignore capitalization, and it should not
be used with BASIC programs.
/NOLOGO
Like the LINK option, /nologo reduces screen clutter by eliminating the
sign-on logo and copyright display.
/PA
The /pa: option lets you change the default library page size of 16 bytes.
Larger values waste memory, because each object file will always occupy the
next higher multiple number of bytes. For example, with a page size of 200
bytes, a 50 byte object file will require an entire 200-byte page. Since a
library can hold no more than 65,536 pages, a larger page size is useful
only when you need to create a library larger than 1 megabyte. The /pa:
switch requires a colon, followed by an integer value between 16 and 32768.
For example, using /pa:256 sets a page size of 256 bytes.
USING RESPONSE FILES WITH LIB.EXE
A LIB response file is similar to a LINK response file, in that it lets you
specify a large number of operations by entering them on separate lines of
a text file. The syntax is similar to a LINK response file, but it is not
identical. Since the plus sign continuation character that LINK uses
serves as a command character to LIB, an ampersand (&) is used instead. A
typical LIB response file is shown below.
+ object1 &
+ \subdir\object2 &
+ c:\subdir2\object3 &
+ object4 ;
As with LINK, you will use an at sign (@) to tell LIB to look in the file
for its input, as opposed to reading the names from the command line:
lib @filename.rsp
USEFUL BC, LINK, AND LIB ENVIRONMENT PARAMETERS
===============================================
Most programmers are familiar with the DOS environment as a way to
establish PATH and PROMPT variables. The PATH environment variable tells
DOS where to search for executable program files it doesn't find in the
current directory. The PROMPT variable specifies a new prompt that DOS
displays at the command line. For example, many people use the command
SET PROMPT=$P$G
to show the current drive and directory. However, the DOS environment can
be used to hold other, more general information as well.
The environment is simply an area of memory that DOS maintains to hold
variables you have assigned. Some of these variables are used by DOS, such
as the PATH and PROMPT settings. Other variables may be defined by you or
your programs, to hold any type of information. For example, you could
enter SET USERNAME=TAMI in the AUTOEXEC.BAT file, and a program could read
that to know the name of the person who is using it. The contents of this
variable (TAMI) could then be used as a file or directory name, or for any
other purpose.
LINK looks at the DOS environment to see if you have specified LINK=
or LIB= or TMP= variables. The first is used to specify default option
switches. For example, if you set LINK=/SEG:450 from the DOS command line
or a batch file, you do not need to use that option each time LINK is run.
Multiple options may be included in a single SET statement, by listing each
in succession. The command SET LINK=/NOE/NOD/EX establishes those three
options shown as the default. Additional separating spaces may also be
included; however, that is unnecessary and wastes environment memory.
Likewise, setting LIB=D:\LIBDIR\ tells LINK to look in the LIBDIR
directory of drive D: for any libraries it cannot find it the current
directory. In this case, LIB= acts as a sort of PATH command. Like PATH,
the LIB= variable accepts multiple path names with or without drive
letters, and each is separated by a semicolon. The command
SET LIB=C:\LIBS\;D:\WORKDIR\
sets a library path to both C:\LIBS and D:\WORKDIR, and even more
directories could be added if needed. To remove an environment variable
simply assign it to a null value; in this case you would use SET LIB=.
The TMP= variable also specifies a path that tells LINK where to write
any temporary files. When a very large program or Quick Library is being
created, it is possible for LINK to run out of memory. Rather than abort
with an error message, LINK will open a temporary disk file and spool the
excess data to that file. If no TMP= variable has been defined, that file
is created in the current directory. However, if you have a RAM disk you
can specify that as the TMP parameter, to speed up the linking process.
For example, SET TMP=F:\ establishes the root directory of drive F as the
temporary directory.
The INCLUDE= variable is recognized by both BC and MASM (the Microsoft
Macro Assembler program), to specify where they should look for Include
files. In my own programming, I prefer to give an explicit directory name
as part of the $INCLUDE metacommand. This avoids unpleasant surprises when
an obsolete version of a file is accidentally included. But you may also
store all $INCLUDE files in a single directory, and then set the INCLUDE
variable to show where that directory is. Like LIB and PATH, the INCLUDE
variable accepts one or more directory names separated by semicolons.
SUMMARY
=======
In this chapter you have learned about compiling and linking manually from
the DOS command line, to avoid the limitations imposed by the automated
menus in the BASIC editor. You have also learned how to create and
maintain both Quick Libraries and conventional .LIB libraries. Besides
accepting information you enter at the DOS command line, LINK and LIB can
also process instructions and file names contained in a response file.
All of the commands and option switches available with BC, LINK, and
LIB were described in detail, along with a listing of the undocumented BC
metacommands for controlling the format of a compiler list file. Library
list files were also discussed, and a sample printout was given showing how
LIB shows all the procedure and object names in a library cross-referenced
alphabetically.
The discussion about stub files explained what they are and how to use
them, to reduce the size of your programs. Overlays were also covered,
accompanied by some reasons you will find them useful along with specific
linking instructions.
Finally, I explained some of the details of the linking process.
Information in each object file header tells LINK the names of external
procedures being called, and where in the object file the incomplete
addresses are located. Besides the segment and address fixups that LINK
performs, DOS also makes some last-minute patches to your program as it is
loaded into memory.
In the next chapter I will cover file handling in similar detail,
explaining how files are manipulated at a low level, and also offering
numerous tips for achieving high performance and small program size.