Developer Documentation
PATH  Mac OS X Documentation > Developer Tools > The GNU C Preprocessor


Previous | Contents | Next

Pitfalls and Subtleties of Macros

This section describes some special rules that apply to macros and macro expansion, and points out certain cases in which the rules have counterintuitive consequences that you must watch out for.

Improperly Nested Constructs

Recall that when a macro is invoked with arguments, the arguments are substituted into the macro body and the result is checked, together with the rest of the input file, for more macros.

It's possible to piece together a macro invocation coming partially from the macro body and partially from the arguments. For example,

#define double(x) (2*(x))
#define call_with_1(x) x(1)

would expand call_with_1 (double) into (2*(1)) .

Macro definitions don't have to have balanced parentheses. By writing an unbalanced left parenthesis in a macro body, it's possible to create a macro invocation that begins inside the macro body but ends outside it. For example:

#define strange(file) fprintf (file, "%s %d",
. . .
strange(stderr) p, 35)

This bizarre example expands to

fprintf (stderr, "%s %d", p, 35)

Unintended Grouping of Arithmetic

You may have noticed that in most of the macro definition examples shown above, each occurrence of a macro argument name has parentheses around it. In addition, another pair of parentheses usually surround the entire macro definition. This section discusses why it's best to write macros that way.

Suppose you define a macro

#define ceil_div(x, y) (x + y - 1) / y

whose purpose is to divide, rounding up. (One use for this operation is to compute how many int objects are needed to hold a certain number of char objects.) Then suppose it's used as follows:

a = ceil_div (b & c, sizeof (int));

This expands into

a = (b & c + sizeof (int) - 1) / sizeof (int);

which doesn't do what's intended. The operator-precedence rules of C make this equivalent to:

a = (b & (c + sizeof (int) - 1)) / sizeof (int);

But what we want is:

a = ((b & c) + sizeof (int) - 1)) / sizeof (int);

Defining the macro as follows provides the desired result:

#define ceil_div(x, y) ((x) + (y) - 1) / (y)

However, unintended grouping can happen in another way. Consider sizeof ceil_div(1, 2) . This has the appearance of a C expression that would compute the size of the type of ceil_div (1, 2) , but in fact it means something very different. Here's what it expands to:

sizeof ((1) + (2) - 1) / (2)

This would take the size of an integer and divide it by 2. The precedence rules have put the division outside the sizeof() when it was intended to be inside.

Parentheses around the entire macro definition can prevent such problems. Here's the recommended way to define ceil_div :

#define ceil_div(x, y) (((x) + (y) - 1) / (y))

Swallowing the Semicolon

Often it's desirable to define a macro that expands into a compound statement. Consider, for example, the following macro, which advances a pointer across space characters:

#define SKIP_SPACES (p, limit) \
{ register char *lim = (limit); \
while (p != lim) { \
if (*p++ != ' ') { \
p-; break; }}}

Here backslash-newline is used to split the macro definition, which must be a single line, so that it resembles the way such C code would appear if not part of a macro definition.

An invocation of this macro might be SKIP_SPACES (p, lim) . Strictly speaking, the invocation expands to a compound statement, which is a complete statement with no need for a semicolon to end it. But it looks like a function call. So it minimizes confusion if you can use it like a function call, writing a semicolon afterward:

SKIP_SPACES (p, lim);

But this can cause trouble before else statements, because the semicolon is actually a null statement. Suppose you write

if (*p != 0)
SKIP_SPACES (p, lim);
else . . .

The presence of two statements--the compound statement and a null statement--in between the if condition and the else makes invalid C code.

The definition of the macro SKIP_SPACES can be altered to solve this problem, using a do ... while statement:

#define SKIP_SPACES (p, limit) \
do { register char *lim = (limit); \
while (p != lim) { \
if (*p++ != ' ') { \
p-; break; }}} \
while (0)

Now SKIP_SPACES (p, lim); expands into one statement:

do {. . .} while (0);

Duplication of Side Effects

Many C programs define a macro min (for "minimum"), like this:

#define min(X, Y) ((X) < (Y) ? (X) : (Y))

When you use this macro with an argument containing a side effect (as shown here)

next = min (x + y, foo (z));

it expands as follows:

next = ((x + y) < (foo (z)) ? (x + y) : (foo (z)));

where x + y has been substituted for X and foo (z) for Y .

The function foo is used only once in the statement as it appears in the program, but the expression foo (z) has been substituted twice into the macro expansion. As a result, foo might be called two times when the statement is executed. If it has side effects or if it takes a long time to compute, the results might not be what you intended. Therefore min is an "unsafe" macro.

One way to solve this problem is to define min in a way that computes the value of foo (z) only once. The C language offers no standard way to do this, but it can be done with GNU C extensions as follows:

#define min(X, Y) \
({ typeof (X) __x = (X), __y = (Y); \
(__x < __y) ? __x : __y; })

If you don't wish to use GNU C extensions, the only solution is to be careful when using the macro min . For example, you can calculate the value of foo (z) , save it in a variable, and use that variable in min :

#define min(X, Y) ((X) < (Y) ? (X) : (Y))
. . .
{
int tem = foo (z);
next = min (x + y, tem);
}

Self-Referential Macros

A self-referential macro is one whose name appears in its definition. A special feature of ANSI-standard C is that the self-reference isn't considered a macro invocation. It's passed into the preprocessor output unchanged.

Consider the following example (assume that foo is also a variable in your program):

#define foo (4 + foo)

Following the ordinary rules, each reference to foo will expand into (4 + foo) ; then this will be rescanned and will expand into (4 + (4 + foo)) ; and so on until it causes a fatal error (memory full) in the preprocessor.

However, the special rule about self-reference cuts this process short after one step, at (4 + foo) . Therefore, this macro definition has the possibly useful effect of causing the program to add 4 to the value of foo wherever foo is referred to.

In most cases, it's a bad idea to take advantage of this feature. A person reading the program who sees that foo is a variable won't expect that it's a macro as well. The reader will come across the identifier foo in the program and think its value should be that of the variable foo , whereas in fact the value is 4 greater.

The special rule for self-reference applies also to indirect self-reference. This is the case where a macro X expands to use a macro y , and y 's expansion refers to the macro x . The resulting reference to x comes indirectly from the expansion of x , so it's a self-reference and isn't further expanded. Thus, after

#define x (4 + y)
#define y (2 * x)

x would expand into (4 + (2 * x)) .

But suppose y is used elsewhere, not from the definition of x . Then the use of x in the expansion of y isn't a self-reference because x isn't in progress. So it does expand. However, the expansion of x contains a reference to y , and that's an indirect self-reference now because y is in progress. The result is that y expands to (2 * (4 + y)) .

Separate Expansion of Macro Arguments

We have explained that the expansion of a macro, including the substituted arguments, is scanned over again for macros to be expanded.

What really happens is more subtle: First each argument text is scanned separately for macros. Then the results of this are substituted into the macro body to produce the macro expansion, and the macro expansion is scanned again for macros to expand.

The result is that the arguments are scanned twice to expand macros in them.

Most of the time, this has no effect. If the argument contained any macros, they're expanded during the first scan. The result therefore contains no macros, so the second scan doesn't change it. If the argument were substituted as given, with no prescan, the single remaining scan would find the same macros and produce the same results.

You might expect the double scan to change the results when a self-referential macro is used in an argument of another macro (see the section " See Self-Referential Macros " above); the self-referential macro would be expanded once in the first scan, and a second time in the second scan. But this isn't what happens. The self-references that don't expand in the first scan are marked so that they won't expand in the second scan either.

The prescan isn't done when an argument is stringified or concatenated. (More precisely, stringification and concatenation use the argument as written, in unprescanned form. The same argument would be used in prescanned form if it's substituted elsewhere without stringification or concatenation.) Thus,

#define str(s) #s
#define foo 4
str (foo)

expands to "foo" . Once more, prescan has been prevented from having any noticeable effect.

The prescan does make a difference in three special cases:

Nested invocations of a macro occur when a macro's argument contains an invocation of that very macro. For example, if f is a macro that expects one argument, f (f (1)) is a nested pair of invocations of f . The desired expansion is made by expanding f (1) and substituting that into the definition of f . The prescan causes the expected result to happen. Without the prescan, f (1) itself would be substituted as an argument, and the inner use of f would appear during the main scan as an indirect self-reference and wouldn't be expanded. Here, the prescan cancels an undesirable side effect of the special rule for self-referential macros.

But prescan causes trouble in certain other cases of nested macro calls. For example:

#define foo a,b
#define bar(x) lose(x)
#define lose(x) (1 + (x))

bar(foo)

We would like bar(foo) to turn into (1 + (foo)) , which would then turn into (1 + (a,b)) . But instead, bar(foo) expands into lose(a,b) , and you get an error because lose requires a single argument. In this case, the problem is easily solved by the same parentheses that ought to be used to prevent misnesting of arithmetic operations:

#define foo (a,b) #define bar(x) lose((x))

The problem is more serious when the operands of the macro aren't expressions (for example, when they are statements). Then parentheses are unacceptable because they would make for invalid C code:

#define foo { int a, b; ... }

In GNU C you can shield the commas using the ({ . . . }) construct, which turns a compound statement into an expression:

#define foo ({ int a, b; ... })

Or you can rewrite the macro definition to avoid such commas:

#define foo { int a; int b; ... }

There's also one case where prescan is useful. It's possible to use prescan to expand an argument and then stringify it--if you use two levels of macros. Let's add a new macro xstr to the example shown above:

#define xstr(s) str(s)
#define str(s) #s
#define foo 4
xstr (foo)

This expands to "4" , not "foo" . The reason for the difference is that the argument of xstr is expanded at prescan (because xstr doesn't specify stringification or concatenation of the argument). The result of prescan then forms the argument for str . str uses its argument without prescan because it performs stringification; but it can't prevent or undo the prescanning already done by xstr .

Cascaded Use of Macros

A cascade of macros occurs when one macro's body contains a reference to another macro (a very common practice). For example:

#define BUFSIZE 1020
#define TABLESIZE BUFSIZE

This isn't at all the same as defining TABLESIZE to be 1020 . The #define for TABLESIZE uses exactly the body you specify--in this case, BUFSIZE --and doesn't check to see whether it too is the name of a macro.

It's only when you use TABLESIZE that the result of its expansion is checked for more macro names.

This makes a difference if you change the definition of BUFSIZE at some point in the source file. TABLESIZE , defined as shown, will always expand using the definition of BUFSIZE that's currently in effect:

#define BUFSIZE 1020
#define TABLESIZE BUFSIZE
#undef BUFSIZE
#define BUFSIZE 37

Now TABLESIZE expands in two stages to 37 .

Newlines in Macro Arguments

Traditional macro processing carries forward all newlines in macro arguments into the expansion of the macro. This means that, if some of the arguments are substituted more than once, or not at all, or are out of order, newlines can be duplicated, lost, or moved around within the expansion. If the expansion consists of multiple statements, then the the line numbers of some of these statements can become distorted. The result can be incorrect line numbers in error messages or as displayed by a debugger.

The C preprocessor operating in ANSI C mode adjusts itself for multiple uses of an argument---the first use expands all the newlines, and subsequent uses of the same argument produce no newlines. But even in this mode, it can produce incorrect line numbering if arguments are used out of order, or are not used at all.

Here is an example illustrating this problem:

#define ignore_second_arg(a,b,c) a; c ignore_second_arg (foo (), ignored (), syntax error);

The syntax error triggered by the tokens syntax error results in an error message citing line four, even though the statement text comes from line five.

Inability to Define a Macro that Produces a # Character

You can't use the C preprocessor to define macros that produce # characters. For instance, the following has unexpected results:

#define linkmacro(numBytes) link #numBytes,a6

Note that you can use the # character inside a string or character constant, as shown here:

#define PrintSharp() printf("#")

Macro Arguments inside String Constants

The C preprocessor doesn't substitute macro arguments that appear inside string constants. For example, the following macro will produce the output "a" no matter what the argument a is:

#define foo(a) "a"

The -traditional option directs cc to handle such cases (among others) in the traditional non-ANSI way.


The GNU C Preprocessor

Previous | Contents | Next