home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
rtsi.com
/
2014.01.www.rtsi.com.tar
/
www.rtsi.com
/
OS9
/
TOP
/
USR
/
DOC
/
m4.doc
< prev
next >
Wrap
Text File
|
2009-11-06
|
20KB
|
1,386 lines
The M4 Macro Processor
Brian W. Kernighan
Dennis M. Ritchie
_A_B_S_T_R_A_C_T
M4 is a macro processor available on UNIX|-
and GCOS. Its primary use has been as a front end
for Ratfor for those cases where parameterless
macros are not adequately powerful. It has also
been used for languages as disparate as C and
Cobol. M4 is particularly suited for functional
languages like Fortran, PL/I and C since macros
are specified in a functional notation.
M4 provides features seldom found even in
much larger macro processors, including
o+ arguments
o+ condition testing
o+ arithmetic capabilities
- ii -
o+ string and substring functions
o+ file manipulation
This paper is a user's manual for M4.
July 1, 1977
-------------------------
|- UNIX is a trademark of Bell Laboratories.
The M4 Macro Processor
Brian W. Kernighan
Dennis M. Ritchie
_I_n_t_r_o_d_u_c_t_i_o_n
A macro processor is a useful way to enhance a program-
ming language, to make it more palatable or more readable,
or to tailor it to a particular application. The #define
statement in C and the analogous define in Ratfor are exam-
ples of the basic facility provided by any macro processor -
replacement of text by other text.
The M4 macro processor is an extension of a macro pro-
cessor called M3 which was written by D. M. Ritchie for the
AP-3 minicomputer; M3 was in turn based on a macro processor
implemented for [1]. Readers unfamiliar with the basic
ideas of macro processing may wish to read some of the dis-
cussion there.
M4 is a suitable front end for Ratfor and C, and has
also been used successfully with Cobol. Besides the
straightforward replacement of one string of text by
- 2 -
another, it provides macros with arguments, conditional
macro expansion, arithmetic, file manipulation, and some
specialized string processing functions.
The basic operation of M4 is to copy its input to its
output. As the input is read, however, each alphanumeric
``token'' (that is, string of letters and digits) is
checked. If it is the name of a macro, then the name of the
macro is replaced by its defining text, and the resulting
string is pushed back onto the input to be rescanned. Mac-
ros may be called with arguments, in which case the argu-
ments are collected and substituted into the right places in
the defining text before it is rescanned.
M4 provides a collection of about twenty built-in mac-
ros which perform various useful operations; in addition,
the user can define new macros. Built-ins and user-defined
macros work exactly the same way, except that some of the
built-in macros have side effects on the state of the pro-
cess.
_U_s_a_g_e
On UNIX, use
m4 [files]
Each argument file is processed in order; if there are no
arguments, or if an argument is `-', the standard input is
read at that point. The processed text is written on the
- 3 -
standard output, which may be captured for subsequent pro-
cessing with
m4 [files] >outputfile
On GCOS, usage is identical, but the program is called ./m4.
_D_e_f_i_n_i_n_g _M_a_c_r_o_s
The primary built-in function of M4 is define, which is
used to define new macros. The input
define(name, stuff)
causes the string name to be defined as stuff. All subse-
quent occurrences of name will be replaced by stuff. name
must be alphanumeric and must begin with a letter (the
underscore _ counts as a letter). stuff is any text that
contains balanced parentheses; it may stretch over multiple
lines.
Thus, as a typical example,
define(N, 100)
...
if (i > N)
defines N to be 100, and uses this ``symbolic constant'' in
a later if statement.
The left parenthesis must immediately follow the word
define, to signal that define has arguments. If a macro or
- 4 -
built-in name is not followed immediately by `(', it is
assumed to have no arguments. This is the situation for N
above; it is actually a macro with no arguments, and thus
when it is used there need be no (...) following it.
You should also notice that a macro name is only recog-
nized as such if it appears surrounded by non-alphanumerics.
For example, in
define(N, 100)
...
if (NNN > 100)
the variable NNN is absolutely unrelated to the defined
macro N, even though it contains a lot of N'_s.
Things may be defined in terms of other things. For
example,
define(N, 100)
define(M, N)
defines both M and N to be 100.
What happens if N is redefined? Or, to say it another
way, is M defined as N or as 100? In M4, the latter is true
- M is 100, so even if N subsequently changes, M does not.
This behavior arises because M4 expands macro names
into their defining text as soon as it possibly can. Here,
that means that when the string N is seen as the arguments
- 5 -
of define are being collected, it is immediately replaced by
100; it's just as if you had said
define(M, 100)
in the first place.
If this isn't what you really want, there are two ways
out of it. The first, which is specific to this situation,
is to interchange the order of the definitions:
define(M, N)
define(N, 100)
Now M is defined to be the string N, so when you ask for M
later, you'll always get the value of N at that time
(because the M will be replaced by N which will be replaced
by 100).
_Q_u_o_t_i_n_g
The more general solution is to delay the expansion of
the arguments of define by _q_u_o_t_i_n_g them. Any text sur-
rounded by the single quotes ` and ' is not expanded immedi-
ately, but has the quotes stripped off. If you say
define(N, 100)
define(M, `N')
the quotes around the N are stripped off as the argument is
being collected, but they have served their purpose, and M
is defined as the string N, not 100. The general rule is
- 6 -
that M4 always strips off one level of single quotes when-
ever it evaluates something. This is true even outside of
macros. If you want the word define to appear in the out-
put, you have to quote it in the input, as in
`define' = 1;
As another instance of the same thing, which is a bit
more surprising, consider redefining N:
define(N, 100)
...
define(N, 200)
Perhaps regrettably, the N in the second definition is
evaluated as soon as it's seen; that is, it is replaced by
100, so it's as if you had written
define(100, 200)
This statement is ignored by M4, since you can only define
things that look like names, but it obviously doesn't have
the effect you wanted. To really redefine N, you must delay
the evaluation by quoting:
define(N, 100)
...
define(`N', 200)
In M4, it is often wise to quote the first argument of a
macro.
- 7 -
If ` and ' are not convenient for some reason, the
quote characters can be changed with the built-in change-
quote:
changequote([, ])
makes the new quote characters the left and right brackets.
You can restore the original characters with just
changequote
There are two additional built-ins related to define.
undefine removes the definition of some macro or built-in:
undefine(`N')
removes the definition of N. (Why are the quotes absolutely
necessary?) Built-ins can be removed with undefine, as in
undefine(`define')
but once you remove one, you can never get it back.
The built-in ifdef provides a way to determine if a
macro is currently defined. In particular, M4 has pre-
defined the names unix and gcos on the corresponding sys-
tems, so you can tell which one you're using:
ifdef(`unix', `define(wordsize,16)' )
ifdef(`gcos', `define(wordsize,36)' )
makes a definition appropriate for the particular machine.
- 8 -
Don't forget the quotes!
ifdef actually permits three arguments; if the name is
undefined, the value of ifdef is then the third argument, as
in
ifdef(`unix', on UNIX, not on UNIX)
_A_r_g_u_m_e_n_t_s
So far we have discussed the simplest form of macro
processing - replacing one string by another (fixed) string.
User-defined macros may also have arguments, so different
invocations can have different results. Within the replace-
ment text for a macro (the second argument of its define)
any occurrence of $n will be replaced by the n_t_h argument
when the macro is actually used. Thus, the macro bump,
defined as
define(bump, $1 = $1 + 1)
generates code to increment its argument by 1:
bump(x)
is
x = x + 1
A macro can have as many arguments as you want, but
only the first nine are accessible, through $1 to $9. (The
- 9 -
macro name itself is $0, although that is less commonly
used.) Arguments that are not supplied are replaced by null
strings, so we can define a macro cat which simply concaten-
ates its arguments, like this:
define(cat, $1$2$3$4$5$6$7$8$9)
Thus
cat(x, y, z)
is equivalent to
xyz
$4 through $9 are null, since no corresponding arguments
were provided.
Leading unquoted blanks, tabs, or newlines that occur
during argument collection are discarded. All other white
space is retained. Thus
define(a, b c)
defines a to be b c.
Arguments are separated by commas, but parentheses are
counted properly, so a comma ``protected'' by parentheses
does not terminate an argument. That is, in
define(a, (b,c))
there are only two arguments; the second is literally (b,c).
- 10 -
And of course a bare comma or parenthesis can be inserted by
quoting it.
_A_r_i_t_h_m_e_t_i_c _B_u_i_l_t-_i_n_s
M4 provides two built-in functions for doing arithmetic
on integers (only). The simplest is incr, which increments
its numeric argument by 1. Thus to handle the common pro-
gramming situation where you want a variable to be defined
as ``one more than N'', write
define(N, 100)
define(N1, `incr(N)')
Then N1 is defined as one more than the current value of N.
The more general mechanism for arithmetic is a built-in
called eval, which is capable of arbitrary arithmetic on
integers. It provides the operators (in decreasing order of
precedence)
unary + and -
** or ^ (exponentiation)
* / % (modulus)
+ -
== != < <= > >=
! (not)
& or && (logical and)
| or || (logical or)
Parentheses may be used to group operations where needed.
- 11 -
All the operands of an expression given to eval must ulti-
mately be numeric. The numeric value of a true relation
(like 1>0) is 1, and false is 0. The precision in eval is
32 bits on UNIX and 36 bits on GCOS.
As a simple example, suppose we want M to be 2**N+1.
Then
define(N, 3)
define(M, `eval(2**N+1)')
As a matter of principle, it is advisable to quote the
defining text for a macro unless it is very simple indeed
(say just a number); it usually gives the result you want,
and is a good habit to get into.
_F_i_l_e _M_a_n_i_p_u_l_a_t_i_o_n
You can include a new file in the input at any time by
the built-in function include:
include(filename)
inserts the contents of filename in place of the include
command. The contents of the file is often a set of defini-
tions. The value of include (that is, its replacement text)
is the contents of the file; this can be captured in defini-
tions, etc.
It is a fatal error if the file named in include cannot
be accessed. To get some control over this situation, the
- 12 -
alternate form sinclude can be used; sinclude (``silent
include'') says nothing and continues if it can't access the
file.
It is also possible to divert the output of M4 to tem-
porary files during processing, and output the collected
material upon command. M4 maintains nine of these diver-
sions, numbered 1 through 9. If you say
divert(n)
all subsequent output is put onto the end of a temporary
file referred to as n. Diverting to this file is stopped by
another divert command; in particular, divert or divert(0)
resumes the normal output process.
Diverted text is normally output all at once at the end
of processing, with the diversions output in numeric order.
It is possible, however, to bring back diversions at any
time, that is, to append them to the current diversion.
undivert
brings back all diversions in numeric order, and undivert
with arguments brings back the selected diversions in the
order given. The act of undiverting discards the diverted
stuff, as does diverting into a diversion whose number is
not between 0 and 9 inclusive.
The value of undivert is _n_o_t the diverted stuff.
Furthermore, the diverted material is _n_o_t rescanned for
- 13 -
macros.
The built-in divnum returns the number of the currently
active diversion. This is zero during normal processing.
_S_y_s_t_e_m _C_o_m_m_a_n_d
You can run any program in the local operating system
with the syscmd built-in. For example,
syscmd(date)
on UNIX runs the date command. Normally syscmd would be
used to create a file for a subsequent include.
To facilitate making unique file names, the built-in
maketemp is provided, with specifications identical to the
system function _m_k_t_e_m_p: a string of XXXXX in the argument is
replaced by the process id of the current process.
_C_o_n_d_i_t_i_o_n_a_l_s
There is a built-in called ifelse which enables you to
perform arbitrary conditional testing. In the simplest
form,
ifelse(a, b, c, d)
compares the two strings a and b. If these are identical,
ifelse returns the string c; otherwise it returns d. Thus
we might define a macro called compare which compares two
strings and returns ``yes'' or ``no'' if they are the same
- 14 -
or different.
define(compare, `ifelse($1, $2, yes, no)')
Note the quotes, which prevent too-early evaluation of
ifelse.
If the fourth argument is missing, it is treated as
empty.
ifelse can actually have any number of arguments, and
thus provides a limited form of multi-way decision capabil-
ity. In the input
ifelse(a, b, c, d, e, f, g)
if the string a matches the string b, the result is c. Oth-
erwise, if d is the same as e, the result is f. Otherwise
the result is g. If the final argument is omitted, the
result is null, so
ifelse(a, b, c)
is c if a matches b, and null otherwise.
_S_t_r_i_n_g _M_a_n_i_p_u_l_a_t_i_o_n
The built-in len returns the length of the string that
makes up its argument. Thus
len(abcdef)
is 6, and len((a,b)) is 5.
- 15 -
The built-in substr can be used to produce substrings
of strings. substr(s, i, n) returns the substring of s that
starts at the i_t_h position (origin zero), and is n charac-
ters long. If n is omitted, the rest of the string is
returned, so
substr(`now is the time', 1)
is
ow is the time
If i or n are out of range, various sensible things happen.
index(s1, s2) returns the index (position) in s1 where
the string s2 occurs, or -1 if it doesn't occur. As with
substr, the origin for strings is 0.
The built-in translit performs character translitera-
tion.
translit(s, f, t)
modifies s by replacing any character found in f by the
corresponding character of t. That is,
translit(s, aeiou, 12345)
replaces the vowels by the corresponding digits. If t is
shorter than f, characters which don't have an entry in t
are deleted; as a limiting case, if t is not present at all,
characters from f are deleted from s. So
- 16 -
translit(s, aeiou)
deletes vowels from s.
There is also a built-in called dnl which deletes all
characters that follow it up to and including the next new-
line; it is useful mainly for throwing away empty lines that
otherwise tend to clutter up M4 output. For example, if you
say
define(N, 100)
define(M, 200)
define(L, 300)
the newline at the end of each line is not part of the
definition, so it is copied into the output, where it may
not be wanted. If you add dnl to each of these lines, the
newlines will disappear.
Another way to achieve this, due to J. E. Weythman, is
divert(-1)
define(...)
...
divert
_P_r_i_n_t_i_n_g
The built-in errprint writes its arguments out on the
standard error file. Thus you can say
- 17 -
errprint(`fatal error')
dumpdef is a debugging aid which dumps the current
definitions of defined terms. If there are no arguments,
you get everything; otherwise you get the ones you name as
arguments. Don't forget to quote the names!
_S_u_m_m_a_r_y _o_f _B_u_i_l_t-_i_n_s
Each entry is preceded by the page number where it is
described.
- 18 -
3 changequote(L, R)
1 define(name, replacement)
4 divert(number)
4 divnum
5 dnl
5 dumpdef(`name', `name', ...)
5 errprint(s, s, ...)
4 eval(numeric expression)
3 ifdef(`name', this if true, this if false)
5 ifelse(a, b, c, d)
4 include(file)
3 incr(number)
5 index(s1, s2)
5 len(string)
4 maketemp(...XXXXX...)
4 sinclude(file)
5 substr(string, position, number)
4 syscmd(s)
5 translit(str, from, to)
3 undefine(`name')
4 undivert(number,number,...)
_A_c_k_n_o_w_l_e_d_g_e_m_e_n_t_s
We are indebted to Rick Becker, John Chambers, Doug
McIlroy, and especially Jim Weythman, whose pioneering use
of M4 has led to several valuable improvements. We are also
- 19 -
deeply grateful to Weythman for several substantial contri-
butions to the code.
_R_e_f_e_r_e_n_c_e_s
[1] B. W. Kernighan and P. J. Plauger, _S_o_f_t_w_a_r_e _T_o_o_l_s,
Addison-Wesley, Inc., 1976.