home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The C Users' Group Library 1994 August
/
wc-cdrom-cusersgrouplibrary-1994-08.iso
/
vol_100
/
137_01
/
sep84col.ddj
< prev
next >
Wrap
Text File
|
1979-12-31
|
21KB
|
676 lines
.pl 61
.po 4
..
.. this article was prepared for the C/Unix Programmer's Notebook
.. Copyright 1984 Pyramid Systems, Inc.
..
.. by A. Skjellum
..
.he "C/Unix Programmer's Notebook" for September, 1984 DDJ.
Introduction
.. okay
Past columns have been devoted to topics about C and Unix as they
exist in the real world. In this column, I depart from this precedent by
discussing some proposals for increasing the power and flexibility of C.
We work in an evolving field where change is inevitable. Consequently,
many languages go through regular upgrades and improvements (eg Fortran IV,
Fortran 66, and Fortran 77). C has been more immune than other popular
languages. The lack of upgrades is probably due mainly to C's lack of
intrinsic functions, but clearly points to the basic elegance and power of
C.
.. okay
I am aware of only a few upgrades beyond the language definition
specified in The C Programming Language (K&R). These are enumerated types
and structure assignment. Both of these features were introduced with Unix
version 7 and are detailed in Unix v7 documentation. Structure assignment
is useful in what follows, but type enumeration will not be mentioned.
Therefore, while we base our definition on the Unix 7 version of C, this is
not likely to confuse those who have access only to The C Programming
Language.
.. okay
Purists may argue that I have no business recommending changes or
upgrades to C. Others may argue that many of the suggestions can be
implemented via compiler preprocessors or by function calls and need not be
part of the language. (This second point is discussed later in the
column.) In order to head off the critisism that I am 'tampering' with the
C language, I offer my recommendations as a new language grammar based on C
but called 'X.' The letter X was chosen to denote language extensibility,
which is the main point of the following proposals.
Language Extensibility
.. okay
Most languages allow user defined functions, subroutines and many
newer languages allow user defined data types. Extensible languages like
Forth and APL allow functions, operators, and data types to be added to
the programming environment in a way that makes them equivalent in stature
to predefined operations. While C retains tremendous flexibility by
excluding intrinsic functions, it does not allow user defined types to be
treated as easily as ints, longs or floats. Specifically, one cannot
extend the definitions of operators such as addition or multiplication to
new data types created with typedef. This means that function calls must
be used, which while a completely viable approach, lacks elegance. This
concept is illustrated in the following example.
.. okay
As part of a program, I need to define a data type called COMPLEX
which will function like Fortran's complex data type. This data type is
used for handling complex numbers of the form A + iB where i is the
imaginary unit and A and B are real numbers. This might be done with the
following definition:
.cp 6
typedef struct /* complex number type definition */
{
double _creal; /* real part */
double _cimag; /* imaginary part */
} COMPLEX;
.. okay
I will work with several variables of type COMPLEX (eg alpha and
beta) which are defined as follows:
COMPLEX alpha, beta; /* alpha and beta are complex #'s */
Up to this point, we have treated the complex data type equivalently to
built-in types. We can also work with pointers to or arrays of COMPLEX, so
there is no deficiency along these lines. However, to assign, add,
multiply, or subtract these COMPLEX variables, subroutines would have to be
invented. Subroutines for two representative operations are illustrated in
Figure I.
.pa
---------- FIGURE I. ----------
Assignment: alpha = A + iB; /* pseudo code */
Function:
calling sequence (K&R C): cassign(&alpha,A,B);
calling sequence (Unix 7 C): alpha = cassign(A,B);
function definition (K&R C):
cassign(comp,a,b)
COMPLEX *comp;
double a,b;
{
comp->_creal = a;
comp->_cimag = b;
}
function definition (Unix 7 C):
COMPLEX cassign(a,b)
double a,b;
{
COMPLEX temp; /* temporary variable */
temp._creal = a;
temp._cimag = b;
return(temp); /* return structure */
}
Addition: gamma = alpha + beta; /* pseudo code */
Function:
calling sequence (K&R C): cadd(&gamma,&alpha,&beta);
calling sequence (Unix 7 C): gamma = cadd(alpha,beta);
function definition (K&R C):
cadd(gamma,alpha,beta)
COMPLEX *gamma; /* destination */
COMPLEX *alpha; /* addend */
COMPLEX *beta; /* augend */
{
gamma->_creal = alpha->_creal + beta->_creal;
gamma->_cimag = alpha->_cimag + beta->_cimag;
}
.cp 10
function definition (Unix 7 C):
COMPLEX cadd(alpha,beta)
COMPLEX alpha,beta; /* addend, augend */
{
COMPLEX temp; /* temporary */
temp._creal = alpha._creal + beta._creal;
temp._cimag = alpha._cimag + beta._cimag;
return(temp);
}
---------- END FIGURE I. ----------
.. okay
The pseudo code presented with the subroutines in Figure I. is the
most convenient way to specify the operations desired. If the data types
had been intrinsic, we could have used similar real C statements in lieu of
subroutines. To utilize '+', '*' or other operators with the COMPLEX data
type we must introduce a mechanism for defining these operations.
Operators
.. okay
How could we specify new operations? For example how would we
define addition for the complex data type? The following type of
definition could be used to extend addition to the COMPLEX type:
COMPLEX oper `+`(alpha,beta) /* X grammar */
COMPLEX alpha,beta;
{
COMPLEX __temp; /* temporary */
__temp._creal = alpha._creal + beta._creal;
__temp._cimag = alpha._cimag + beta._cimag;
return(__temp); /* return result */
}
The keyword oper is new: oper indicates that the following definition is
for an operator. The return keyword used in function calls also appears
with a similar meaning. Since COMPLEX preceeds oper, this defines an
operation over the COMPLEX data type. Since there are two arguments
(alpha, beta), the operator is binary. Finally, note that the '+' sign is
enclosed in graven accents. Quoting by graven accents is chosen as a way
to distinguish operator names. We will see that quotation will not always
be needed.
.. okay
To use this new operator (and assuming that '=' had also been
defined,) the following statement could be used:
gamma = alpha + beta; /* add complex numbers */
Note that we have omitted the graven accents. Since the '+' can be
distinguished from keywords or identifiers in this context, quoting is not
required. The operator definition specified above gives the X compiler a
means to evaluate the addition request specified in the example statement.
The parser would break this statement down until it could pass an argument
garnered from the left and right of the addition operator, much as it does
with intrinsic operators and data types. Whether this results in a
subroutine call or in-line code would depend on the compiler's
implementation.
More on Operators
.. okay
Operators turn out to be a very powerful and useful concept. We
needn't limit ourselves to defining standard operations for new types.
There is nothing to stop the definition of arbitrary operators. A crude
facility already exists for this in C via the parameterized #define
statement. However, the above facility is more general and more consistent
with the syntax of C than the preprocessor #define approach. To encompass
the generation of inline code as provided by #define, we would also offer
the inline adjective, which could be used as follows:
COMPLEX inline oper `-`(alpha,beta) /* subtraction inline */
...
This keyword would instruct the compiler to generate inline code (as
opposed to a subroutine call) whenever possible. It's use is analogous to
the use of the register adjective: the compiler complies when feasible and
silently ignores the request when it cannot comply.
.. okay
In some cases, C definitions can be shortened when no ambiguity
exists (eg 'unsigned' instead of 'unsigned int'). Therefore, 'inline'
would replace 'inline oper' in actual practice. Furthermore, operators
would by default work on and return integers, as functions do by default.
Other Users for Operators
.. okay
In my view, operators would be used not only to define existing
operations over new data types, but also for specifying other operations
over new as well as existing data types. These new operators would
normally have alphanumeric names and would thus require quoting in graven
accents when they appear in expressions. For example, we define the
operation of NAND (negated and) for integers as follows (no graven accents
are required in the definition but are required in the below invocation):
int oper nand(a,b)
int a,b;
{
return(~(a & b));
}
To use this in an actual expression we would have to quote the nand:
c = a `nand` b;
Operator Hierarchy
.. okay
C already has a built-in hierarchy for known operations. The most
reasonable approach is to give user-defined operators the lowest priority.
This might require more parentheses, but seems logical.
Pointers to Operators
.. okay
C provides the facility to use pointers to functions. It could
potentially prove useful to have pointers to operators as well. A
function's address is specified by its name without trailing parentheses.
Unfortunately, operator names are used in this way to indicate the
operation they represent. In order to remove the ambiguity in requesting
the pointer, the operator name could be parenthesized (eg (+) or (`nand`)).
.. okay
Using pointers to operators implies that defined operations must
have subroutines associated with them. Thus truly inline operators could
have no pointers associated with them.
Dichotomy of Operators and Functions
.. okay
Functions and operators are almost the same thing. However, the
compiler must know if an operator is binary or unary. Therefore, its
definition must be available before use. On the other hand, arguments to C
functions are not checked for number or type. Therefore, we choose to
keep operators and functions separate, although there is nothing to prevent
operators using function calls.
.. okay
In order to avoid lexical conflicts, operator and function names
would have to be different. This is also desirable from a programming
viewpoint, in order to avoid confusion and errors.
Other Proposals
.. okay
With the addition of operators, the X grammar provides a much more
consistent programming environment than standard C. However, there are
some other points with deserve consideration. The first of these is
providing a means to handle subroutines with a variable number of
arguments. This is considered first.
Since C makes no assumptions about its function library, the user
is free to write his own, should the standard functions prove inadequate.
However, the user cannot properly handle functions with variable number of
arguments, as must be done by printf(), scanf() and their relatives. We
solve this problem by introducing a typing adjective called vec which is
short for vector. This adjective is used to indicate the the number of
arguments to the function is variable. For example, the ficticious
function my_printf() which allows variable arguments (and returns an
integer) would be defined as follows:
vec int my_printf(argcnt,argvec)
int argcnt;
char *argvec[];
{
/* code goes here */
}
.. okay
A function declared with vec always has two arguments: argcnt, argvec.
These variables are analogous to main()'s (argc,argv) pair. Before use, a
definition of the form:
vec my_printf();
would be included in each file where my_printf() is referenced. This
definition causes command line arguments to be processed normally: the
rightmost argument is pushed (placed on the stack) first, and the leftmost
last when code is generated. However, the two additional arguments argcnt
and argvec are also stacked. The argvec variable points to the stack
location where the first real argument is located. Since normal stacks are
push-down, this should provide the arguments in the correct order. argcnt
contains the number of arguments plus one to account for argvec. This
makes it completely analogous to argc. argvec always contains an address,
but this is not very useful if no arguments were specified in the function
call.
.. okay
To illustrate the stacking mechanism, imagine that we invoke
my_printf() as follows:
my_printf(arg1,arg2,arg3,arg4,arg5);
For this specific call, the stacking arrangment (excluding any special
register saves) would look as specified in Figure II. Note that argcnt is
six (five arguments) for this case, as described above.
.pa
---------- Figure II. ----------
Memory Layout for a variable argument function call
-------------------
| Low memory |
-------------------
| ... |
-------------------
| argcnt = 6 |
-------------------
| argvec = ADDR |
-------------------
ADDR: | arg1 |
-------------------
| arg2 |
-------------------
| arg3 |
-------------------
| arg4 |
-------------------
| arg5 |
-------------------
| ... |
-------------------
| High memory |
-------------------
---------- End Figure II. ----------
.. okay
It might be worthwhile to have variable argument calls, even if the
function were not declared as using this calling convention. To allow this,
we introduce the ellispis (...) concept into the argument string. If
my_printf() were not declared as vec, we could force variable argument
format as follows:
my_printf(arg1,arg2,arg3,arg4,arg5...);
Always including the ellipsis mark for this variety of call seems to
improve readability, but is not required in order that compatibility is
kept with current C usage.
Fixed Arguments
.. okay
The argvec variable always points to the first variable specified
on the command line. However, the function definition could still
explicitly declare a finite number of arguments which it may wish to
examine more directly. For example, if the first argument of my_printf()
were a control string, we could declare my_printf() as follows:
vec int my_printf(argcnt,argvec,control_string);
int argcnt;
char **argvec;
char *control_string;
Notice that contents of control_string would be meaningless if argcnt were
less than two.
.. okay
One final note about variable argument control is that it enhances
a function's ability to detect incorrect input. With reference to
printf(), Kernighan and Ritchie state: "A warning: printf uses its first
argument to decide how many arguments follow and what their types are. It
will get confused, if there are not enough arguments or if they are the
wrong type." If implemented with the vec arrangement, printf() could at
least know if it has been given the right number of arguments. It still
would not know if they were of the correct types.
Variable length automatic arrays
.. okay
Another element of the X grammar is the ability to declare
automatic arrays which possess variable length. Since stack displacements
are computed at each entry to a block, this only forces a computed size
allocation. At worst, a memory allocation mechanism must be tied into the
compiler. This latter restriction can be serious if C is used in a very
low level environment, such as in operating system development. So that
the use of this feature can be seen readily, we require the use of the var
adjective in conjunction with such definitions. For most purposes, it
offers a welcome enhancement. Where it is inappropriate, this feature
should be disabled via a compiler switch.
As a general example, we declare a variable length array in the
following routine:
/* declare an array of integers one larger than argument */
array_test(length)
int length;
{
var int test[length+1]; /* declare array */
...
}
A new looping structure
.. okay
Many loops are unconditional with breaks generated only from
within. Therefore it is often useful to have an unconditional looping
command. This avoids a lot of 'while(1)' sequences. This could be
implemented as follows:
.cp 7
loop
{
... code ...
}
replaces
while(1)
{
... code ...
}
Preprocessors and related comments
.. okay
Preprocessors could be used to implement several of the X features
mentioned above. The statements and expressions would be expanded by the
preprocessor into standard function calls. The preprocessor would also
provide subroutines from definitions, as needed. New data types could
certainly be handled in this way. However, changes to the C parser must be
made in order to handle the vec and var features. Trivial additions such
as loop can be handled with the existing C preprocessor.
.. okay
Some programmers may argue that no additions are needed since most
of the features outlined above can all be achieved through function calls.
In my view, the X grammar makes C more (and not less) consistent because it
allows both intrinsic and user-defined types to be handled in similarly. It
also allows greater portability by defining a means through which variable
argument functions can be handled uniformly. In summation, it turns C into
an extensible language while adding only a few new keywords.
Conclusion
.. okay
In this column, I have suggested an enhanced C grammar which was
denoted X to indicate extensibility. It is the (Unix 7) C language with
enhancements designed to allow the incorporation of user-specified
operators into programs. This should provide more flexible and consistent
reference to user-defined data types. Also mentioned were variable length
automatic arrays (var) and a mechanism for allowing variable argument
functions (vec). Finally, the use of preprocessors for implementing these
ideas was mentioned.
I look forward to any other ideas about X or enhanced C which may
be forthcoming from our readers.