|
Volume Number: | 9 | |
Issue Number: | 12 | |
Column Tag: | Pascal workshop |
Pascal Programmer’s Guide
to Understanding ‘C’
Teach yourself to read another language - Part I
By Ken Gladstone, MacTech Magazine Technical Editor
Most of the articles that we receive here at MacTech Magazine use source code that is written in C. We have had a number of readers request that we either publish more articles that use Pascal, or that we provide Pascal translations of our source code listings. Since we can’t force authors to use Pascal, and since we really don’t want to translate every piece of C code that we publish, our reply has always been “You should be reading the articles to learn the programming concepts, and it shouldn’t matter what language we use in the listings. (Or you should submit more Pascal-based articles!)” Well, we have had several readers respond saying that this would be fine, except that they are unable to read C - a valid complaint.
With this article, I aim to remove that as a valid complaint. This article is designed to teach Pascal programmers how to understand C code listings. Although since my Pascal is a little rusty and I now program primarily a C, perhaps I should have written an article entitled “The ‘C’ Programmer’s Guide to Remembering Pascal!” This article is in no way intended to be a complete C reference, but is simply intended to help you Pascal programmers understand the C listings that you see in our magazine or elsewhere.
Let me provide a little information on the history of C and some of its incarnations. The original Bible for C programmers is a book called “The C Programming Language” by Brian Kernighan and Dennis Ritchie, published by Prentice-Hall. The book is often refered to as “Kernighan and Ritchie,” or simply “K&R.” If after reading this article you want to see the true original C reference, I suggest you pick up a copy of K&R. Since that time, the C language has been expanded and standardized, resulting in ANSI C. In addition to supporting ANSI C, most current compilers have their own extentions to the language. At times, I will refer to Mac specific aspects of the C language. So, let’s begin:
COMMENTS
We’ll start with comments because they are easy (and also because K&R start with them in the reference section of their book).
/* 1 */ /* A forward slash followed by a star signifies the beginning of a C comment. Comments can span multiple lines, and are ended by a star and another slash */
There is a second form of comment that is not in K&R C, but that is supported in many C compilers, including MPW and Think C. Two forward slashes signify a comment that continues until the end of the line. Here are three uses of that style comment:
// 2 // A one line comment LineOfCode(ThisIs); // Comment at end of a line of code // LetsCommentOut += ThisLineOfCode;
IDENTIFIERS
Both C and Pascal have similar rules about naming identifiers: you can use letters, digits and the underscore character, and the first character cannot be a digit. But while Pascal identifiers are not case sensitive, C identifiers are: variable, Variable and VARIABLE are three different identifiers. In fact, in C pretty much everything is case sensitive.
OPERATORS
Operators can be a great source of confusion because both C and Pascal use many of the same special characters, but with vastly different meanings. The following table summarizes the operators. I have highlighted the ones that I feel may cause Pascal programmers the most confusion in reading C. The operators are presented in K&R’s reference order:
C Operator Pascal Equivalent Description
struc.field struc.field Obtain a field of a record (structure).
ptr->field ptr^.field Obtain a field of a pointer to a record.
* ptr ptr^ Dereference a pointer
& variable @ variable Return pointer to (address of) a variable
- expr - expr Unary negation
! expr NOT expr Logical negation
~ expr BNOT( expr) Bitwise negation (one’s complement)
++ var var := var + 1 Pre-increment (see notes below)
-- var var := var - 1 Pre-decrement (see below)
var ++ var := var + 1 Post-increment (see below)
var -- var := var - 1 Post-decrement (see below)
(type)expr type(expr) C: Type casting, Pascal: type coercion
sizeof expr Sizeof( expr ) C: sizeof operator, Pascal: Sizeof function
sizeof( type ) Sizeof( type ) C: sizeof operator, Pascal: Sizeof function
e1 * e2 e1 * e2 Mutiplication
e1 / e2 e1 / e2 ( e1 DIV e2 ) Division (see below)
e1 % e2 e1 MOD e2 Remainder (modulo operator)
e1 + e2 e1 + e2 Addition
e1 - e2 e1 - e2 Subtraction
e1 << e2 BSL(e1,e2) C: Left shift operator, Pascal: left shift function
e1 >> e2 BSR(e1,e2) Right shift
e1 < e2 e1 < e2 Relational “less than” comparison
e1 > e2 e1 > e2 Relational “greater than” comparison
e1 <= e2 e1 <= e2 Relational “less than or equal” comparison
e1 >=e2 e1 >= e2 Relational “greater than or equal” comparison
e1 == e2 e1 = e2 Equal to comparison
e1 != e2 e1 <> e2 Not equal to comparison
e1 & e2 BAND( e1, e2) Bitwise “and”
e1 ^ e2 BXOR( e1, e2) Bitwise “exclusive or”
e1 | e2 BOR( e1, e2) Bitwise “or”
e1 && e2 e1 & e2 Logical “and”
e1 || e2 e1 | e2 Logical “or”
e1 ? e2 : e3 see below The ternary “conditional” operator
var = expr var := expr Simple assignment
var += expr var := var + expr Add then assign
var -= expr var := var - expr Subtract then assign
var *= expr var := var * expr Multiply then assign
var /= expr var := var / expr Divide then assign
var >>= expr var := BSR(var,expr); Shift right then assign
var <<= expr var := BSL(var,expr); Shift left then assign
var &= expr var := BAND(var,expr); Bitwise “and” then assign
var ^= expr var := BXOR(var,expr); Bitwise “exclusive or” then assign
var |= expr var := BOR(var,expr); Bitwise “or” then assign
expr , expr see below Comma operator
Some important things to note: In C, the single equal sign is used for assignment, while in Pascal, the single equal sign is used for “equal to” comparison. C uses two equal signs for “equal to” comparison. In C, the single amperand ‘&’ is used both for the bitwise “and” operation (a binary operator) and for determining the address of a variable (a unary operator). In C, the asterisk ‘*’ is used both for multiplication (binary operator) and for dereferencing a pointer (unary operator). In C, parens are used both for determining order of operations, and also for type coercion (called casting in C). There are several C operators that translate to functions in Pascal (bitwise negation, sizeof, bit shifting, etc.). C does not have a built-in exponentiation operator to correspond to Pascal’s ‘**’ operator.
Unlike Pascal, C does not have two separate division operators. It performs integer division if both operators are of intergral types.
My table doesn’t tell the whole story of the very useful pre- and post-increment (and decrement) operators. These operators are probably best understood with an example. So here is a fragment of C code:
/* 3 */ a = ++b; c = d++;
The Pascal equivalent is:
{4} b := b + 1; a := b; c := d; d := d + 1;
Both the pre and post increment operators will increment the given variable. The difference is when. The pre-increment operator increments the variable before using the variable’s value. The post-increment operator uses the existing value and then increments the variable.
The table doesn’t explain the C ternary “conditional” operator, ‘a ? b : c’. If the first operand evaluates to non-zero, then the “conditional” operator evaluates to its second operand, otherwise it evaluates to its third operand. This operator is also probably easiest to understand by example. Here is some C code:
/* 5 */ result = a ? b : c;
And the Pascal equavalent:
{ 6 } IF a<>0 THEN result := b ELSE result := c;
The “comma” operator is another unusual one in C. It is used to group expressions (where you would usually expect to see only one expression) and it evaluates to the value of the right expression. Once again, here is a C example (calling a function that takes three parameters):
/* 7 */ result = myFunction( a, (b=3, b+2), c);
and the Pascal equvalent:
{ 8 } b := 3; result: = myFunction( a, b + 2, c );
Both of these code fragments end out assigning 3 to b, and end out passing 5 as the second parameter to myFunction.
CONSTANTS
A sequence of digits generally signifies a decimal constant. If the sequence begins with a 0 (zero digit), then the constant is octal. If the sequence begins with 0x or 0X, then the constant is hexadecimal, and can also contain the letters a-f or A-F. If the sequence ends in the letter ell (l or L), the constant is an explicit long integer. For example 123L is of C type long (Pascal type longint).
A character enclosed in single quotes represents a constant of the ACSII value of the character. For example, '!' is the same as 33. Certain special character constants can be represented with the following escape sequences.
Sequence Meaning
\n The newline character (character 10)
\t The tab character (character 9)
\b The backspace character (character 8)
\r Carriage return (character 13)
\f Form feed (character 12)
\\ Backslash
\' Single quote
\ddd The octal constant represented by the 1, 2 or 3 digits ddd.
PROGRAM STRUCTURE
Unlike Pascal, C does not differentiate between procedures and functions - in C, everything is a function. But C functions are not required to return values, so a C function that does not return a value is like a Pascal procedure. C uses the keyword void as the return type for a function that does not return a value. C also uses the keyword void as the parameter list for functions which take no parameters. Unlike Pascal, C does not allow nested functions. All C functions are at the same level.
C does not have an equivalent to the Pascal PROGRAM keyword. Instead, C knows where to start executing by looking for a function called main. main is not a reserved C keyword, it is just a C compiler convention to generate code that starts by executing a function called main. Pascal uses BEGIN and END to create a block of statements (a compound statement). C uses curly braces, so ‘{’ is equivalent to BEGIN and ‘}’ is equivalent to END.
Pascal only needs semicolons between two statements - it doesn’t need one on the last statement of a block (compound statement), nor in constructs with just a single statement. C needs a semicolon after every statement. However, C does not put a semicolon after the end of a block, nor does it put a period at the end of a unit. C compilers just keep parsing a source files until reaching the EOF.
Using the preceeding rules, here is an example of a C function with no return value, and its Pascal equivalent procedure. First the C version:
/* 9 */ void myProc( long myFirstParam, char mySecondParam ) { /* Here we have some code that does something */ }
Now the Pascal version:
{ 10 } procedure myProc(myFirstParam: LONGINT, mySecondParam: CHAR); BEGIN (* Here we have some code that does something *) END;
Pascal returns its function values by assigning a value to the function name. C returns a value by using its return statement. Here is an example of a C function that takes no parameters and that returns a double precision floating point value, and then its Pascal equivalent:
/* 11 */ double myFunc( void ) /* C version */ { return 3.14; }
{12} function myFunc : DOUBLE; (* Pascal Version *) BEGIN myFunc := 3.14 END;
VARIABLE DECLARATIONS AND SCOPE
C makes a distinction between declaring and defining a variable. A declaration describes the characteristics of a variable, but may or may not create the actual storage for the variable. A definition will alway create the actual storage. While Pascal separates its declarations into type, const and var sections, C specifies this information individually for each declaration. C declarations can occur either within the body of an individual function, or outside of any function. Generally, a variable declared inside a function can only be seen within that function, and only uses storage space while the program is executing the function. Generally, a variable declared outside of any function lives for the entire execution of the program, and is visible everywhere. But C provides some modifiers that alter those rules. Here is an example C variable declaration to show the various parts of a declaration:
static unsigned long myLongVar[2][3], * myPointerVar = & myLongVar[1][0];
The word static is a “storage class” specifier. Storage class keywords are optional, and tell the compiler such things as where the variable actually lives and the width of its scope. The unsigned and long keywords are “type” specifiers. They tell the compiler the type of the variable. myLongVar and myPointerVar are the comma-separated list of names of variables to declare. The [2][3] after myLongVar signifies that we are declaring a 2 by 3 array (with indexes starting at zero) of unsigned longs. The star before myPointerVar means that the variable is actually of type pointer to the base type unsigned long integer, and the = & myLongVar[1][0] is an initializer that assigns an initial value to the variable. In this case myPointerVar is initialized to the address of one of the elements of myLongVar (i.e. initialized to point to myLongVar[1][0]).
Let’s start with the various “storage classes.” The following table describes the storage classes (and ANSI and Apple extension type qualifiers):
Keyword Description
auto auto is the implied type for variables that are declared within the body of a function. And since it is implied, you will probably never see the keyword actually used. It means that the variable is created on the stack each time the function is called, and that it vanishes each time the function is exited. The variable uses no storage at any other time, and cannot been seen outside the scope of the enclosing function. A recursive function with an auto variable would have a separate instance of the variable for each level of recursive depth.
register register variables are a variation of auto variables. They have the same scope and lifespan, but the register keyword requests that the compiler try to store the variable in a register (for faster access) instead of on the stack. Each compiler has its own rules about the number and type of variables that can be placed in registers, and compilers generally ignore the register keyword if it is used for too many variables, or for variables that are too large to store in a register.
volatile This is an ANSI extension. Many compilers’ optimizers will automatically try to place auto variables in registers. The volatile keyword instructs the optimizer not to do this, but to store the variable on the stack.
static The static keyword has two uses: First, a static variable defined inside a function is local to that function, but retains its value (and uses non-stack storage) throughout the life of the program. A recursive function that contains a static variable would only ever have a single instance of the variable. Second, a static variable defined outside of a function body has its scope limited to the file that contains it - it cannot be seen or used by other source files. Similarly, the static keyword can be applied to function names to prevent them from being seen or called by other files.
const This is an ANSI extension. The const qualifier is used to specify that the given identifier is a constant that cannot be changed, and is similar to identifiers declared in a Pascal const section. It modifies the word that follows it, so it can be used in various ways:
// 13 const int myInt = 5; /* myInt can’t change value */ // The following 3 examples are pointers // to characters const char *p; /* p can change, but what it points to can't */ char *const p = "Hi"; /* p can't change, but what it points to can */ const char *const p; /* Neither can be changed */
extern The extern keyword is used to declare (without allocating any storage) the characteristics of a variable which is actually defined (with allocated storage) elsewhere. The most common use is to define a variable in one file (outside of the body of any function), and then to have an external declaration of that variable in a second file. In this way, code in multiple files can share variables without passing them as function parameters. This declaration concept can be applied to functions as well as variables. When a function is declared this way, the result is commonly referred to as a “function prototype.” This usage is similar to Pascal’s FORWARD directive.
typedef typedef is not used for declaring variables - it is used for defining types (similar to identifiers in a Pascal type section). So for typedefs, the name given at the end of the declaration is the desired name for the type instead of the desired name for a variable. Like extern, typedef does not create any storage.
pascal This is an Apple extension. The pascal keyword is used with function declarations, and is used for allowing C code to call Pascal code, and vice versa. The pascal keyword tells the compiler that the given function uses Pascal’s calling conventions. It can be used for declaring external functions that are written in Pascal (such as all the Macintosh toolbox calls), or it can be used for functions that are written in C but that will be called from a Pascal file. One of the differences between C and Pascal function calling is that C pushes function arguments onto the stack from right to left, and Pascal pushes function arguments from left to right.
After the optional storage class specifiers, the next component of a declaration is the “type” section. The following table describes C’s various standard variable types:
C Type Pascal Equivalent Description
int integer or longint Signed integer variable type. This is supposed to be the “natural” integer size for the CPU. MPW C ints are 4 bytes, Think C ints can be either 2 or 4 bytes, selected by compiler option.
char char One byte character variable type. Note that C chars are signed (-128..127) but that Pascal chars are unsigned (0..255). See unsigned keyword below.
short integer 2 byte signed integer variable type (on the Mac).
long longint 4 byte signed integer variable type (on the Mac). Can also be used in conjunction with the word double to signify an even longer float type (see extended below).
comp comp, computational Apple Extension. An 8 byte SANE signed integer.
float real, single 4 byte floating point variables (on the Mac).
double double 8 byte floating point variables (on the Mac).
extended extended Apple Extension. It means the same as long double, and signifies either a 10 or 12 byte floating point value (depending upon whether 68881 compiling is turned on).
struct record Used for variables with multiple fields. The field declarations are enclosed by curly braces, and each field declaration looks like a regular variable declaration.
union variant record Used for variables with a choice of types (see below). These look like structures, and are accessed in the same way, but are quite different.
enum TYPE=(val1,val2, ) Enumerated type.
unsigned This modifier is used to specify unsigned versions of ints, chars, shorts, and longs. For example, an unsigned char is a one byte unsigned character variable, equivalent to a Pascal char. If used by itself, it means unsigned int.
signed This modifier is used to signify signed versions of the integer types. This is the default in most compilers.
void This type has a few uses. When used as the return type for a function, it signifies that the function does not return any value (similar to a Pascal procedure). When used as the parameter list of a function, it signifies that the function does not take any parameters. When used with a star (void *) it signifies a generic pointer type that can point to any base type.
Now that we’ve covered the elements of declarations, and since tables and syntax descriptions can only go so far, here is an example that demonstrates many of these “variable declaration” concepts in action. The example consists of two (very contrived) files that show a bunch of common C constructs.
/* 14 */ /*************** Start of File1.c *****************/ /* * This file starts by defining some global variables * and declaring some new variable types. */ int globalToTheWorld = 10; // Any code can access this var static int globalToThisFile = 1; // Only for code in this file /* * This declaration does not create any storage nor any * variables. It just defines a new 'uchar' variable type. */ typedef unsigned char uchar; // 'uchar' is a new type /* * Define a structure variable (like a Pascal record). * The structure consists of two fields. */ struct { // Fields are enclosed by curly braces int anInteger; // This is one field int * pointerToAnInteger; // This is another field } myRecordVar; // This is the variable /* * It is common to combine a 'typedef' with a 'struct' to * define a new variable type that is a structure (record). */ typedef struct { int anInt; uchar aUchar; } myStructType; // No storage, just a type /* * Declare a function that lives 'extern' elsewhere * (in this case, in the Macintosh toolbox). * Since there is no code here, the 'extern' keyword * is implied and is not actually necessary. This * line is generally called a function prototype, and * it is similar to a pascal FORWARD directive. * the 'pascal' keyword tells the 'C' compiler to use * Pascal function calling rules for this function. * You actually won't have to prototype the toolbox * routines, as you'll see when we discuss the * C preprocessor. */ extern pascal void SysBeep( short Duration ); /* * Now that we have some types and some global variables, * here is some code. This first function has the static * keyword, and hence cannot be called from other files. * 'void' signifies that it takes no parameters. */ static float FunctionForThisFileOnly( void ) { int anAutoIntOnTheStack; // an implied auto, on the stack register int inARegisterForSpeed = 2; // This next variable retains its value from call to call static int howManyCalls = 0; myStructType aStructure; myStructType * ptrToStruct; void * ptrToAnything = & globalToTheWorld; // each time we're called, increment this count ++ howManyCalls; ptrToStruct = & aStructure; // ptr now points to the struct aStructure.anInt = 7; // We can access members // from the var... ptrToStruct->aUchar = 'c'; // ...or from a ptr to the var. // A pointer can be assigned the address of a variable myRecordVar.pointerToAnInteger = & anAutoIntOnTheStack; /* * A star defererences a pointer, so this statement * actually assigns 7 to the 'anAutoIntOnTheStack' var. */ * myRecordVar.pointerToAnInteger = 7; /* * The next two statements are like the last two, but note * that to access what a void pointer points to, you must * first cast (coerce) it to a valid type. In this case, * we cast it to be a pointer to an integer, and then * dereference it. */ ptrToAnything = & anAutoIntOnTheStack; * (int *)ptrToAnything = 6; inARegisterForSpeed = globalToThisFile + globalToTheWorld - howManyCalls; anAutoIntOnTheStack = inARegisterForSpeed * 2; // The function evaluates to (returns) this: return anAutoIntOnTheStack / 2.0; } /* * This second function is not static, * so it can be called from other files. * the two 'void' keywords signify that it * returns no value (like a Pascal procedure) * and that it takes no parameters. */ void GlobalDoubleBeep( void ) { SysBeep( 1 ); SysBeep( 1 ); } /****************** End of File1.c ************************/
/* 15 */ /******************** Start of File2.c ********************/ /* * Because of the 'extern' keyword here, * this declaration does not create any storage, * it just allows code in this file to use a variable * that is actually defined in another file (file1.c). */ extern int globalToTheWorld; /* * This is a function prototype, like in file1.c, * but here I've left off the implied 'extern'. */ void GlobalDoubleBeep( void ); static float LocalFunction( void ) { /* * A union variable looks like a structure variable, * and it is accessed in the same manner. But instead * of containing enough storage for ALL of its fields, * it can only contain one of them at any given time. * Its size is determined by the size of its largest * field. Generally, something else in the code keeps * track of what kind of value is currently stored in * a given union variable. This is similar to having a * variant record (a case inside a record) in Pascal. */ union { float CanBeFloat; long CanBeLongInt; char CanBeChar; } schizophrenia; enum {black, brown, red, orange, yellow, green, blue, violet, grey, white} color; /* * The numbers in curly braces are initial * values for this array. And since no size * for the array is provided in the square * brackets, the compiler uses the initializer * list to set the array size -- in this case 3. */ int Array[] = { 7, 12, 6 }; /* * Now that we've declared our variables, here's the code: */ Array[0] = 2; // An array of size 3 uses... Array[1] = 5; // indexes 0, 1, and 2. Array[2] = 17; color = green; ++ globalToTheWorld; // Use a variable from another file GlobalDoubleBeep(); // Use a function from another file schizophrenia.CanBeFloat = 1.234; /* * If we assign to a different field of the union, we * wipe out the CanBeFloat value that we just put in. */ schizophrenia.CanBeLongInt= 123456789; /* * Here we're interpreting a long as a float, * so the result is basically meaningless! */ return schizophrenia.CanBeFloat; } /******************** End of File2.c **********************/
I’m out of space - we’ll continue next month!
- SPREAD THE WORD:
- Slashdot
- Digg
- Del.icio.us
- Newsvine