home *** CD-ROM | disk | FTP | other *** search
- .ta .5i 1i 1.5i 2i 2.5i 3i 3.5i 4i
- .ul
- 1. Introduction
- .et
- C is a computer language based on the earlier language B [1].
- The languages and their compilers differ in two
- major ways:
- C introduces the notion of types, and defines
- appropriate extra syntax and semantics;
- also, C on the \s8PDP\s10-11 is a true compiler, producing
- machine code where B produced interpretive code.
- .pg
- Most of the software for the \s8UNIX\s10 time-sharing system [2]
- is written in C, as is the operating system itself.
- C is also available on the \s8HIS\s10 6070 computer
- at Murray Hill and
- and on the \s8IBM\s10 System/370
- at Holmdel [3].
- This paper is a manual only for the C language itself
- as implemented on the \s8PDP\s10-11.
- However, hints are given occasionally in the text of
- implementation-dependent features.
- .pg
- The \s8UNIX\s10 Programmer's Manual [4]
- describes the library routines available to C programs under \s8UNIX\s10,
- and also the procedures for compiling programs under that
- system.
- ``The \s8GCOS\s10 C Library'' by Lesk and Barres [5]
- describes routines available under that system
- as well as compilation procedures.
- Many of these routines, particularly the ones having to do with I/O,
- are also provided under \s8UNIX\s10.
- Finally, ``Programming in C\(mi A Tutorial,''
- by B. W. Kernighan [6],
- is as useful as promised by its title and the author's
- previous introductions to allegedly impenetrable subjects.
- .ul
- 2. Lexical conventions
- .et
- There are six kinds of
- tokens:
- identifiers, keywords, constants, strings, expression operators,
- and other separators.
- In general blanks, tabs, newlines,
- and comments as described below
- are ignored except as they serve to separate
- tokens.
- At least one of these characters is required to separate
- otherwise adjacent identifiers,
- constants, and certain operator-pairs.
- .pg
- If the input stream has been parsed into tokens
- up to a given character, the next token is taken
- to include the longest string of characters
- which could possibly constitute a token.
- .ms
- 2.1 Comments
- .et
- The characters ^^\fG/\**\fR^^ introduce a comment, which terminates
- with the characters^^ \fG\**/\fR.
- .ms
- 2.2 Identifiers (Names)
- .et
- An identifier is a sequence of letters and digits;
- the first character must be alphabetic.
- The underscore ``\(ru'' counts as alphabetic.
- Upper and lower case letters
- are considered different.
- No more than the first eight characters
- are significant, and only the first seven for
- external identifiers.
- .ms
- 2.3 Keywords
- .et
- The following identifiers are reserved for use
- as keywords, and may not be used otherwise:
- .sp .7
- .in .5i
- .ta 2i
- .nf
- .ne 10
- .ft G
- int break
- char continue
- float if
- double else
- struct for
- auto do
- extern while
- register switch
- static case
- goto default
- return entry
- sizeof
- .sp .7
- .ft R
- .fi
- .in 0
- The
- .bd entry
- keyword is not currently implemented by any compiler but
- is reserved for future use.
- .ms
- 2.3 Constants
- .et
- There are several kinds
- of constants, as follows:
- .ms
- 2.3.1 Integer constants
- .et
- An integer constant is a sequence of digits.
- An integer is taken
- to be octal if it begins with \fG0\fR, decimal otherwise.
- The digits \fG8\fR and \fG9\fR have octal value 10 and 11 respectively.
- .ms
- 2.3.2 Character constants
- .et
- A character constant is 1 or 2 characters enclosed in single quotes
- ``\fG^\(aa^\fR''.
- Within a character constant a single quote must be preceded by
- a back-slash ``\\''.
- Certain non-graphic characters, and ``\\'' itself,
- may be escaped according to the following table:
- .sp .7
- .ta .5i 1.25i
- .nf
- \s8BS\s10 \\b
- \s8NL\s10 \\n
- \s8CR\s10 \\r
- \s8HT\s10 \\t
- \fIddd\fR \\\fIddd\fR
- \\ \\\\
- .fi
- .sp .7
- The escape ``\\\fIddd\fR''
- consists of the backslash followed by 1, 2, or 3 octal digits
- which are taken to specify the value of the
- desired character.
- A special case of this construction is ``\\0'' (not followed
- by a digit) which indicates a null character.
- .pg
- Character constants behave exactly like integers
- (not, in particular, like objects
- of character type).
- In conformity with the addressing structure of the \s8PDP\s10-11,
- a character constant of length 1 has the code for the
- given character in the low-order byte
- and 0 in the high-order byte;
- a character constant of length 2 has the code for the
- first character in the low byte and that for the second
- character in the high-order byte.
- Character constants with more than one character are
- inherently machine-dependent and should
- be avoided.
- .ms
- 2.3.3 Floating constants
- .et
- A floating constant consists of
- an integer part, a decimal point, a fraction part,
- an \fGe\fR, and an optionally signed integer exponent.
- The integer and fraction parts both consist of a sequence
- of digits.
- Either the integer part or the fraction
- part (not both) may be missing;
- either the decimal point or
- the \fGe\fR and the exponent (not both) may be missing.
- Every floating constant is taken to be double-precision.
- .ms
- 2.4 Strings
- .et
- A string is a sequence of characters surrounded by
- double quotes ``^\fG"\fR^''.
- A string has the type
- array-of-characters (see below)
- and refers to an area of storage initialized with
- the given characters.
- The compiler places
- a null byte (^\\0^)
- at the end of each string so that programs
- which scan the string can
- find its end.
- In a string, the character ``^\fG"\fR^'' must be preceded by
- a ``\\''^;
- in addition, the same escapes as described for character
- constants may be used.
- .ul
- 3. Syntax notation
- .et
- In the syntax notation used in this manual,
- syntactic categories are indicated by
- \fIitalic\fR type,
- and literal words and characters
- in \fGgothic.\fR
- Alternatives are listed on separate lines.
- An optional terminal or non-terminal symbol is
- indicated by the subscript ``opt,'' so that
- .dp
- { expression\*(op }
- .ed
- would indicate an optional expression in braces.
- .ul
- 4. What's in a Name?
- .et
- C bases the interpretation of an
- identifier upon two attributes of the identifier: its
- .ft I
- storage class
- .ft R
- and its
- .ft I
- type.
- .ft R
- The storage class determines the location and lifetime
- of the storage associated with an identifier;
- the type determines
- the meaning of the values
- found in the identifier's storage.
- .pg
- There are four declarable storage classes:
- automatic,
- static,
- external,
- and
- register.
- Automatic variables are local to each invocation of
- a function, and are discarded on return;
- static variables are local to a function, but retain
- their values independently of invocations of the
- function; external variables are independent of any function.
- Register variables are stored in the fast registers
- of the machine; like automatic
- variables they are local to each function and disappear on return.
- .pg
- C supports four fundamental types of objects:
- characters, integers, single-, and double-precision
- floating-point numbers.
- .sp .7
- .in .5i
- Characters (declared, and hereinafter called, \fGchar\fR) are chosen from
- the \s8ASCII\s10 set;
- they occupy the right-most seven bits
- of an 8-bit byte.
- It is also possible to interpret \fGchar\fRs
- as signed, 2's complement 8-bit numbers.
- .sp .4
- Integers (\fGint\fR) are represented in 16-bit 2's complement notation.
- .sp .4
- Single precision floating point (\fGfloat\fR) quantities
- have magnitude in the range approximately
- 10\u\s7\(+-38\s10\d
- or 0; their precision is 24 bits or about
- seven decimal digits.
- .sp .4
- Double-precision floating-point (\fGdouble\fR) quantities have the same range
- as \fGfloat\fRs and a precision of 56 bits
- or about 17 decimal digits.
- .sp .7
- .in 0
- .pg
- Besides the four fundamental types there is a
- conceptually infinite class of derived types constructed
- from the fundamental types in the following ways:
- .sp .7
- .in .5i
- .ft I
- arrays
- .ft R
- of objects of most types;
- .sp .4
- .ft I
- functions
- .ft R
- which return objects of a given type;
- .sp .4
- .ft I
- pointers
- .ft R
- to objects of a given type;
- .sp .4
- .ft I
- structures
- .ft R
- containing objects of various types.
- .sp .7
- .in 0
- In general these methods
- of constructing objects can
- be applied recursively.
- .ul
- 5. Objects and lvalues
- .et
- An object is a manipulatable region of storage;
- an lvalue is an expression referring to an object.
- An obvious example of an lvalue
- expression is an identifier.
- There are operators which yield lvalues:
- for example,
- if E is an expression of pointer type, then \**E is an lvalue
- expression referring to the object to which E points.
- The name ``lvalue'' comes from the assignment expression
- ``E1@=@E2'' in which the left operand E1 must be
- an lvalue expression.
- The discussion of each operator
- below indicates whether it expects lvalue operands and whether it
- yields an lvalue.
- .ul
- 6. Conversions
- .et
- A number of operators may, depending on their operands,
- cause conversion of the value of an operand from one type to another.
- This section explains the result to be expected from such
- conversions.
- .ms
- 6.1 Characters and integers
- .et
- A \fGchar\fR object may be used anywhere
- an \fGint\fR may be.
- In all cases the
- \fGchar\fR is converted to an \fGint\fR
- by propagating its sign through the
- upper 8 bits of the resultant integer.
- This is consistent with the two's complement representation
- used for both characters and integers.
- (However,
- the sign-propagation feature
- disappears in other implementations.)
- .ms
- 6.2 Float and double
- .et
- All floating arithmetic in C is carried out in double-precision;
- whenever a \fGfloat\fR
- appears in an expression it is lengthened to \fGdouble\fR
- by zero-padding its fraction.
- When a \fGdouble\fR must be
- converted to \fGfloat\fR, for example by an assignment,
- the \fGdouble\fR is rounded before
- truncation to \fGfloat\fR length.
- .ms
- 6.3 Float and double; integer and character
- .et
- All \fGint\fRs and \fGchar\fRs may be converted without
- loss of significance to \fGfloat\fR or \fGdouble\fR.
- Conversion of \fGfloat\fR or \fGdouble\fR
- to \fGint\fR or \fGchar\fR takes place with truncation towards 0.
- Erroneous results can be expected if the magnitude
- of the result exceeds 32,767 (for \fGint\fR)
- or 127 (for \fGchar\fR).
- .ms
- 6.4 Pointers and integers
- .et
- Integers and pointers may be added and compared; in such a case
- the \fGint\fR is converted as
- specified in the discussion of the addition operator.
- .pg
- Two pointers to objects of the same type may be subtracted;
- in this case the result is converted to an integer
- as specified in the discussion of the subtraction
- operator.
-