home *** CD-ROM | disk | FTP | other *** search
Text File | 1992-10-18 | 48.3 KB | 1,585 lines |
-
-
-
-
-
-
-
-
-
- IIIInnnnddddiiiiaaaannnn HHHHiiiillllllll CCCC SSSSttttyyyylllleeee aaaannnndddd CCCCooooddddiiiinnnngggg SSSSttttaaaannnnddddaaaarrrrddddssss
- aaaassss aaaammmmeeeennnnddddeeeedddd ffffoooorrrr UUUU ooooffff TTTT ZZZZoooooooollllooooggggyyyy UUUUNNNNIIIIXXXX||||----
-
-
- L.W. Cannon
- R.A. Elliott
- L.W. Kirchhoff
- J.H. Miller
- J.M. Milner
- R.W. Mitze
- E.P. Schan
- N.O. Whittington
-
- Bell Labs
-
-
- Henry Spencer
-
- Zoology Computer Systems
- University of Toronto
-
-
-
- _A_B_S_T_R_A_C_T
-
- This document is an annotated (by the last
- author) version of the original paper of the same
- title. It describes a set of coding standards and
- recommendations which are local standards for
- officially-supported UNIX programs. The scope is
- coding style, not functional organization.
-
-
-
- April 18, 1990
-
-
-
-
-
-
-
-
-
-
- _________________________
- |- UNIX is a trademark of Bell Laboratories.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- IIIInnnnddddiiiiaaaannnn HHHHiiiillllllll CCCC SSSSttttyyyylllleeee aaaannnndddd CCCCooooddddiiiinnnngggg SSSSttttaaaannnnddddaaaarrrrddddssss
- aaaassss aaaammmmeeeennnnddddeeeedddd ffffoooorrrr UUUU ooooffff TTTT ZZZZoooooooollllooooggggyyyy UUUUNNNNIIIIXXXX||||----
-
-
- L.W. Cannon
- R.A. Elliott
- L.W. Kirchhoff
- J.H. Miller
- J.M. Milner
- R.W. Mitze
- E.P. Schan
- N.O. Whittington
-
- Bell Labs
-
-
- Henry Spencer
-
- Zoology Computer Systems
- University of Toronto
-
-
-
- _1. _I_n_t_r_o_d_u_c_t_i_o_n
-
- This document is a result of a committee formed at
- Indian Hill to establish a common set of coding standards
- and recommendations for the Indian Hill community. The
- scope of this work is the coding style, not the functional
- organization of programs. The standards in this document
- are not specific to ESS programming only1. We have tried to
- combine previous work [1,6] on C style into a uniform set of
- standards that should be appropriate for any project using
- C2.
-
- _________________________
- |- UNIX is a trademark of Bell Laboratories.
-
- 1. In fact, they're pretty good general standards. ``To
- be clear is professional; not to be clear is
- unprofessional.'' - Sir Ernest Gowers. This document
- is presented unadulterated; U of T variations,
- comments, exceptions, etc. are presented in footnotes.
-
- 2. Of necessity, these standards cannot cover all
- situations. Experience and informed judgement count
- for much. Inexperienced programmers who encounter
- unusual situations should consult 1) code written by
- experienced C programmers following these rules, or 2)
- experienced C programmers.
-
-
-
-
- April 18, 1990
-
-
-
-
-
- - 2 -
-
-
- _2. _F_i_l_e _O_r_g_a_n_i_z_a_t_i_o_n
-
- A file consists of various sections that should be
- separated by several blank lines. Although there is no max-
- imum length requirement for source files, files with more
- than about 1500 lines are cumbersome to deal with. The edi-
- tor may not have enough temp space to edit the file, compi-
- lations will go slower, etc. Since most of us use 300 baud
- terminals, entire rows of asterisks, for example, should be
- discouraged3. Also lines longer than 80 columns are not
- handled well by all terminals and should be avoided if pos-
- sible4.
-
- The suggested order of sections for a file is as fol-
- lows:
-
- 1. Any header file includes should be the first thing in
- the file.
-
- 2. Immediately after the includes5 should be a prologue
- that tells what is in that file. A description of the
- purpose of the objects in the files (whether they be
- functions, external data declarations or definitions,
- or something else) is more useful than a list of the
- object names.
-
- 3. Any typedefs and defines that apply to the file as a
- whole are next.
-
- 4. Next come the global (external) data declarations. If
- a set of defines applies to a particular piece of glo-
- bal data (such as a flags word), the defines should be
- immediately after the data declaration6.
-
- 5. The functions come last7.
- _________________________
-
- 3. This is not a problem at U of T, or most other sensible
- places, but rows of asterisks are still annoying.
-
- 4. Excessively long lines which result from deep indenting
- are often a symptom of poorly-organized code.
-
- 5. A common variation, in both Bell code and ours, is to
- reverse the order of sections 1 and 2. This is an
- acceptable practice.
-
- 6. Such defines should be indented to put the _d_e_f_i_n_es one
- level deeper than the first keyword of the declaration
- to which they apply.
-
- 7. They should be in some sort of meaningful order. Top-
- down is generally better than bottom-up, and a
- ``breadth-first'' approach (functions on a similar
-
-
-
- April 18, 1990
-
-
-
-
-
- - 3 -
-
-
- _2._1. _F_i_l_e _N_a_m_i_n_g _C_o_n_v_e_n_t_i_o_n_s
-
- UNIX requires certain suffix conventions for names of
- files to be processed by the _c_c command [5]8. The following
- suffixes are required:
-
- +o C source file names must end in ._c
-
- +o Assembler source file names must end in ._s
-
- In addition the following conventions are universally
- followed:
-
- +o Relocatable object file names end in ._o
-
- +o Include header file names end in ._h 9 or ._d
-
- +o Ldp10 specification file names end in ._b
-
- +o Yacc source file names end in ._y
-
- +o Lex source file names end in ._l
-
- _3. _H_e_a_d_e_r _F_i_l_e_s
-
- Header files are files that are included in other files
- prior to compilation by the C preprocessor. Some are
- defined at the system level like _s_t_d_i_o._h which must be
- included by any program using the standard I/O library.
- Header files are also used to contain data declarations and
- defines that are needed by more than one program11. Header
- _________________________
- level of abstraction together) is preferred over
- depth-first (functions defined as soon as possible
- after their calls). Considerable judgement is called
- for here. If defining large numbers of essentially-
- independent utility functions, consider alphabetical
- order.
-
- 8. In addition to the suffix conventions given here, it is
- conventional to use `Makefile' (not `makefile') for the
- control file for _m_a_k_e and `README' for a summary of the
- contents of a directory or directory tree.
-
- 9. Preferred. An alternate convention that may be
- preferable in multi-language environments is to use the
- same suffix as an ordinary source file but with two
- periods instead of one (e.g. ``foo..c'').
-
- 10. No idea what this is.
-
- 11. Don't use absolute pathnames for header files. Use the
- <_n_a_m_e> construction for getting them from a standard
- place, or define them relative to the current
-
-
-
- April 18, 1990
-
-
-
-
-
- - 4 -
-
-
- files should be functionally organized, i.e., declarations
- for separate subsystems should be in separate header files.
- Also, if a set of declarations is likely to change when code
- is ported from one machine to another, those declarations
- should be in a separate header file.
-
- Header files should not be nested. Some objects like
- typedefs and initialized data definitions cannot be seen
- twice by the compiler in one compilation. On non-UNIX sys-
- tems this is also true of uninitialized declarations without
- the _e_x_t_e_r_n keyword12. This can happen if include files are
- nested and will cause the compilation to fail.
-
- _4. _E_x_t_e_r_n_a_l _D_e_c_l_a_r_a_t_i_o_n_s
-
- External declarations should begin in column 1. Each
- declaration should be on a separate line. A comment
- describing the role of the object being declared should be
- included, with the exception that a list of defined con-
- stants do not need comments if the constant names are suffi-
- cient documentation. The comments should be tabbed so that
- they line up underneath each other13. Use the tab character
- (CTRL I if your terminal doesn't have a separate key) rather
- than blanks. For structure and union template declarations,
- each element should be alone on a line with a comment
- describing it. The opening brace ( { ) should be on the
- same line as the structure tag, and the closing brace should
- be alone on a line in column 1, i.e.
-
- struct boat {
- int wllength; /* water line length in feet */
- int type; /* see below */
- long sarea; /* sail area in square feet */
- };
- /*
- * defines for boat.type14
- */
- #define KETCH 1
- #define YAWL 2
- #define SLOOP 3
- #define SQRIG 4
- #define MOTOR 5
- _________________________
- directory. The ----IIII option of the C compiler is the best
- way to handle extensive private libraries of header
- files; it permits reorganizing the directory structure
- without having to alter source files.
-
- 12. It should be noted that declaring variables in a header
- file is often a poor idea. Frequently it is a symptom
- of poor partitioning of code between files.
-
- 13. So should the constant names and their defined values.
-
-
-
-
- April 18, 1990
-
-
-
-
-
- - 5 -
-
-
- If an external variable is initialized15 the equal sign
- should not be omitted16.
-
- int x = 1;
- char *msg = "message";
- struct boat winner = {
- 40, /* water line length */
- YAWL,
- 600 /* sail area */
- };
-
- 17
-
- _5. _C_o_m_m_e_n_t_s
-
- Comments that describe data structures, algorithms,
- etc., should be in block comment form with the opening /* in
- column one, a * in column 2 before each line of comment
- text18, and the closing */ in columns 2-3.
-
- _________________________
-
- 14. These defines are better put right after the
- declaration of _t_y_p_e, within the _s_t_r_u_c_t declaration,
- with enough tabs after # to indent _d_e_f_i_n_e one level
- more than the structure member declarations.
-
- 15. Any variable whose initial value is important should be
- _e_x_p_l_i_c_i_t_l_y initialized, or at the very least should be
- commented to indicate that C's default initialization
- to 0 is being relied on.
-
- 16. The empty initializer, ``{}'', should never be used.
- Structure initializations should be fully parenthesized
- with braces. Constants used to initialize longs should
- be explicitly long.
-
- 17. In any file which is part of a larger whole rather than
- a self-contained program, maximum use should be made of
- the _s_t_a_t_i_c keyword to make functions and variables
- local to single files. Variables in particular should
- be accessible from other files only when there is a
- clear need that cannot be filled in another way. Such
- usages should be commented to make it clear that
- another file's variables are being used; the comment
- should name the other file.
-
- 18. Some automated program-analysis packages use a
- different character in this position as a marker for
- lines with specific items of information. In
- particular, a line with a `-' here in a comment
- preceding a function is sometimes assumed to be a one-
- line summary of the function's purpose.
-
-
-
-
- April 18, 1990
-
-
-
-
-
- - 6 -
-
-
-
- /*
- * Here is a block comment.
- * The comment text should be tabbed over19
- * and the opening /* and closing star-slash
- * should be alone on a line.
- */
-
-
- Note that _g_r_e_p ^.\* will catch all block comments in
- the file. In some cases, block comments inside a function
- are appropriate, and they should be tabbed over to the same
- tab setting as the code that they describe. Short comments
- may appear on a single line indented over to the tab setting
- of the code that follows.
-
- if (argc > 1) {
- /* Get input file from command line. */
- if (freopen(argv[1], "r", stdin) == NULL)
- error("can't open %s\n", argv[1]);
- }
-
-
- Very short comments may appear on the same line as the
- code they describe, but should be tabbed over far enough to
- separate them from the statements. If more than one short
- comment appears in a block of code they should all be tabbed
- to the same tab setting.
-
- if (a == 2)
- return(TRUE); /* special case */
- else
- return(isprime(a)); /* works only for odd a */
-
-
- _6. _F_u_n_c_t_i_o_n _D_e_c_l_a_r_a_t_i_o_n_s
-
- Each function should be preceded by a block comment
- prologue that gives the name and a short description of what
- the function does20. If the function returns a value, the
- type of the value returned should be alone on a line in
- column 1 (do not default to _i_n_t). If the function does not
- return a value then it should not be given a return type.
- _________________________
-
- 19. A common practice in both Bell and local code is to use
- a space rather than a tab after the *. This is
- acceptable.
-
- 20. Discussion of non-trivial design decisions is also
- appropriate, but avoid duplicating information that is
- present in (and clear from) the code. It's too easy
- for such redundant information to get out of date.
-
-
-
-
- April 18, 1990
-
-
-
-
-
- - 7 -
-
-
- If the value returned requires a long explanation, it should
- be given in the prologue; otherwise it can be on the same
- line as the return type, tabbed over. The function name and
- formal parameters should be alone on a line beginning in
- column 1. Each parameter should be declared (do not default
- to _i_n_t), with a comment on a single line. The opening brace
- of the function body should also be alone on a line begin-
- ning in column 1. The function name, argument declaration
- list, and opening brace should be separated by a blank
- line21. All local declarations and code within the function
- body should be tabbed over at least one tab.
-
- If the function uses any external variables, these
- should have their own declarations in the function body
- using the _e_x_t_e_r_n keyword. If the external variable is an
- array the array bounds must be repeated in the _e_x_t_e_r_n
- declaration. There should also be _e_x_t_e_r_n declarations for
- all functions called by a given function. This is particu-
- larly beneficial to someone picking up code written by
- another. If a function returns a value of type other than
- _i_n_t, it is required by the compiler that such functions be
- declared before they are used. Having the _e_x_t_e_r_n delcara-
- tion in the calling function's declarations section avoids
- all such problems22.
-
- In general each variable declaration should be on a
- separate line with a comment describing the role played by
- the variable in the function. If the variable is external
- or a parameter of type pointer which is changed by the func-
- tion, that should be noted in the comment. All such com-
- ments for parameters and local variables should be tabbed so
- that they line up underneath each other. The declarations
- should be separated from the function's statements by a
- blank line.
-
- A local variable should not be redeclared in nested
- blocks23. Even though this is valid C, the potential
- _________________________
-
- 21. Neither Bell nor local code has ever included these
- separating blank lines, and it is not clear that they
- add anything useful. Leave them out.
-
- 22. These rules tend to produce a lot of clutter. Both
- Bell and local practice frequently omits _e_x_t_e_r_n
- declarations for _s_t_a_t_i_c variables and functions. This
- is permitted. Omission of declarations for standard
- library routines is also permissible, although if they
- _a_r_e declared it is better to declare them within the
- functions that use them rather than globally.
-
- 23. In fact, avoid any local declarations that override
- declarations at higher levels.
-
-
-
-
- April 18, 1990
-
-
-
-
-
- - 8 -
-
-
- confusion is enough that _l_i_n_t will complain about it when
- given the ----hhhh option.
-
- _6._1. _E_x_a_m_p_l_e_s
-
-
- /*
- * skyblue()
- *
- * Determine if the sky is blue.
- */
-
- int /* TRUE or FALSE */
- skyblue()
-
- {
- extern int hour;
-
- if (hour < MORNING || hour > EVENING)
- return(FALSE); /* black */
- else
- return(TRUE); /* blue */
- }
-
-
- /*
- * tail(nodep)
- *
- * Find the last element in the linked list
- * pointed to by nodep and return a pointer to it.
- */
-
- NODE * /* pointer to tail of list */
- tail(nodep)
-
- NODE *nodep; /* pointer to head of list */
-
- {
- register NODE *np; /* current pointer advances to NULL */
- register NODE *lp; /* last pointer follows np */
-
- np = lp = nodep;
- while ((np = np->next) != NULL)
- lp = np;
- return(lp);
- }
-
-
- _7. _C_o_m_p_o_u_n_d _S_t_a_t_e_m_e_n_t_s
-
- Compound statements are statements that contain lists
- of statements enclosed in braces. The enclosed list should
- be tabbed over one more than the tab position of the com-
- pound statement itself. The opening left brace should be at
-
-
-
- April 18, 1990
-
-
-
-
-
- - 9 -
-
-
- the end of the line beginning the compound statement and the
- closing right brace should be alone on a line, tabbed under
- the beginning of the compound statement. Note that the left
- brace beginning a function body is the only occurrence of a
- left brace which is alone on a line.
-
- _7._1. _E_x_a_m_p_l_e_s
-
-
- if (expr) {
- statement;
- statement;
- }
-
- if (expr) {
- statement;
- statement;
- } else {
- statement;
- statement;
- }
-
- Note that the right brace before the _e_l_s_e and the right
- brace before the _w_h_i_l_e of a _d_o-_w_h_i_l_e statement (below) are
- the only places where a right braces appears that is not
- alone on a line.
-
- for (i = 0; i < MAX; i++) {
- statement;
- statement;
- }
-
- while (expr) {
- statement;
- statement;
- }
-
- do {
- statement;
- statement;
- } while (expr);
-
- switch (expr) {
- case ABC:
- case DEF:
- statement;
- break;
- case XYZ:
- statement;
- break;
- default:
- statement;
- break24;
- }
-
-
-
- April 18, 1990
-
-
-
-
-
- - 10 -
-
-
- Note that when multiple _c_a_s_e labels are used, they are
- placed on separate lines. The fall through feature of the C
- _s_w_i_t_c_h statement should rarely if ever be used when code is
- executed before falling through to the next one. If this is
- done it must be commented for future maintenance.
-
- if (strcmp(reply, "yes") == EQUAL) {
- statements for yes
- ...
- } else if (strcmp(reply, "no") == EQUAL) {
- statements for no
- ...
- } else if (strcmp(reply, "maybe") == EQUAL) {
- statements for maybe
- ...
- } else {
- statements for none of the above
- ...
- }
-
- The last example is a generalized _s_w_i_t_c_h statement and the
- tabbing reflects the switch between exactly one of several
- alternatives rather than a nesting of statements.
-
- _8. _E_x_p_r_e_s_s_i_o_n_s
-
- _8._1. _O_p_e_r_a_t_o_r_s
-
- The old versions of equal-ops =+, =-, =*, etc. should
- not be used. The preferred use is +=, -=, *=, etc. All
- binary operators except . and -> should be separated from
- their operands by blanks25. In addition, keywords that are
- followed by expressions in parentheses should be separated
- from the left parenthesis by a blank26. Blanks should also
- appear after commas in argument lists to help separate the
- arguments visually. On the other hand, macros with argu-
- ments and function calls should not have a blank between the
- name and the left parenthesis. In particular, the C prepro-
- cessor requires the left parenthesis to be immediately after
- _________________________
-
- 24. This _b_r_e_a_k is, strictly speaking, unnecessary, but it
- is required nonetheless because it prevents a fall-
- through error if another _c_a_s_e is added later after the
- last one.
-
- 25. Some judgement is called for in the case of complex
- expressions, which may be clearer if the ``inner''
- operators are not surrounded by spaces and the
- ``outer'' ones are.
-
- 26. _S_i_z_e_o_f is an exception, see the discussion of function
- calls. Less logically, so is _r_e_t_u_r_n.
-
-
-
-
- April 18, 1990
-
-
-
-
-
- - 11 -
-
-
- the macro name or else the argument list will not be recog-
- nized. Unary operators should not be separated from their
- single operand. Since C has some unexpected precedence
- rules, all expressions involving mixed operators should be
- fully parenthesized.
-
- _E_x_a_m_p_l_e_s
-
- a += c + d;
- a = (a + b) / (c * d);
- strp->field = str.fl - ((x & MASK) >> DISP);
- while (*d++ = *s++)
- ; /* EMPTY BODY */
-
-
- _8._2. _N_a_m_i_n_g _C_o_n_v_e_n_t_i_o_n_s
-
- Individual projects will no doubt have their own naming
- conventions. There are some general rules however.
-
- +o An initial underscore should not be used for any user-
- created names27. UNIX uses it for names that the user
- should not have to know (like the standard I/O
- library)28.
-
- +o Macro names, _t_y_p_e_d_e_f names, and _d_e_f_i_n_e names should be
- all in CAPS.
-
- +o Variable names, structure tag names, and function names
- should be in lower case29. Some macros (such as
- _g_e_t_c_h_a_r and _p_u_t_c_h_a_r) are in lower case since they may
- also exist as functions. Care is needed when inter-
- changing macros and functions since functions pass
- their parameters by value whereas macros pass their
- arguments by name substitution30.
- _________________________
-
- 27. Trailing underscores should be avoided too.
-
- 28. This convention is reserved for system purposes. If
- you must have your own private identifiers, begin them
- with a capital letter identifying the package to which
- they belong.
-
- 29. It is best to avoid names that differ only in case,
- like _f_o_o and _F_O_O. The potential for confusion is
- considerable.
-
- 30. This difference also means that carefree use of macros
- requires care when they are defined. Remember that
- complex expressions can be used as parameters, and
- operator-precedence problems can arise unless all
- occurrences of parameters in the definition have
- parentheses around them. There is little that can be
-
-
-
- April 18, 1990
-
-
-
-
-
- - 12 -
-
-
- _8._3. _C_o_n_s_t_a_n_t_s
-
- Numerical constants should not be coded directly31.
- The _d_e_f_i_n_e feature of the C preprocessor should be used to
- assign a meaningful name. This will also make it easier to
- administer large programs since the constant value can be
- changed uniformly by changing only the _d_e_f_i_n_e. The enumera-
- tion data type is the preferred way to handle situations
- where a variable takes on only a discrete set of values,
- since additional type checking is available through _l_i_n_t.
-
- There are some cases where the constants 0 and 1 may
- appear as themselves instead of as defines. For example if
- a _f_o_r loop indexes through an array, then
-
- for (i = 0; i < ARYBOUND; i++)
-
- is reasonable while the code
-
- fptr = fopen(filename, "r");
- if (fptr == 0)
- error("can't open %s\n", filename);
-
- is not. In the last example the defined constant _N_U_L_L is
- available as part of the standard I/O library's header file
- _s_t_d_i_o._h and must be used in place of the 0.
-
- _9. _P_o_r_t_a_b_i_l_i_t_y
-
- The advantages of portable code are well known. This
- section gives some guidelines for writing portable code,
- where the definition of portable is taken to mean that a
- source file contains portable code if it can be compiled and
- executed on different machines with the only source change
- being the inclusion of possibly different header files. The
- header files will contain defines and typedefs that may vary
- from machine to machine. Reference [1] contains useful
- information on both style and portability. Many of the
- recommendations in this document originated in [1]. The
- following is a list of pitfalls to be avoided and recommen-
- dations to be considered when designing portable code:
-
- +o First, one must recognize that some things are
- inherently non-portable. Examples are code to deal
- with particular hardware registers such as the program
- _________________________
- done about the problems caused by side effects in
- parameters except to avoid side effects in expressions
- (a good idea anyway).
-
- 31. At the very least, any directly-coded numerical
- constant must have a comment explaining the derivation
- of the value.
-
-
-
-
- April 18, 1990
-
-
-
-
-
- - 13 -
-
-
- status word, and code that is designed to support a
- particular piece of hardware such as an assembler or
- I/O driver. Even in these cases there are many rou-
- tines and data organizations that can be made machine
- independent. It is suggested that source file be
- organized so that the machine-independent code and the
- machine-dependent code are in separate files. Then if
- the program is to be moved to a new machine, it is a
- much easier task to determine what needs to be
- changed32. It is also possible that code in the
- machine-independent files may have uses in other pro-
- grams as well.
-
- +o Pay attention to word sizes. The following sizes apply
- to basic types in C for the machines that will be used
- most at IH33:
-
- type pdp11 3B IBM
- ________________________
- char 8 8 8
- short 16 16 16
- int 16 32 32
- long 32 32 32
-
- In general if the word size is important, _s_h_o_r_t or _l_o_n_g
- should be used to get 16 or 32 bit items on any of the
- above machines34. If a simple loop counter is being
- used where either 16 or 32 bits will do, then use _i_n_t,
- since it will get the most efficient (natural) unit for
- the current machine35.
- _________________________
-
- 32. If you #_i_f_d_e_f dependencies, make sure that if no
- machine is specified, the result is a syntax error, _n_o_t
- a default machine!
-
- 33. The 3B is a Bell Labs machine. The VAX, not shown in
- the table, is similar to the 3B in these respects. The
- 68000 resembles either the pdp11 or the 3B, depending
- on the particular compiler.
-
- 34. Any unsigned type other than plain _u_n_s_i_g_n_e_d _i_n_t should
- be _t_y_p_e_d_e_fed, as such types are highly compiler-
- dependent. This is also true of long and short types
- other than _l_o_n_g _i_n_t and _s_h_o_r_t _i_n_t. Large programs
- should have a central header file which supplies
- _t_y_p_e_d_e_fs for commonly-used width-sensitive types, to
- make it easier to change them and to aid in finding
- width-sensitive code.
-
- 35. Beware of making assumptions about the size of
- pointers. They are not always the same size as _i_n_t.
- Nor are all pointers always the same size, or freely
- interconvertible. Pointer-to-character is a particular
-
-
-
- April 18, 1990
-
-
-
-
-
- - 14 -
-
-
- +o Word size also affects shifts and masks. The code
-
- x &= 0177770
-
- will clear only the three rightmost bits of an _i_n_t on a
- PDP11. On a 3B it will also clear the entire upper
- halfword. Use
-
- x &= ~07
-
- instead which works properly on all machines36.
-
- +o Code that takes advantage of the two's complement
- representation of numbers on most machines should not
- be used. Optimizations that replace arithmetic opera-
- tions with equivalent shifting operations are particu-
- larly suspect. You should weigh the time savings with
- the potential for obscure and difficult bugs when your
- code is moved, say, from a 3B to a 1A.
-
- +o Watch out for signed characters. On the PDP-11, char-
- acters are sign extended when used in expressions,
- which is not the case on any other machine. In partic-
- ular, _g_e_t_c_h_a_r is an integer-valued function (or macro)
- since the value of _E_O_F for the standard I/O library is
- -1, which is not possible for a character on the 3B or
- IBM37.
-
- +o The PDP-11 is unique among processors on which C exists
- in that the bytes are numbered from right to left
- within a word. All other machines (3B, IBM, Interdata
- 8/32, Honeywell) number the bytes from left to right38.
- Hence any code that depends on the left-right orienta-
- tion of bits in a word deserves special scrutiny. Bit
- fields within structure members will only be portable
- _________________________
- trouble spot on machines which do not address to the
- byte.
-
- 36. The or operator ( | ) does not have these problems, nor
- do bitfields (which, unfortunately, are not very
- portable due to defective compilers).
-
- 37. Actually, this is not quite the real reason why _g_e_t_c_h_a_r
- returns _i_n_t, but the comment is valid: code which
- assumes either that characters are signed or that they
- are unsigned is unportable. It is best to completely
- avoid using _c_h_a_r to hold numbers. Manipulation of
- characters as if they were numbers is also often
- unportable.
-
- 38. Actually, there are some more right-to-left machines
- now, but the comments still apply.
-
-
-
-
- April 18, 1990
-
-
-
-
-
- - 15 -
-
-
- so long as two separate fields are never concatenated
- and treated as a unit39. [1,3]
-
- +o Do not default the boolean test for non-zero, i.e.
-
- if (f() != FAIL)
-
- is better than
-
- if (f())
-
- even though _F_A_I_L may have the value 0 which is con-
- sidered to mean false by C40. This will help you out
- later when somebody decides that a failure return
- should be -1 instead of 0 41.
-
- +o Be suspicious of numeric values appearing in the code.
- Even simple values like 0 or 1 could be better
- expressed using defines like _F_A_L_S_E and _T_R_U_E (see previ-
- ous item)42. Any other constants appearing in a pro-
- gram would be better expressed as a defined constant.
- This makes it easier to change and also easier to read.
-
- +o Become familiar with existing library functions and
- _________________________
-
- 39. The same applies to variables in general. Alignment
- considerations and loader peculiarities make it very
- rash to assume that two consecutively-declared
- variables are together in memory, or that a variable of
- one type is aligned appropriately to be used as another
- type.
-
- 40. A particularly notorious case is using _s_t_r_c_m_p to test
- for string equality, where the result should _n_e_v_e_r _e_v_e_r
- be defaulted. The preferred approach is to define a
- macro _S_T_R_E_Q:
-
- #define STREQ(a, b) (strcmp((a), (b)) == 0)
-
-
- 41. An exception is commonly made for predicates, which are
- functions which meet the following restrictions:
-
- +o Has no other purpose than to return true or false.
-
- +o Returns 0 for false, 1 for true, nothing else.
-
- +o Is named so that the meaning of (say) a `true' return
- is absolutely obvious. Call a predicate _i_s_v_a_l_i_d or
- _v_a_l_i_d, not _c_h_e_c_k_v_a_l_i_d.
-
- 42. Actually, _Y_E_S and _N_O often read better.
-
-
-
-
- April 18, 1990
-
-
-
-
-
- - 16 -
-
-
- defines43. You should not be writing your own string
- compare routine, or making your own defines for system
- structures44. Not only does this waste your time, but
- it prevents your program from taking advantage of any
- microcode assists or other means of improving perfor-
- mance of system routines45.
-
- +o Use _l_i_n_t. It is a valuable tool for finding machine-
- dependent constructs as well as other inconsistencies
- or program bugs that pass the compiler46.
-
- _1_0. _L_i_n_t
-
- _L_i_n_t is a C program checker [2] that examines C source
- files to detect and report type incompatibilities, incon-
- sistencies between function definitions and calls, potential
- program bugs, etc. It is expected that projects will
- require programs to use _l_i_n_t as part of the official accep-
- tance procedure47. In addition, work is going on in depart-
- ment 5521 to modify _l_i_n_t so that it will check for adherence
- to the standards in this document.
-
- It is still too early to say exactly which of the
- _________________________
-
- 43. But not _t_o_o familiar. The internal details of library
- facilities, as opposed to their external interfaces,
- are subject to change without warning. They are also
- often quite unportable.
-
- 44. Or, especially, writing your own code to control
- terminals. Use the _t_e_r_m_c_a_p package.
-
- 45. It also makes your code less readable, because the
- reader has to figure out whether you're doing something
- special in that reimplemented stuff to justify its
- existence. Furthermore, it's a fruitful source of
- bugs.
-
- 46. The use of _l_i_n_t on all programs is strongly
- recommended. It is difficult to eliminate complaints
- about functions whose return value is not used (in the
- current version of C, at least), but most other
- messages from _l_i_n_t really do indicate something wrong.
- The -h, -p, -a, -x, and -c options are worth learning.
- All of them will complain about some legitimate things,
- but they will also pick up many botches. Note that -p
- checks function-call type-consistency for only a subset
- of Unix library routines, so programs should be linted
- both with and without this option for best
- ``coverage''.
-
- 47. Yes.
-
-
-
-
- April 18, 1990
-
-
-
-
-
- - 17 -
-
-
- standards given here will be checked by _l_i_n_t. In some cases
- such as whether a comment is misleading or incorrect there
- is little hope of mechanical checking. In other cases such
- as checking that the opening brace of a function body is
- alone on a line in column 1, the test has already been
- added48. Future bulletins will be used to announce new
- additions to _l_i_n_t as they occur.
-
- It should be noted that the best way to use _l_i_n_t is not
- as a barrier that must be overcome before official accep-
- tance of a program, but rather as a tool to use whenever
- major changes or additions to the code have been made. _L_i_n_t
- can find obscure bugs and insure portability before problems
- occur.
-
- _1_1. _S_p_e_c_i_a_l _C_o_n_s_i_d_e_r_a_t_i_o_n_s
-
- This section contains some miscellaneous do's and
- don'ts.
-
- +o Don't change syntax via macro substitution. It makes
- the program unintelligible to all but the perpetrator.
-
- +o There is a time and a place for embedded assignment
- statements49. In some constructs there is no better
- way to accomplish the results without making the code
- bulkier and less readable. The _w_h_i_l_e loop in section
- 8.1 is one example of an appropriate place. Another is
- the common code segment:
-
- while ((c = getchar()) != EOF) {
- process the character
- }
-
- Using embedded assignment statements to improve run-
- time performance is also possible. However, one should
- consider the tradeoff between increased speed and
- decreased maintainability that results when embedded
- assignments are used in artificial places. For exam-
- ple, the code:
-
- a = b + c;
- d = a + r;
-
- should not be replaced by
-
- _________________________
-
- 48. Little of this is relevant at U of T. The version of
- _l_i_n_t that we have lacks these mods.
-
- 49. The ++++++++ and -------- operators count as assignment statements.
- So, for many purposes, do functions with side effects.
-
-
-
-
- April 18, 1990
-
-
-
-
-
- - 18 -
-
-
-
- d = (a = b + c) + r;
-
- even though the latter may save one cycle. Note that
- in the long run the time difference between the two
- will decrease as the optimizer gains maturity, while
- the difference in ease of maintenance will increase as
- the human memory of what's going on in the latter piece
- of code begins to fade50.
-
- +o There is also a time and place for the ternary ? :
- operator and the binary comma operator. The logical
- expression operand before the ? : should be
- parenthesized:
-
- (x >= 0) ? x : -x
-
- Nested ? : operators can be confusing and should be
- avoided if possible. There are some macros like
- _g_e_t_c_h_a_r where they can be useful. The comma operator
- can also be useful in _f_o_r statements to provide multi-
- ple initializations or incrementations.
-
- +o Goto statements should be used sparingly as in any
- well-structured code51. The main place where they can
- be usefully employed is to break out of several levels
- of _s_w_i_t_c_h, _f_o_r, and _w_h_i_l_e nesting52, e.g.
-
- for (...)
- for (...) {
- ...
- if (disaster)
- goto error;
- }
- ...
- error:
- clean up the mess
-
- When a _g_o_t_o is necessary the accompanying label should
- be alone on a line and tabbed one tab position to the
- _________________________
-
- 50. Note also that side effects within expressions can
- result in code whose semantics are compiler-dependent,
- since C's order of evaluation is explicitly undefined
- in most places. Compilers do differ.
-
- 51. The _c_o_n_t_i_n_u_e statement is almost as bad. _B_r_e_a_k is less
- troublesome.
-
- 52. The need to do such a thing may indicate that the inner
- constructs should be broken out into a separate
- function, with a success/failure return code.
-
-
-
-
- April 18, 1990
-
-
-
-
-
- - 19 -
-
-
- left of the associated code that follows.
-
- +o This committee recommends that programmers not rely on
- automatic beautifiers for the following reasons.
- First, the main person who benefits from good program
- style is the programmer himself. This is especially
- true in the early design of handwritten algorithms or
- pseudo-code. Automatic beautifiers can only be applied
- to complete, syntactically correct programs and hence
- are not available when the need for attention to white
- space and indentation is greatest. It is also felt
- that programmers can do a better job of making clear
- the complete visual layout of a function or file, with
- the normal attention to detail of a careful program-
- mer53. Sloppy programmers should learn to be careful
- programmers instead of relying on a beautifier to make
- their code readable. Finally, it is felt that since
- beautifiers are non-trivial programs that must parse
- the source, the burden of maintaining them in the face
- of the continuing evolution of C is not worth the bene-
- fits gained by such a program.
-
- _1_2. _P_r_o_j_e_c_t _D_e_p_e_n_d_e_n_t _S_t_a_n_d_a_r_d_s
-
- Individual projects may wish to establish additional
- standards beyond those given here. The following issues are
- some of those that should be adddressed by each project pro-
- gram administration group.
-
- +o What additional naming conventions should be followed?
- In particular, systematic prefix conventions for func-
- tional grouping of global data and also for structure
- or union member names can be useful.
-
- +o What kind of include file organization is appropriate
- for the project's particular data hierarchy?
-
- +o What procedures should be established for reviewing
- _l_i_n_t complaints? A tolerance level needs to be esta-
- blished in concert with the _l_i_n_t options to prevent
- unimportant complaints from hiding complaints about
- real bugs or inconsistencies.
-
- +o If a project establishes its own archive libraries, it
- should plan on supplying a lint library file [2] to the
- system administrators. This will allow _l_i_n_t to check
- for compatible use of library functions.
-
- _________________________
-
- 53. In other words, some of the visual layout is dictated
- by intent rather than syntax. Beautifiers cannot read
- minds.
-
-
-
-
- April 18, 1990
-
-
-
-
-
- - 20 -
-
-
- _1_3. _C_o_n_c_l_u_s_i_o_n
-
- A set of standards has been presented for C programming
- style. One of the most important points is the proper use
- of white space and comments so that the structure of the
- program is evident from the layout of the code. Another
- good idea to keep in mind when writing code is that it is
- likely that you or someone else will be asked to modify it
- or make it run on a different machine sometime in the
- future.
-
- As with any standard, it must be followed if it is to
- be useful. The Indian Hill version of _l_i_n_t will enforce
- those standards that are amenable to automatic checking. If
- you have trouble following any of these standards don't just
- ignore them. Programmers at Indian Hill should bring their
- problems to the Software Development System Group (Lee
- Kirchhoff, contact) in department 5522. Programmers outside
- Indian Hill should contact the Processor Application Group
- (Layne Cannon, contact) in department 5512 54.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- _________________________
-
- 54. At U of T Zoology, it's Henry Spencer in 336B.
-
-
-
-
- April 18, 1990
-
-
-
-
-
- - 21 -
-
-
- RRRReeeeffffeeeerrrreeeennnncccceeeessss
-
-
-
- [1] B.A. Tague, "C Language Portability", Sept 22, 1977.
- This document issued by department 8234 contains three
- memos by R.C. Haight, A.L. Glasser, and T.L. Lyon deal-
- ing with style and portability.
-
- [2] S.C. Johnson, "Lint, a C Program Checker", Technical
- Memorandum, 77-1273-14, September 16, 1977.
-
- [3] R.W. Mitze, "The 3B/PDP-11 Swabbing Problem", Memoran-
- dum for File, 1273-770907.01MF, September 14, 1977.
-
- [4] R.A. Elliott and D.C. Pfeffer, "3B Processor Common
- Diagnostic Standards- Version 1", Memorandum for File,
- 5514-780330.01MF, March 30, 1978.
-
- [5] R.W. Mitze, "An Overview of C Compilation of UNIX User
- Processes on the 3B", Memorandum for File, 5521-
- 780329.02MF, March 29, 1978.
-
- [6] B.W. Kernighan and D.M. Ritchie, _T_h_e _C _P_r_o_g_r_a_m_m_i_n_g
- _L_a_n_g_u_a_g_e, Prentice-Hall 1978.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- April 18, 1990
-
-
-
-
-
- - 22 -
-
-
-
- /*
- * TTTThhhheeee CCCC SSSSttttyyyylllleeee SSSSuuuummmmmmmmaaaarrrryyyy SSSShhhheeeeeeeetttt Block comment,
- * by Henry Spencer, U of T Zoology describes file.
- */
-
- #include <errno.h> Headers; don't nest.
-
- typedef int SEQNO; /* ... */ Global definitions.
- #define STREQ(a, b) (strcmp((a), (b)) == 0)
-
- static char *foo = NULL; /* ... */ Global declarations.
- struct bar { Static whenever poss.
- SEQNO alpha; /* ... */
- # define NOSEQNO 0
- int beta; /* ... */ Don't assume 16 bits.
- };
-
- /*
- * Many unnecessary braces, to show where. Functions.
- */
- static int /* what is returned */ Don't default int.
- bletch(a)
- int a; /* ... */ Don't default int.
- {
- int bar; /* ... */
- extern int errno; /* ..., changed here */
- extern char *index();
-
- if (foobar() != FAIL) { if (!isvalid()) {
- return(OK); errno = ERANGE;
- } } else {
- x = &y + z->field;
- while (x == (y & MASK)) { }
- f += (x >= 0) ? x : -x;
- } for (i = 0; i < BOUND; i++) {
- /* lint -h[p]cax. */
- do { }
- /* Avoid nesting ?: */
- } while (index(a, b) != NULL); if (STREQ(x, "foo")) {
- x |= 07; /* 07 is... */
- switch (...) { } else if (STREQ(x, "bar")) {
- case ABC: x &= ~077; /* 077 is... */
- case DEF: } else if (STREQ(x, "ugh")) {
- printf("...", a, b); /* Avoid gotos */
- break; } else {
- case XYZ: /* and continues. */
- x = y; }
- /* FALLTHROUGH */
- default: while ((c = getc()) != EOF)
- /* Limit imbedded =s. */ ; /* NULLBODY */
- break;
- }
- }
-
-
-
- April 18, 1990
-
-
-
-
-
- - 23 -
-
-
- ---------------
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- April 18, 1990
-
-
-