

|
Double Dutch C++ Coding Style
Matt Stibbe
Hungarian notation, invented by Charles Simonyi, is popular among some PC programmers. And Macintosh programmers, led by Apple itself, have evolved an ad-hoc style guide for Mac programs.
The importance of code style guidelines, of whatever kind, is growing in proportion to program size and complexity. This article presents a style guide based Apple's informal style guide and the sterner discipline of Hungarian. I'll refer to it as "Double Dutch," continuing the tradition ironic national references (I am half Dutch). It is particularly aimed at C++ users, but should be applicable to other 3rd generation languages.
Theory
The fundamental principle of Double Dutch is that the form of variable and function names follow their function.An example is iCCh, which is read as an integer counter of characters and can be broken down as: [i] betokens an integer value, [C] indicates a counter of some kind, and [Ch] is the tag for a character value (a char) used here as a method mnemonic.
In function (or method) names, the form is similar. For example, iCCh=iGetLengthSz("filename"); is parsed thus: [i] again an integer value but this time indicating the functions return value; [GetLength] a natural language transitive verb indicating the functions operation; and [Sz] a tag for a zero terminated string (a C style string) which is the function's parameter-in this case, the file name.
It conveys a lot of information in a concise, formal and non-arbitrary way, but it isn't immediately readable. It is this apparent opacity, not Simonyi's nationality, that gave Hungarian notation its name.
Why go to such lengths to write apparently unreadable code? Because most programmers are born optimists. They tend to underestimate the length of a project, the complexity of their code, and the number of bugs in it.
A good style convention can bring estimates closer to reality. Brooks in "The Mythical Man Month" proposes a scheduling scheme of 1/3 planning, 1/6 coding, and 1/2 testing. Code conventions benefit each of these stages.
The planning stage usually involves constructing what might be termed a data dictionary-a class hierarchy containing data and methods. Double Dutch works best if the data formats are defined before coding begins. It provides a rigorous way to identify types, data structures, and functions in advance.
During coding, it enforces a close correspondence between a formal specification and implementation. Having a formalized way of writing variable and function names helps late-night coffee-assisted memories. It helps you avoid semantic contradictions like the one between "DisposPtr" and "DisposeControl," where one is written with an abbreviation and an 'e', the other not.
Simonyi and Heller talk of "type calculus" in their August 1991 Byte article. This is a mental discipline that is aided by Double Dutch style notation. The function and variable examples shown above provide a trivial example because the leading 'i' in iCCh [variable type] corresponds to the leading 'i' [return value] in iGetLengthSz(...). Type calculus comes into its own with the complex pointer arithmetic that C++ sometimes introduces. For example, pCh is a pointer to a character. Therefore, the 'p' component carries a memory of the original definition "char*" with it, making it easier to remember when to deference it.
In the prehistoric days when C compilers did not do much-or any-type checking, type calculus was helpful in tracking down some bugs. Nowadays, type calculus is still useful as another way of reviewing code during the testing phase-it complements dry runs, source level debuggers, compiler error messages, and encoded checks by providing a formal way of comparing the expectations of the code to runtime reality.
Double Dutch conventions also help overcome typical programming problems such as arbitrary abbreviations, inconsistency, sloppiness, large code atrophy, and "neat hack"-ism. The last two need some explanation.
Large code atrophy is a phrase I use for the naming and style problems that arise in large programs. For example, data type is defined in one header file. Later, a similar one gets defined in another file because the original has been forgotten or ignored or because it bears a name that wrongly suggests it doesn't apply to the situation.
"Neat-hack-ism" is the tendency among some C and C++ programmers to generate incomprehensible "write-only" code because it is a "neat hack." Embedding context, structure and purpose information into variable and function names can alleviate this kind of obfuscation. These problems are magnified when you work across platforms and when several programmers work on one project.
Programmers new to Double Dutch style tend to object to it on grounds of readability, inflexibility, and "cramping my style." The first two are valid objections, the latter mere prevarication. It's true that a program written in this style looks daunting, but then any high level language looks daunting to a non-programmer. Once the simple format is learned, a quick reading of a Double Dutch program yields a more comprehensive understanding of a piece of code. It is simply a matter of learning how to parse the names, and understanding the data structures unique to the program. This is what anyone has to do with a new program.
The accusation of inflexibility comes from the nuisance of updating variable and function names each time you change a type. In our example, if the programmer decided that a "long" rather than an "int" counter was required, every instance of the variable would have to be changed to lCCh, and the function to lGetLengthSz(...). This is a pain, even with global search and replace. In its defense, this change might draw attention to any dependence on an int counter.
C++ adds its own problems to programming by making it easier to write obscure code. Goldsmith and Palevich argue convincingly against frequent use of overloading and default arguments, and in favour of using strong type checking. A Double Dutch style complements this by expressing these self-imposed restraints in the code itself. Overloaded functions can be expressed without ambiguity in Double Dutch by changing the tags of the parameters or return value. For example, lGetLengthSz and lGetLengthFp might return the length of a file, but one takes a string and the other a file pointer as a parameter.
Implementation
Name construction
The centerpiece of Double Dutch is name construction. A name contains up to four component parts-the scope, type, qualifier and mnemonic-in the form [scope][type[s]][qualifier][mnemnonic].Any or all of the parts can be omitted. Think of the name as an address-the more information that is added, the clearer the destination becomes. Each component begins with a capital letter. Variable names begin with a lower case character, function names begin with a capital letter. Underscore characters are not used.
Double Dutch is applied to function or method names thus: [Return Type][Mnemonic Action(s)][Parameter Types], where the first is the return value of the function, "Action" is a description of the action of the function or method that may be transitive (eg "print" or "find"), and where parameters lists the type tags of formal arguments. In grammatical terms, the parameters are the objects of the verb.
Scope
The scope indicates the provenance of a variable. Function names don't really need scoping as C++ enforces various kinds of scoping information. A static member function is prefixed by its class name (eg TScreen::Draw()), and other member functions have a "parent" object (eg theScreen->Draw()). The idea of scoping a variable draws on Apple's conventions, as embodied in MacApp.the | A function or method arguments, for example lGetLengthSz(char* theSz). |
f | A local or member variable, for example class TClass {int fI;}; |
k | A constant defined using #define. |
c | A constant defined using the const keyword. |
g | A global variable (including static members of classes); for example, gApplication,TGame::gPlayingField. |
T | A class definition (as in TWindow in MacApp). |
M | For multiple inheritance classes (or "mix-in" classes). |
e | Enumerated type (eg eColorConstant). |
ec | Enumerated type member item (eg ecRed). |
Type
Define base types as abbreviations or acronyms of the type's description, or as some other memorable or random sequence of characters, preferably two or three characters long.If it's truly necessary to refer to the native C types such as word, unsigned-word and long word types, the tags w, u and l are acceptable. Standard base types, derived from Hungarian, are:
bf(flag) | A boolean flag. The qualifier indicates the condition under which the value is true, for example bfOpen. |
ch | A 1 byte ASCII character. |
sz | A 'C' type null terminated string. |
sp | A Pascal type string, where the first byte contains the length. |
p | A pointer. For example, pch is a pointer to a character ((char*) in c). |
h | A handle - a pointer to a pointer. |
Qualifier
The qualifier contains information about the use and purpose of the variable. This is almost pure Hungarian, and the following list is drawn from Simonyi and Heller:i | An index into an array of elements with the given type. |
c | Some count of instances of the given type (for example, cch is a count of characters). |
d | The numeric difference between two instances of the given type (for example, DX is the integer difference called X, perhaps the width of a rectangle). |
Temp (or T) | A temporary variable. |
Sav | A temporary variable from which the value will be restored. |
Prev | A save value that lags behind a current value by one iteration. |
Cur | The current value in some enumeration. |
Next | Next value in some enumeration. |
Dest, Src | Destination and source, for example used in buffer handling. |
Nil | An empty, invalid value for some variable type. |
1,2 | Numbers can be used to distinguish between similar variables. |
Buf | A buffer. |
Min | Smallest legal index. Typically defined to be 0. |
Max | The allocation limit of some stack. |
First | First element of some interval. |
Last | Last element of some interval. |
In naming functions, the mnemonic defines the operation of the function. It is possible to define a standard set of function mnemonics, for example "Get" and "Set" in instance access functions (theRect.IXGet() or thePoint.SetIX(10)).
Guidelines
See the sidebar for a brief list of style guidelines to keep handy, compiled from the articles listed in the bibliography and from our experience in-house. Guidelines are just that. They are not written in stone.I hope this article will provoke debate and thought on the subject. Some kind of style convention is vital-whether it is a home grown "adhocracy" or a strictly imposed formal discipline. Because computer programming remains a literal process, it is still important to say what you mean and mean what you say.
Bibliography
- "The Hungarian Revolution," Charles Simonyi and Martin Heller, Byte August 1991.
- "Programmers At Work" interview with Charles Simonyi, Microsoft Press.
- "Unofficial C++ Style Guide" Goldsmith and Palevich, DEVELOP issue 2.
- "The Mythical Man Month," F.P. Brooks Jr., N.Carolina, Addison Wesley 1982.

- SPREAD THE WORD:
- Slashdot
- Digg
- Del.icio.us
- Newsvine