home *** CD-ROM | disk | FTP | other *** search
Text File | 1992-10-18 | 38.9 KB | 1,043 lines |
-
-
-
-
-
-
-
-
-
- HHHHoooowwww TTTToooo SSSStttteeeeaaaallll CCCCooooddddeeee
- oooorrrr
- IIIInnnnvvvveeeennnnttttiiiinnnngggg TTTThhhheeee WWWWhhhheeeeeeeellll OOOOnnnnllllyyyy OOOOnnnncccceeee
-
-
- Henry Spencer
-
- Zoology Computer Systems
- 25 Harbord St.
- University of Toronto
- Toronto, Ont. M5S1A1 Canada
- {allegra,ihnp4,decvax,utai}!utzoo!henry
-
-
- _A_B_S_T_R_A_C_T
-
- Much is said about ``standing on other
- people's shoulders, not their toes'', but in fact
- the wheel is re-invented every day in the Unix/C
- community. Worse, often it is re-invented badly,
- with bumps, corners, and cracks. There are ways
- of avoiding this: some of them bad, some of them
- good, most of them under-appreciated and under-
- used.
-
-
-
- _I_n_t_r_o_d_u_c_t_i_o_n
-
- ``Everyone knows'' that that the UNIX/C|- community and its
- programmers are the very paragons of re-use of software. In
- some ways this is true. Brian Kernighan [1] and others have
- waxed eloquent about how outstanding UNIX is as an environ-
- ment for software re-use. Pipes, the shell, and the design
- of programs as `filters' do much to encourage programmers to
- build on others' work rather than starting from scratch.
- Major applications can be, and often are, written without a
- line of C. Of course, there are always people who insist on
- doing everything themselves, often citing `efficiency' as
- the compelling reason why they can't possibly build on the
- work of others (see [2] for some commentary on this). But
- surely these are the lamentable exceptions, rather than the
- rule?
-
- Well, in a word, no.
-
- At the level of shell programming, yes, software re-use is
- widespread in the UNIX/C community. Not quite as widespread
- _________________________
- |- UNIX is a trademark of Bell Laboratories.
-
-
-
-
- February 21, 1989
-
-
-
-
-
- - 2 -
-
-
- or as effective as it might be, but definitely common. When
- the time comes to write programs in C, however, the situa-
- tion changes. It took a radical change in directory format
- to make people use a library to read directories. Many new
- programs still contain hand-crafted code to analyze their
- arguments, even though prefabricated help for this has been
- available for years. C programmers tend to think that
- ``re-using software'' means being able to take the source
- for an existing program and edit it to produce the source
- for a new one. While that _i_s a useful technique, there are
- better ways.
-
- Why does it matter that re-invention is rampant? Apart from
- the obvious, that programmers have more work to do, I mean?
- Well, extra work for the programmers is not exactly an
- unmixed blessing, even from the programmers' viewpoint!
- Time spent re-inventing facilities that are already avail-
- able is time that is _n_o_t available to improve user inter-
- faces, or to make the program run faster, or to chase down
- the proverbial Last Bug. Or, to get really picky, to make
- the code readable and clear so that our successors can
- _u_n_d_e_r_s_t_a_n_d it.
-
- Even more seriously, re-invented wheels are often square.
- Every time that a line of code is re-typed is a new chance
- for bugs to be introduced. There will always be the tempta-
- tion to take shortcuts based on how the code will be used-
- shortcuts that may turn around and bite the programmer when
- the program is modified or used for something unexpected.
- An inferior algorithm may be used because it's ``good
- enough'' and the better algorithms are too difficult to
- reproduce on the spur of the moment... but the definition of
- ``good enough'' may change later. And unless the program is
- well-commented [here we pause for laughter], the next person
- who works on it will have to study the code at length to
- dispel the suspicion that there is some subtle reason for
- the seeming re-invention. Finally, to quote [2], _i_f _y_o_u
- _r_e-_i_n_v_e_n_t _t_h_e _s_q_u_a_r_e _w_h_e_e_l, _y_o_u _w_i_l_l _n_o_t _b_e_n_e_f_i_t _w_h_e_n _s_o_m_e_-
- _b_o_d_y _e_l_s_e _r_o_u_n_d_s _o_f_f _t_h_e _c_o_r_n_e_r_s.
-
- In short, re-inventing the wheel ought to be a rare event,
- occurring only for the most compelling reasons. Using an
- existing wheel, or improving an existing one, is usually
- superior in a variety of ways. There is nothing dishonor-
- able about stealing code* to make life easier and better.
-
- _T_h_e_f_t _v_i_a _t_h_e _E_d_i_t_o_r
-
- UNIX historically has flourished in environments in which
- full sources for the system are available. This led to the
- _________________________
- * Assuming no software licences, copyrights, patents,
- etc. are violated!
-
-
-
-
- February 21, 1989
-
-
-
-
-
- - 3 -
-
-
- most obvious and crudest way of stealing code: copy the
- source of an existing program and edit it to do something
- new.
-
- This approach does have its advantages. By its nature, it
- is the most flexible method of stealing code. It may be the
- only viable approach when what is desired is some variant of
- a complex algorithm that exists only within an existing pro-
- gram; a good example was V7 _d_u_m_p_d_i_r (which printed a table
- of contents of a backup tape), visibly a modified copy of V7
- _r_e_s_t_o_r (the only other program that understood the obscure
- format of backup tapes). And it certainly is easy.
-
- On the other hand, this approach also has its problems. It
- creates two subtly-different copies of the same code, which
- have to be maintained separately. Worse, they often have to
- be maintained ``separately but simultaneously'', because the
- new program inherits all the mistakes of the original. Fix-
- ing the same bug repeatedly is so mind-deadening that there
- is great temptation to fix it in only the program that is
- actually giving trouble... which means that when the other
- gives trouble, re-doing the cure must be preceded by re-
- doing the investigation and diagnosis. Still worse, such
- non-simultaneous bug fixes cause the variants of the code to
- diverge steadily. This is also true of improvements and
- cleanup work.
-
- A program created in this way may also be inferior, in some
- ways, to one created from scratch. Often there will be ves-
- tigial code left over from the program's evolutionary ances-
- tors. Apart from consuming resources (and possibly harbor-
- ing bugs) without a useful purpose, such vestigial code
- greatly complicates understanding the new program in isola-
- tion.
-
- There is also the possibility that the new program has
- inherited a poor algorithm from the old one. This is actu-
- ally a universal problem with stealing code, but it is espe-
- cially troublesome with this technique because the original
- program probably was not built with such re-use in mind.
- Even if its algorithms were good for _i_t_s intended purpose,
- they may not be versatile enough to do a good job in their
- new role.
-
- One relatively clean form of theft via editing is to alter
- the original program's source to generate either desired
- program by conditional compilation. This eliminates most of
- the problems. Unfortunately, it does so only if the two
- programs are sufficiently similar that they can share most
- of the source. When they diverge significantly, the result
- can be a maintenance nightmare, actually worse than two
- separate sources. Given a close similarity, though, this
- method can work well.
-
-
-
-
- February 21, 1989
-
-
-
-
-
- - 4 -
-
-
- _T_h_e_f_t _v_i_a _L_i_b_r_a_r_i_e_s
-
- The obvious way of using somebody else's code is to call a
- library function. Here, UNIX has had some success stories.
- Almost everybody uses the _s_t_d_i_o library rather than invent-
- ing their own buffered-I/O package. (That may sound trivial
- to those who never programmed on a V6 or earlier UNIX, but
- in fact it's a great improvement on the earlier state of
- affairs.) The simpler sorts of string manipulations are
- usually done with the _s_t_r_x_x_x functions rather than by hand-
- coding them, although efficiency issues and the wide diver-
- sity of requirements have limited these functions to less
- complete success. Nobody who knows about _q_s_o_r_t bothers to
- write his own sorting function.
-
- However, these success stories are pleasant islands in an
- ocean of mud. The fact is that UNIX's libraries are a dis-
- grace. They are well enough implemented, and their design
- flaws are seldom more than nuisances, but there aren't
- _e_n_o_u_g_h of them! Ironically, UNIX's ``poor cousin'', the
- Software Tools community [3,4], has done much better at
- this. Faced with a wild diversity of different operating
- systems, they were forced to put much more emphasis on iden-
- tifying clean abstractions for system services.
-
- For example, the Software Tools version of _l_s runs
- unchanged, _w_i_t_h_o_u_t conditional compilation, on dozens of
- different operating systems [4]. By contrast, UNIX programs
- that read directories invariably dealt with the raw system
- data structures, until Berkeley turned this cozy little
- world upside-down with a change to those data structures.
- The Berkeley implementors were wise enough to provide a
- library for directory access, rather than just documenting
- the new underlying structure. However, true to the UNIX
- pattern, they designed a library which quietly assumed (in
- some of its naming conventions) that the underlying system
- used _t_h_e_i_r structures! This particular nettle has finally
- been grasped firmly by the IEEE POSIX project [5], at the
- cost of yet another slightly-incompatible interface.
-
- The adoption of the new directory libraries is not just a
- matter of convenience and portability: in general the
- libraries are faster than the hand-cooked code they replace.
- Nevertheless, Berkeley's original announcement of the change
- was greeted with a storm of outraged protest.
-
- Directories, alas, are not an isolated example. The UNIX/C
- community simply hasn't made much of an effort to identify
- common code and package it for re-use. One of the two major
- variants of UNIX still lacks a library function for binary
- search, an algorithm which is notorious for both the perfor-
- mance boost it can produce and the difficulty of coding a
- fully-correct version from scratch. No major variant of
- UNIX has a library function for either one of the following
-
-
-
- February 21, 1989
-
-
-
-
-
- - 5 -
-
-
- code fragments, both omnipresent (or at least, they _s_h_o_u_l_d
- be omnipresent [6]) in simple* programs that use the
- relevant facilities:
-
- if ((f = fopen(filename, mode)) == NULL)
- _p_r_i_n_t _e_r_r_o_r _m_e_s_s_a_g_e _w_i_t_h _f_i_l_e_n_a_m_e, _m_o_d_e, _a_n_d _s_p_e_c_i_f_i_c
- _r_e_a_s_o_n _f_o_r _f_a_i_l_u_r_e, _a_n_d _t_h_e_n _e_x_i_t
-
-
- if ((p = malloc(amount)) == NULL)
- _p_r_i_n_t _e_r_r_o_r _m_e_s_s_a_g_e _a_n_d _e_x_i_t
-
- These may sound utterly trivial, but in fact programmers
- almost never produce as good an error message for _f_o_p_e_n as
- ten lines of library code can, and half the time the return
- value from _m_a_l_l_o_c isn't checked at all!
-
- These examples illustrate a general principle, a side bene-
- fit of stealing code: the way to encourage standardization|-
- and quality is to make it easier to be careful and standard
- than to be sloppy and non-standard. On systems with library
- functions for error-checked _f_o_p_e_n and _m_a_l_l_o_c, it is easier
- to use the system functions-which take some care to do ``the
- right thing''-than to kludge it yourself. This makes con-
- verts very quickly.
-
- These are not isolated examples. Studying the libraries of
- most any UNIX system will yield other ideas for useful
- library functions (as well as a lot of silly nonsense that
- UNIX doesn't need, usually!). A few years of UNIX systems
- programming also leads to recognition of repeated needs.
- Does _y_o_u_r* UNIX have library functions to:
-
- +o decide whether a filename is well-formed (contains no
- control characters, shell metacharacters, or white
- space, and is within any name-length limits your
- _________________________
- * I include the qualification ``simple'' because
- complex programs often want to do more intelligent
- error recovery than these code fragments suggest.
- However, _m_o_s_t of the programs that use these functions
- _d_o_n'_t need fancy error recovery, and the error
- responses indicated are _b_e_t_t_e_r than the ones those
- programs usually have now!
- |- Speaking of encouraging standardization: we use the
- names _e_f_o_p_e_n and _e_m_a_l_l_o_c for the checked versions of
- _f_o_p_e_n and _m_a_l_l_o_c, and arguments and returned values are
- the same as the unchecked versions except that the
- returned value is guaranteed non-NULL if the function
- returns at all.
- * As you might guess, my system has all of these. Most
- of them are trivial to write, or are available in
- public-domain forms.
-
-
-
-
- February 21, 1989
-
-
-
-
-
- - 6 -
-
-
- system sets)?
-
- +o close all file descriptors except the standard ones?
-
- +o compute a standard CRC (Cyclic Redundancy Check
- ``checksum'')?
-
- +o operate on _m_a_l_l_o_ced unlimited-length strings?
-
- +o do what _a_c_c_e_s_s(2) does but using the effective
- userid?
-
- +o expand metacharacters in a filename the same way the
- shell does? (the simplest way to make sure that the
- two agree is to use _p_o_p_e_n and _e_c_h_o for anything com-
- plicated)
-
- +o convert integer baud rates to and from the speed
- codes used by your system's serial-line _i_o_c_t_ls?
-
- +o convert integer file modes to and from the _r_w_x
- strings used|- to present such modes to humans?
-
- +o do a binary search through a file the way _l_o_o_k(1)
- does?
-
- The above are fairly trivial examples of the sort of things
- that _o_u_g_h_t to be in UNIX libraries. More sophisticated
- libraries can also be useful, especially if the language
- provides better support for them than C does; C++ is an
- example [7]. Even in C, though, there is much room for
- improvement.
-
- Adding library functions does have its disadvantages. The
- interface to a library function is important, and getting it
- right is hard. Worse, once users have started using one
- version of an interface, changing it is very difficult even
- when hindsight clearly shows mistakes; the near-useless
- return values of some of the common UNIX library functions
- are obvious examples. Satisfactory handling of error condi-
- tions can be difficult. (For example, the error-checking
- _m_a_l_l_o_c mentioned earlier is very handy for programmers, but
- invoking it from a library function would be a serious mis-
- take, removing any possibility of more intelligent response
- to that error.) And there is the perennial headache of try-
- ing to get others to adopt your pet function, so that pro-
- grams using it can be portable without having to drag the
- source of the function around too. For all this, though,
- libraries are in many ways the most satisfactory way of
- _________________________
- |- If you think only _l_s uses these, consider that _r_m and
- some similar programs _o_u_g_h_t to use _r_w_x strings, not
- octal modes, when requesting confirmation!
-
-
-
-
- February 21, 1989
-
-
-
-
-
- - 7 -
-
-
- encouraging code theft.
-
- Alas, encouraging code theft does not guarantee it. Even
- widely-available library functions often are not used nearly
- as much as they should be. A conspicuous example is _g_e_t_o_p_t,
- for command-line argument parsing. _G_e_t_o_p_t supplies only
- quite modest help in parsing the command line, but the stan-
- dardization and consistency that its use produces is still
- quite valuable; there are far too many pointless variations
- in command syntax in the hand-cooked argument parsers in
- most UNIX programs. Public-domain implementations of _g_e_t_o_p_t
- have been available for years, and AT&T has published (!)
- the source for the System V implementation. Yet people con-
- tinue to write their own argument parsers. There is one
- valid reason for this, to be discussed in the next section.
- There are also a number of excuses, mostly the standard ones
- for not using library functions:
-
- +o ``It doesn't do quite what I want.'' _B_u_t _o_f_t_e_n _i_t _i_s
- _c_l_o_s_e _e_n_o_u_g_h _t_o _s_e_r_v_e, _a_n_d _t_h_e _c_o_m_b_i_n_e_d _b_e_n_e_f_i_t_s _o_f
- _c_o_d_e _t_h_e_f_t _a_n_d _s_t_a_n_d_a_r_d_i_z_a_t_i_o_n _o_u_t_w_e_i_g_h _t_h_e _m_i_n_o_r
- _m_i_s_m_a_t_c_h_e_s.
-
- +o ``Calling a library function is too inefficient.''
- _T_h_i_s _i_s _m_o_s_t_l_y _h_e_a_r_d _f_r_o_m _p_e_o_p_l_e _w_h_o _h_a_v_e _n_e_v_e_r _p_r_o_-
- _f_i_l_e_d _t_h_e_i_r _p_r_o_g_r_a_m_s _a_n_d _h_e_n_c_e _h_a_v_e _n_o reliable
- _i_n_f_o_r_m_a_t_i_o_n _a_b_o_u_t _w_h_a_t _t_h_e_i_r _c_o_d_e'_s _e_f_f_i_c_i_e_n_c_y _p_r_o_b_-
- _l_e_m_s _a_r_e [_2].
-
- +o ``I didn't know about it.'' _C_o_m_p_e_t_e_n_t _p_r_o_g_r_a_m_m_e_r_s
- _k_n_o_w _t_h_e _c_o_n_t_e_n_t_s _o_f _t_h_e_i_r _t_o_o_l_b_o_x_e_s.
-
- +o ``That whole concept is ugly, and should be
- redesigned.'' (Often said of _g_e_t_o_p_t, since the usual
- UNIX single-letter-option syntax that _g_e_t_o_p_t imple-
- ments is widely criticized as user-hostile.) _H_o_w
- _l_i_k_e_l_y _i_s _i_t _t_h_a_t _t_h_e _r_e_s_t _o_f _t_h_e _w_o_r_l_d _w_i_l_l _g_o _a_l_o_n_g
- _w_i_t_h _y_o_u_r _r_e_d_e_s_i_g_n (_a_s_s_u_m_i_n_g _y_o_u _e_v_e_r _f_i_n_i_s_h _i_t)?
- _C_o_n_s_i_s_t_e_n_c_y _a_n_d _a _h_i_g_h-_q_u_a_l_i_t_y _i_m_p_l_e_m_e_n_t_a_t_i_o_n _a_r_e
- _v_a_l_u_a_b_l_e _e_v_e_n _i_f _t_h_e _s_t_a_n_d_a_r_d _b_e_i_n_g _i_m_p_l_e_m_e_n_t_e_d _i_s
- _s_u_b_o_p_t_i_m_a_l.
-
- +o ``I would have done it differently.'' _T_h_e _t_r_i_u_m_p_h _o_f
- _p_e_r_s_o_n_a_l _t_a_s_t_e _o_v_e_r _p_r_o_f_e_s_s_i_o_n_a_l _p_r_o_g_r_a_m_m_i_n_g.
-
- _T_h_e_f_t _v_i_a _T_e_m_p_l_a_t_e_s
-
- _T_e_m_p_l_a_t_e_s are a major and much-neglected approach to code
- sharing: ``boilerplate'' programs which contain a
- carefully-written skeleton for some moderately stereotyped
- task, which can then be adapted and filled in as needed.
- This method has some of the vices of modifying existing pro-
- grams, but the template can be designed for the purpose,
- with attention to quality and versatility.
-
-
-
- February 21, 1989
-
-
-
-
-
- - 8 -
-
-
- Templates can be particularly useful when library functions
- are used in a stereotyped way that is a little complicated
- to write from scratch; _g_e_t_o_p_t is an excellent example. The
- one really valid objection to _g_e_t_o_p_t is that its invocation
- is not trivial, and typing in the correct sequence from
- scratch is a real test of memory. The usual _g_e_t_o_p_t manual
- page contains a lengthy example which is essentially a tem-
- plate for a _g_e_t_o_p_t-using program.
-
- When the first public-domain _g_e_t_o_p_t appeared, it quickly
- became clear that it would be convenient to have a template
- for its use handy. This template eventually grew to incor-
- porate a number of other things: a useful macro or two,
- definition of _m_a_i_n, opening of files in the standard UNIX
- filter fashion, checking for mistakes like opening a direc-
- tory, filename and line-number tracking for error messages,
- and some odds and ends. The full current version can be
- found in the Appendix; actually it diverged into two dis-
- tinct versions when it became clear that some filters wanted
- the illusion of a single input stream, while others wanted
- to handle each input file individually (or didn't care).
-
- The obvious objection to this line of development is ``it's
- more complicated than I need''. In fact, it turns out to be
- surprisingly convenient to have all this machinery presup-
- plied. _I_t _i_s _m_u_c_h _e_a_s_i_e_r _t_o _a_l_t_e_r _o_r _d_e_l_e_t_e _l_i_n_e_s _o_f _c_o_d_e
- _t_h_a_n _t_o _a_d_d _t_h_e_m. If directories are legitimate input, just
- delete the code that catches them. If no filenames are
- allowed as input, or exactly one must be present, change one
- line of code to enforce the restriction and a few more to
- deal with the arguments correctly. If the arguments are not
- filenames at all, just delete the bits of code that assume
- they are. And so forth.
-
- The job of writing an ordinary filter-like program is
- reduced to filling in two or three blanks* in the template,
- and then writing the code that actually processes the data.
- Even quick improvisations become good-quality programs,
- doing things the standard way with all the proper amenities,
- because even a quick improvisation is easier to do by start-
- ing from the template. _T_e_m_p_l_a_t_e_s _a_r_e _a_n _u_n_m_i_x_e_d _b_l_e_s_s_i_n_g;
- _a_n_y_o_n_e _w_h_o _t_y_p_e_s _a _n_o_n-_t_r_i_v_i_a_l _p_r_o_g_r_a_m _i_n _f_r_o_m _s_c_r_a_t_c_h _i_s
- _w_a_s_t_i_n_g _h_i_s _t_i_m_e _a_n_d _h_i_s _e_m_p_l_o_y_e_r'_s _m_o_n_e_y.
-
- Templates are also useful for other stereotyped files, even
- ones that are not usually thought of as programs. Most ver-
- sions of UNIX have a simple template for manual pages hiding
- somewhere (in V7 it was /_u_s_r/_m_a_n/_m_a_n_0/_x_x). Shell files that
- want to analyze complex argument lists have the same _g_e_t_o_p_t
- problem as C programs, with the same solution. There is
- _________________________
- * All marked with the string `xxx' to make them easy
- for a text editor to find.
-
-
-
-
- February 21, 1989
-
-
-
-
-
- - 9 -
-
-
- enough machinery in a ``production-grade'' _m_a_k_e file to make
- a template worthwhile, although this one tends to get
- altered fairly heavily; our current one is in the Appendix.
-
- _T_h_e_f_t _v_i_a _I_n_c_l_u_s_i_o_n
-
- Source inclusion (####iiiinnnncccclllluuuuddddeeee) provides a way of sharing both
- data structures and executable code. Header files (e.g.
- _s_t_d_i_o._h) in particular tend to be taken for granted. Again,
- those who haven't been around long enough to remember V6
- UNIX may have trouble grasping what a revolution it was when
- V7 introduced systematic use of header files!
-
- However, even mundane header files could be rather more use-
- ful than they normally are now. Data structures in header
- files are widely accepted, but there is somewhat less use of
- them to declare the return types of functions. One or two
- common header files like _s_t_d_i_o._h and _m_a_t_h._h do this, but
- programmers are still used to the idea that the type of
- (e.g.) _a_t_o_l has to be typed in by hand. Actually, all too
- often the programmer says ``oh well, on my machine it works
- out all right if I don't bother declaring _a_t_o_l'', and the
- result is dirty and unportable code. The X3J11 draft ANSI
- standard for C addresses this by defining some more header
- files and requiring their use for portable programs, so that
- the header files can do all the work and do it _r_i_g_h_t.
-
- In principle, source inclusion can be used for more than
- just header files. In practice, almost anything that can be
- done with source inclusion can be done, and usually done
- more cleanly, with header files and libraries. There are
- occasional specialized exceptions, such as using macro
- definitions and source inclusion to fake parameterized data
- types.
-
- _T_h_e_f_t _v_i_a _I_n_v_o_c_a_t_i_o_n
-
- Finally, it is often possible to steal another program's
- code simply by invoking that program. Invoking other pro-
- grams via _s_y_s_t_e_m or _p_o_p_e_n for things that are easily done in
- C is a common beginner's error. More experienced program-
- mers can go too far the other way, however, insisting on
- doing everything in C, even when a leavening of other
- methods would give better results. The best way to sort a
- large file is probably to invoke _s_o_r_t(1), not to do it your-
- self. Even invoking a shell file can be useful, although a
- bit odd-seeming to most C programmers, when elaborate file
- manipulation is needed and efficiency is not critical.
-
- Aside from invoking other programs at run time, it can also
- be useful to invoke them at compile time. Particularly when
- dealing with large tables, it is often better to dynamically
- generate the C code from some more compact and readable
- notation. _Y_a_c_c and _l_e_x are familiar examples of this on a
-
-
-
- February 21, 1989
-
-
-
-
-
- - 10 -
-
-
- large scale, but simple _s_e_d and _a_w_k programs can build
- tables in more specialized, application-specific ways.
- Whether this is really theft is debatable, but it's a valu-
- able technique all the same. It can neatly bypass a lot of
- objections that start with ``but C won't let me write...''.
-
- _A_n _E_x_c_e_s_s _o_f _I_n_v_e_n_t_i_o_n
-
- With all these varied methods, why is code theft not more
- widespread? Why are so many programs unnecessarily invented
- from scratch?
-
- The most obvious answer is the hardest to counter: theft
- requires that there be something to steal. Use of library
- functions is impossible unless somebody sets up a library.
- Designing the interfaces for library functions is not easy.
- Worse, doing it _w_e_l_l requires insight, which generally isn't
- available on demand. The same is true, to varying degrees,
- for the other forms of theft.
-
- Despite its reputation as a hotbed of software re-use, UNIX
- is actually hostile to some of these activities. If UNIX
- directories had been complex and obscure, directory-reading
- libraries would have been present from the beginning. As it
- is, it was simply _t_o_o _e_a_s_y to do things ``the hard way''.
- There _s_t_i_l_l is no portable set of functions to perform the
- dozen or so useful manipulations of terminal modes that a
- user program might want to do, a major nuisance because
- changing those modes ``in the raw'' is simple but highly
- unportable.
-
- Finally, there is the Not Invented Here syndrome, and its
- relatives, Not Good Enough and Not Understood Here. How
- else to explain AT&T UNIX's persistent lack of the _d_b_m
- library for hashed databases (even though it was developed
- at Bell Labs and hence is available to AT&T), and Berkeley
- UNIX's persistent lack of the full set of _s_t_r_x_x_x functions
- (even though a public-domain implementation has existed for
- years)? The X3J11 and POSIX efforts are making some pro-
- gress at developing a common nucleus of functionality, but
- they are aiming at a common subset of current systems, when
- what is really wanted is a common superset.
-
- _C_o_n_c_l_u_s_i_o_n
-
- In short, never build what you can (legally) steal! Done
- right, it yields better programs for less work.
-
- _R_e_f_e_r_e_n_c_e_s
-
-
- [1] Brian W. Kernighan, _T_h_e _U_n_i_x _S_y_s_t_e_m _a_n_d _S_o_f_t_w_a_r_e _R_e_u_s_a_-
- _b_i_l_i_t_y, IEEE Transactions on Software Engineering, Vol
- SE-10, No. 5, Sept. 1984, pp. 513-8.
-
-
-
- February 21, 1989
-
-
-
-
-
- - 11 -
-
-
- [2] Geoff Collyer and Henry Spencer, _N_e_w_s _N_e_e_d _N_o_t _B_e _S_l_o_w,
- Usenix Winter 1987 Technical Conference, pp. 181-190.
-
- [3] Brian W. Kernighan and P.J. Plauger, _S_o_f_t_w_a_r_e _T_o_o_l_s,
- Addison-Wesley, Reading, Mass. 1976.
-
- [4] Mike O'Dell, _U_N_I_X: _T_h_e _W_o_r_l_d _V_i_e_w, Usenix Winter 1987
- Technical Conference, pp. 35-45.
-
- [5] IEEE, _I_E_E_E _T_r_i_a_l-_U_s_e _S_t_a_n_d_a_r_d _1_0_0_3._1 (_A_p_r_i_l _1_9_8_6): _P_o_r_t_-
- _a_b_l_e _O_p_e_r_a_t_i_n_g _S_y_s_t_e_m _f_o_r _C_o_m_p_u_t_e_r _E_n_v_i_r_o_n_m_e_n_t_s, IEEE
- and Wiley-Interscience, New York, 1986.
-
- [6] Ian Darwin and Geoff Collyer, _C_a_n'_t _H_a_p_p_e_n _o_r /*
- _N_O_T_R_E_A_C_H_E_D */ _o_r _R_e_a_l _P_r_o_g_r_a_m_s _D_u_m_p _C_o_r_e, Usenix Winter
- 1985 Technical Conference, pp. 136-151.
-
- [7] Bjarne Stroustrup, _T_h_e _C++ _P_r_o_g_r_a_m_m_i_n_g _L_a_n_g_u_a_g_e,
- Addison-Wesley, Reading, Mass. 1986.
-
- _A_p_p_e_n_d_i_x
-
- Warning: these templates have been in use for varying
- lengths of time, and are not necessarily all entirely bug-
- free.
-
- _C _p_r_o_g_r_a_m, _s_i_n_g_l_e _s_t_r_e_a_m _o_f _i_n_p_u_t
-
- /*
- * name - purpose xxx
- *
- * $Log$
- */
- #include <stdio.h>
- #include <sys/types.h>
- #include <sys/stat.h>
- #include <string.h>
- #define MAXSTR 500 /* For sizing strings -- DON'T use BUFSIZ! */
- #define STREQ(a, b) (*(a) == *(b) && strcmp((a), (b)) == 0)
- #ifndef lint
- static char RCSid[] = "$Header$";
- #endif
- int debug = 0;
- char *progname;
- char **argvp; /* scan pointer for nextfile() */
- char *nullargv[] = { "-", NULL }; /* dummy argv for case of no args */
- char *inname; /* filename for messages etc. */
- long lineno; /* line number for messages etc. */
- FILE *in = NULL; /* current input file */
- extern void error(), exit();
- #ifdef UTZOOERR
- extern char *mkprogname();
- #else
- #define mkprogname(a) (a)
- #endif
- char *nextfile();
- void fail();
- /*
- - main - parse arguments and handle options
- */
- main(argc, argv)
- int argc;
- char *argv[];
- {
- int c;
- int errflg = 0;
- extern int optind;
- extern char *optarg;
- void process();
- progname = mkprogname(argv[0]);
-
-
-
- February 21, 1989
-
-
-
-
- - 12 -
-
-
- while ((c = getopt(argc, argv, "xxxd")) != EOF)
- switch (c) {
- case 'xxx': /* xxx meaning of option */
- xxx
- break;
- case 'd': /* Debugging. */
- debug++;
- break;
- case '?':
- default:
- errflg++;
- break;
- }
- if (errflg) {
- fprintf(stderr, "usage: %s ", progname);
- fprintf(stderr, "xxx [file] ...\n");
- exit(2);
- }
- if (optind >= argc)
- argvp = nullargv;
- else
- argvp = &argv[optind];
- inname = nextfile();
- if (inname != NULL)
- process();
- exit(0);
- }
- /*
- - getline - get next line (internal version of fgets)
- */
- char *
- getline(ptr, size)
- char *ptr;
- int size;
- {
- register char *namep;
- while (fgets(ptr, size, in) == NULL) {
- namep = nextfile();
- if (namep == NULL)
- return(NULL);
- inname = namep; /* only after we know it's good */
- }
- lineno++;
- return(ptr);
- }
- /*
- - nextfile - switch files
- */
- char * /* filename */
- nextfile()
- {
- register char *namep;
- struct stat statbuf;
- extern FILE *efopen();
- if (in != NULL)
- (void) fclose(in);
- namep = *argvp;
- if (namep == NULL) /* no more files */
- return(NULL);
- argvp++;
- if (STREQ(namep, "-")) {
- in = stdin;
- namep = "stdin";
- } else {
- in = efopen(namep, "r");
- if (fstat(fileno(in), &statbuf) < 0)
- error("can't fstat `%s'", namep);
- if ((statbuf.st_mode & S_IFMT) == S_IFDIR)
- error("`%s' is directory!", namep);
- }
- lineno = 0;
- return(namep);
- }
- /*
- - fail - complain and die
- */
- void
- fail(s1, s2)
- char *s1;
- char *s2;
- {
- fprintf(stderr, "%s: (file `%s', line %ld) ", progname, inname, lineno);
- fprintf(stderr, s1, s2);
- fprintf(stderr, "\n");
- exit(1);
- }
- /*
- - process - process input data
- */
- void
- process()
- {
- char line[MAXSTR];
- while (getline(line, (int)sizeof(line)) != NULL) {
- xxx
- }
- }
-
-
-
- February 21, 1989
-
-
-
-
- - 13 -
-
-
- _C _p_r_o_g_r_a_m, _s_e_p_a_r_a_t_e _i_n_p_u_t _f_i_l_e_s
-
- /*
- * name - purpose xxx
- *
- * $Log$
- */
- #include <stdio.h>
- #include <sys/types.h>
- #include <sys/stat.h>
- #include <string.h>
- #define MAXSTR 500 /* For sizing strings -- DON'T use BUFSIZ! */
- #define STREQ(a, b) (*(a) == *(b) && strcmp((a), (b)) == 0)
- #ifndef lint
- static char RCSid[] = "$Header$";
- #endif
- int debug = 0;
- char *progname;
- char *inname; /* filename for messages etc. */
- long lineno; /* line number for messages etc. */
- extern void error(), exit();
- #ifdef UTZOOERR
- extern char *mkprogname();
- #else
- #define mkprogname(a) (a)
- #endif
- void fail();
- /*
- - main - parse arguments and handle options
- */
- main(argc, argv)
- int argc;
- char *argv[];
- {
- int c;
- int errflg = 0;
- FILE *in;
- struct stat statbuf;
- extern int optind;
- extern char *optarg;
- extern FILE *efopen();
- void process();
- progname = mkprogname(argv[0]);
- while ((c = getopt(argc, argv, "xxxd")) != EOF)
- switch (c) {
- case 'xxx': /* xxx meaning of option */
- xxx
- break;
- case 'd': /* Debugging. */
- debug++;
- break;
- case '?':
- default:
- errflg++;
- break;
- }
- if (errflg) {
- fprintf(stderr, "usage: %s ", progname);
- fprintf(stderr, "xxx [file] ...\n");
- exit(2);
- }
- if (optind >= argc)
- process(stdin, "stdin");
- else
- for (; optind < argc; optind++)
- if (STREQ(argv[optind], "-"))
- process(stdin, "-");
- else {
- in = efopen(argv[optind], "r");
- if (fstat(fileno(in), &statbuf) < 0)
- error("can't fstat `%s'", argv[optind]);
- if ((statbuf.st_mode & S_IFMT) == S_IFDIR)
- error("`%s' is directory!", argv[optind]);
- process(in, argv[optind]);
- (void) fclose(in);
- }
- exit(0);
- }
- /*
- - process - process input file
- */
- void
- process(in, name)
- FILE *in;
- char *name;
- {
- char line[MAXSTR];
- inname = name;
- lineno = 0;
- while (fgets(line, sizeof(line), in) != NULL) {
- lineno++;
- xxx
- }
- }
-
-
-
- February 21, 1989
-
-
-
-
- - 14 -
-
-
- /*
- - fail - complain and die
- */
- void
- char *s1;
- char *s2;
- {
- fprintf(stderr, "%s: (file `%s', line %ld) ", progname, inname, lineno);
- fprintf(stderr, s1, s2);
- fprintf(stderr, "\n");
- exit(1);
- }
-
- _M_a_k_e _f_i_l_e
-
- # Things you might want to put in ENV and LENV:
- # -Dvoid=int compiler lacks void
- # -DCHARBITS=0377 compiler lacks unsigned char
- # -DSTATIC=extern compiler dislikes "static foo();" as forward decl.
- # -DREGISTER= machines with few registers for register variables
- # -DUTZOOERR have utzoo-compatible error() function and friends
- ENV = -DSTATIC=extern -DREGISTER= -DUTZOOERR
- LENV = -Dvoid=int -DCHARBITS=0377 -DREGISTER= -DUTZOOERR
- # Things you might want to put in TEST:
- # -DDEBUG debugging hooks
- # -I. header files in current directory
- TEST = -DDEBUG
- # Things you might want to put in PROF:
- # -Dstatic='/* */' make everything global so profiler can see it.
- # -p profiler
- PROF =
- CFLAGS = -O $(ENV) $(TEST) $(PROF)
- LINTFLAGS = $(LENV) $(TEST) -ha
- LDFLAGS = -i
- OBJ = xxx
- LSRC = xxx
- DTR = README dMakefile tests tests.good xxx.c
- xxx: xxx.o
- $(CC) $(CFLAGS) $(LDFLAGS) xxx.o -o xxx
- xxx.o: xxx.h
- lint: $(LSRC)
- lint $(LINTFLAGS) $(LSRC) | tee lint
- r: xxx tests tests.good # Regression test.
- xxx <tests >tests.new
- diff -h tests.new tests.good && rm tests.new
- # Prepare good output for regression test -- name isn't "tests.good"
- # because human judgement is needed to decide when output is good.
- good: xxx tests
- xxx <tests >tests.good
- dtr: r $(DTR)
- makedtr $(DTR) >dtr
- dMakefile: Makefile
- sed '/^L*ENV=/s/ *-DUTZOOERR//' Makefile >dMakefile
- clean:
- rm -f *.o lint tests.new dMakefile dtr core mon.out xxx
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- February 21, 1989
-
-