home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!spool.mu.edu!sdd.hp.com!cs.utexas.edu!usc!srhqla!quest!kdq
- From: kdq@quest.UUCP (Kevin D. Quitt)
- Newsgroups: comp.lang.c
- Subject: Re: Questions about token merging and trigraphs
- Message-ID: <XZo2VB1w165w@quest.UUCP>
- Date: Sat, 19 Dec 92 11:51:08 PST
- References: <1TA04FH@cdis-1.compu.com>
- Reply-To: srhqla!quest!kdq
- Organization: Job quest (805) 251-8210, So Cal: (800) 400-8210
- Lines: 118
-
- tanner@cdis-1.compu.com (Dr. T. Andrews) writes:
-
- > Underneath a bogus ``reply-to'' header, kdq@quest.UUCP writes..
-
- My reply-to line goes out of this site as "srhqla!quest!kdq"; if
- another site is munging that, there's little I can do.
-
- > First, necessity: a trivial proof exists that there is no machine
- > on which the characters can not be represented in some other way.
-
- The necessity is to provide a *single* representation that is
- accpeted on *all* machines. Certainly, one could map the APL character
- set onto ASCII, and program that way; the fact that something can be
- done does not make it useful or meaningful.
-
-
- > Second, utility: the most commonly-cited character set for which
- > trigraphs are ``justified'', EBCDIC, certainly has the characters
- > in question. Disagreement on print-train convention may mean
- > that some people see different characters in place of their
- > expected, on some of their printers. A similar problem exists
- > for people whose terminals replace certain ASCII characters
- > according to local preference.
-
- From the rationale:
-
- "C derived its repetoire from the ASCII codeset. Unfortunately the
- ASCII repetoire is not a subset of all other commonly used character
- sets, and widespread practice in Europe is not to implement all of
- ASCII either, but use some parts of its collating sequence for special
- national characters.
-
- "The solution is an internationally agreed-upon repetoire, in terms of
- which an international representation of C can be defined. The ISO has
- defined such a standard: ISO 646 describes an _invariant subset_ of
- ASCII.
-
- "The characters in the ASCII repetoire used by C and absent from the
- ISO 646 repetoire are: # [ ] { } \ | ~ ^ <<hash mark, left and right
- bracket, left and right brace, back-slash, vertical bar, tilde and
- caret>>.
-
- "...The obvious idea of defining two-character escape sequences fails
- because C uses all the characters which _are_ in the ISO 646 repetoire:
- no single escape character is available. The best that can be done is
- to use a _trigraph_ - an escape digraph followed by a distinguishing
- character."
-
-
- > This means that, without trigraphs, some people would have to
- > accept what their printers give; others would have to configure
- > terminals to show the ASCII characters when writing C. It is not
- > clear that trigraphs are preferable.
-
- This means that some people would have alphabetic characters in
- place of those glyphs. How'd you like to have code that looked like:
-
- for L i=0; i<max; i++ M J
- aNiO = i;
- K
-
-
- > Had the utility been clear, there would have been a large body of
- > prior art from which to draw. The lack of such a large body
- > suggests that the problem is insufficiently bothersome to justify
- > the development of the art.
-
- Either that, or those other systems had to put up with the ugliness
- above, or they couldn't run C, or programs written on them couldn't be
- ported to other unlike systems.
-
-
- > Third, harm: it is by no means unlikely that ``can't happen''
- > error messages may include text such as ``zero quark-count??!
- > contact support''. The harm of having the error message mangled
- > is small, but the benefit of this replacement in a string appears
- > to be zero. If you have to use a trigraph for the character, it
- > seems unlikely that you will benefit by putting it in a string.
-
- "?? was selected as the escape digraph because it is not used anywhere
- else in C (except as noted below); it suggests that something unusual
- is going on. The third character was chosen with an eye to graphical
- similarity to the character being represented.
-
- "The sequence ?? cannot currently occur anywhere in a legal C program
- except in strings, character constants, comments, or header names. The
- escape character sequence '\?' ...was introduced to allow two adjacent
- question-marks in such contexts to be represented as ?\?, a form
- distinct from the trigraph."
-
-
-
- > Reduced program legibility is also harmful.
-
- "The committee makes no claims that a program written using trigraphs
- looks attractive. AS a matter of style, it may be wise to surround
- trigraphs with whitespace, so they stand out better in program text.
- Some users may wish to use preprocessing macros for some or all of the
- trigraph sequences".
-
-
- > Thus, because trigraphs are unnecessary, insufficiently useful,
- > and occasionally harmful, I judge that they were an invention of
- > questionable virtue.
-
- I believe I've shown that your first adjective is incorrect, and
- your second is questionable. As to harm, no code was broken; the only
- thing that changed was the appearance of some output - and you've
- yourself indicated that is of no great harm.
-
- After all, if some port of C is willing to use alphabetics that
- correspond to the ASCII codes for the 9 glyphs, they're free to do so.
- It's just that it's possible now, as it was not before, to write C code
- that is guaranteed to work on all implementations.
-
-
- _
- Kevin D. Quitt 96.37% of all statistics are made up. srhqla!quest!kdq
-