NetNews Usenet Archive 1992 #30

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #30 / NN_1992_30.iso / spool / comp / lang / c / 18691 < prev next >

Wrap

Internet Message Format | 1992-12-21 | 5.3 KB

Path: sparky!uunet!spool.mu.edu!sdd.hp.com!cs.utexas.edu!usc!srhqla!quest!kdq From: kdq@quest.UUCP (Kevin D. Quitt) Newsgroups: comp.lang.c Subject: Re: Questions about token merging and trigraphs Message-ID: <XZo2VB1w165w@quest.UUCP> Date: Sat, 19 Dec 92 11:51:08 PST References: <1TA04FH@cdis-1.compu.com> Reply-To: srhqla!quest!kdq Organization: Job quest (805) 251-8210, So Cal: (800) 400-8210 Lines: 118 tanner@cdis-1.compu.com (Dr. T. Andrews) writes: > Underneath a bogus ``reply-to'' header, kdq@quest.UUCP writes.. My reply-to line goes out of this site as "srhqla!quest!kdq"; if another site is munging that, there's little I can do. > First, necessity: a trivial proof exists that there is no machine > on which the characters can not be represented in some other way. The necessity is to provide a *single* representation that is accpeted on *all* machines. Certainly, one could map the APL character set onto ASCII, and program that way; the fact that something can be done does not make it useful or meaningful. > Second, utility: the most commonly-cited character set for which > trigraphs are ``justified'', EBCDIC, certainly has the characters > in question. Disagreement on print-train convention may mean > that some people see different characters in place of their > expected, on some of their printers. A similar problem exists > for people whose terminals replace certain ASCII characters > according to local preference. From the rationale: "C derived its repetoire from the ASCII codeset. Unfortunately the ASCII repetoire is not a subset of all other commonly used character sets, and widespread practice in Europe is not to implement all of ASCII either, but use some parts of its collating sequence for special national characters. "The solution is an internationally agreed-upon repetoire, in terms of which an international representation of C can be defined. The ISO has defined such a standard: ISO 646 describes an _invariant subset_ of ASCII. "The characters in the ASCII repetoire used by C and absent from the ISO 646 repetoire are: # [ ] { } \ | ~ ^ <<hash mark, left and right bracket, left and right brace, back-slash, vertical bar, tilde and caret>>. "...The obvious idea of defining two-character escape sequences fails because C uses all the characters which _are_ in the ISO 646 repetoire: no single escape character is available. The best that can be done is to use a _trigraph_ - an escape digraph followed by a distinguishing character." > This means that, without trigraphs, some people would have to > accept what their printers give; others would have to configure > terminals to show the ASCII characters when writing C. It is not > clear that trigraphs are preferable. This means that some people would have alphabetic characters in place of those glyphs. How'd you like to have code that looked like: for L i=0; i<max; i++ M J aNiO = i; K > Had the utility been clear, there would have been a large body of > prior art from which to draw. The lack of such a large body > suggests that the problem is insufficiently bothersome to justify > the development of the art. Either that, or those other systems had to put up with the ugliness above, or they couldn't run C, or programs written on them couldn't be ported to other unlike systems. > Third, harm: it is by no means unlikely that ``can't happen'' > error messages may include text such as ``zero quark-count??! > contact support''. The harm of having the error message mangled > is small, but the benefit of this replacement in a string appears > to be zero. If you have to use a trigraph for the character, it > seems unlikely that you will benefit by putting it in a string. "?? was selected as the escape digraph because it is not used anywhere else in C (except as noted below); it suggests that something unusual is going on. The third character was chosen with an eye to graphical similarity to the character being represented. "The sequence ?? cannot currently occur anywhere in a legal C program except in strings, character constants, comments, or header names. The escape character sequence '\?' ...was introduced to allow two adjacent question-marks in such contexts to be represented as ?\?, a form distinct from the trigraph." > Reduced program legibility is also harmful. "The committee makes no claims that a program written using trigraphs looks attractive. AS a matter of style, it may be wise to surround trigraphs with whitespace, so they stand out better in program text. Some users may wish to use preprocessing macros for some or all of the trigraph sequences". > Thus, because trigraphs are unnecessary, insufficiently useful, > and occasionally harmful, I judge that they were an invention of > questionable virtue. I believe I've shown that your first adjective is incorrect, and your second is questionable. As to harm, no code was broken; the only thing that changed was the appearance of some output - and you've yourself indicated that is of no great harm. After all, if some port of C is willing to use alphabetics that correspond to the ASCII codes for the 9 glyphs, they're free to do so. It's just that it's possible now, as it was not before, to write C code that is guaranteed to work on all implementations. _ Kevin D. Quitt 96.37% of all statistics are made up. srhqla!quest!kdq