NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / std / internat / 1120 < prev next >

Wrap

Internet Message Format | 1993-01-09 | 2.6 KB

Path: sparky!uunet!cs.utexas.edu!usc!sdd.hp.com!think.com!enterpoop.mit.edu!mintaka.lcs.mit.edu!ai-lab!wheat-chex!glenn From: glenn@wheat-chex.ai.mit.edu (Glenn A. Adams) Newsgroups: comp.std.internat Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST) Date: 9 Jan 1993 17:35:04 GMT Organization: MIT Artificial Intelligence Laboratory Lines: 42 Message-ID: <1in2c8INNmbj@life.ai.mit.edu> References: <1993Jan7.033153.12133@fcom.cc.utah.edu> <1993Jan8.092754.6344@prl.dec.com> <1993Jan9.024546.26934@fcom.cc.utah.edu> NNTP-Posting-Host: wheat-chex.ai.mit.edu Keywords: Unicode ISO10646 CharacterEncoding In article <1993Jan9.024546.26934@fcom.cc.utah.edu> you write: >[ First a clarification of something which is my fault because of my > background in comm software: I have been informed that the currently > "blessed" correct terminlogy for what I have been calling "Runic > encoding" is "Process code", "File code", or "Interchange code". I'll > try to call it "Interchange code" from now on (I feel the other terms > imply applications, some of which I disagree with). ] I should have been more clear. A "process code" is a fixed-width encoding suitable for internal processing, e.g., ASCII, Unicode, 10646 UCS2, and 10646 UCS4, EUC wide char; a "file code" or "interchange code" is a potentially variable length encoding suitable for file storage (non memory mapped environments) or interchange, e.g., UTF1 and UTF2 (FSS-UTF), Shift JIS, EUC Multibyte. [My objection to your use of the word "rune" was (1) you weren't clear about which of these encodings you were referring to, and (2) I hate cute terminology which is opaque when perfectly transparent terminology already exists.] One should not in general use an interchange code (UTF1 or UTF2) for processing. While one may use a process code for interchange, some communication channels may have difficulties with data transparency (e.g., Unicode and 10646 UCS[24] allow NULL bytes and ISO2022 C0/C1 control code bytes in any byte position of their "process codes"). I can't imagine why anyone in their right mind would want to use UTF[12] or any other ostensibe interchange code for processing, given the problems of variable length encodings. However, that doesn't mean that unaware applications can't effectively use an interchange code internally, e.g., 8-bit clean applications which interpret only the ASCII (ISO646) characters could use UTF2 (FSS-UTF) without difficulty. But if one is to create an aware application which uses more than the ASCII subset, or if it is to memory map files, then use of a fixed-with process code (even for backing store) becomes much more sensible. Glenn Adams