home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!zaphod.mps.ohio-state.edu!cs.utexas.edu!sun-barr!lll-winken!framsparc.ocf.llnl.gov!booloo
- From: booloo@framsparc.ocf.llnl.gov (Mark Boolootian)
- Newsgroups: comp.os.ms-windows.programmer.win32
- Subject: Printable multibyte encodings
- Message-ID: <143664@lll-winken.LLNL.GOV>
- Date: 17 Dec 92 00:25:29 GMT
- Sender: usenet@lll-winken.LLNL.GOV
- Organization: Lawrence Livermore National Laboratory
- Lines: 51
- Nntp-Posting-Host: framsparc.ocf.llnl.gov
-
- Following is part of a thread from the ietf mailing list. I thought it
- might be of some interest to the readers of this group (then again, maybe
- not...)
-
- From: henry@zoo.toronto.edu
- Date: Wed, 16 Dec 92 18:54:29 EST
- To: ietf@isi.edu
- Subject: Re: printable multibyte encodings
-
- >However, people should note that Windows NT, which promises to be a
- >very widespread and influential operating system, uses fixed size 16
- >bit Unicode through out including file names... How is this
- >going to be handled in FTP, Telnet, etc.? I believe that the Internet
- >should start migrating from predominantly 8 bit byte US ASCII to fixed
- >size 16 Unicode in most of its protocols where character strings
- >occur.
-
- Unfortunately, this breaks *everything*, unless it's negotiated in some
- way, in which case we end up with two parallel sets of code which are
- identical except for the width of characters handled.
-
- However... if you convince the NT versions of telnet etc. to encode their
- Unicode characters using the UTF-2 encoding before placing it on the net,
- then:
-
- (1) So long as the NT crowd sticks to characters found in ASCII, *nothing*
- has to change -- the UTF-2 representation of ASCII characters is
- identical to ASCII.
-
- (2) If the other Internet software is willing to tolerate 8-bit octets
- in filenames etc. -- which will mean some adjustments to protocols,
- and probably some reprogramming (but not nearly as much) -- then
- everything works, without multiple versions of the code. UTF-2
- avoids octets that break things like C string functions. The
- only code that has to actually *know* that something funny is going
- on is code that has to work with text character-by-character, e.g.
- filename wildcard expanders.
-
- (3) For text that is mostly ASCII, only one octet has to be transmitted
- or stored per character.
-
- I'm afraid that, in an interoperable world, Windows NT got this wrong and
- Bell Labs's Plan 9 got it right. Using UTF-2 makes the transition a whole
- lot less painful, and doesn't break backward compatibility nearly as badly.
-
- Henry Spencer at U of Toronto Zoology
- henry@zoo.toronto.edu utzoo!henry
-
- --
- Mark Boolootian booloo@llnl.gov +1 510 423 1948
- Disclaimer: My fingers type for me alone.
-