NetNews Usenet Archive 1992 #30

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #30 / NN_1992_30.iso / spool / comp / os / mswindo / programm / win32 / 2502 < prev next >

Wrap

Internet Message Format | 1992-12-16 | 2.7 KB

Path: sparky!uunet!zaphod.mps.ohio-state.edu!cs.utexas.edu!sun-barr!lll-winken!framsparc.ocf.llnl.gov!booloo From: booloo@framsparc.ocf.llnl.gov (Mark Boolootian) Newsgroups: comp.os.ms-windows.programmer.win32 Subject: Printable multibyte encodings Message-ID: <143664@lll-winken.LLNL.GOV> Date: 17 Dec 92 00:25:29 GMT Sender: usenet@lll-winken.LLNL.GOV Organization: Lawrence Livermore National Laboratory Lines: 51 Nntp-Posting-Host: framsparc.ocf.llnl.gov Following is part of a thread from the ietf mailing list. I thought it might be of some interest to the readers of this group (then again, maybe not...) From: henry@zoo.toronto.edu Date: Wed, 16 Dec 92 18:54:29 EST To: ietf@isi.edu Subject: Re: printable multibyte encodings >However, people should note that Windows NT, which promises to be a >very widespread and influential operating system, uses fixed size 16 >bit Unicode through out including file names... How is this >going to be handled in FTP, Telnet, etc.? I believe that the Internet >should start migrating from predominantly 8 bit byte US ASCII to fixed >size 16 Unicode in most of its protocols where character strings >occur. Unfortunately, this breaks *everything*, unless it's negotiated in some way, in which case we end up with two parallel sets of code which are identical except for the width of characters handled. However... if you convince the NT versions of telnet etc. to encode their Unicode characters using the UTF-2 encoding before placing it on the net, then: (1) So long as the NT crowd sticks to characters found in ASCII, *nothing* has to change -- the UTF-2 representation of ASCII characters is identical to ASCII. (2) If the other Internet software is willing to tolerate 8-bit octets in filenames etc. -- which will mean some adjustments to protocols, and probably some reprogramming (but not nearly as much) -- then everything works, without multiple versions of the code. UTF-2 avoids octets that break things like C string functions. The only code that has to actually *know* that something funny is going on is code that has to work with text character-by-character, e.g. filename wildcard expanders. (3) For text that is mostly ASCII, only one octet has to be transmitted or stored per character. I'm afraid that, in an interoperable world, Windows NT got this wrong and Bell Labs's Plan 9 got it right. Using UTF-2 makes the transition a whole lot less painful, and doesn't break backward compatibility nearly as badly. Henry Spencer at U of Toronto Zoology henry@zoo.toronto.edu utzoo!henry -- Mark Boolootian booloo@llnl.gov +1 510 423 1948 Disclaimer: My fingers type for me alone.