home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.std.internat:813 news.admin.misc:702
- Path: sparky!uunet!usc!sdd.hp.com!cs.utexas.edu!uwm.edu!linac!att!att!allegra!alice!andrew
- From: andrew@alice.att.com (Andrew Hume)
- Newsgroups: comp.std.internat,news.admin.misc
- Subject: Re: 16-bit news
- Summary: plan 9 utf details
- Message-ID: <24388@alice.att.com>
- Date: 11 Dec 92 21:47:14 GMT
- Article-I.D.: alice.24388
- References: <ByuIr9.8o8@ra.nrl.navy.mil> <ByuwAH.ICt@mudos.ann-arbor.mi.us> <1992Dec9.202955.4809@HQ.Ileaf.COM>
- Organization: AT&T Bell Laboratories, Murray Hill NJ
- Lines: 33
-
- In article <1992Dec9.202955.4809@HQ.Ileaf.COM>, walters@HQ.Ileaf.COM (Tim Walters) writes:
- > For those of us not attending Usenix, would it be possible to get a
- > brief description of the difference between fss-utf and utf? Or is
- > there an ftp-able description somewhere?
- > --
- > Tim Walters, Interleaf uunet!leafusa!walters, walters@HQ.Ileaf.com
-
-
- I have made available a package of the plan 9 utf related
- manual pages on netlib. there is an ascii and postscript version
- (utfman.asc and utfman.ps) in the directory research/memo.
- that's all you nee dto know for netlib; for ftp, login
- into research.att.com as netlib.
-
- for a quick idea, i quote the man page:
-
- Letting numbers be binary, a rune x is converted to a multi-
- byte UTF sequence as follows:
-
- 01. x in [00000000.0bbbbbbb] b 0bbbbbbb
- 10. x in [00000bbb.bbbbbbbb] b 110bbbbb, 10bbbbbb
- 11. x in [bbbbbbbb.bbbbbbbb] b 1110bbbb, 10bbbbbb, 10bbbbbb
-
- Conversion 01 provides a one-byte sequence that spans the
- ASCII character set in a compatible way. Conversions 10 and
- 11 represent higher-valued characters as sequences of two or
- three bytes with the high bit set. Plan 9 does not support
- the 4, 5, and 6 byte sequences proposed by X-Open. When
- there are multiple ways to encode a value, for example rune
- 0, the shortest encoding is used.
-
-
- andrew
-