NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / std / internat / 1103 < prev next >

Wrap

Text File | 1993-01-07 | 4.3 KB | 89 lines

Newsgroups: comp.std.internat Path: sparky!uunet!haven.umd.edu!darwin.sura.net!gatech!usenet.ins.cwru.edu!agate!dog.ee.lbl.gov!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry From: terry@cs.weber.edu (A Wizard of Earth C) Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST) Message-ID: <1993Jan8.031949.6284@fcom.cc.utah.edu> Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages Sender: news@fcom.cc.utah.edu Organization: University of Utah Computer Center References: <2615@titccy.cc.titech.ac.jp> <1993Jan5.090747.29232@fcom.cc.utah.edu> <id.EAHW.92A@ferranti.com> <1993Jan7.033153.12133@fcom.cc.utah.edu> <1ihfj7EINNhmj@uni-erlangen.de> Date: Fri, 8 Jan 93 03:19:49 GMT Lines: 76 In article <1ihfj7EINNhmj@uni-erlangen.de>, unrza3@cd4680fs.rrze.uni-erlangen.de (Markus Kuhn) writes: |> terry@cs.weber.edu (A Wizard of Earth C) writes: |> |> >Consider that Runic encoding is antithetical in terms of single character |> >changes for fixed record length files by virtue of it's ability to either |> >change record size (destroying the seek-offset record addressing) or by |> >changing the amount of data representable in a field (destroying the |> >ability to use fixed-length fields for input in the front end client). |> |> The C type Rune has been defined in the Tompson paper about Plan 9 Unicode |> encoding as "unsigned short" = 16bit per character. The 1+ byte encoding is |> called by all people UTF (there are different versions, Plan 9 uses UTF-2). |> I believe you mixed up the meanings of UTF encoded and Runic encoding. |> |> It seems to be absolutely obvious that there are many applications, |> where a fixed length runic encoding with 16 bit/character is useful. |> That's also the reason, why UTF <-> Rune translation routines (which |> are very easy to implement) have been included in Plan 9 libraries. |> |> There are also very good reasons to use UTF, especially, where |> compatibility with ASCII is of benefit. |> |> Sorry, I don't understand your problem at all. |> |> Markus |> |> -- |> Markus Kuhn, Computer Science student -=-=- University of Erlangen, Germany |> Internet: mskuhn@immd4.informatik.uni-erlangen.de | X.500 entry available |> --- Wer, wie, was? Wieso, weshalb, warum? Wer nichts fragt bleibt dumm. --- Ken Thompson, in the comments utf-fss.c (anonymous FTP to metis.com), the Plan 9 UTF mechanism: * Proposed FSS-UTF * ---------------- * * The proposed UCS transformation format encodes UCS values in the range * [0,0x7fffffff] using multibyte characters of lengths 1, 2, 3, 4, 5, * and 6 bytes. For all encodings of more than one byte, the initial * byte determines the number of bytes used and the high-order bit in * each byte is set. Every byte that does not start 10xxxxxx is the * start of a UCS character sequence. * * An easy way to remember this transformation format is to note that the * number of high-order 1's in the first byte signifies the number of * bytes in the multibyte character: * * Bits Hex Min Hex Max Byte Sequence in Binary * 7 00000000 0000007f 0vvvvvvv * 11 00000080 000007FF 110vvvvv 10vvvvvv * 16 00000800 0000FFFF 1110vvvv 10vvvvvv 10vvvvvv * 21 00010000 001FFFFF 11110vvv 10vvvvvv 10vvvvvv 10vvvvvv * 26 00200000 03FFFFFF 111110vv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv * 31 04000000 7FFFFFFF 1111110v 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv * * The UCS value is just the concatenation of the v bits in the multibyte * encoding. When there are multiple ways to encode a value, for example * UCS 0, only the shortest encoding is legal. My problem is not with ANSI-style fixed length Runes, but with this style of Runic encoding, where the length is *not* fixed for the purposes of compatability with existing 7-bit US ASCII files (this makes 7-bit ASCII automatically "encoded" at the expense of all other characters). Terry Lambert terry@icarus.weber.edu terry_lambert@novell.com --- Any opinions in this posting are my own and not those of my present or previous employers. ------------------------------------------------------------------------------- "I have an 8 user poetic license" - me Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial -------------------------------------------------------------------------------