home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.std.internat
- Path: sparky!uunet!haven.umd.edu!darwin.sura.net!gatech!usenet.ins.cwru.edu!agate!dog.ee.lbl.gov!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry
- From: terry@cs.weber.edu (A Wizard of Earth C)
- Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST)
- Message-ID: <1993Jan8.031949.6284@fcom.cc.utah.edu>
- Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages
- Sender: news@fcom.cc.utah.edu
- Organization: University of Utah Computer Center
- References: <2615@titccy.cc.titech.ac.jp> <1993Jan5.090747.29232@fcom.cc.utah.edu> <id.EAHW.92A@ferranti.com> <1993Jan7.033153.12133@fcom.cc.utah.edu> <1ihfj7EINNhmj@uni-erlangen.de>
- Date: Fri, 8 Jan 93 03:19:49 GMT
- Lines: 76
-
- In article <1ihfj7EINNhmj@uni-erlangen.de>, unrza3@cd4680fs.rrze.uni-erlangen.de (Markus Kuhn) writes:
- |> terry@cs.weber.edu (A Wizard of Earth C) writes:
- |>
- |> >Consider that Runic encoding is antithetical in terms of single character
- |> >changes for fixed record length files by virtue of it's ability to either
- |> >change record size (destroying the seek-offset record addressing) or by
- |> >changing the amount of data representable in a field (destroying the
- |> >ability to use fixed-length fields for input in the front end client).
- |>
- |> The C type Rune has been defined in the Tompson paper about Plan 9 Unicode
- |> encoding as "unsigned short" = 16bit per character. The 1+ byte encoding is
- |> called by all people UTF (there are different versions, Plan 9 uses UTF-2).
- |> I believe you mixed up the meanings of UTF encoded and Runic encoding.
- |>
- |> It seems to be absolutely obvious that there are many applications,
- |> where a fixed length runic encoding with 16 bit/character is useful.
- |> That's also the reason, why UTF <-> Rune translation routines (which
- |> are very easy to implement) have been included in Plan 9 libraries.
- |>
- |> There are also very good reasons to use UTF, especially, where
- |> compatibility with ASCII is of benefit.
- |>
- |> Sorry, I don't understand your problem at all.
- |>
- |> Markus
- |>
- |> --
- |> Markus Kuhn, Computer Science student -=-=- University of Erlangen, Germany
- |> Internet: mskuhn@immd4.informatik.uni-erlangen.de | X.500 entry available
- |> --- Wer, wie, was? Wieso, weshalb, warum? Wer nichts fragt bleibt dumm. ---
-
- Ken Thompson, in the comments utf-fss.c (anonymous FTP to metis.com), the Plan
- 9 UTF mechanism:
-
- * Proposed FSS-UTF
- * ----------------
- *
- * The proposed UCS transformation format encodes UCS values in the range
- * [0,0x7fffffff] using multibyte characters of lengths 1, 2, 3, 4, 5,
- * and 6 bytes. For all encodings of more than one byte, the initial
- * byte determines the number of bytes used and the high-order bit in
- * each byte is set. Every byte that does not start 10xxxxxx is the
- * start of a UCS character sequence.
- *
- * An easy way to remember this transformation format is to note that the
- * number of high-order 1's in the first byte signifies the number of
- * bytes in the multibyte character:
- *
- * Bits Hex Min Hex Max Byte Sequence in Binary
- * 7 00000000 0000007f 0vvvvvvv
- * 11 00000080 000007FF 110vvvvv 10vvvvvv
- * 16 00000800 0000FFFF 1110vvvv 10vvvvvv 10vvvvvv
- * 21 00010000 001FFFFF 11110vvv 10vvvvvv 10vvvvvv 10vvvvvv
- * 26 00200000 03FFFFFF 111110vv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv
- * 31 04000000 7FFFFFFF 1111110v 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv 10vvvvvv
- *
- * The UCS value is just the concatenation of the v bits in the multibyte
- * encoding. When there are multiple ways to encode a value, for example
- * UCS 0, only the shortest encoding is legal.
-
- My problem is not with ANSI-style fixed length Runes, but with this style of
- Runic encoding, where the length is *not* fixed for the purposes of
- compatability with existing 7-bit US ASCII files (this makes 7-bit ASCII
- automatically "encoded" at the expense of all other characters).
-
-
- Terry Lambert
- terry@icarus.weber.edu
- terry_lambert@novell.com
- ---
- Any opinions in this posting are my own and not those of my present
- or previous employers.
- -------------------------------------------------------------------------------
- "I have an 8 user poetic license" - me
- Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial
- -------------------------------------------------------------------------------
-