NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / std / internat / 1084 < prev next >

Wrap

Internet Message Format | 1993-01-07 | 2.5 KB

Path: sparky!uunet!munnari.oz.au!spool.mu.edu!sgiblab!nec-gw!nec-tyo!wnoc-tyo-news!cs.titech!titccy.cc.titech!necom830!mohta From: mohta@necom830.cc.titech.ac.jp (Masataka Ohta) Newsgroups: comp.std.internat Subject: Dumb Terry Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages Message-ID: <2639@titccy.cc.titech.ac.jp> Date: 7 Jan 93 08:55:00 GMT References: <1hu9v5INNbp1@rodan.UU.NET> <8490@charon.cwi.nl> <1hvu79INN4qf@rodan.UU.NET> <1993Jan7.063116.14846@fcom.cc.utah.edu> Sender: news@titccy.cc.titech.ac.jp Organization: Tokyo Institute of Technology Lines: 45 In article <1993Jan7.063116.14846@fcom.cc.utah.edu> terry@cs.weber.edu (A Wizard of Earth C) writes: >The >collation sequences for Japanese, for instance, vary on pronuciation. If >it were the intent of Unicode to provide a unified collation mechanism, >this would be a very strong argument against Chinese/Japanese unification. You completely miss the point. Japanese Kanji does not have a unique pronouciation, while Japanese word has. >First, Unicode is not the sole definition of 10646; just the only currently >defined character set within 10646. There is no reason to throw out 10646 >because of Unicode (although I could make an argument for 32 bits being a >nifty reason for doing so). 10646 explicitely assign corresponding C/J/K Han in GB, JIS and KCS national standard to the same code point. Thus, expanding 10646 to 32 bit can't re-separate the assignment. That is, 10646 is unusable, because it is polluted with Unicode. >Second, Unicode buys more than simply another character set; it buys the >ability to produce non-conflicting monolingual localizations of software >systems (as opposed to conflicting ones as a result of a lack of standards >coordination with existing standards). It also buys a platform for >non-conflicting multinationalization (multilingual data processing) given >a means of compounding documents by language/locale (there may be more than >one locale per language). For such a purpose, use EUC, which is the moderate subset of ISO 2022. >The Unicode character set solves a number of long standing problems. Unicode is a genuine mixture of past failures: 1) 16 bitness from 8 bitness of ISO 8859/* 2) statefullness (by signature) from ISO 2022 3) combining characters from ISO 6937 4) existence of full/half width character from JIS X0201/0208 and more, without solving any problems. And now, you are importing one more failure: incorrect dependence on locale model. Masataka Ohta