NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / std / internat / 1094 < prev next >

Wrap

Internet Message Format | 1993-01-07 | 5.7 KB

Path: sparky!uunet!not-for-mail From: avg@rodan.UU.NET (Vadim Antonov) Newsgroups: comp.std.internat Subject: Re: INTERNATIONALIZATION: JAPAN, FAR EAST Date: 7 Jan 1993 15:21:49 -0500 Organization: UUNET Technologies Inc, Falls Church, VA Lines: 112 Message-ID: <1ii3ctINNp4c@rodan.UU.NET> References: <8490@charon.cwi.nl> <1hvu79INN4qf@rodan.UU.NET> <1993Jan7.063116.14846@fcom.cc.utah.edu> NNTP-Posting-Host: rodan.uu.net Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages In article <1993Jan7.063116.14846@fcom.cc.utah.edu> terry@cs.weber.edu (A Wizard of Earth C) writes: >Multiple potential sort orders give lie to this argument. At least several >languages have been mentioned (ie: German) where there are multiple possible >sort orders. All languages have a default sorting -- namely the one which is used in dictionaries. Existance of multiple sorting rules does not nullify the necessity of a default sorting which is sufficient for most cases and --more imortant-- can be done in an environment lacking the locale required for specialized sortings. I bet 99% of users will not care WHICH sorting is used as long as they used to it. Other can have their own personal locale. >Let's call a spade a spade -- collation sequences. The >collation sequences for Japanese, for instance, vary on pronuciation. If >it were the intent of Unicode to provide a unified collation mechanism, >this would be a very strong argument against Chinese/Japanese unification. I've heard Japanese don't like that unification. >Luckily, this was not a goal. "This car doesn't move, but luckily it was not a goal." >>Sure, the information can (*must* if you're >>going to do trivial things like sorting or case-insensitive comparisons) >>be preserved off-text (in mail headers or in file attributes, for >>example) but it effectively defeats the very purpose of ISO10646 -- >>why on the Earth do i need to spare bits for encoding glyphs if >>i already know the language and 8 (or 16 for oriental languages) bits >>is quite enough to map the alphabet. Don't you see this gap in >>the logic nullifying all benefits of 10646? > >I don't see how this nullifies the benefits of Unicode (which you seem to >be using 10646 as a synonym for, given that this is the only codified >portion). That's not my fault that you don't see that if there are two codes one with N bits and the second with M bits per character (N > M) and there is an external constraint (aka locale) defining the set of applicable characters to be subsets of both M and N the M is more memory-efficient. It is no more complex than 2+2 if someone cares to think a bit. >First, Unicode is not the sole definition of 10646; just the only currently >defined character set within 10646. There is no reason to throw out 10646 >because of Unicode (although I could make an argument for 32 bits being a >nifty reason for doing so). They share the common design philosophy and the same fundamental mistakes. >Second, Unicode buys more than simply another character set; it buys the >ability to produce non-conflicting monolingual localizations of software >systems (as opposed to conflicting ones as a result of a lack of standards >coordination with existing standards). Who told you that? Sure, ISO happened to introduce so many standards that it caused a complete havoc in minds. Then, we don't need one more sloppy standard. >It also buys a platform for >non-conflicting multinationalization (multilingual data processing) given >a means of compounding documents by language/locale (there may be more than >one locale per language). Admittedly, this is not as elegant as a unified >glyph set for all languages, but it does charge the penalty to the >multilingual (minority) rather than the localized-monolingual (majority) >user. 10646 is inadequate for true multinationalization because it breaks existing OS semantics and i hardly doubt there will not be many people eager to redesign everything from scratch for the sake of few truly multilingual applications. >By virtue of the "multiple collating sequences within a single language" >argument, the same holds true of your soloution -- worse, there are >exception cases in your soloution, while there is a potential uniform >impementation on top of Unicode. Huh? Where did you see the exceptions which aren't present in Unicode as well? Quite opposite -- most variations of local sorting rules can be reduced to the default algorithm with trivial transliteration. >The Fact is, a multilingual word processor will have to present its menus >to the user, probably in his native language, by means of a locale >mechanism. If a "vi" style implementation is used (no explict commands if >you ignore ":set" and ":map" and all OS escapes), there is still the >requirement of localization of error messages and keyboard input. There >is no divorcing the language from the application entirely, if the >application is one which operates on text as data. Nobody told that first; and screen editors aren't applications but the pieces of SYSTEM software interacting with specific hardware, second. The fact that all that termcap/terminfo stuff is on user level is nothing more than the old Unix klugde. >"The simplest explanation which fits the facts is the correct one" > -- William of Occam > If it fits. If it doesn't it becomes a religion. Any religion breeds fanatics who mindlessly follow authorities who assume their power by means of hierarchial institutions, ritual phrases and assertion of superior wisdom of collective bodies. Their speeches are full of sacral words, references to obscure documents known only to "belonging" and self-praise for being able to make those collective bodies to produce the meaningless "words of wisdom" after endless debates on insufficient deatils. Is't this picture familiar? --vadim