home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!not-for-mail
- From: avg@rodan.UU.NET (Vadim Antonov)
- Newsgroups: comp.std.internat
- Subject: Re: INTERNATIONALIZATION: JAPAN, FAR EAST
- Date: 7 Jan 1993 15:21:49 -0500
- Organization: UUNET Technologies Inc, Falls Church, VA
- Lines: 112
- Message-ID: <1ii3ctINNp4c@rodan.UU.NET>
- References: <8490@charon.cwi.nl> <1hvu79INN4qf@rodan.UU.NET> <1993Jan7.063116.14846@fcom.cc.utah.edu>
- NNTP-Posting-Host: rodan.uu.net
- Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages
-
- In article <1993Jan7.063116.14846@fcom.cc.utah.edu> terry@cs.weber.edu (A Wizard of Earth C) writes:
- >Multiple potential sort orders give lie to this argument. At least several
- >languages have been mentioned (ie: German) where there are multiple possible
- >sort orders.
-
- All languages have a default sorting -- namely the one which is used in
- dictionaries. Existance of multiple sorting rules does not nullify
- the necessity of a default sorting which is sufficient for most cases
- and --more imortant-- can be done in an environment lacking the
- locale required for specialized sortings.
-
- I bet 99% of users will not care WHICH sorting is used as long as they
- used to it. Other can have their own personal locale.
-
- >Let's call a spade a spade -- collation sequences. The
- >collation sequences for Japanese, for instance, vary on pronuciation. If
- >it were the intent of Unicode to provide a unified collation mechanism,
- >this would be a very strong argument against Chinese/Japanese unification.
-
- I've heard Japanese don't like that unification.
-
- >Luckily, this was not a goal.
-
- "This car doesn't move, but luckily it was not a goal."
-
- >>Sure, the information can (*must* if you're
- >>going to do trivial things like sorting or case-insensitive comparisons)
- >>be preserved off-text (in mail headers or in file attributes, for
- >>example) but it effectively defeats the very purpose of ISO10646 --
- >>why on the Earth do i need to spare bits for encoding glyphs if
- >>i already know the language and 8 (or 16 for oriental languages) bits
- >>is quite enough to map the alphabet. Don't you see this gap in
- >>the logic nullifying all benefits of 10646?
- >
- >I don't see how this nullifies the benefits of Unicode (which you seem to
- >be using 10646 as a synonym for, given that this is the only codified
- >portion).
-
- That's not my fault that you don't see that if there are two codes
- one with N bits and the second with M bits per character (N > M) and
- there is an external constraint (aka locale) defining the set of applicable
- characters to be subsets of both M and N the M is more memory-efficient.
- It is no more complex than 2+2 if someone cares to think a bit.
-
- >First, Unicode is not the sole definition of 10646; just the only currently
- >defined character set within 10646. There is no reason to throw out 10646
- >because of Unicode (although I could make an argument for 32 bits being a
- >nifty reason for doing so).
-
- They share the common design philosophy and the same fundamental mistakes.
-
- >Second, Unicode buys more than simply another character set; it buys the
- >ability to produce non-conflicting monolingual localizations of software
- >systems (as opposed to conflicting ones as a result of a lack of standards
- >coordination with existing standards).
-
- Who told you that? Sure, ISO happened to introduce so many standards
- that it caused a complete havoc in minds. Then, we don't need one
- more sloppy standard.
-
-
- >It also buys a platform for
- >non-conflicting multinationalization (multilingual data processing) given
- >a means of compounding documents by language/locale (there may be more than
- >one locale per language). Admittedly, this is not as elegant as a unified
- >glyph set for all languages, but it does charge the penalty to the
- >multilingual (minority) rather than the localized-monolingual (majority)
- >user.
-
- 10646 is inadequate for true multinationalization because it breaks
- existing OS semantics and i hardly doubt there will not be many people
- eager to redesign everything from scratch for the sake of few truly
- multilingual applications.
-
- >By virtue of the "multiple collating sequences within a single language"
- >argument, the same holds true of your soloution -- worse, there are
- >exception cases in your soloution, while there is a potential uniform
- >impementation on top of Unicode.
-
- Huh? Where did you see the exceptions which aren't present in Unicode
- as well? Quite opposite -- most variations of local sorting rules
- can be reduced to the default algorithm with trivial transliteration.
-
- >The Fact is, a multilingual word processor will have to present its menus
- >to the user, probably in his native language, by means of a locale
- >mechanism. If a "vi" style implementation is used (no explict commands if
- >you ignore ":set" and ":map" and all OS escapes), there is still the
- >requirement of localization of error messages and keyboard input. There
- >is no divorcing the language from the application entirely, if the
- >application is one which operates on text as data.
-
- Nobody told that first; and screen editors aren't applications but the
- pieces of SYSTEM software interacting with specific hardware, second.
- The fact that all that termcap/terminfo stuff is on user level is nothing
- more than the old Unix klugde.
-
- >"The simplest explanation which fits the facts is the correct one"
- > -- William of Occam
- >
-
- If it fits. If it doesn't it becomes a religion. Any religion breeds
- fanatics who mindlessly follow authorities who assume their power
- by means of hierarchial institutions, ritual phrases and assertion
- of superior wisdom of collective bodies. Their speeches are full of
- sacral words, references to obscure documents known only to "belonging"
- and self-praise for being able to make those collective bodies to
- produce the meaningless "words of wisdom" after endless debates on
- insufficient deatils.
-
- Is't this picture familiar?
-
- --vadim
-