home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.std.internat
- Path: sparky!uunet!spool.mu.edu!agate!dog.ee.lbl.gov!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry
- From: terry@cs.weber.edu (A Wizard of Earth C)
- Subject: Re: INTERNATIONALIZATION: JAPAN, FAR EAST
- Message-ID: <1993Jan7.063116.14846@fcom.cc.utah.edu>
- Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages
- Sender: news@fcom.cc.utah.edu
- Organization: Weber State University (Ogden, UT)
- References: <1hu9v5INNbp1@rodan.UU.NET> <8490@charon.cwi.nl> <1hvu79INN4qf@rodan.UU.NET>
- Date: Thu, 7 Jan 93 06:31:16 GMT
- Lines: 90
-
- In article <1hvu79INN4qf@rodan.UU.NET> avg@rodan.UU.NET (Vadim Antonov) writes:
- >The idea of visual encoding (and one letter-onr glyph is nothing more
- >than a compressed image of the text) is simply wrong because it
- >drops valuable information readily available at the point of the CREATION
- >of the text but not later.
-
- Multiple potential sort orders give lie to this argument. At least several
- languages have been mentioned (ie: German) where there are multiple possible
- sort orders. Let's call a spade a spade -- collation sequences. The
- collation sequences for Japanese, for instance, vary on pronuciation. If
- it were the intent of Unicode to provide a unified collation mechanism,
- this would be a very strong argument against Chinese/Japanese unification.
- Luckily, this was not a goal. The non-intersecting unification of all sets
- such as suggested by yourself and Ohta is not supported by this argument
- unless you provide one character set per collation sequence. This is
- folly for single languages with multiple collation sequences. If one is to
- maintain multiple collating sequences for one font, why not all fonts?
-
- >Sure, the information can (*must* if you're
- >going to do trivial things like sorting or case-insensitive comparisons)
- >be preserved off-text (in mail headers or in file attributes, for
- >example) but it effectively defeats the very purpose of ISO10646 --
- >why on the Earth do i need to spare bits for encoding glyphs if
- >i already know the language and 8 (or 16 for oriental languages) bits
- >is quite enough to map the alphabet. Don't you see this gap in
- >the logic nullifying all benefits of 10646?
-
- I don't see how this nullifies the benefits of Unicode (which you seem to
- be using 10646 as a synonym for, given that this is the only codified
- portion).
-
- First, Unicode is not the sole definition of 10646; just the only currently
- defined character set within 10646. There is no reason to throw out 10646
- because of Unicode (although I could make an argument for 32 bits being a
- nifty reason for doing so).
-
- Second, Unicode buys more than simply another character set; it buys the
- ability to produce non-conflicting monolingual localizations of software
- systems (as opposed to conflicting ones as a result of a lack of standards
- coordination with existing standards). It also buys a platform for
- non-conflicting multinationalization (multilingual data processing) given
- a means of compounding documents by language/locale (there may be more than
- one locale per language). Admittedly, this is not as elegant as a unified
- glyph set for all languages, but it does charge the penalty to the
- multilingual (minority) rather than the localized-monolingual (majority)
- user.
-
- >10646 was meant as an encoding eliminating the necessity to carry off-text
- >information (which is not a piece of cake, especially in multi-lingual
- >texts). However, the "single glyph" approach ruined the very intent
- >because you need the off-text information to do trivial tasks anyway!
- >What's the gain? More wasted bits, yeah?
-
- By virtue of the "multiple collating sequences within a single language"
- argument, the same holds true of your soloution -- worse, there are
- exception cases in your soloution, while there is a potential uniform
- impementation on top of Unicode.
-
- The Fact is, a multilingual word processor will have to present its menus
- to the user, probably in his native language, by means of a locale
- mechanism. If a "vi" style implementation is used (no explict commands if
- you ignore ":set" and ":map" and all OS escapes), there is still the
- requirement of localization of error messages and keyboard input. There
- is no divorcing the language from the application entirely, if the
- application is one which operates on text as data.
-
- >Take a life, guys. We in Russia did that mistake (DKOI and "GOST" encodings)
- >many years ago and came to realize that this solution is too simple to
- >be correct.
-
- "The simplest explanation which fits the facts is the correct one"
- -- William of Occam
-
- I don't see why every soloution which resembles a past soloution (not that
- I am claiming a familiarity with "DKOI" or "GOST", so the resemblence is
- one which you see) must be incorrect by virtue of that resemblence. The
- Unicode character set solves a number of long standing problems.
-
-
- Terry Lambert
- terry@icarus.weber.edu
- terry_lambert@novell.com
- ---
- Any opinions in this posting are my own and not those of my present
- or previous employers.
- --
- -------------------------------------------------------------------------------
- "I have an 8 user poetic license" - me
- Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial
- -------------------------------------------------------------------------------
-