home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.std.internat
- Path: sparky!uunet!elroy.jpl.nasa.gov!ames!agate!dog.ee.lbl.gov!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry
- From: terry@cs.weber.edu (A Wizard of Earth C)
- Subject: Re: INTERNATIONALIZATION: JAPAN, FAR EAST
- Message-ID: <1993Jan8.072720.9554@fcom.cc.utah.edu>
- Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages
- Sender: news@fcom.cc.utah.edu
- Organization: University of Utah Computer Center
- References: <8490@charon.cwi.nl> <1hvu79INN4qf@rodan.UU.NET> <1993Jan7.063116.14846@fcom.cc.utah.edu> <1ii3ctINNp4c@rodan.UU.NET>
- Date: Fri, 8 Jan 93 07:27:20 GMT
- Lines: 187
-
- In article <1ii3ctINNp4c@rodan.UU.NET>, avg@rodan.UU.NET (Vadim Antonov) writes:
- |> In article <1993Jan7.063116.14846@fcom.cc.utah.edu> terry@cs.weber.edu (A Wizard of Earth C) writes:
- |> >Multiple potential sort orders give lie to this argument. At least several
- |> >languages have been mentioned (ie: German) where there are multiple possible
- |> >sort orders.
- |>
- |> All languages have a default sorting -- namely the one which is used in
- |> dictionaries. Existance of multiple sorting rules does not nullify
- |> the necessity of a default sorting which is sufficient for most cases
- |> and --more imortant-- can be done in an environment lacking the
- |> locale required for specialized sortings.
-
- This is an arbitrary designation, and one not supported by the Bundespost.
-
- |> I bet 99% of users will not care WHICH sorting is used as long as they
- |> used to it. Other can have their own personal locale.
-
- I think I could make an argument for the applicability of the "dont care"
- state to non-natural collation (one not used at all by the native speakers).
- This would be a false argument, but I could make it. ;-}.
-
- The idea of a "personal locale" requires that there be a means of applying
- a localized collation. Since this ability is required to be there, why have
- two mechanisms for collation? (The locale version and the lexical version).
-
- |> >Let's call a spade a spade -- collation sequences. The
- |> >collation sequences for Japanese, for instance, vary on pronuciation. If
- |> >it were the intent of Unicode to provide a unified collation mechanism,
- |> >this would be a very strong argument against Chinese/Japanese unification.
- |>
- |> I've heard Japanese don't like that unification.
-
- Many don't, for various reasons. Some are valid; most are not.
-
- |> >Luckily, this was not a goal.
- |>
- |> "This car doesn't move, but luckily it was not a goal."
-
- "This car won't carry 200 people; luckily, the goal was a car, not a bus."
-
- You want a bus. I want a car.
-
- |>
- |> >>Sure, the information can (*must* if you're
- |> >>going to do trivial things like sorting or case-insensitive comparisons)
- |> >>be preserved off-text (in mail headers or in file attributes, for
- |> >>example) but it effectively defeats the very purpose of ISO10646 --
- |> >>why on the Earth do i need to spare bits for encoding glyphs if
- |> >>i already know the language and 8 (or 16 for oriental languages) bits
- |> >>is quite enough to map the alphabet. Don't you see this gap in
- |> >>the logic nullifying all benefits of 10646?
- |> >
- |> >I don't see how this nullifies the benefits of Unicode (which you seem to
- |> >be using 10646 as a synonym for, given that this is the only codified
- |> >portion).
- |>
- |> That's not my fault that you don't see that if there are two codes
- |> one with N bits and the second with M bits per character (N > M) and
- |> there is an external constraint (aka locale) defining the set of applicable
- |> characters to be subsets of both M and N the M is more memory-efficient.
- |> It is no more complex than 2+2 if someone cares to think a bit.
-
- You have once again delete my rationale, and left only the statement. There
- are good reasons (which you keep deleting but not refuting) for a unified character set. You must agree with them, or you would not argue so fervently
- for your own form of unification. Surely you do not claim the concept of
- locale destroys them all?
-
- |> >First, Unicode is not the sole definition of 10646; just the only currently
- |> >defined character set within 10646. There is no reason to throw out 10646
- |> >because of Unicode (although I could make an argument for 32 bits being a
- |> >nifty reason for doing so).
- |>
- |> They share the common design philosophy and the same fundamental mistakes.
-
- A broad statement for one who has never read the Unicode standard (by your
- own admission in "Message-ID: <1i13rrINNars@rodan.UU.NET>":
-
- Glenn> If you have a genuine interest in learning about the facts surrounding
- Glenn> Unicode and 10646, I would recommend a good reading of the Unicode
- Glenn> Standard and the Proceedings of the Unicode Implementor's Workshops.
-
- Vadim> Thank you, i've got no time to read things i don't need.
-
- |> >Second, Unicode buys more than simply another character set; it buys the
- |> >ability to produce non-conflicting monolingual localizations of software
- |> >systems (as opposed to conflicting ones as a result of a lack of standards
- |> >coordination with existing standards).
- |>
- |> Who told you that? Sure, ISO happened to introduce so many standards
- |> that it caused a complete havoc in minds. Then, we don't need one
- |> more sloppy standard.
-
- They exist, they must be retrofitted, if nothing else because of NFS
- crossmounts between new and old systems.
-
- |> >It also buys a platform for
- |> >non-conflicting multinationalization (multilingual data processing) given
- |> >a means of compounding documents by language/locale (there may be more than
- |> >one locale per language). Admittedly, this is not as elegant as a unified
- |> >glyph set for all languages, but it does charge the penalty to the
- |> >multilingual (minority) rather than the localized-monolingual (majority)
- |> >user.
- |>
- |> 10646 is inadequate for true multinationalization because it breaks
- |> existing OS semantics and i hardly doubt there will not be many people
- |> eager to redesign everything from scratch for the sake of few truly
- |> multilingual applications.
-
- I couldn't agree more. All the better to make design decisions in
- favor of localization as long as you don't make multinationalization 100%
- impossible. Apple did this with the MAC (substitute "user" and "programmer")
- and it was an amazing success.
-
- |> >By virtue of the "multiple collating sequences within a single language"
- |> >argument, the same holds true of your soloution -- worse, there are
- |> >exception cases in your soloution, while there is a potential uniform
- |> >impementation on top of Unicode.
- |>
- |> Huh? Where did you see the exceptions which aren't present in Unicode
- |> as well? Quite opposite -- most variations of local sorting rules
- |> can be reduced to the default algorithm with trivial transliteration.
-
- The "default algorithm" is the exception which is not present in Unicode.
- The rule is localized collation, which you admit must be supported.
-
- |> >The Fact is, a multilingual word processor will have to present its menus
- |> >to the user, probably in his native language, by means of a locale
- |> >mechanism. If a "vi" style implementation is used (no explict commands if
- |> >you ignore ":set" and ":map" and all OS escapes), there is still the
- |> >requirement of localization of error messages and keyboard input. There
- |> >is no divorcing the language from the application entirely, if the
- |> >application is one which operates on text as data.
- |>
- |> Nobody told that first; and screen editors aren't applications but the
- |> pieces of SYSTEM software interacting with specific hardware, second.
- |> The fact that all that termcap/terminfo stuff is on user level is nothing
- |> more than the old Unix klugde.
-
- That explains why EDT works so wonderfully with my Wyse-50. What is your
- soloution to the termcap/terminfo "kludge"?
-
- I may agree with the characterization of a program editor as system software,
- but I draw the line at a multilingual word processor.
-
- |>
- |> >"The simplest explanation which fits the facts is the correct one"
- |> > -- William of Occam
- |>
- |> If it fits. If it doesn't it becomes a religion. Any religion breeds
- |> fanatics who mindlessly follow authorities who assume their power
- |> by means of hierarchial institutions, ritual phrases and assertion
- |> of superior wisdom of collective bodies. Their speeches are full of
- |> sacral words, references to obscure documents known only to "belonging"
- |> and self-praise for being able to make those collective bodies to
- |> produce the meaningless "words of wisdom" after endless debates on
- |> insufficient deatils.
- |>
- |> Is't this picture familiar?
-
- Sounds like the IRS. 8-).
-
- Seriously, you sound like you left out a paragraph beginning with "But I,
- as a founder of the one *true* religion...".
-
- Do you honestly believe that it is physically impossible to implement some
- application (like a multilingual word processor) oin top of Unicode? What
- reference implementation would convince *you* to "convert"? Is it possible
- to "convert" you at all (to *anything* other than your suggestion, not to
- Unicode in particular)?
-
- All of your complaints have been addressed by one persons reply or another;
- admittedly, some of the soloutions have been mighty ugly -- I haven't seen
- a point yet that doesn't have at least *some* workaround, however. Do you
- still claim "impossible!", or are you willing to back down to "It's too
- damn ugly for me!"?
-
-
- Terry Lambert
- terry@icarus.weber.edu
- terry_lambert@novell.com
- ---
- Any opinions in this posting are my own and not those of my present
- or previous employers.
- -------------------------------------------------------------------------------
- "I have an 8 user poetic license" - me
- Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial
- -------------------------------------------------------------------------------
-