NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / std / internat / 1107 < prev next >

Wrap

Text File | 1993-01-08 | 9.6 KB | 200 lines

Newsgroups: comp.std.internat Path: sparky!uunet!elroy.jpl.nasa.gov!ames!agate!dog.ee.lbl.gov!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry From: terry@cs.weber.edu (A Wizard of Earth C) Subject: Re: INTERNATIONALIZATION: JAPAN, FAR EAST Message-ID: <1993Jan8.072720.9554@fcom.cc.utah.edu> Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages Sender: news@fcom.cc.utah.edu Organization: University of Utah Computer Center References: <8490@charon.cwi.nl> <1hvu79INN4qf@rodan.UU.NET> <1993Jan7.063116.14846@fcom.cc.utah.edu> <1ii3ctINNp4c@rodan.UU.NET> Date: Fri, 8 Jan 93 07:27:20 GMT Lines: 187 In article <1ii3ctINNp4c@rodan.UU.NET>, avg@rodan.UU.NET (Vadim Antonov) writes: |> In article <1993Jan7.063116.14846@fcom.cc.utah.edu> terry@cs.weber.edu (A Wizard of Earth C) writes: |> >Multiple potential sort orders give lie to this argument. At least several |> >languages have been mentioned (ie: German) where there are multiple possible |> >sort orders. |> |> All languages have a default sorting -- namely the one which is used in |> dictionaries. Existance of multiple sorting rules does not nullify |> the necessity of a default sorting which is sufficient for most cases |> and --more imortant-- can be done in an environment lacking the |> locale required for specialized sortings. This is an arbitrary designation, and one not supported by the Bundespost. |> I bet 99% of users will not care WHICH sorting is used as long as they |> used to it. Other can have their own personal locale. I think I could make an argument for the applicability of the "dont care" state to non-natural collation (one not used at all by the native speakers). This would be a false argument, but I could make it. ;-}. The idea of a "personal locale" requires that there be a means of applying a localized collation. Since this ability is required to be there, why have two mechanisms for collation? (The locale version and the lexical version). |> >Let's call a spade a spade -- collation sequences. The |> >collation sequences for Japanese, for instance, vary on pronuciation. If |> >it were the intent of Unicode to provide a unified collation mechanism, |> >this would be a very strong argument against Chinese/Japanese unification. |> |> I've heard Japanese don't like that unification. Many don't, for various reasons. Some are valid; most are not. |> >Luckily, this was not a goal. |> |> "This car doesn't move, but luckily it was not a goal." "This car won't carry 200 people; luckily, the goal was a car, not a bus." You want a bus. I want a car. |> |> >>Sure, the information can (*must* if you're |> >>going to do trivial things like sorting or case-insensitive comparisons) |> >>be preserved off-text (in mail headers or in file attributes, for |> >>example) but it effectively defeats the very purpose of ISO10646 -- |> >>why on the Earth do i need to spare bits for encoding glyphs if |> >>i already know the language and 8 (or 16 for oriental languages) bits |> >>is quite enough to map the alphabet. Don't you see this gap in |> >>the logic nullifying all benefits of 10646? |> > |> >I don't see how this nullifies the benefits of Unicode (which you seem to |> >be using 10646 as a synonym for, given that this is the only codified |> >portion). |> |> That's not my fault that you don't see that if there are two codes |> one with N bits and the second with M bits per character (N > M) and |> there is an external constraint (aka locale) defining the set of applicable |> characters to be subsets of both M and N the M is more memory-efficient. |> It is no more complex than 2+2 if someone cares to think a bit. You have once again delete my rationale, and left only the statement. There are good reasons (which you keep deleting but not refuting) for a unified character set. You must agree with them, or you would not argue so fervently for your own form of unification. Surely you do not claim the concept of locale destroys them all? |> >First, Unicode is not the sole definition of 10646; just the only currently |> >defined character set within 10646. There is no reason to throw out 10646 |> >because of Unicode (although I could make an argument for 32 bits being a |> >nifty reason for doing so). |> |> They share the common design philosophy and the same fundamental mistakes. A broad statement for one who has never read the Unicode standard (by your own admission in "Message-ID: <1i13rrINNars@rodan.UU.NET>": Glenn> If you have a genuine interest in learning about the facts surrounding Glenn> Unicode and 10646, I would recommend a good reading of the Unicode Glenn> Standard and the Proceedings of the Unicode Implementor's Workshops. Vadim> Thank you, i've got no time to read things i don't need. |> >Second, Unicode buys more than simply another character set; it buys the |> >ability to produce non-conflicting monolingual localizations of software |> >systems (as opposed to conflicting ones as a result of a lack of standards |> >coordination with existing standards). |> |> Who told you that? Sure, ISO happened to introduce so many standards |> that it caused a complete havoc in minds. Then, we don't need one |> more sloppy standard. They exist, they must be retrofitted, if nothing else because of NFS crossmounts between new and old systems. |> >It also buys a platform for |> >non-conflicting multinationalization (multilingual data processing) given |> >a means of compounding documents by language/locale (there may be more than |> >one locale per language). Admittedly, this is not as elegant as a unified |> >glyph set for all languages, but it does charge the penalty to the |> >multilingual (minority) rather than the localized-monolingual (majority) |> >user. |> |> 10646 is inadequate for true multinationalization because it breaks |> existing OS semantics and i hardly doubt there will not be many people |> eager to redesign everything from scratch for the sake of few truly |> multilingual applications. I couldn't agree more. All the better to make design decisions in favor of localization as long as you don't make multinationalization 100% impossible. Apple did this with the MAC (substitute "user" and "programmer") and it was an amazing success. |> >By virtue of the "multiple collating sequences within a single language" |> >argument, the same holds true of your soloution -- worse, there are |> >exception cases in your soloution, while there is a potential uniform |> >impementation on top of Unicode. |> |> Huh? Where did you see the exceptions which aren't present in Unicode |> as well? Quite opposite -- most variations of local sorting rules |> can be reduced to the default algorithm with trivial transliteration. The "default algorithm" is the exception which is not present in Unicode. The rule is localized collation, which you admit must be supported. |> >The Fact is, a multilingual word processor will have to present its menus |> >to the user, probably in his native language, by means of a locale |> >mechanism. If a "vi" style implementation is used (no explict commands if |> >you ignore ":set" and ":map" and all OS escapes), there is still the |> >requirement of localization of error messages and keyboard input. There |> >is no divorcing the language from the application entirely, if the |> >application is one which operates on text as data. |> |> Nobody told that first; and screen editors aren't applications but the |> pieces of SYSTEM software interacting with specific hardware, second. |> The fact that all that termcap/terminfo stuff is on user level is nothing |> more than the old Unix klugde. That explains why EDT works so wonderfully with my Wyse-50. What is your soloution to the termcap/terminfo "kludge"? I may agree with the characterization of a program editor as system software, but I draw the line at a multilingual word processor. |> |> >"The simplest explanation which fits the facts is the correct one" |> > -- William of Occam |> |> If it fits. If it doesn't it becomes a religion. Any religion breeds |> fanatics who mindlessly follow authorities who assume their power |> by means of hierarchial institutions, ritual phrases and assertion |> of superior wisdom of collective bodies. Their speeches are full of |> sacral words, references to obscure documents known only to "belonging" |> and self-praise for being able to make those collective bodies to |> produce the meaningless "words of wisdom" after endless debates on |> insufficient deatils. |> |> Is't this picture familiar? Sounds like the IRS. 8-). Seriously, you sound like you left out a paragraph beginning with "But I, as a founder of the one *true* religion...". Do you honestly believe that it is physically impossible to implement some application (like a multilingual word processor) oin top of Unicode? What reference implementation would convince *you* to "convert"? Is it possible to "convert" you at all (to *anything* other than your suggestion, not to Unicode in particular)? All of your complaints have been addressed by one persons reply or another; admittedly, some of the soloutions have been mighty ugly -- I haven't seen a point yet that doesn't have at least *some* workaround, however. Do you still claim "impossible!", or are you willing to back down to "It's too damn ugly for me!"? Terry Lambert terry@icarus.weber.edu terry_lambert@novell.com --- Any opinions in this posting are my own and not those of my present or previous employers. ------------------------------------------------------------------------------- "I have an 8 user poetic license" - me Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial -------------------------------------------------------------------------------