home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.std.internat
- Path: sparky!uunet!zaphod.mps.ohio-state.edu!malgudi.oar.net!caen!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry
- From: terry@cs.weber.edu (A Wizard of Earth C)
- Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST)
- Message-ID: <1993Jan7.071324.15413@fcom.cc.utah.edu>
- Sender: news@fcom.cc.utah.edu
- Organization: Weber State University (Ogden, UT)
- References: <1i0s05INNnfn@rodan.UU.NET> <TT.93Jan1135637@tarzan.jyu.fi> <1i2h7cINN3qj@rodan.UU.NET>
- Date: Thu, 7 Jan 93 07:13:24 GMT
- Lines: 76
-
- In article <1i2h7cINN3qj@rodan.UU.NET> avg@rodan.UU.NET (Vadim Antonov) writes:
- >In article <TT.93Jan1135637@tarzan.jyu.fi> tt@tarzan.jyu.fi (Tapani Tarvainen) writes:
- >>>Unicode (and for that matter Plan 9 UTF) does not support the last
- >>>two mentioned functions. I have yet to see Plan 9 _sort_ which will
- >>>sort Russian strings without being told explicitly that it is Russian.
- >>
- >>So what?
- >>I've yet to see anything even planned that would allow sorting
- >>both Finnish and German without being told which is wanted.
- >>In fact I can't even imagine one that would make any sense.
- >>In the case of a list of names, the very same data could be
- >>sorted differently depending on where it is going to be used.
- >
- >Pfrr, take a look at DEMOS Unix-likes -- they do sort both Russian and English
- >without being told which is wanted. [<ah>-<ya>]* in shell really selects
- > ^ ^ -imagine real cyrillic letters here
- >all files startting from lowercase russian letter. lex generates correct
- >parsers for languages with russian keywords. Grep works as it is supposed to.
- >So far no user complained that there are two o's and two A's in the code.
- >
- >It is not impossible -- it's rather easy if the right code is choosen.
-
- Well, this is a bilingual example, not a multilingual one including East
- Asian languages; it's also less than perfect for bilingual mixing of
- character sets with intersecting glyph sets: there is not only an implied
- lexical order in each language (which *is* not valid for languages with
- multiple possible collating sequences, such as German), there is an implied
- ordering of languages.
-
- Let's say for the sake of argument that someone standardized your suggestion
- (no mean feat, considering the number of glyphs you want!), and lets say they
- even resolved the problem with your suggestion that hasn't even been mentioned
- yet because the majority of us believe there are more significant problems
- than you suggest: Searching for explicit character patterns in a text file
- (like searching for "u in a file and expecting it to match every instance of
- 'Gr"uen' or other u-umlaut, except it can't because you only typed in one of
- the 75 possible u-umlaut's, and it was the wrong one).
-
- Accept all that as given. For the sake of argument, I will.
-
- I now demand that English be first in the implied lexical ordering of
- character sets within your "super character set" so that *my* files sort
- before *your* files.
-
- Ut-oh -- another can of worms, I guess... 8-).
-
- >>Tying sorting rules to character sets is not a good idea, IMHO.
- >
- >Would you like to specify the language for every range in every regular
- >expression you use?
-
- Nah. As a user, I'll get some programmer to do it for me once, and set
- my locale, and never worry about it again. I'll expect the word processor
- to apply the correct sort ordering to the German dictionary my first
- client is publishing and the German phone book my other client is publishing,
- of course -- you can do that with the "super character set", can't you?
-
- >Embedding sorting rules to character sets is not a "good idea" -- it's
- >a necessity and you can do nothing about it.
-
- Provide me an implementation for a language with multiple collating
- sequences, then maybe I'll agree with you, as long as it doesn't apply
- equally well to a Unicode character set.
-
-
- Terry Lambert
- terry@icarus.weber.edu
- terry_lambert@novell.com
- ---
- Any opinions in this posting are my own and not those of my present
- or previous employers.
- --
- -------------------------------------------------------------------------------
- "I have an 8 user poetic license" - me
- Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial
- -------------------------------------------------------------------------------
-