NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / std / internat / 1079 < prev next >

Wrap

Text File | 1993-01-07 | 4.3 KB | 88 lines

Newsgroups: comp.std.internat Path: sparky!uunet!zaphod.mps.ohio-state.edu!malgudi.oar.net!caen!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry From: terry@cs.weber.edu (A Wizard of Earth C) Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST) Message-ID: <1993Jan7.071324.15413@fcom.cc.utah.edu> Sender: news@fcom.cc.utah.edu Organization: Weber State University (Ogden, UT) References: <1i0s05INNnfn@rodan.UU.NET> <TT.93Jan1135637@tarzan.jyu.fi> <1i2h7cINN3qj@rodan.UU.NET> Date: Thu, 7 Jan 93 07:13:24 GMT Lines: 76 In article <1i2h7cINN3qj@rodan.UU.NET> avg@rodan.UU.NET (Vadim Antonov) writes: >In article <TT.93Jan1135637@tarzan.jyu.fi> tt@tarzan.jyu.fi (Tapani Tarvainen) writes: >>>Unicode (and for that matter Plan 9 UTF) does not support the last >>>two mentioned functions. I have yet to see Plan 9 _sort_ which will >>>sort Russian strings without being told explicitly that it is Russian. >> >>So what? >>I've yet to see anything even planned that would allow sorting >>both Finnish and German without being told which is wanted. >>In fact I can't even imagine one that would make any sense. >>In the case of a list of names, the very same data could be >>sorted differently depending on where it is going to be used. > >Pfrr, take a look at DEMOS Unix-likes -- they do sort both Russian and English >without being told which is wanted. [<ah>-<ya>]* in shell really selects > ^ ^ -imagine real cyrillic letters here >all files startting from lowercase russian letter. lex generates correct >parsers for languages with russian keywords. Grep works as it is supposed to. >So far no user complained that there are two o's and two A's in the code. > >It is not impossible -- it's rather easy if the right code is choosen. Well, this is a bilingual example, not a multilingual one including East Asian languages; it's also less than perfect for bilingual mixing of character sets with intersecting glyph sets: there is not only an implied lexical order in each language (which *is* not valid for languages with multiple possible collating sequences, such as German), there is an implied ordering of languages. Let's say for the sake of argument that someone standardized your suggestion (no mean feat, considering the number of glyphs you want!), and lets say they even resolved the problem with your suggestion that hasn't even been mentioned yet because the majority of us believe there are more significant problems than you suggest: Searching for explicit character patterns in a text file (like searching for "u in a file and expecting it to match every instance of 'Gr"uen' or other u-umlaut, except it can't because you only typed in one of the 75 possible u-umlaut's, and it was the wrong one). Accept all that as given. For the sake of argument, I will. I now demand that English be first in the implied lexical ordering of character sets within your "super character set" so that *my* files sort before *your* files. Ut-oh -- another can of worms, I guess... 8-). >>Tying sorting rules to character sets is not a good idea, IMHO. > >Would you like to specify the language for every range in every regular >expression you use? Nah. As a user, I'll get some programmer to do it for me once, and set my locale, and never worry about it again. I'll expect the word processor to apply the correct sort ordering to the German dictionary my first client is publishing and the German phone book my other client is publishing, of course -- you can do that with the "super character set", can't you? >Embedding sorting rules to character sets is not a "good idea" -- it's >a necessity and you can do nothing about it. Provide me an implementation for a language with multiple collating sequences, then maybe I'll agree with you, as long as it doesn't apply equally well to a Unicode character set. Terry Lambert terry@icarus.weber.edu terry_lambert@novell.com --- Any opinions in this posting are my own and not those of my present or previous employers. -- ------------------------------------------------------------------------------- "I have an 8 user poetic license" - me Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial -------------------------------------------------------------------------------