NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / std / internat / 1076 < prev next >

Wrap

Text File | 1993-01-07 | 4.9 KB | 108 lines

Newsgroups: comp.std.internat Path: sparky!uunet!spool.mu.edu!agate!dog.ee.lbl.gov!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry From: terry@cs.weber.edu (A Wizard of Earth C) Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST) Message-ID: <1993Jan7.065611.15193@fcom.cc.utah.edu> Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages Sender: news@fcom.cc.utah.edu Organization: Weber State University (Ogden, UT) References: <1i0s05INNnfn@rodan.UU.NET> <1993Jan1.114158.17149@prl.dec.com> <1i2emiINN2td@rodan.UU.NET> Date: Thu, 7 Jan 93 06:56:11 GMT Lines: 95 In article <1i2emiINN2td@rodan.UU.NET> avg@rodan.UU.NET (Vadim Antonov) writes: >In article <1993Jan1.114158.17149@prl.dec.com> boyd@prl.dec.com (Boyd Roberts) writes: >>In article <1i0s05INNnfn@rodan.UU.NET>, avg@rodan.UU.NET (Vadim Antonov) writes: >>> >>> A good encoding should support easy (i'd say natural) localization. >>> It should provide simple algorithms for simple functions >>> like getting string length, searching a character, case-insensitive >>> comparison, lexicographical comparison. >>> >> >>Well that's where you're wrong. The characters and how they are used >>are distinct problems. > >Don't you realize that having trivial programs to ask which language >they're doing operation in effectively defeats the entire purpose of >Unicode? This is not directly required (nor desirable) given the standard tools for locale specification. It does not follow as an inevitable conclusion of Boyd's statement. >Should my shell ask me about language of every [a-z] in my commands? No, the shell should be localized to the users language preference, as should the commands. >If it shouldn't then it has to get the information somewhere, right? Right. >If the information is kept outside the text (file names in this case) >then why do i need all those extra bits -- my program *already* knows the exact >(small) alphabet. Good question. Answer: Because it reduces the set of functions required to process each language. There is not a single function per language for text manipulation. It simplifies the interface to do file manipulation based on text, and it allows for relatively somple localization of any application. In addition, it provides a basis for multilingual processing; you can either do this however you choose to, or wait for followon standards, or use existing standards which don't conflict with the basic Unicode concept. I wouldn't place the information in the file name itself, however; this has the unfortunate effect of needing to attribute a character to a particular language -- this assumption that the file name could hold the information is not valid for Unicode (as you have so frequently pointed out). Much better to either embed information in the data (for a multilingual file) or in the file attributes within the file system (for a monolingual file). >"Unicode -- a code for texts which will never be sorted!" Great. As I (and nearly everyone else in this forum) pointed out in a prior article, the argument that sorting is associated with lexical order fails even under your assumption of a character set per language/locale because of the existance of multiple sort orders per locale. If the information must be maintained externaly to the data for such languages, why not for all languages? >>Problem 2 (localisation) is damn hard. > >Tell me. I've spent ten years doing *real* localization and i know >the price of ill-thought solutions on the ground level (aka character >set ordering). Again, sorting can not be safely tied to character set lexical order for all languages. I disagree with Boyd here. Localization with Unicode is a piece of cake. Unicode allows it to be entirely data driven, with no locale-specific algorithms or hard-coded data. >>Should Problem 1 cater for the fact I type `localisation' whereas >>you type `localization'? We're both using Engligh, typed on American >>keyboards (I guess, oops mine's made in West Germany) so where are you >>going to draw the line. Is this Problem 1? I say it's Problem 2. > >The example is artificial and has nothing to do with the character sets. >As you well aware it is different words in the same alphabet. This example becomes more of a problem when translated to one of a glyph variant between Chinese and Japanese. I agree that the problem is one of words, not characters -- however, in ideographic languages, words *are* characters. The example is not as artificial as you make out. Terry Lambert terry@icarus.weber.edu terry_lambert@novell.com --- Any opinions in this posting are my own and not those of my present or previous employers. -- ------------------------------------------------------------------------------- "I have an 8 user poetic license" - me Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial -------------------------------------------------------------------------------