home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.std.internat
- Path: sparky!uunet!spool.mu.edu!agate!dog.ee.lbl.gov!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry
- From: terry@cs.weber.edu (A Wizard of Earth C)
- Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST)
- Message-ID: <1993Jan7.065611.15193@fcom.cc.utah.edu>
- Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages
- Sender: news@fcom.cc.utah.edu
- Organization: Weber State University (Ogden, UT)
- References: <1i0s05INNnfn@rodan.UU.NET> <1993Jan1.114158.17149@prl.dec.com> <1i2emiINN2td@rodan.UU.NET>
- Date: Thu, 7 Jan 93 06:56:11 GMT
- Lines: 95
-
- In article <1i2emiINN2td@rodan.UU.NET> avg@rodan.UU.NET (Vadim Antonov) writes:
- >In article <1993Jan1.114158.17149@prl.dec.com> boyd@prl.dec.com (Boyd Roberts) writes:
- >>In article <1i0s05INNnfn@rodan.UU.NET>, avg@rodan.UU.NET (Vadim Antonov) writes:
- >>>
- >>> A good encoding should support easy (i'd say natural) localization.
- >>> It should provide simple algorithms for simple functions
- >>> like getting string length, searching a character, case-insensitive
- >>> comparison, lexicographical comparison.
- >>>
- >>
- >>Well that's where you're wrong. The characters and how they are used
- >>are distinct problems.
- >
- >Don't you realize that having trivial programs to ask which language
- >they're doing operation in effectively defeats the entire purpose of
- >Unicode?
-
- This is not directly required (nor desirable) given the standard tools
- for locale specification. It does not follow as an inevitable conclusion
- of Boyd's statement.
-
- >Should my shell ask me about language of every [a-z] in my commands?
-
- No, the shell should be localized to the users language preference, as should
- the commands.
-
- >If it shouldn't then it has to get the information somewhere, right?
-
- Right.
-
- >If the information is kept outside the text (file names in this case)
- >then why do i need all those extra bits -- my program *already* knows the exact
- >(small) alphabet.
-
- Good question. Answer: Because it reduces the set of functions required
- to process each language. There is not a single function per language
- for text manipulation. It simplifies the interface to do file manipulation
- based on text, and it allows for relatively somple localization of any
- application. In addition, it provides a basis for multilingual processing;
- you can either do this however you choose to, or wait for followon standards,
- or use existing standards which don't conflict with the basic Unicode
- concept.
-
- I wouldn't place the information in the file name itself, however; this has
- the unfortunate effect of needing to attribute a character to a particular
- language -- this assumption that the file name could hold the information
- is not valid for Unicode (as you have so frequently pointed out). Much
- better to either embed information in the data (for a multilingual file)
- or in the file attributes within the file system (for a monolingual file).
-
- >"Unicode -- a code for texts which will never be sorted!" Great.
-
- As I (and nearly everyone else in this forum) pointed out in a prior
- article, the argument that sorting is associated with lexical order fails
- even under your assumption of a character set per language/locale because
- of the existance of multiple sort orders per locale. If the information
- must be maintained externaly to the data for such languages, why not for
- all languages?
-
- >>Problem 2 (localisation) is damn hard.
- >
- >Tell me. I've spent ten years doing *real* localization and i know
- >the price of ill-thought solutions on the ground level (aka character
- >set ordering).
-
- Again, sorting can not be safely tied to character set lexical order for
- all languages. I disagree with Boyd here. Localization with Unicode is
- a piece of cake. Unicode allows it to be entirely data driven, with no
- locale-specific algorithms or hard-coded data.
-
- >>Should Problem 1 cater for the fact I type `localisation' whereas
- >>you type `localization'? We're both using Engligh, typed on American
- >>keyboards (I guess, oops mine's made in West Germany) so where are you
- >>going to draw the line. Is this Problem 1? I say it's Problem 2.
- >
- >The example is artificial and has nothing to do with the character sets.
- >As you well aware it is different words in the same alphabet.
-
- This example becomes more of a problem when translated to one of a glyph
- variant between Chinese and Japanese. I agree that the problem is one
- of words, not characters -- however, in ideographic languages, words *are*
- characters. The example is not as artificial as you make out.
-
-
- Terry Lambert
- terry@icarus.weber.edu
- terry_lambert@novell.com
- ---
- Any opinions in this posting are my own and not those of my present
- or previous employers.
- --
- -------------------------------------------------------------------------------
- "I have an 8 user poetic license" - me
- Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial
- -------------------------------------------------------------------------------
-