home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!munnari.oz.au!spool.mu.edu!sgiblab!nec-gw!nec-tyo!wnoc-tyo-news!cs.titech!titccy.cc.titech!necom830!mohta
- From: mohta@necom830.cc.titech.ac.jp (Masataka Ohta)
- Newsgroups: comp.unix.bsd
- Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST)
- Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages
- Message-ID: <2615@titccy.cc.titech.ac.jp>
- Date: 4 Jan 93 18:57:31 GMT
- References: <id.M2XV.VTA@ferranti.com> <1992Dec18.043033.14254@midway.uchicago.edu> <1992Dec18.212323.26882@netcom.com> <1992Dec19.083137.4400@fcom.cc.utah.edu> <2564@titccy.cc.titech.ac.jp> <1992Dec28.062554.24144@fcom.cc.utah.edu>
- Sender: news@titccy.cc.titech.ac.jp
- Organization: Tokyo Institute of Technology
- Lines: 131
-
- In article <1992Dec28.062554.24144@fcom.cc.utah.edu>
- terry@cs.weber.edu (A Wizard of Earth C) writes:
-
- >|> Do you know what Shift JIS is? It's a defacto standard for charcter encoding
- >|> established by microsoft, NEC, ASCII etc. and common in Japanese PC market.
- >
- >I am aware of JIS; however, even you must agree that the Japaneese hardware
- >and software markets have not reached the level of "commodity hardware"
- >found elsewhere in the world (ie: the US and Europe).
-
- Sigh... WIth DOS/V you can use Japanese on YOUR "commodity hardware".
-
- >I think other mechanisms, such as ATOK, Wnn, and KanjiHand deserve to be
- >examined. One method would be to adopt exactly the input mechanism of
- >"Ichi-Taro" (the most popular NEC 98 word processer).
-
- They run also on IBM/PC.
-
- >|> In the workstation market in Japan, some supports Shift JIS, some
- >|> supports EUC and some supports both. Of course, many US companies
- >|> sell Japanized UNIX on thier workstations.
- >
- >I think this is precisely what we want to avoid -- localization. The basic
- >difference, to my mind, is that localization invloves the maintenance of
- >multiple code sets, whereas internationalization requires maintenance of
- >multiple data sets, a much smaller job.
-
- >This I don't understand. The maximum translation table from one 16 bit value
- >to another is 16k.
-
- WHAAAAT? It's 128KB, not 16k.
-
- >This means 2 16k tables for translation into/out of
- >Unicode for Input/Output devices,
-
- I'm afraid you don't know what Unicode is. What, do you mean, "tables for
- translation" is?
-
- >I don't see why the storage mechanism in any way effects the validity of the
- >data
-
- *I* don't see why the storage mechanism in any way effects the validity of the
- data
-
- >and thus I don't understand *why* you say "with Unicode, we can't
- >achieve internationalization."
-
- Because we can't process a data mixed with Japanese and Chinese.
-
- >I don't understand this, either. This is like saying PC ASCII can not cover
- >both the US and the UK because the American and English pound signs are not
- >the same, or that it can't cover German or Dutch because of the 7 characters
- >difference needed for support of those languages.
-
- Wrong. The US and UK sign are the same character while they might be assigned
- different code points in different countryies.
-
- Thus, in universal coded character set, it is corrent to assign a
- single code point to the single pound sign, even though the character
- is used both in US and UK.
-
- But, corresponding characters in China/Japan, which do not share the
- same graphical representation even on the moderate quality printers
- thus different characters, are assigned the same code point in Unicode.
-
- >|> Of course, it is possible to LOCALIZE Unicode so that it produces
- >|> Japanese characters only or Chinese characters only. But don't we
- >|> need internationalization?
- >
- >The point of an internationalization effort (as *opposed* to a localization
- >effort) is the coexistance of languages within the same processing means.
- >The point is not to produce something which is capable of "only English" or
- >"only French" or "only Japanese" at the flick of an environment variable;
- >the point is to produce something which is *data driven* and localized by
- >a change of data rather than by a change of code. To do otherwise would
- >require the use of multiple code trees for each language, which was the
- >entire impetus for an internationalization effort in the first place.
-
- That is THE problem of Unicode.
-
- I was informed that MicroSoft will provide a LOCALIZATION mechanism
- to print correnponding Chinese/Japanese characters of Unicode
- differently.
-
- So, HOW can we MIX Chinese and Japanese without LOCALIZATION?
-
- >your argument that the lexical order of the
- >target language effects the usability of a storage standard is invalid.
-
- My argument has nothing to do with lexical ordering.
-
- >Sure, the translation mechanisms may be *easier* to code given localization
- >of lexical ordering, but that doesn't mean they *can't* be coded otherwise;
-
- Of course, any coding is equally OK for translation.
-
- >This involves yet another
- >set of localization-specific storage tables to translate from an ISO or
- >other local font to Unicode and back on attributed file storage.
-
- FILE ATTRIBUTE!!!!!????? *IT* *IS* *EVIL*. Do you really know UNIX?
-
- How can you "cat" two files with different file attributes?
-
- What attribute can you attach to semi binary file, in which some field
- contains an ASCII string and some other field contains a JIS string?
-
- >To do
- >otherwise would require 16 bit sotrage of files, or worse, runic encoding
- >of any non-US ASCII characters in a file. This either doubles the file
- >size for all text files (something the west _will_not_accept_),
-
- Do you know what UTF is?
-
- >or
- >"pollutes" the files (all files except those stored in US-ASCII have file
- >sizes which no longer reflect true character counts on the file).
-
- That's already true for languages like Japanese, whose characters are
- NOT ALWAYS (but sometimes) represented with a single byte.
-
- But, what's wrong with that?
-
- >Admittedly, these mechanisms are adapatable for XPG4 (not widely available)
- >and XPG3 (does not support eastern languages), but the MicroSoft adoption
- >of Unicode tells us that at least 90% of the market is now committed to
- >Unicode, if not now, then in the near future.
-
- Do you think MicroSoft will use file attributes?
-
- Masataka Ohta
-