home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.unix.bsd
- Path: sparky!uunet!cs.utexas.edu!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry
- From: terry@cs.weber.edu (A Wizard of Earth C)
- Subject: Re: INTERNATIONALIZATION: JAPAN, FAR EAST
- Message-ID: <1992Dec16.221634.4879@fcom.cc.utah.edu>
- Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages
- Sender: news@fcom.cc.utah.edu
- Organization: Weber State University (Ogden, UT)
- References: <1992Dec14.185028.9757@fcom.cc.utah.edu> <1gksolINNmkg@frigate.doc.ic.ac.uk> <mathias.724467456@sune.stacken.kth.se>
- Date: Wed, 16 Dec 92 22:16:34 GMT
- Lines: 128
-
- In article <mathias.724467456@sune.stacken.kth.se> mathias@stacken.kth.se (Mathias Bage) writes:
- >In <1gksolINNmkg@frigate.doc.ic.ac.uk> kd@doc.ic.ac.uk (Kostis Dryllerakis) writes:
- [ ... Re: INTERNATIONIZATION ... ]
- >> Preliminary attemps have already been made (I personally work
- >>under X-windows with greek ISO-standard characters without many
- >>problems) but a coordinated effort for internationalisation is indeed
- >>necessary. Note that the rest of the operating systems are currently
- >>"externally touched" in order to support the greek language i.e. bu
- >>hacking your way out.
- >
- > Has anyone in this newsgroup ever heard of the Unicode/ISO10646
- >(UCS) standard? It exists today and has everything (almost), even
- >though the Japanese don't like the sort order of the Kanji
- >characters... Look/ask in comp.internat.std for more info. See also
- >RFC 1345.
-
- I mentioned Unicode as the proposed 386BSD target standard, with ISO
- character set attribution on specific files *within* the file system
- as a means of avoiding eating huge chunks of storage in languages
- with existing 8-bit representations (ie: the to/from translation would
- be done in a file system layer (perhaps the VFS syscall layer) common
- to all file systems).
-
- I would be more likely to endorse Unicode than the 10646 draft standard
- (which includes Unicode) simply because ISO-10646 *is* draft.
-
- Unicode (from 5 of the 7 responses garnered so far) is pretty much
- uniformly hated in Japan; the Japanese seem to prefer the JIS encoding
- (ala kterm and jterm). While this *is* embodied in an existing
- standard (XPG4), it has the drawback of preventing a unified character
- glyph space, such as that provided by Unicode.
-
- I suspect this preference stems from the existing equipment, state
- tables, and IBM VGA support for JIS more than any real prejudice
- against the standard for technical reasons.
-
- The unvarnished facts are:
-
- 1) Microsoft NT is Unicode based.
- 2) Unicode provides a ROMable X font (we'd have to build one;
- it's actually the fact of the non-overlapping glyph space
- that provides an advantage over JIS).
- 3) Unicode provides a means of simultaneous storage of multilingual
- documents on the same system.
- 4) Use of Unicode within the file system's directory service name
- space provides a means of internationalizing 386BSD itself.
- 5) A "Unicode outline font" project is currently under way in
- China.
- 6) Unicode allows for "localization ready" as opposed to simply
- "internationalizable" UNIX tools and utilities.
- 7) Fixed field lengths are observed in utilities/programs regardless
- of the localized language (ie: 80 English characters=80 Greek
- characters=80 Cyrillic characters=80 Kanji characters). A runic
- implementation would cause field lengths to vary, peraps radically.
- 8) Support for nearly all written human languages, with a proposed
- expansion for a larger set.
-
- The drawbacks are:
-
- 1) Non-compliance with XPG4.
- 2) Probable non-compliance with ISO-10646 (due to it being incomplete).
- 3) Japaneese engineers don't like it (probable reason: current JIS
- investment in man hours/money).
- 4) "Connection rules" For languages (like Tamil and Arabic) do not
- translate readily into X display technology.
- 5) A rewrite is necessary for most of the JIS input tables and
- semantics to give an identical key sequence/Kanji presentation
- for Japanese.
-
- The arguments are:
-
- 1) Non-compliance with XPG4 is not a problem, since it is impossible
- to comply with both XPG4 and ISO-10646.
- 2) By utilizing the ISO-10646 draft, conflicts with the completed
- standard can be minimized.
- 3) This is sticky. If the reason Japanese engineers dislike Unicode
- is simply embedded technology (JIS/XPG4-JIS), then we don't have
- a problem... the technology used should not be apparent to the
- user in any case. If the JIS technology is preferred over the
- Unicode technology because of engineering simplification for
- romanji/kana conversion to kanji, then the problem is a little
- more difficult, but is surmountable with ~16K of conversion
- vector tables (small overhead compared to the memory taken by a
- single font). If the JIS ordering is preferred because it aids
- in stroke-count analysis for symbol recognition, *then* we have
- a problem.
- 4) Connection rules for, for instance, Tamil, can not be resolved
- adequately using any of the existing character technologies for
- X; thus it is not at issue.
- 5) A rewrite will be necessary for these tables regardless, even were
- we to choose XPG4-JIS encoding, if only because the encoding is
- going to vary when the character tables are offset to form a
- Unicode-like non-intesecting glyph set (necessary for "localization
- ready" as opposed to "internationalizable" OS and tools).
-
- Definitions:
-
- localization ready: Missing per-locale translation of text
- strings. All work has been done to
- display drivers & environment to support
- drop in message databases in the local
- language.
-
- internationalizable: Missing per-locale translation of text
- strings. Missing OS/FS support for
- local language representation. May
- run "localized" apps like jterm/kterm.
-
-
- A significant advantage of a "localization ready" OS is the ability to
- supply a "default" environment through a static which is modified by
- examination of the "LOCALE" or other language specification mechanism
- in the user's environment. Thus all applications written on the
- system are already "enabled" by virtue of their use of the C library;
- this assumes use of "unichar" types, etc., within the applications.
-
-
- Terry Lambert
- terry@icarus.weber.edu
- terry_lambert@novell.com
- ---
- Any opinions in this posting are my own and not those of my present
- or previous employers.
- --
- -------------------------------------------------------------------------------
- "I have an 8 user poetic license" - me
- Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial
- -------------------------------------------------------------------------------
-