NetNews Usenet Archive 1992 #30

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #30 / NN_1992_30.iso / spool / comp / unix / bsd / 10297 < prev next >

Wrap

Text File | 1992-12-16 | 6.5 KB | 141 lines

Newsgroups: comp.unix.bsd Path: sparky!uunet!cs.utexas.edu!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry From: terry@cs.weber.edu (A Wizard of Earth C) Subject: Re: INTERNATIONALIZATION: JAPAN, FAR EAST Message-ID: <1992Dec16.221634.4879@fcom.cc.utah.edu> Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages Sender: news@fcom.cc.utah.edu Organization: Weber State University (Ogden, UT) References: <1992Dec14.185028.9757@fcom.cc.utah.edu> <1gksolINNmkg@frigate.doc.ic.ac.uk> <mathias.724467456@sune.stacken.kth.se> Date: Wed, 16 Dec 92 22:16:34 GMT Lines: 128 In article <mathias.724467456@sune.stacken.kth.se> mathias@stacken.kth.se (Mathias Bage) writes: >In <1gksolINNmkg@frigate.doc.ic.ac.uk> kd@doc.ic.ac.uk (Kostis Dryllerakis) writes: [ ... Re: INTERNATIONIZATION ... ] >> Preliminary attemps have already been made (I personally work >>under X-windows with greek ISO-standard characters without many >>problems) but a coordinated effort for internationalisation is indeed >>necessary. Note that the rest of the operating systems are currently >>"externally touched" in order to support the greek language i.e. bu >>hacking your way out. > > Has anyone in this newsgroup ever heard of the Unicode/ISO10646 >(UCS) standard? It exists today and has everything (almost), even >though the Japanese don't like the sort order of the Kanji >characters... Look/ask in comp.internat.std for more info. See also >RFC 1345. I mentioned Unicode as the proposed 386BSD target standard, with ISO character set attribution on specific files *within* the file system as a means of avoiding eating huge chunks of storage in languages with existing 8-bit representations (ie: the to/from translation would be done in a file system layer (perhaps the VFS syscall layer) common to all file systems). I would be more likely to endorse Unicode than the 10646 draft standard (which includes Unicode) simply because ISO-10646 *is* draft. Unicode (from 5 of the 7 responses garnered so far) is pretty much uniformly hated in Japan; the Japanese seem to prefer the JIS encoding (ala kterm and jterm). While this *is* embodied in an existing standard (XPG4), it has the drawback of preventing a unified character glyph space, such as that provided by Unicode. I suspect this preference stems from the existing equipment, state tables, and IBM VGA support for JIS more than any real prejudice against the standard for technical reasons. The unvarnished facts are: 1) Microsoft NT is Unicode based. 2) Unicode provides a ROMable X font (we'd have to build one; it's actually the fact of the non-overlapping glyph space that provides an advantage over JIS). 3) Unicode provides a means of simultaneous storage of multilingual documents on the same system. 4) Use of Unicode within the file system's directory service name space provides a means of internationalizing 386BSD itself. 5) A "Unicode outline font" project is currently under way in China. 6) Unicode allows for "localization ready" as opposed to simply "internationalizable" UNIX tools and utilities. 7) Fixed field lengths are observed in utilities/programs regardless of the localized language (ie: 80 English characters=80 Greek characters=80 Cyrillic characters=80 Kanji characters). A runic implementation would cause field lengths to vary, peraps radically. 8) Support for nearly all written human languages, with a proposed expansion for a larger set. The drawbacks are: 1) Non-compliance with XPG4. 2) Probable non-compliance with ISO-10646 (due to it being incomplete). 3) Japaneese engineers don't like it (probable reason: current JIS investment in man hours/money). 4) "Connection rules" For languages (like Tamil and Arabic) do not translate readily into X display technology. 5) A rewrite is necessary for most of the JIS input tables and semantics to give an identical key sequence/Kanji presentation for Japanese. The arguments are: 1) Non-compliance with XPG4 is not a problem, since it is impossible to comply with both XPG4 and ISO-10646. 2) By utilizing the ISO-10646 draft, conflicts with the completed standard can be minimized. 3) This is sticky. If the reason Japanese engineers dislike Unicode is simply embedded technology (JIS/XPG4-JIS), then we don't have a problem... the technology used should not be apparent to the user in any case. If the JIS technology is preferred over the Unicode technology because of engineering simplification for romanji/kana conversion to kanji, then the problem is a little more difficult, but is surmountable with ~16K of conversion vector tables (small overhead compared to the memory taken by a single font). If the JIS ordering is preferred because it aids in stroke-count analysis for symbol recognition, *then* we have a problem. 4) Connection rules for, for instance, Tamil, can not be resolved adequately using any of the existing character technologies for X; thus it is not at issue. 5) A rewrite will be necessary for these tables regardless, even were we to choose XPG4-JIS encoding, if only because the encoding is going to vary when the character tables are offset to form a Unicode-like non-intesecting glyph set (necessary for "localization ready" as opposed to "internationalizable" OS and tools). Definitions: localization ready: Missing per-locale translation of text strings. All work has been done to display drivers & environment to support drop in message databases in the local language. internationalizable: Missing per-locale translation of text strings. Missing OS/FS support for local language representation. May run "localized" apps like jterm/kterm. A significant advantage of a "localization ready" OS is the ability to supply a "default" environment through a static which is modified by examination of the "LOCALE" or other language specification mechanism in the user's environment. Thus all applications written on the system are already "enabled" by virtue of their use of the C library; this assumes use of "unichar" types, etc., within the applications. Terry Lambert terry@icarus.weber.edu terry_lambert@novell.com --- Any opinions in this posting are my own and not those of my present or previous employers. -- ------------------------------------------------------------------------------- "I have an 8 user poetic license" - me Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial -------------------------------------------------------------------------------