NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / std / internat / 1063 < prev next >

Wrap

Internet Message Format | 1993-01-06 | 4.3 KB

Path: sparky!uunet!europa.asd.contel.com!emory!gatech!enterpoop.mit.edu!mintaka.lcs.mit.edu!ai-lab!wheat-chex!glenn From: glenn@wheat-chex.ai.mit.edu (Glenn A. Adams) Newsgroups: comp.std.internat Subject: Unicode Han Characters [was Re: Language tagging] Date: 6 Jan 1993 17:59:12 GMT Organization: MIT Artificial Intelligence Laboratory Lines: 67 Message-ID: <1if6lgINN6ri@life.ai.mit.edu> References: <2609@titccy.cc.titech.ac.jp> <1iav6tINNee2@life.ai.mit.edu> <MELBY.93Jan6113951@dove.yk.fujitsu.co.jp> NNTP-Posting-Host: wheat-chex.ai.mit.edu Keywords: Unicode ISO10646 Hanzi Kanji Hanja In article <MELBY.93Jan6113951@dove.yk.fujitsu.co.jp> melby@dove.yk.fujitsu.co.jp (John B. Melby) writes: >If what I have heard about the Unicode standard is accurate, Chinese >simplified forms are not distinguished from Chinese unsimplified forms >when they are effectively equivalent. What you have heard is inaccurate. Simplified Chinese forms are encoded separately from their traditional counterparts. >Of course, there is one major flaw in the preliminary Unicode version: the >Japanese simplified form of "sakura" (ying1hua1 de ying1) is not included. I'm not sure which "preliminary" Unicode version you are referring to; however, Unicode does have at U+6A31 the character ying1hua1 de ying1. Is the simplified Japanese form you refer to contained in JISX0208 or JISX0212? If it is, then it is in Unicode 1.0. If it isn't, a second level of Han characters is now being formulated by the CJK-JRG for inclusion in a future version of Unicode. The initial collection of Han characters in Unicode comprises the Unified Repertoire and Ordering 2.0 produced by CJK-JRG (CJK Joint Research Group), currently headed by Kato Shigenobu of Toppan Printing, Japan, with delegation leaders Dominic Cheng (Hong Kong Information Technologies Federation); Miyazawa Akira (NACSIS), Japan; Su Liang (Mitac, Taipei Computer Association), Taiwan; Oh Young-Taik (Korean Bureau of Standards); Zhang Zhoucai (Center for Computer Information Development Research, Ministry of Machinery and Electronic Industry), China. The members of the CJK-JRG editorial subcommitte who were responsible for final review and verification of URO 2.0 were Lee Collins (Taligent), USA; Kao Tien-Cheu (Institute for Information Industry), Taiwan; Koike Tateo (Hitachi), Japan; Prof. Lee Choon Tack (Kongju National University), Korea; and Zhang Zhoucai (CCID/MMEI), China. As is quite clear from this list, the Han Characters in Unicode (10646) were hardly the sole efforts of USA members. An extraordinary amount of expertise, predominantly from the CJK countries themselves, went into the development of URO 2.0. As I mentioned above, the work of CJK-JRG is not complete, as a subsequent collection of characters is now being developed for inclusion into the Unicode/10646 standard. The first collection of Han characters currently included in Unicode represent the most important and vast majority of existing CJK character sets. These include all characters from GB 2312-80, GB 12345-90, GB8565-89, CNS116453 (planes 1 & 2), JIS X 0208-1980, JIS X 0212-1990, KS C 5601-1989, and KS C 5657-1991. In addition, some characters of the unsimplified form of GB 7589-87, the unsimplified form of GB 7590-87, CNS 11643 (plane 14), the old Chinese telegraph code, and unique characters from ANSI Z.39.64-1989 were included. If two characters were distinct in any character set of the first list of character sets above, then they are distinct in Unicode -- this is called the Source Set Separation rule. The second collection of Han characters now being considered by CJK-JRG include those which were not encoded by any of the above mentioned character sets and those which are not currently encoded by any character set. The analysis and inclusion of these new Han characters into Unicode will be a major project over the next few years. Because Unicode *does* incorporate all Han characters used by the important and widely used CJK character sets, implementations based on Unicode will immediately be able to interoperate with data currently encoded in these character sets. If you would like to obtain (unofficial) mappings between a number of these character sets and Unicode, you can find them by anonymous FTP on METIS.COM [140.186.33.40] in /pub/csets. Glenn Adams