home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!europa.asd.contel.com!emory!gatech!enterpoop.mit.edu!mintaka.lcs.mit.edu!ai-lab!wheat-chex!glenn
- From: glenn@wheat-chex.ai.mit.edu (Glenn A. Adams)
- Newsgroups: comp.std.internat
- Subject: Unicode Han Characters [was Re: Language tagging]
- Date: 6 Jan 1993 17:59:12 GMT
- Organization: MIT Artificial Intelligence Laboratory
- Lines: 67
- Message-ID: <1if6lgINN6ri@life.ai.mit.edu>
- References: <2609@titccy.cc.titech.ac.jp> <1iav6tINNee2@life.ai.mit.edu> <MELBY.93Jan6113951@dove.yk.fujitsu.co.jp>
- NNTP-Posting-Host: wheat-chex.ai.mit.edu
- Keywords: Unicode ISO10646 Hanzi Kanji Hanja
-
- In article <MELBY.93Jan6113951@dove.yk.fujitsu.co.jp> melby@dove.yk.fujitsu.co.jp (John B. Melby) writes:
- >If what I have heard about the Unicode standard is accurate, Chinese
- >simplified forms are not distinguished from Chinese unsimplified forms
- >when they are effectively equivalent.
-
- What you have heard is inaccurate. Simplified Chinese forms are encoded
- separately from their traditional counterparts.
-
- >Of course, there is one major flaw in the preliminary Unicode version: the
- >Japanese simplified form of "sakura" (ying1hua1 de ying1) is not included.
-
- I'm not sure which "preliminary" Unicode version you are referring to;
- however, Unicode does have at U+6A31 the character ying1hua1 de ying1.
- Is the simplified Japanese form you refer to contained in JISX0208 or
- JISX0212? If it is, then it is in Unicode 1.0. If it isn't, a second
- level of Han characters is now being formulated by the CJK-JRG for inclusion
- in a future version of Unicode. The initial collection of Han characters in
- Unicode comprises the Unified Repertoire and Ordering 2.0 produced by
- CJK-JRG (CJK Joint Research Group), currently headed by Kato Shigenobu
- of Toppan Printing, Japan, with delegation leaders Dominic Cheng (Hong
- Kong Information Technologies Federation); Miyazawa Akira (NACSIS),
- Japan; Su Liang (Mitac, Taipei Computer Association), Taiwan; Oh Young-Taik
- (Korean Bureau of Standards); Zhang Zhoucai (Center for Computer Information
- Development Research, Ministry of Machinery and Electronic Industry), China.
- The members of the CJK-JRG editorial subcommitte who were responsible
- for final review and verification of URO 2.0 were Lee Collins (Taligent), USA;
- Kao Tien-Cheu (Institute for Information Industry), Taiwan; Koike Tateo
- (Hitachi), Japan; Prof. Lee Choon Tack (Kongju National University), Korea;
- and Zhang Zhoucai (CCID/MMEI), China.
-
- As is quite clear from this list, the Han Characters in Unicode (10646)
- were hardly the sole efforts of USA members. An extraordinary amount
- of expertise, predominantly from the CJK countries themselves, went into
- the development of URO 2.0. As I mentioned above, the work of CJK-JRG
- is not complete, as a subsequent collection of characters is now being
- developed for inclusion into the Unicode/10646 standard.
-
- The first collection of Han characters currently included in Unicode
- represent the most important and vast majority of existing CJK character
- sets. These include all characters from GB 2312-80, GB 12345-90,
- GB8565-89, CNS116453 (planes 1 & 2), JIS X 0208-1980, JIS X 0212-1990,
- KS C 5601-1989, and KS C 5657-1991. In addition, some characters of the
- unsimplified form of GB 7589-87, the unsimplified form of GB 7590-87,
- CNS 11643 (plane 14), the old Chinese telegraph code, and unique characters
- from ANSI Z.39.64-1989 were included. If two characters were distinct
- in any character set of the first list of character sets above, then
- they are distinct in Unicode -- this is called the Source Set Separation
- rule.
-
- The second collection of Han characters now being considered by CJK-JRG
- include those which were not encoded by any of the above mentioned
- character sets and those which are not currently encoded by any character
- set. The analysis and inclusion of these new Han characters into
- Unicode will be a major project over the next few years.
-
- Because Unicode *does* incorporate all Han characters used by the important
- and widely used CJK character sets, implementations based on Unicode will
- immediately be able to interoperate with data currently encoded in these
- character sets.
-
- If you would like to obtain (unofficial) mappings between a number of
- these character sets and Unicode, you can find them by anonymous FTP
- on METIS.COM [140.186.33.40] in /pub/csets.
-
- Glenn Adams
-
-
-