home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!think.com!enterpoop.mit.edu!mintaka.lcs.mit.edu!ai-lab!wheat-chex!glenn
- From: glenn@wheat-chex.ai.mit.edu (Glenn A. Adams)
- Newsgroups: comp.std.internat
- Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST)
- Date: 9 Jan 1993 18:23:18 GMT
- Organization: MIT Artificial Intelligence Laboratory
- Lines: 42
- Message-ID: <1in56mINNnhq@life.ai.mit.edu>
- References: <1993Jan7.065611.15193@fcom.cc.utah.edu> <1993Jan8.094119.6795@prl.dec.com> <1993Jan9.031217.27425@fcom.cc.utah.edu>
- NNTP-Posting-Host: wheat-chex.ai.mit.edu
- Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages
-
- In article <1993Jan9.031217.27425@fcom.cc.utah.edu> terry@cs.weber.edu (A Wizard of Earth C) writes:
- >In article <1993Jan8.094119.6795@prl.dec.com> boyd@prl.dec.com (Boyd Roberts) writes:
- >>Tell me how I sort on stroke count in Unicode without ``locale-specific
- >>algorithms or hard-coded data''?
-
- If it is Han characters that are being sorted, then Unicode already orders
- them according to KangXi radicals (214) and (additional) stroke count.
- Characters with simplified radicals immediately follow those with the
- traditional radical. Ties in these orders are broken by using the orders
- of KangXi Zidian, Dai Kanwa Ziten, Hanyu Da Zidian, and Dae Jaweon in
- this order. To quote the relevant passage from the Unicode standard,
- volume 2 (p. 14):
-
- "When a character is found in the KanXi Zidian, it follows the KangXi
- Zidian order. When it is not found in the KangXi Zidian and is found in
- Dai Kanwa Ziten, it is given a position extrapolated from the KangXi
- position of the preceding character in Dai Kanwa Ziten. When it is not
- found in either KangXi or Dai Kanwa, Hanyu Da Zidian and Dae Jaweon
- dictionaries are consulted in a similar manner."
-
- Total stroke-count and four-corner (et al.) orderings are typically
- used only for input mechanisms. It is trivial to transform Unicode
- Han encodings into any of various weighted indexes for sorting by using
- a simple table lookup.
-
- If anyone is interested in obtaining a Unicode Kanji Database,
- I have one available for interested parties. It contains the following
- information:
-
- Unicode to JISX0208 Mappings
- Onyomi pronunciation(s) in katakana of each Unicode Kanji.
- Kunyomi pronunciation(s) in hiragana of each Unicode Kanji.
- Romaji for each on/kun kana reading (yomi).
- Radical, total stroke count, and added stroke count of each Unicode Kanji.
-
- *Only those Unicode Han characters which are also in JISX0208 are
- present in this database.
-
- Please send mail to <ucjis-request@metis.com> if you would like further
- information.
-
- Glenn Adams
-