NetNews Usenet Archive 1993 #3

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #3 / NN_1993_3.iso / spool / comp / std / internat / 1276 < prev next >

Wrap

Internet Message Format | 1993-01-22 | 1.8 KB

Path: sparky!uunet!cs.utexas.edu!sun-barr!sh.wide!wnoc-tyo-news!cs.titech!titccy.cc.titech!necom830!mohta From: mohta@necom830.cc.titech.ac.jp (Masataka Ohta) Newsgroups: comp.std.internat Subject: Re: Radicals Instead of Characters Message-ID: <2791@titccy.cc.titech.ac.jp> Date: 22 Jan 93 09:44:54 GMT References: <1j8kroINNf59@flop.ENGR.ORST.EDU> <ISHIKAWA.93Jan18203811@ds5200.personal-media.co.jp> <1j9sfpINN46t@life.ai.mit.edu> <1jfgq1INNqmn@flop.ENGR.ORST.EDU> Sender: news@titccy.cc.titech.ac.jp Organization: Tokyo Institute of Technology Lines: 36 In article <1jfgq1INNqmn@flop.ENGR.ORST.EDU> crowl@jade.CS.ORST.EDU (Lawrence Crowl) writes: >The question I was asking was "can you _identify_ a han/kanji character >based on a sequence of radicals" No, you can't. Radicals are for indexing only. The rest of the character has its own complex shape. >and "would it be reasonable to encode >han/kanji on that basis". Such encoding is too lengthy. >Agreed. However, there is no natural size for tables. Table sized of >4000 are much cheaper than table sizes of 64000. If you use radical based encoding, it makes everything complex. Moreover, you will have to have sixteen 4000 entry tables which is as large as a single 64000 entry table. >But, can sixteen bits represent _all_ historical Han characters _and_ >the historical texts of all other languages? My guess is 16 bits can >_if_ Han characters are coded as radicals, Maybe nor may not be. Many complex Han characters are just unique. >If the level 1 Han characters were also coded as radicals where >possible, you'd have a coding system like what I was proposing. Of >course, the charactes might be several radicals long. BTW, from the view point of programmers, combining characters are just unusable. Masataka Ohta