NetNews Usenet Archive 1993 #3

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #3 / NN_1993_3.iso / spool / comp / std / internat / 1255 < prev next >

Wrap

Internet Message Format | 1993-01-21 | 2.5 KB

Path: sparky!uunet!mcsun!sun4nl!cwi.nl!dik From: dik@cwi.nl (Dik T. Winter) Newsgroups: comp.std.internat Subject: Re: Cleanicode Message-ID: <8689@charon.cwi.nl> Date: 21 Jan 93 02:34:48 GMT References: <C138zr.r3@poel.juice.or.jp> <ISHIKAWA.93Jan20182546@ds5200.personal-media.co.jp> Sender: news@cwi.nl Organization: CWI, Amsterdam Lines: 35 In article <ISHIKAWA.93Jan20182546@ds5200.personal-media.co.jp> ishikawa@personal-media.co.jp writes: > Why not unify Latin/Cyrillic/Greek 'A'? This simple question also is > the cause of uncomfortable feeling many Japanese programmers seem to > have (including myself). > I understand that. I think some unification might be done with the LCG glyphs. But the LCG scripts have the feature that each glyph comes in two forms: majuscule and minuscule. The distinction between the two forms is very small, in many cases it does not matter whether the majuscule form or the minuscule form is used (e.g. sorting). But that breaks down unification of Latin/Cyrillic and Greek 'A' because the minuscule form is different. Still worse are examples like 'T' and 'B' where the minuscule form is different for all three. On the other hand, I do not think Unicode is consistent (I do not know for sure, when I tried to buy the book it was sold out). I think that Turkish dotless and dot-having 'I' both share half a code point with the Latin 'I'. My preference would be three (times two) code points: Latin 'I', Turkish 'I' with dot and Turkish 'I' without dot. But I (as a westerner) understand why it is not done. It is impossible to distinguish the majuscule Latin 'I' from the Turkish majuscule dotless 'I'. Which would make it more difficult for the user. On the other hand, as a programmer, I see the difficulty in doing a case insensitive search. With the two Turkish 'I's there will be more false matches unless language is coded also, but again, that makes it more difficult for the user. But I think that unification of those majuscule/minuscule glyphs that are (upto font differences) identical would make sense. This includes Latin/Cyrillic 'A/a', 'J/j' (is the latter included in Cyrillic?) and Latin/Cyrillic/Greek 'O/o'. As I understand it, part of the problems with CJK unification are of a similar nature. While the base character is the same there may be different simplifications. But that is only as far as I understand it. -- dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland home: bovenover 215, 1025 jn amsterdam, nederland; e-mail: dik@cwi.nl