NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / std / internat / 1034 < prev next >

Wrap

Internet Message Format | 1993-01-05 | 2.3 KB

Path: sparky!uunet!wupost!gumby!yale!mintaka.lcs.mit.edu!ai-lab!wheat-chex!glenn From: glenn@wheat-chex.ai.mit.edu (Glenn A. Adams) Newsgroups: comp.std.internat Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST) Date: 5 Jan 1993 03:41:46 GMT Organization: MIT Artificial Intelligence Laboratory Lines: 34 Message-ID: <1ib01qINNfaf@life.ai.mit.edu> References: <1993Jan1.094759.8021@fcom.cc.utah.edu> <1i2k09INN4hl@rodan.UU.NET> <id.E1FW.PX5@ferranti.com> NNTP-Posting-Host: wheat-chex.ai.mit.edu Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages In article <id.E1FW.PX5@ferranti.com> peter@ferranti.com (peter da silva) writes: >You have identified two problems with Unicode and ISO 10646: case conversion >and lexical ordering. I do not agree. I believe that Vadim thought that similar glyphs were unified irrespective of script (e.g., Latin T vs Cyrillic T). Since this is incorrect, his claim regarding case conversion is unfounded (though I must admit their are case conversion problems which I didn't see mentioned, e.g., Turkish i/DOTTED CAPITAL I -- this is an issue though a bit different than was claimed). As for lexical ordering, no character set can solve this problem unless it is defined for use with a single writing system. A universal character set abstracts the differences between writing systems (i.e., languages) by encoding scripts; thus no universal character set which encodes scripts can simultaneously define all requisite lexical orderings (though it might choose one ordering arbitrarily). Consequently, I would say that these are not problems with Unicode or 10646; instead, they are problems having to do with text processing in general. And, no universal character set which unifies scripts can solve these problems; furthermore, I would argue that a universal character set which does not unify scripts (i.e., encodings writing systems directly) will not only be hopelessly inefficient (because of the much, much larger encoding space required), but also hopelessly incomplete (because new writing system are being created all the time). Unicode made the correct choice by encoding scripts independently of writing system (language/orthography); it also made the correct choice in determining that the problem of lexical ordering is a higher-level problem, not to be solved by a character set. Glenn Adams