home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!wupost!gumby!yale!mintaka.lcs.mit.edu!ai-lab!wheat-chex!glenn
- From: glenn@wheat-chex.ai.mit.edu (Glenn A. Adams)
- Newsgroups: comp.std.internat
- Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST)
- Date: 5 Jan 1993 03:41:46 GMT
- Organization: MIT Artificial Intelligence Laboratory
- Lines: 34
- Message-ID: <1ib01qINNfaf@life.ai.mit.edu>
- References: <1993Jan1.094759.8021@fcom.cc.utah.edu> <1i2k09INN4hl@rodan.UU.NET> <id.E1FW.PX5@ferranti.com>
- NNTP-Posting-Host: wheat-chex.ai.mit.edu
- Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages
-
- In article <id.E1FW.PX5@ferranti.com> peter@ferranti.com (peter da silva) writes:
- >You have identified two problems with Unicode and ISO 10646: case conversion
- >and lexical ordering.
-
- I do not agree. I believe that Vadim thought that similar glyphs were
- unified irrespective of script (e.g., Latin T vs Cyrillic T). Since
- this is incorrect, his claim regarding case conversion is unfounded
- (though I must admit their are case conversion problems which I didn't
- see mentioned, e.g., Turkish i/DOTTED CAPITAL I -- this is an issue
- though a bit different than was claimed).
-
- As for lexical ordering, no character set can solve this problem
- unless it is defined for use with a single writing system. A universal
- character set abstracts the differences between writing systems
- (i.e., languages) by encoding scripts; thus no universal character
- set which encodes scripts can simultaneously define all requisite
- lexical orderings (though it might choose one ordering arbitrarily).
-
- Consequently, I would say that these are not problems with Unicode
- or 10646; instead, they are problems having to do with text processing
- in general. And, no universal character set which unifies scripts
- can solve these problems; furthermore, I would argue that a universal
- character set which does not unify scripts (i.e., encodings writing
- systems directly) will not only be hopelessly inefficient (because
- of the much, much larger encoding space required), but also
- hopelessly incomplete (because new writing system are being created
- all the time).
-
- Unicode made the correct choice by encoding scripts independently
- of writing system (language/orthography); it also made the correct
- choice in determining that the problem of lexical ordering is a
- higher-level problem, not to be solved by a character set.
-
- Glenn Adams
-