home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.std.internat
- Path: sparky!uunet!psinntp!ficc!peter
- From: peter@ferranti.com (peter da silva)
- Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST)
- Message-ID: <id.N4FW.SAC@ferranti.com>
- Keywords: ISO10646 Unicode
- Organization: Xenix Support, FICC
- References: <1i13rrINNars@rodan.UU.NET> <id.68CW.A16@ferranti.com> <1i2m57INN4vr@rodan.UU.NET>
- Date: Mon, 4 Jan 1993 17:35:38 GMT
- Lines: 43
-
- In article <1i2m57INN4vr@rodan.UU.NET> avg@rodan.UU.NET (Vadim Antonov) writes:
- > In article <id.68CW.A16@ferranti.com> peter@ferranti.com (peter da silva) writes:
- > >In article <1i13rrINNars@rodan.UU.NET> avg@rodan.UU.NET (Vadim Antonov) writes:
- > >> We were talking about lexicographical sorting, not abouth phonetics.
-
- > >But lexicographic sorting (actually, lexicograhic ordering) is a minor part of
- > >this. Most sorting computers do is algorithmic ordering, to optimise some
- > >combination of operations on data structures (searching, for example). The
- > >character set is irrelevant there.
-
- > Wrong-o. Nobody does numerical sorts since invention of secondary
- > indices.
-
- I'm afraid you'll have to translate this sentence. It parses as valid English,
- and uses appropriate sentence structure and terminology for the context, but
- seems almost completely irrelevant to anything I said. For efficient lookup
- the index needs to be ordered in some fashion whether it's a flat table or a
- tree. Unless you find hashing adequate for all possible purposes, perhaps?
-
- > The problem is not in searching -- the problem is in presenting
- > the information and in regular expressions ([a-z] - does it include "o?)
-
- No. The regular expression '[a-z]' is a side effect of the fact that ASCII
- happens to be in numerical order for the base alphanumeric characters used
- in English computer text. It's invalid for EBCDIC, for example. The POSIX
- alternative for what you *mean* here is something like '[:lower:]', and I
- would hope that for the long term this be extended to specify localization
- information, for example '[:lower/english/usa:]' so that it would allow
- loan words like clich'e or na"ive, or names like 'da Silva' (with a non
- blank space between the 'a' and 'S').
-
- Sure, it's a mouthful, so you'd do this:
-
- setenv LOWER '[:lower/english/usa:]'
-
- You need to do that for scripts, anyway, since you want your program to
- continue to work when it's downloaded from some site in Finland and used
- in London or Beirut.
- --
- Peter da Silva `-_-'
- Ferranti International Controls Corporation 'U`
- Sugar Land, TX 77487-5012 USA
- +1 713 274 5180 "Zure otsoa besarkatu al duzu gaur?"
-