home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!enterpoop.mit.edu!ira.uka.de!fauern!uni-erlangen.de!not-for-mail
- From: unrza3@cd4680fs.rrze.uni-erlangen.de (Markus Kuhn)
- Newsgroups: comp.std.internat
- Subject: Re: Let's develop ISO sorting rules
- Date: 5 Jan 1993 18:29:49 +0100
- Organization: Regionales Rechenzentrum Erlangen
- Message-ID: <1icgidEINN4v3@uni-erlangen.de>
- References: <8496@charon.cwi.nl> <C0Cuz5.2wy@flatlin.ka.sub.org> <1ibmdcEINNooe@uni-erlangen.de> <1993Jan5.150305.755@klaava.Helsinki.FI>
- Reply-To: mskuhn@immd4.informatik.uni-erlangen.de
- NNTP-Posting-Host: cd4680fs.rrze.uni-erlangen.de
- Lines: 67
- Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages
-
- wirzeniu@klaava.Helsinki.FI (Lars Wirzenius) writes:
-
- >mskuhn@immd4.informatik.uni-erlangen.de writes:
- >>THE SOLUTION IS SIMPLE ONCE YOU ACCEPTED THAT INCOMPATIBILITY WITH
- >>EXISTING HISTORICAL RULES IS NECESSARY!
-
- >I don't accept it. Computers have to change to please users, not the
- >other way around.
-
- The requirements of users change quickly once they see the solutions offered by
- computers, especially, if a new solution offered by a computer has
- great advantages over the traditional methods people require only
- because they know no alternatives. In some situations, the paradigm
- 'computers have to change to please users' is a dead end, if the users
- don't agree if you ask them the first time and if you implement all
- solutions that have been explained to you. (I have had bad experiences
- with we-offer-all-possible-solutions-systems recently, I just say OSI).
- Simple and efficient solutions have always pleased users. This will
- also be true for many internationalization issues.
-
- >There is no sorting order that will satisfy everybody. Thus it is not
- >a good idea to embed one into the character set and make everybody use
- >it.
-
- My vision is NOT a sorting order that is embedded in the character set.
- That would be too trivial, of course. The Unicode developpers had good
- reasons to embed one into the code table. No, I have a slightly more clever
- algorithm in mind, that will do 2 passes:
-
- 1. ignore punctuations etc. and group letters together before
- comparing the strings.
-
- 2. No. 1 will not offer a total order, which should be supplied by a
- beautiful sorting standard. So if 1 fails than compare the strings
- completely without throwing any trivial information away. Rule 2 must not
- conflict with rule 1, the partial ordering must only be completed.
-
- I am playing around with an algorithm that works this way since a few
- days, and the results are very promising and easy to understand intuitively.
- I believe e.g., that my method is far superior to the complex and
- often non-deterministic rules in the 60 year old German DIN 5007 standard.
- In an elegant implementation, 2 passes are not necessary, but the
- algorithm is easier to understand if explained with two passes. It might
- even be described with 3 passes ... :-)
-
- The method deals fine with punctuations in the strings (e.g. in
- bibliographic references and person names), is pretty efficient and
- easy to implement. Word lists produced by my algorithm are pretty easy
- to scan for human eyes. The solution is much more general than simple
- upcase conversion before lexicographic character code comparison which
- is often used today with US-ASCII.
-
- I still don't know, whether I should post the algorithm here, or whether
- I should write a paper or techreport first, as it is much more promising
- than all character code based lexical orderings that have been proposed here
- so far.
-
- No, the answer that NO internationally suitable sorting algorithm (not
- only sorted character table!) is possible is too simple in my eyes.
-
- Markus
-
- --
- Markus Kuhn, Computer Science student -=-=- University of Erlangen, Germany
- Internet: mskuhn@immd4.informatik.uni-erlangen.de | X.500 entry available
- ----- Anyone participating in the use of MS-DOS, Heroin or Cocaine is -----
- ---- simply not getting the most out of life possible. (Brian Downing) ----
-