NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / std / internat / 1043 < prev next >

Wrap

Internet Message Format | 1993-01-05 | 3.9 KB

Path: sparky!uunet!enterpoop.mit.edu!ira.uka.de!fauern!uni-erlangen.de!not-for-mail From: unrza3@cd4680fs.rrze.uni-erlangen.de (Markus Kuhn) Newsgroups: comp.std.internat Subject: Re: Let's develop ISO sorting rules Date: 5 Jan 1993 18:29:49 +0100 Organization: Regionales Rechenzentrum Erlangen Message-ID: <1icgidEINN4v3@uni-erlangen.de> References: <8496@charon.cwi.nl> <C0Cuz5.2wy@flatlin.ka.sub.org> <1ibmdcEINNooe@uni-erlangen.de> <1993Jan5.150305.755@klaava.Helsinki.FI> Reply-To: mskuhn@immd4.informatik.uni-erlangen.de NNTP-Posting-Host: cd4680fs.rrze.uni-erlangen.de Lines: 67 Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages wirzeniu@klaava.Helsinki.FI (Lars Wirzenius) writes: >mskuhn@immd4.informatik.uni-erlangen.de writes: >>THE SOLUTION IS SIMPLE ONCE YOU ACCEPTED THAT INCOMPATIBILITY WITH >>EXISTING HISTORICAL RULES IS NECESSARY! >I don't accept it. Computers have to change to please users, not the >other way around. The requirements of users change quickly once they see the solutions offered by computers, especially, if a new solution offered by a computer has great advantages over the traditional methods people require only because they know no alternatives. In some situations, the paradigm 'computers have to change to please users' is a dead end, if the users don't agree if you ask them the first time and if you implement all solutions that have been explained to you. (I have had bad experiences with we-offer-all-possible-solutions-systems recently, I just say OSI). Simple and efficient solutions have always pleased users. This will also be true for many internationalization issues. >There is no sorting order that will satisfy everybody. Thus it is not >a good idea to embed one into the character set and make everybody use >it. My vision is NOT a sorting order that is embedded in the character set. That would be too trivial, of course. The Unicode developpers had good reasons to embed one into the code table. No, I have a slightly more clever algorithm in mind, that will do 2 passes: 1. ignore punctuations etc. and group letters together before comparing the strings. 2. No. 1 will not offer a total order, which should be supplied by a beautiful sorting standard. So if 1 fails than compare the strings completely without throwing any trivial information away. Rule 2 must not conflict with rule 1, the partial ordering must only be completed. I am playing around with an algorithm that works this way since a few days, and the results are very promising and easy to understand intuitively. I believe e.g., that my method is far superior to the complex and often non-deterministic rules in the 60 year old German DIN 5007 standard. In an elegant implementation, 2 passes are not necessary, but the algorithm is easier to understand if explained with two passes. It might even be described with 3 passes ... :-) The method deals fine with punctuations in the strings (e.g. in bibliographic references and person names), is pretty efficient and easy to implement. Word lists produced by my algorithm are pretty easy to scan for human eyes. The solution is much more general than simple upcase conversion before lexicographic character code comparison which is often used today with US-ASCII. I still don't know, whether I should post the algorithm here, or whether I should write a paper or techreport first, as it is much more promising than all character code based lexical orderings that have been proposed here so far. No, the answer that NO internationally suitable sorting algorithm (not only sorted character table!) is possible is too simple in my eyes. Markus -- Markus Kuhn, Computer Science student -=-=- University of Erlangen, Germany Internet: mskuhn@immd4.informatik.uni-erlangen.de | X.500 entry available ----- Anyone participating in the use of MS-DOS, Heroin or Cocaine is ----- ---- simply not getting the most out of life possible. (Brian Downing) ----