NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / std / internat / 1078 < prev next >

Wrap

Internet Message Format | 1993-01-07 | 5.0 KB

Path: sparky!uunet!gatech!enterpoop.mit.edu!eru.mt.luth.se!lunic!sunic!corax.udac.uu.se!Riga.DoCS.UU.SE!andersa From: andersa@Riga.DoCS.UU.SE (Anders Andersson) Newsgroups: comp.std.internat Subject: Re: Let's develop ISO sorting rules Message-ID: <1ifr9sINNd0o@corax.udac.uu.se> Date: 6 Jan 93 23:51:24 GMT References: <1i0vnmINN352@rodan.UU.NET> <8494@charon.cwi.nl> <1i2durINN2pj@rodan.UU.NET> <8496@charon.cwi.nl> <C0Cuz5.2wy@flatlin.ka.sub.org> <1ibmdcEINNooe@uni-erlangen.de> <1993Jan5.222627.29561@jarvis.csri.toronto.edu> <1iev27EINNmc4@uni-erlangen.de> Organization: Uppsala University, Sweden Lines: 87 NNTP-Posting-Host: riga.docs.uu.se In article <1iev27EINNmc4@uni-erlangen.de>, unrza3@cd4680fs.rrze.uni-erlangen.de (Markus Kuhn) writes: > flaps@dgp.toronto.edu (Alan J Rosenthal) writes: > >I believe you. I'm sure that the analogous statement is true for French. But > >I'm almost as sure that Swedish readers won't want "a-circle" to be anywhere > >near "a". I think your flexibility on this is a language-relative phenomenon. > > This would be really bad news, could someone from Sweden coment on this? I can support Alan's assumption. The three common Swedish letters that are the issue here are _always_ sorted after Z when done here properly. The only place where I've seen them occur elsewhere is in foreign-made gazetteers (such as the index of a world atlas), where it wouldn't make sense to convene readers of Swedish in particular. This of course doesn't mean that we would be totally lost in the dark if a foreign sorting method were forced down our throats, but I'm sure a lot of people would complain about it if they were told it was for the benefit of computer standardization... I can see that A and A-ring have a common graphical component, but so what? We consider them different. A Swedish typist might consider 'l' and '1' equivalent, but not A and A-ring. The situation is the same in Finland, Norway and Denmark, though the relative positions of (and the actual glyphs used for) the three vowels differ somewhat. When we sort Norwegian and Danish words in a Swedish context, we usually regard their vowels as equivalent to ours based on phonetics (i.e. Danish AE ligature = = Swedish A-diaeresis). Further, W is sorted as V, and U-diaeresis as Y. Your suggested rule does apply to most other accented and special letters of the Latin alphabet, though, whenever they occur in a Swedish context (most often E-acute, and in proper names of people). The original language of the sorted word does not matter to us (I don't think it does to anybody); O-diaeresis comes last in the alphabet regardless of whether it belongs to a German, Hungarian, or Turkish name. > And the following rule may be understood by every user within less then > 10 seconds: > > sort latin characters with diacritics (e.g. "a, 'a, ^a, ...) near their > pure latin version a. The issue is not whether the rule is understood, but whether it's accepted. It may not turn into a political issue like the one about nuclear power, but people generally prefer doing things the way they have always done them. Btw, where would you put the AE ligature? With a or with ae? > I'd like to discuss here, where the cyrillic, greek etc. letters > should be included. The positions that correspond to the latin letters > used in international transcription systems might be a good starting > point, if this is possible while preserving the ordering of e.g. the > greek alphabet. If you take the Latin alphabet as your frame of reference, then it's not possible to preserve the alphabetic order of either Cyrillic (A, B, V, G, D...) or Greek (A, B, G, D, E...) letters. However, Cyrillic appears to have more in common with Greek than with Latin. Maybe we should settle for Greek order? :-) > With out it, there will terrible > ISO 10646 sorting methods based on the Unicode code number of each character > become common practice!!! The definition of these groups might perhaps also be > useful for case-and-diacritic-invariant searching, because case-only-invariant > searching is only half a solution in Unicode that was ok with ASCII. Why bother, really? If you need a sorting order that is the same all over the world, for the purpose of building a database where ISO 10646 strings are used as keys, then simply use code point order. We have done that with ASCII for decades; it's not pretty, but it's probably not intended for human consumption anyway. If, on the other hand, the purpose is to produce a human-readable index or something, then strive to accomodate the human, which means using the user's preferred sorting order. If the index is to be printed once and not changeable thereafter, and to be used by many users, decide upon one existing natural language, and sort according to its rules (English has worked fine for the gazetteers I've seen). Do you have an example of where your method would be used? -- Anders Andersson, Dept. of Computer Systems, Uppsala University Paper Mail: Box 325, S-751 05 UPPSALA, Sweden Phone: +46 18 183170 EMail: andersa@DoCS.UU.SE