home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!usc!howland.reston.ans.net!paladin.american.edu!gatech!concert!rutgers!cmcl2!gauss.cims.nyu.edu!mckenney
- From: mckenney@gauss.cims.nyu.edu (Alan M. McKenney)
- Newsgroups: comp.std.internat
- Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST)
- Summary: German has more than one "alphabetical order"
- Keywords: ISO10646 Unicode German Sorting
- Message-ID: <C0E2H4.B17@cmcl2.nyu.edu>
- Date: 5 Jan 93 16:15:02 GMT
- References: <8494@charon.cwi.nl> <1i2durINN2pj@rodan.UU.NET> <8496@charon.cwi.nl>
- Sender: notes@cmcl2.nyu.edu (Notes Person)
- Organization: Courant Institute, NYU, NY, NY, USA
- Lines: 63
- Nntp-Posting-Host: gauss.cims.nyu.edu
-
- In article <8496@charon.cwi.nl> dik@cwi.nl (Dik T. Winter) writes:
- >In article <1i2durINN2pj@rodan.UU.NET> avg@rodan.UU.NET (Vadim Antonov) writes:
- ....
-
- > > >Moreover, one question: how would you encode the German A-umlaut such that
- > > >it sorts properly (i.e. as if it is the letter combination AE)?
- > >
- > > The sorting order should be strict -- if you have two identical words
- > > with a-umlaut and ae in the middle is it the same word? If it is then
- > > ae IS a variation of a-umlaut and should always be treated as a signle
- > > letter.
- > >
- >I do not think you understand. From the AVON (Amtliches Verzeichnis der
- >Ortnetzkennzahlen) edition 1985, which gives area codes for the places in
- >Germany. The next is a selection of places mentioned ("o is o-umlaut):
- > Modautal
- > M"ockm"uhl
- > ...
- > M"ornsheim
- > Moers
- > M"ossingen
- > ...
- > M"otzingen
- > Mogendorf
- >now come up with a coding that allows this (standard German) sorting.
- [and further mentions that "o is equiv. to oe, but oe is not equiv. to
- "o.]
-
- I'm afraid that Dik Winter, as well-informed as he is, has
- oversimplified things a little. :-) There is not one sorting
- scheme used in Germany, but (at least?) two.
-
- The scheme described by Dik *is* the scheme that is used in
- telephone books, (and generally for proper names?) for very good
- reasons.
-
- It is *not* the sorting that is used in any German dictionary I
- have seen. In dictionaries, a-umlaut is equivalent to a (not ae)
- for sorting purposes, o-umlaut to o, u-umlaut to u, and "sharp s"
- to ss.
-
- This further complicates Vadim Antonov's scheme for having character
- codes encode sorting order: we would have to somehow specify that
- under certain circumstances "a sorts one way, and under others
- it sorts another.
-
- I even seem to recall a third scheme, but I don't remember what
- it was, and it could have been something only used by users
- of English-"speaking" computers.
-
-
- (However, I am inclined to think that totally different alphabets,
- e.g., the Cyrillics and the Latins, should be represented separately.
- I am not familiar with 10646, but some posts here have suggested
- that 10646 uses the same code for all letters that look like, e.g.,
- T -- I can understand someone not being happy with that, if it is
- true. However, I don't have much experience in producing a unified
- character set.)
-
-
- --
- Alan McKenney E-mail: mckenney@cims.nyu.edu (INTERNET)
- Courant Institute,NYU,USA ...!cmcl2!cims.nyu.edu!mckenney (UUCP)
-