home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.std.internat:643 soc.culture.turkish:9799 soc.culture.nordic:5580
- Newsgroups: comp.std.internat,soc.culture.turkish,soc.culture.nordic
- Path: sparky!uunet!mcsun!sunic!corax.udac.uu.se!Riga.DoCS.UU.SE!andersa
- From: andersa@Riga.DoCS.UU.SE (Anders Andersson)
- Subject: Latin unification in ISO 10646
- Message-ID: <1992Sep9.163417.8803@corax.udac.uu.se>
- Followup-To: comp.std.internat
- Sender: news@corax.udac.uu.se
- Organization: Uppsala University, Sweden
- References: <1992Sep7.195212.2614@boole.uucp> <HAAVARDF.92Sep8012952@gluon.uio.no> <TT.92Sep9114439@tarzan.jyu.fi>
- Date: Wed, 9 Sep 1992 16:34:17 GMT
- Lines: 79
-
- [Note move of thread from soc.culture.nordic to comp.std.internat, as well
- as a single hint to readers of soc.culture.turkish in case of interest.]
-
- In article <TT.92Sep9114439@tarzan.jyu.fi>, tt@tarzan.jyu.fi (Tapani Tarvainen) writes:
- > In article <1992Sep8.160511.1976@corax.udac.uu.se> andersa@Riga.DoCS.UU.SE (Anders Andersson) writes:
- > >The Turkish alphabet is different here (and more consistent, in my
- > >opinion), as it has two different vowels 'i'; one with dot and the
- > >other without (both letters of course appear in upper- and lowercase).
- > >I'm afraid even ISO 10646 fails to support them properly...
- >
- > I'm fairly sure even some less ambitious ISO character set
- > (probably 8859-n, where n>1) supports Turkish completely, including
- > the dotless i (treated as a separate character).
-
- I suggest we look at ISO 8859-3 (which I understand is the official name
- for Latin Alphabet Nr 3) for reference, as it claims to support Turkish.
- In the following, I'm disputing the 'completeness' of that support:
-
- Latin-3 contains among other, mostly southern European, characters
-
- 0xA9 capital letter I with dot above, and
- 0xB9 small letter i without dot above.
-
- Of course, these are supposed to be used in conjunction with the
- 'normal' ASCII characters of the LH part of the table, in particular
-
- 0x49 (Latin) capital letter I, and
- 0x69 (Latin) small letter i,
-
- to make up the two different kinds of 'i' used in Turkish, each in
- upper- and lowercase. From a mere typographic standpoint (having a
- unique code for each visually distinguishable glyph), I consider
- this support complete.
-
- Programmers are used to being able to perform case conversion on
- letters of the ASCII table by simply adding or subtracting a certain
- constant to the character code, given that the code is within a
- particular range (A-Z or a-z). With later ISO standards, this is
- not quite such a simple task due to the sometimes ad-hoc layout of
- lowercase letters with respect to corresponding uppercase letters
- (examples available upon request), but it would still be possible
- using tables showing the relationship.
-
- However, since the same character code is now used for both Latin
- capital 'I' and Turkish capital dotless 'I', case conversion is no
- longer a trivial matter. Consider TO_LOWER(TO_UPPER(dotless 'i')).
- It ought to be symmetric, but what's the result?
-
- Is it somehow understood that automatic case conversion of letters
- of the Latin, Greek and Cyrillic alphabets (and possibly others) is
- beyond the scope of ISO character standards, or is this just an odd
- case having been overlooked? Judging from the little I've seen of
- ISO 10646, it contains no better support for Turkish 'i' variants
- than Latin-3 does (see positions 0x0130 and 0x0131 in UCS-2).
-
- My proposal: Add two specifically Turkish letters to ISO 10646,
- one capital 'I' without dot and one small 'i' with dot, and consider
- them different from the Latin 'I' and 'i'. I have no formal
- relationship with any standardization body, so I'll have to leave
- this proposal for any interested party to bring it up in the proper
- forum.
-
- Latin and Cyrillic capital 'M' look the same, while the small forms
- don't. Those capital 'M' letters have different codes in ISO 10646,
- though maybe for reasons of systematic tabulation rather than in order
- to support case conversion. We did away with the old typewriter
- unification of '1' and 'l' long ago (and the same for '0' and 'O',
- if that ever was a problem). Is Turkish 'i' vs. Latin 'i' in that
- different a ballpark?
-
- Are there word processors today that know how to case-convert a word
- containing Turkish letters? What are Turkish typists used to?
-
- Are there other letters in other alphabets suffering from similar
- unification problems in current ISO standards?
- --
- Anders Andersson, Dept. of Computer Systems, Uppsala University
- Paper Mail: Box 325, S-751 05 UPPSALA, Sweden
- Phone: +46 18 183170 EMail: andersa@DoCS.UU.SE
-