home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.std.internat:872 news.admin.misc:787
- Path: sparky!uunet!cs.utexas.edu!uwm.edu!caen!sol.ctr.columbia.edu!ira.uka.de!news.belwue.de!math.fu-berlin.de!fauern!uni-erlangen.de!not-for-mail
- From: unrza3@cd4680fs.rrze.uni-erlangen.de (Markus Kuhn)
- Newsgroups: comp.std.internat,news.admin.misc
- Subject: Re: 8-bit representation, plus an X problem
- Date: 16 Dec 1992 15:32:56 +0100
- Organization: Regionales Rechenzentrum Erlangen
- Message-ID: <1gnemoEINN3qo@uni-erlangen.de>
- References: <171@complex.complex.is> <eaRsVB2w165w@blues.kk.sub.org> <BzBJ0I.34s@ra.nrl.navy.mil>
- Reply-To: mskuhn@immd4.informatik.uni-erlangen.de
- NNTP-Posting-Host: cd4680fs.rrze.uni-erlangen.de
- Lines: 79
- Keywords: ISO8859-1 CP850 fidonet gateway
-
- atkinson@itd.nrl.navy.mil (Randall Atkinson) writes:
-
- > Shouldn't really take any longer if done correctly. Certainly the
- >good folks at AT&T Bell Labs have shown that it works quite nicely.
- >The Bell Labs encoding for ISO-10646 appears to be the leading
- >contender for adoption by MIME as the conventional way to encode
- >ISO-10646 in MIME email -- and it would be just fine with either 7-bit
- >or 8-bit transport (if I understand Andrew Hume correctly).
-
- The Plan 9 UTF format is clearly an 8-bit format. So we would get the following
- transformation pipeline:
-
-
- 1. The Bitmaps of the characters encoded in a local charset
- with local codes (e.g. Latin 1, IBM CP850, ...)
-
- |
- V ISO 10646
-
- 2. The numbers in the rage 0 to 65535
-
- |
- V one of the proposed UTFs (e.g. from P9)
-
- 3. A stream of bytes (8-bit!!!)
-
- |
- only for mail -> V an 8-bit transparent 7-bit encoding
-
- 4. A printable stream suitable for historical
- mail systems.
-
- The receiving system has to perform the reverse steps. (Yes, I know
- that I used the wrong terminology and that there are ISO standards defining
- a much more detailed reference model of the character set universe,
- but a poor CS student can't even pay the ISO documents that define
- them, so how could I read them ... :-( )
-
- The only STANDARD which we will have in a few months is a mapping
- between the characters (bitmaps) to a space of numbers in a huge table.
- Unfortunately, ISO 10646 doesn't tell us a lot about how we should deal
- with these numbers. We still have to define the second and third step
- in the above pipeline. I heard roumors, that the Plan 9 UTF version will
- be added as UTF-2 to an ISO 10646 annex, can anyone confirm this? Perhaps
- we really should wait as suggested by Randall Atkinson for the final
- version of ISO 10646 before starting to include anything new (e.g.
- a 16-bit encoding) in MIME.
-
- Will there be other ISO standards (e.g. POSIX, etc.) that define
- precisely how to handle the numbers defined by ISO 10646? The Plan 9
- stuff seems to be very promising.
-
- >Moreover,
- >the transported character set and the displayed character set need not
- >be the same.
-
- Of course. But USENET users will have difficulties in understanding this,
- as there isn't currently anything like a presentation layer (as defined in the
- OSI reference model) that performs the conversion between the local
- representation (e.g. I prefer Latin 1 files) and the encoding used on
- the network (e.g. any 8-bit encoding on news and 7-bit encoding on
- email). This won't cause any news/email interworking troubles, as
- every text has to be converted anyway. (Perhaps in the far future of
- ISO 10646 terminals and editors, the presentation layer (e.g. realised
- by software converting MIME to/from a local format) might disappear again,
- as then the local format and the network format will be the same like
- this is today the case with US-ASCII). Yes, MIME is a presentation
- layer protocol for USENET and SMPT.
-
- Such a model should perhaps more clearly be explained in future MIME
- extending standards.
-
- Markus
-
- --
- Markus Kuhn, Computer Science student -=-=- University of Erlangen, Germany
- Internet: mskuhn@immd4.informatik.uni-erlangen.de | X.500 entry available
- ----- Anyone participating in the use of MS-DOS, Heroin or Cocaine is -----
- ---- simply not getting the most out of life possible. (Brian Downing) ----
-