home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.ulg.ac.be
/
2014.06.ftp.ulg.ac.be.tar
/
ftp.ulg.ac.be
/
pub
/
docs
/
iso8859
/
readme2.txt
< prev
Wrap
Text File
|
1998-05-17
|
4KB
|
62 lines
Comments (best understood after reading iso8859.networking.txt)
The documentation in this directory started to be written in the early 90s.
At that time, it was not widely obvious that, and how, text data had to be
translated in communication to cope with different character sets in hosts.
The theory and translation tables have now gained wide acceptance, especially
the translation table for the Macintosh.
There remains a big problem, though.
As clearly stated, the scope of the paper is limited to 8-bit character sets
on communication lines, and an 8-bit character set is limited by nature.
The best understood example is the case of the Macintosh.
I have received various comments that Macintosh translation could be different.
These comments are perfectly right: there are billions of ways to make another
translation table just as good as mine because a large part of it is arbitrary.
The comments always related to the arbitrary part, only designed to be unique
so that Macintosh to Macintosh communication doesn't alter data.
Never a word about the other part.
So, guess why...
The reason is that ISO 8859-1 is a well established communication standard and
hence that no one questions the part of the Macintosh table adhering to it.
The problem is that there simply is no undisputable way to transmit the rest of
the Macintosh characters on a communication line.
Oh yes, as strange as it may sound, there is still no way for a Macintosh to be
a full network computer regarding character sets by the end of this century.
For example, there is an "oe" character both on the Macintosh and on Windows
(ligature oe is a single character).
No one in the world will ever agree on what code point must be used on the
communication line to send this character from the Macintosh to Windows or
conversely unless something other than ISO 8859-1 is used on the communication
line. The UNAVOIDABLE conclusion is that a character set wider than 8 bits HAS
to be used in communication for the Macintosh to become a network computer.
This wider network character set IS there to be used in the form of ISO 10646,
aka its subset called UNICODE, with their encodings. If THAT were used, no one
would ever think of wondering how a Macintosh must transmit "oe". And this
holds for any character of any character set that can be used on anu computer.
When discussion was going on about MIME's encoding of character sets, I stated
loudly that it was a key matter to use 10646 exclusively in mail. Alas, 10646
was regarded not ready, and a menagerie of 8-bit character sets resulted.
I've been happy to volunteer and contribute the evolution of international
characters transmission, but doing that with an 8-bit character set has soon
been a pain in my heart because I knew that I was slowing down the emergence of
the only true solution.
So, before sending me comments about my 8-bit Mac translation, please think
twice if you wouldn't prefer using ISO 10646, and maybe send comments to those
whose real job is to design communication and ask them for ISO 10646.
People from the ISO standards have done a extremely good job with 10646.
Others like authors of Web browsers have put it to good use.
But translation is normally not an application feature, but part of the
presentation layer of the OSI model, and hence of the system, and the grand
solution will be be put together when ftp will translate text as easily as any
other application by invoking the same system primitives.
Heartfelt thanks for your interest,
AndrΘ (C3A9, not E9), May 1998.