home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.std.internat:900 news.admin.misc:835
- Path: sparky!uunet!pipex!doc.ic.ac.uk!doc.ic.ac.uk!usenet
- From: rap@news (Ross Paterson)
- Newsgroups: comp.std.internat,news.admin.misc
- Subject: Re: 8-bit representation, plus an X problem
- Date: 18 Dec 1992 11:00:20 GMT
- Organization: Dept. of Computing, Imperial College, University of London, UK.
- Lines: 28
- Message-ID: <1gsb04INNpqi@frigate.doc.ic.ac.uk>
- References: <24433@alice.att.com>
- Reply-To: rap@doc.ic.ac.uk
- NNTP-Posting-Host: peaberry.doc.ic.ac.uk
-
- From article <24433@alice.att.com>, by andrew@alice.att.com (Andrew Hume):
- | UTF-1 does #00-#9f in one byte, #a0-#4015 in two bytes, and
- | #4106-#ffff in three bytes. FSS-UTF does #00-#7f in one byte, #80-#7ff
- | in two bytes and #800-#ffff in three bytes. [...]
- |
- | certainly, the characters between #800 and #4015 will take
- | 50% more space in FSS-UTF than UTF-1. from our file command (which
- | guesses at languages), this would seem to include Devanagari, Bengali,
- | Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Thai, Lao,
- | Tibetan, Georgian, Japanese (hiragana etc), chinese (some) and korean.
- | (with the exceptions of the CJK scripts and perhaps georgian, there
- | is no measureable email traffic in these scripts.)
-
- Yes, UTF-1 gets the ALPHABET and SYMBOL zones, and the important part of
- the CJK AUXILIARY zone, in 2 bytes. FSS-UTF goes up to 3 bytes just before
- it gets to the alphabets of Asia beyond the Middle East. For this you get
- synchronization, but is it worth it?
-
- | and even for these scripts, i would guess connectivity (in the
- | sense of being able to send/receive messages at all) matters more
- | to those folks than the increase in space (how much of your disk is
- | mail?) or decrease in bandwidth (in most messages i see, headers are
- | 30-50% of the message).
-
- They can get their connectivity by specifying an existing 7-bit standard
- in a MIME header.
-
- How much of a Plan 9 disk is text?
-