home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.std.internat:893 news.admin.misc:818
- Path: sparky!uunet!spool.mu.edu!uwm.edu!zaphod.mps.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!att!allegra!alice!andrew
- From: andrew@alice.att.com (Andrew Hume)
- Newsgroups: comp.std.internat,news.admin.misc
- Subject: Re: 8-bit representation, plus an X problem
- Summary: lets not get carried away here
- Message-ID: <24433@alice.att.com>
- Date: 17 Dec 92 16:01:33 GMT
- Article-I.D.: alice.24433
- References: <24426@alice.att.com| <1gpruaINNhfm@frigate.doc.ic.ac.uk>
- Organization: AT&T Bell Laboratories, Murray Hill NJ
- Lines: 34
-
- In article <1gpruaINNhfm@frigate.doc.ic.ac.uk>, rap@news (Ross Paterson) writes:
- > It's hard to imagine that FSS-UTF will be popular with users of those
- > alphabets (all originating in Asia, BTW) whose letters are going to
- > take up 3 bytes, while they take up 2 in UTF-1 and 7 bits or so in
- > existing standards.
-
-
- i wish folks would just come out and say the character sets they mean.
- if you mean kanji (shift-jis), or chinese (GB2312-80 or Big 5), or korean,
- then the existing practice is two bytes per character.
-
- UTF-1 does #00-#9f in one byte, #a0-#4015 in two bytes, and #4106-#ffff
- in three bytes. FSS-UTF does #00-#7f in one byte, #80-#7ff in two bytes and
- #800-#ffff in three bytes. given the unified han characters now start at #4e00,
- i'd say FSS-UTF and UTF-1 had pretty much the same performance here.
-
- certainly, the characters between #800 and #4015 will take 50% more
- space in FSS-UTF than UTF-1. from our file command (which guesses at languages),
- this would seem to include Devanagari, Bengali, Gurmukhi, Gujarati, Oriya,
- Tamil, Telugu, Kannada, Malayalam, Thai, Lao, Tibetan, Georgian,
- Japanese (hiragana etc), chinese (some) and korean. (with the exceptions
- of the CJK scripts and perhaps georgian, there is no measureable email
- traffic in these scripts.)
-
- and even for these scripts, i would guess connectivity (in the sense
- of being able to send/receive messages at all) matters more to those folks
- than the increase in space (how much of your disk is mail?) or decrease in
- bandwidth (in most messages i see, headers are 30-50% of the message).
-
- in any case, i agree there is a space overhead. its up to users
- to figure out if its worth it; i just wanted to clear up factual errors.
-
-
- andrew
-