home *** CD-ROM | disk | FTP | other *** search
- Submitted-by: enag@ifi.uio.no (Erik Naggum)
-
- Peter da Silva <peter@ferranti.com> writes:
- >In article <16rpgaINNol0@ftp.UU.NET> david@mks.com (David Rowley) writes:
- >> Note that UTF and 8-bit Latin 1 (ISO 8859-1) are identical for
- >> characters 0x00 to 0x9f. Codepoints above 0x9f are used to
- >> introduce the multibyte sequences.
- >
- >That seems strange. 0x80 through 0x9f are all controls, and all the
- >national characters in Latin-1 are in 0xA0 to 0xFF. Why would they allow
- >Latin-1 control codes (CSI, etc) and blow off all the graphics? Are you
- >sure they didn't overload the high control range (0x80 to 0x9f)? That
- >would seem a much more useful encoding.
-
- Character numbers 128 (0x80) through 159 (0x9F) are not used in ISO
- 10646, and are not used in UTF, either. It's highly misleading to claim
- that they are used, since, in fact, they aren't even graphic characters
- in _any_ ISO 4873-conforming coded character set (of which the ISO 8859
- family is an instance), and row 0 of ISO 10646 (but only row 0) conforms
- to ISO 4873 with respect to not populating the control character ranges
- with graphic characters.
-
- ISO 8859-1 characters (i.e. the right half of row 0) are introduced with
- character number 160 (0xA0). Following this "code extension" character
- is a single ISO 8859-1 character with the same character number that the
- character has in ISO 8859-1.
-
- For example, if the original string is (hex) A1 43 61 72 61 6d 62 61 21
- ("!Caramba!" with the first ! up-side down) in ISO 8859-1, it will be
- (hex) A0 A1 43 61 72 61 6d 62 61 21 in ISO 10646 UTF.
-
- Best regards,
- </Erik>
- --
- Erik Naggum | ISO 8879 SGML | +47 295 0313
- | ISO 10744 HyTime |
- <erik@naggum.no> | ISO 10646 UCS | Memento, terrigena.
- <enag@ifi.uio.no> | ISO 9899 C | Memento, vita brevis.
-
-
- Volume-Number: Volume 29, Number 10
-
-