home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!uunet!not-for-mail
- From: david@mks.com (David Rowley)
- Newsgroups: comp.std.unix
- Subject: Re: ISO 10646 files
- Date: 18 Aug 1992 14:19:06 -0700
- Organization: Mortice Kern Systems Inc., Waterloo, Ontario, CANADA
- Lines: 29
- Sender: sef@ftp.UU.NET
- Approved: sef@ftp.uucp (Moderator, Sean Eric Fagan)
- Message-ID: <16rpgaINNol0@ftp.UU.NET>
- References: <16p6bmINNs1l@ftp.UU.NET>
- NNTP-Posting-Host: ftp.uu.net
- X-Submissions: std-unix@uunet.uu.net
-
- Submitted-by: david@mks.com (David Rowley)
-
- In article <16p6bmINNs1l@ftp.UU.NET> mskuhn@immd4.informatik.uni-erlangen.de (Markus Kuhn) writes:
- >But how may UCS-4 files be identified? Do they always begin with 0000feff
- >and are converted if they begin with fffe0000 or other permutations?
- >Does ISO 10646 say anything about this or will any future POSIX extension do?
-
- Being relatively new to ISO 10646, I believe the intent is to
- use the UCS Transformation Format (Annex F of 10646) as the
- standard external representation format (such as file contents,
- etc.). This multibyte encoding supports both the UCS2 and UCS4
- codeplanes.
-
- Note that UTF and 8-bit Latin 1 (ISO 8859-1) are identical for
- characters 0x00 to 0x9f. Codepoints above 0x9f are used to
- introduce the multibyte sequences.
-
- One problem, though, is that the UTF description in ISO 10646 is
- informative, rather than normative. With this being the case
- can implementors safely point to UTF as a standard encoding?
-
- --
- David Rowley
- Mortice Kern Systems Inc.
- 35 King Street North, Waterloo, ON, Canada N2J 2W9
- 519/884-2251, FAX 519/884-8861, david@mks.com
-
-
- Volume-Number: Volume 29, Number 3
-