home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!uunet!not-for-mail
- From: mohta@necom830.cc.titech.ac.jp (Masataka Ohta)
- Newsgroups: comp.std.unix
- Subject: Re: ISO 10646 files
- Date: 18 Aug 1992 21:29:55 -0700
- Organization: Tokyo Institute of Technology
- Lines: 43
- Sender: sef@ftp.UU.NET
- Approved: sef@ftp.uucp (Moderator, Sean Eric Fagan)
- Message-ID: <16sio3INN3lr@ftp.UU.NET>
- References: <16p6bmINNs1l@ftp.UU.NET>
- NNTP-Posting-Host: ftp.uu.net
- X-Submissions: std-unix@uunet.uu.net
-
- Submitted-by: mohta@necom830.cc.titech.ac.jp (Masataka Ohta)
-
- In article <16p6bmINNs1l@ftp.UU.NET>
- mskuhn@immd4.informatik.uni-erlangen.de (Markus Kuhn) writes:
-
- >How UCS-2 files have to be handeled under future OS versions (e.g. UNIX)
- >seems to be quite obvious:
- >
- > - Every UCS-2 file begins with feff. If it begins with fffe, than library
- > routines will activate a 'byte order swap mode' that corrects the
- > data from an otherendian machine.
-
- What?
-
- How can 'cat' know the file being read is a text file?
-
- Do you want to introduce an infamous "FILE TYPE" to UNIX?
-
- > - In this way, every UNIX tool (cc, cat, ...) can easily determine,
- > how the file has to be interpreted, because everything starting
- > with something else is considered to be an 8-bit Latin 1 encoded
- > file (if it is interpreted as a 'text file' at all).
-
- What if a 8-bit Latin 1 file begins with 0xfffe?
-
- Code points 0xfe and 0xff represent valid Latin 1 characters.
-
- 0xfe: LATIN SMALL LETTER THORN
- 0xff: LATIN SMALL LETTER Y WITH DIAERESIS
-
- Can you still say "quite obvious"?
-
- >But how may UCS-4 files be identified? Do they always begin with 0000feff
- >and are converted if they begin with fffe0000 or other permutations?
- >Does ISO 10646 say anything about this or will any future POSIX extension do?
-
- It is one of the well known defects of ISO 10646, which the standardizing
- committee simply neglected.
-
- Masataka Ohta
-
-
- Volume-Number: Volume 29, Number 5
-