home *** CD-ROM | disk | FTP | other *** search
- Submitted-by: mskuhn@immd4.informatik.uni-erlangen.de (Markus Kuhn)
-
- How UCS-2 files have to be handeled under future OS versions (e.g. UNIX)
- seems to be quite obvious:
-
- - Every UCS-2 file begins with feff. If it begins with fffe, than library
- routines will activate a 'byte order swap mode' that corrects the
- data from an otherendian machine.
-
- - In this way, every UNIX tool (cc, cat, ...) can easily determine,
- how the file has to be interpreted, because everything starting
- with something else is considered to be an 8-bit Latin 1 encoded
- file (if it is interpreted as a 'text file' at all).
-
- But how may UCS-4 files be identified? Do they always begin with 0000feff
- and are converted if they begin with fffe0000 or other permutations?
- Does ISO 10646 say anything about this or will any future POSIX extension do?
-
- It should be not too complicated to develop C library routines that
- are based on new types (lets call them ucs2_t or ucs4_t) that hide
- the 8-bit vs. UCS-2 difference completely from the programmer.
- With these things once spezified writing 16-bit orientated operating systems
- and applications should be quite simple. I think we need a standard for
- this NOW, otherwise UCS-2 files won't be as simple to handle as ASCII
- files.
-
- What do you think?
-
- Markus
-
- --
- Markus Kuhn, Computer Science student -=-=- University of Erlangen, Germany
- Internet: mskuhn@immd4.informatik.uni-erlangen.de | X.500 entry available
- -A distributed system is one in which the failure of a computer you didn't-
- -even know existed can render your own computer unusable. (Leslie Lamport)-
-
-
- Volume-Number: Volume 28, Number 102
-
-