home *** CD-ROM | disk | FTP | other *** search
- Xref: sparky comp.std.internat:914 news.admin.misc:861
- Path: sparky!uunet!pipex!bnr.co.uk!uknet!mcsun!Germany.EU.net!incom!kostis!blues!kosta
- From: kosta@blues.kk.sub.org (Kosta Kostis)
- Newsgroups: comp.std.internat,news.admin.misc
- Subject: Re: Data tagging (was: 8-bit representation, plus an X problem)
- Keywords: magic codes, portable data
- Message-ID: <mq62VB1w165w@blues.kk.sub.org>
- Date: 20 Dec 92 01:52:45 GMT
- References: <1gtrpdINN6c4@corax.udac.uu.se>
- Organization: The Blues Family
- Lines: 53
-
- andersa@Riga.DoCS.UU.SE (Anders Andersson) writes:
-
- > [note Followup-To: comp.std.internat]
- >
- > In article <1gt5a2EINNin3@uni-erlangen.de>, unrza3@cd4680fs.rrze.uni-erlangen
- > > It should also be noted, that at least one existing OS (Windows NT)
- > > uses a 2 byte encoding both internally (e.g. in filenames in Fnodes
- > > on the disc) as well as in text files. Text files always begin with
- > ^^
- > > FEFF as a magic code for ISO 10646 textes. This code also indicates,
- > > whether it is a littleendian file.
- >
- > Is this magic code visible to the user without any special tricks,
- > or is it filtered away by the operating system when the file is
- > opened for reading? Suppose I obtain a file, that is labeled as
- > containing IS 10646 text, via FTP from a server running Windows NT,
- > to a client running a different system--will I then get this 0xFEFF
- > magic code (which is meaningless on my system) too, or will I get a
- > 'clean' IS 10646 text?
-
- Well, I guess programs will "see" either "0xFEFF" or "0xFFFE" depending
- on whether the file has been written by a big or a little endian.
- If you have a system for which ISO 10646 text files is not meaningless,
- the "magic" code isn't meaningless.
-
- > I remember seeing text files containing an explicit ^Z (0x1A) at
- > the end, due to their origin on some home computer where ^Z was the
- > ordinary EOF marker, even though I was sitting on a system with
- > perfectly functional EOF pointers in the file descriptor blocks...
-
- Never mind. That's a totaly different story.
-
- > I hope the above isn't yet another version of that problem (non-
- > standard tags or markers floating around with standards-compliant
- > data on systems not understanding them)?
-
- The tag *is* defined the UniCode standard and so it should be defined
- in ISO 10646, too (this is an assumption). All systems capable of
- reading/displaying UniCode/ISO 10646-16-bit (the latter is not the
- official name) should follow this tagging.
-
- > Alternatively, does this magic code have any chance of becoming
- > a standard itself?
-
- See above.
-
- Kosta
-
-
- --
- Kosta Kostis, Talstrasse 25, D-6074 Roedermark 3, Germany
- kosta@blues.kk.sub.org (home)
- sw authors: please support ISO 8859-1! dv|DV\_=aeoeueAEOEUEss
-