home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!cis.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!att!allegra!alice!andrew
- From: andrew@alice.att.com (Andrew Hume)
- Newsgroups: comp.std.internat
- Subject: Re: Data tagging (was: 8-bit representation, plus an X problem)
- Message-ID: <24551@alice.att.com>
- Date: 5 Jan 93 02:41:49 GMT
- Article-I.D.: alice.24551
- References: <24426@alice.att.com| <1gpruaINNhfm@frigate.doc.ic.ac.uk> <2606@titccy.cc.titech.ac.jp>
- Organization: AT&T Bell Laboratories, Murray Hill NJ
- Lines: 97
-
- be warned that my comments here are not meant to be belligerent;
- i am just trying to figure out the complaints.
-
- In article <2606@titccy.cc.titech.ac.jp>, mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
- ~ In article <24494@alice.att.com> andrew@alice.att.com (Andrew Hume) writes:
- ~
- ~ > this is a fundamental and persistent misapprehension of
- ~ >10646. it was neither designed for, nor can it allowd a system
- ~ >to display identical Han characters (same 10646 codepoint)
- ~ >in different fonts WITHOUT some other (non-10646) convention.
- ~
- ~ As it defines the same code point for correnponding Japanese/Chinese
- ~ characters, differentiation is impossible.
-
- this is agreeing with me, right?
-
- ~ >10646 simply defines codepoints, allowing the unambiguous specification
- ~ >of the characters. if you want to set (or switch) fonts, then you
- ~ >have to invent some scheme to do it, as say ISO 2022 does
- ~ >(in a slightly clumsy way).
-
- ~ Then, why you use ISO 10646? Can't we use ISO 2022?
-
- because 2022 is utterly hopeless. in fact, 2022 simply defines
- how to interpret byte streams (7 or 8 bit) using special sequences
- to load new character sets and to switch between them. to actually know
- what characters are meant, you have to know what the special escape
- sequences are AND know how that particular small character set works.
- furthermore, this is now a stateful encoding, so you can't seek or anything.
- just horrible. at least with 10646, i know all teh characters NOW, and there
- is no state switching.
-
- ~ > i claim NT has some convention in addition to the locale
- ~ >(perhaps, it is part of the lcoale's definition) so it can do this.
- ~ >but i don't know.
- ~
- ~ Then, how can I create a text file which contains both Chinese
- ~ and Japanese?
-
- i said i didn't know; presumably, there are magic commands (a la RTF)
- that change fonts.
-
- ~ >> Shouldn't we use error numbers internally and interprete it by local
- ~ >> programs?
- ~
- ~ > when the protocol was updated, this was taken out
- ~ >and errors are now just text strings (64 bytes i think).
- ~
- ~ Making everything English text is another fatal misdesign of Plan 9
- ~ as an internationalized OS.
-
- damned if i know where i said the text was english. it happens to
- be for us for the file server we have for our jukebox. we don't legislate
- the language, just that it is 10646 characters (sorry, you can't mix japanes
- and chinese within error messages).
-
- ~ > the bottom line is, when you connect to remote services,
- ~ >there is simply no way to know what they might report as errors.
- ~
- ~ If you are connecting to some server, you know the protocol you use
- ~ in which meanings on error numbers are contained.
-
- just wrong. i attempt to open a file. an error is returned.
- the protocol doesn't define nor restrict how that may have failed.
- sure we can guess common ones ('file does not exist'); but what about rare
- ones ("its bob's file and he's gone home")?
-
- ~ >the problem is that a file server, say, may have an arbitrary
- ~ >number of error conditions, many of which may not be known locally.
- ~
- ~ Or, how can the client program know the severity of the error? Should
- ~ client program retry on error? How many times? Or should the client
- ~ abort the current transaction and retry? Or should the client
- ~ abort the entire job?
-
- good idea (not that it has anything to do with error numbers)!
- by default, we assume that file servers are working or not. but in general,
- it would be plausible for the file server to prefix its error message
- with a convential token indicating whether or not to retry. again, this
- should be a file server depependent thing and therefore cannot be tied
- to any particular set of error numbers.
-
- ~ Even if the version of the server is updated and new error codes are added,
- ~ the server should know the version of the client and return old error code.
- ~ Or, how can the client program behave? Of course, for purely
- ~ informational purpose only, you can still report it, say, as "unknown
- ~ error # 42", which is of no help.
-
- not sure what this means. in any case, returning text avoids any
- synchronisation problems.
-
-
- and just in case you were wondering why this is being pursued,
- it shows why having international character sets isn't the same as i18n.
- you have to deal with things like errno and so on.
-
- andrew hume
-