NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / std / internat / 1031 < prev next >

Wrap

Internet Message Format | 1993-01-05 | 4.7 KB

Path: sparky!uunet!cis.ohio-state.edu!pacific.mps.ohio-state.edu!linac!att!att!allegra!alice!andrew From: andrew@alice.att.com (Andrew Hume) Newsgroups: comp.std.internat Subject: Re: Data tagging (was: 8-bit representation, plus an X problem) Message-ID: <24551@alice.att.com> Date: 5 Jan 93 02:41:49 GMT Article-I.D.: alice.24551 References: <24426@alice.att.com| <1gpruaINNhfm@frigate.doc.ic.ac.uk> <2606@titccy.cc.titech.ac.jp> Organization: AT&T Bell Laboratories, Murray Hill NJ Lines: 97 be warned that my comments here are not meant to be belligerent; i am just trying to figure out the complaints. In article <2606@titccy.cc.titech.ac.jp>, mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes: ~ In article <24494@alice.att.com> andrew@alice.att.com (Andrew Hume) writes: ~ ~ > this is a fundamental and persistent misapprehension of ~ >10646. it was neither designed for, nor can it allowd a system ~ >to display identical Han characters (same 10646 codepoint) ~ >in different fonts WITHOUT some other (non-10646) convention. ~ ~ As it defines the same code point for correnponding Japanese/Chinese ~ characters, differentiation is impossible. this is agreeing with me, right? ~ >10646 simply defines codepoints, allowing the unambiguous specification ~ >of the characters. if you want to set (or switch) fonts, then you ~ >have to invent some scheme to do it, as say ISO 2022 does ~ >(in a slightly clumsy way). ~ Then, why you use ISO 10646? Can't we use ISO 2022? because 2022 is utterly hopeless. in fact, 2022 simply defines how to interpret byte streams (7 or 8 bit) using special sequences to load new character sets and to switch between them. to actually know what characters are meant, you have to know what the special escape sequences are AND know how that particular small character set works. furthermore, this is now a stateful encoding, so you can't seek or anything. just horrible. at least with 10646, i know all teh characters NOW, and there is no state switching. ~ > i claim NT has some convention in addition to the locale ~ >(perhaps, it is part of the lcoale's definition) so it can do this. ~ >but i don't know. ~ ~ Then, how can I create a text file which contains both Chinese ~ and Japanese? i said i didn't know; presumably, there are magic commands (a la RTF) that change fonts. ~ >> Shouldn't we use error numbers internally and interprete it by local ~ >> programs? ~ ~ > when the protocol was updated, this was taken out ~ >and errors are now just text strings (64 bytes i think). ~ ~ Making everything English text is another fatal misdesign of Plan 9 ~ as an internationalized OS. damned if i know where i said the text was english. it happens to be for us for the file server we have for our jukebox. we don't legislate the language, just that it is 10646 characters (sorry, you can't mix japanes and chinese within error messages). ~ > the bottom line is, when you connect to remote services, ~ >there is simply no way to know what they might report as errors. ~ ~ If you are connecting to some server, you know the protocol you use ~ in which meanings on error numbers are contained. just wrong. i attempt to open a file. an error is returned. the protocol doesn't define nor restrict how that may have failed. sure we can guess common ones ('file does not exist'); but what about rare ones ("its bob's file and he's gone home")? ~ >the problem is that a file server, say, may have an arbitrary ~ >number of error conditions, many of which may not be known locally. ~ ~ Or, how can the client program know the severity of the error? Should ~ client program retry on error? How many times? Or should the client ~ abort the current transaction and retry? Or should the client ~ abort the entire job? good idea (not that it has anything to do with error numbers)! by default, we assume that file servers are working or not. but in general, it would be plausible for the file server to prefix its error message with a convential token indicating whether or not to retry. again, this should be a file server depependent thing and therefore cannot be tied to any particular set of error numbers. ~ Even if the version of the server is updated and new error codes are added, ~ the server should know the version of the client and return old error code. ~ Or, how can the client program behave? Of course, for purely ~ informational purpose only, you can still report it, say, as "unknown ~ error # 42", which is of no help. not sure what this means. in any case, returning text avoids any synchronisation problems. and just in case you were wondering why this is being pursued, it shows why having international character sets isn't the same as i18n. you have to deal with things like errno and so on. andrew hume