NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / std / internat / 1074 < prev next >

Wrap

Text File | 1993-01-07 | 5.9 KB | 120 lines

Newsgroups: comp.std.internat Path: sparky!uunet!wupost!crcnis1.unl.edu!moe.ksu.ksu.edu!zaphod.mps.ohio-state.edu!saimiri.primate.wisc.edu!ames!agate!dog.ee.lbl.gov!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry From: terry@cs.weber.edu (A Wizard of Earth C) Subject: Re: An alternative I18N paradigm Message-ID: <1993Jan7.054124.14059@fcom.cc.utah.edu> Sender: news@fcom.cc.utah.edu Organization: Weber State University (Ogden, UT) References: <1hncs1INN1qq@corax.udac.uu.se> <DAN.92Dec29102634@dan.watson.ibm.com> <1hu9v9INN923@life.ai.mit.edu> Date: Thu, 7 Jan 93 05:41:24 GMT Lines: 108 Vadim dragged me into this group by referencing an article I posted only in comp.unix.bsd in a "mass-rebuttal" posted here. I might as well attempt to contribute. In article <1hu9v9INN923@life.ai.mit.edu> glenn@wheat-chex.ai.mit.edu (Glenn A. Adams) writes: >The real problem, at least as I see it, is that the locale model >doesn't distinguish between the consumer of information and the >producer of information. It naively assumes that an individual >end user must choose the manner in which (linguistically and >culturally sensitive) information is to be presented, and that >this choice can be determined by one fixed value parameter (i.e., >the locale setting). I agree. A pure locale model is unacceptable for multinationalization (use in a multilingual environment). This is the problem with "code pages" in DOS. I don't think that a "pure" locale is required, though. [ ... ] >However, if the producer is another individual, then >the locale probably should be ignored, and information contained in the >text itself should be consulted about matters such as character set encoding, >font(s), language tag(s), or other presentation information. In this case, >the locale may be of help only in the case that such explicit information >(encoding, font, etc.) is absent, and here, it may produce complete garbage >if it makes the wrong assumptions. Agreed; a compounding mechanism is necessary, potentially combined with some method of per document attribution for the majority case of monolingual documents, for use in a multinational implementation of locale to make it useful. > >One may ask where 10646/Unicode fits into all of this? It provides only >a small part of the solution; namely, a single universal character set >encoding rather than many non-universal encodings. Of course, one might >propose to solve a small part of the problem by using 10646: use 10646 >for all encoded text. However, as has been pointed out by Ohta-san and >others, this doesn't solve many other problems, e.g., whether to use a >Chinese or a Japanese font to display a given Han character. So more >is needed still. Right; the above mentioned out-of-unicode band mechanisms for achieving multinationalization, or some similar methodes, such as those used in the ISO 2022 standard, but with the advantages of Unicode by virtue of being layered on top of Unicode. I see localized monolingual as the majority case, and multilingual as the minority case. Because of the fact that name space information is not locale specific, you can't have a "/home" for an English user on the system and a "/casa" for the Spanish user. This means that an actual multinational system which is not partitioned by language (chroot with hard links or a translucent "namespace mount?) must have the ability to display the character sets of all the languages in use on the system at any time, if only for consistant opeartion of "ls" in shared or publicly examinable directories. This is a heavy burden for minority use to carry. I further believe it to be a long way between the acceptance of a greater than 16-bit font and the ability of, for instance, the X system to support 2^20 character glyphs downloaded to the terminal (20 bits has been consistently used by "Ohta-san" (correctly, the honorific "san" should be on the surname, and I am not sure if he has chosen a westernized ordering on his name; you may be using the equivalent of "Glenn-san"... but I digress 8-)) as a working approximation for his idea of the size requirements of a non-unified set containing the non-intersecting "unification" of all existing sets. >I, for one, do not believe that 10646 will become universally used >in a fortnight. Local and other standard encodings will continue to >exist, probably forever. So we need to start doing one thing very >quickly: tagging character data as to its encoding. Another thing we >need to do, is add language (or writing system) tags to texts which >mix multiple languages. Alternatively, this could be done by tagging >font runs and then associating languages with those runs [I do not >advocate this method - I prefer explicit language or writing system >tags]. Other kinds of tags might be necessary for certain types >of processing, e.g., yomi (phonetic reading) tags for allowing the >display of furugana, sort keys for allowing producer specified sorting >behavior, and so forth. Yes! >10646 will not even address any of these matters. However, Unicode may >do so in the form of implementation guidelines or further work on I18N. Yes! >Nonetheless, I think that it is quite important for many parties to begin >implementing such systems so that development of standard tags and tagging >systems can proceed. We need prior art and experience in these areas >before effective standards can be developed with a reasonable hope of >success. Yes! Thank you for your succinct statement of the issues! Terry Lambert terry@icarus.weber.edu terry_lambert@novell.com --- Any opinions in this posting are my own and not those of my present or previous employers. -- ------------------------------------------------------------------------------- "I have an 8 user poetic license" - me Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial -------------------------------------------------------------------------------