home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.std.internat
- Path: sparky!uunet!wupost!crcnis1.unl.edu!moe.ksu.ksu.edu!zaphod.mps.ohio-state.edu!saimiri.primate.wisc.edu!ames!agate!dog.ee.lbl.gov!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry
- From: terry@cs.weber.edu (A Wizard of Earth C)
- Subject: Re: An alternative I18N paradigm
- Message-ID: <1993Jan7.054124.14059@fcom.cc.utah.edu>
- Sender: news@fcom.cc.utah.edu
- Organization: Weber State University (Ogden, UT)
- References: <1hncs1INN1qq@corax.udac.uu.se> <DAN.92Dec29102634@dan.watson.ibm.com> <1hu9v9INN923@life.ai.mit.edu>
- Date: Thu, 7 Jan 93 05:41:24 GMT
- Lines: 108
-
- Vadim dragged me into this group by referencing an article I posted only in
- comp.unix.bsd in a "mass-rebuttal" posted here. I might as well attempt to
- contribute.
-
-
- In article <1hu9v9INN923@life.ai.mit.edu> glenn@wheat-chex.ai.mit.edu (Glenn A. Adams) writes:
- >The real problem, at least as I see it, is that the locale model
- >doesn't distinguish between the consumer of information and the
- >producer of information. It naively assumes that an individual
- >end user must choose the manner in which (linguistically and
- >culturally sensitive) information is to be presented, and that
- >this choice can be determined by one fixed value parameter (i.e.,
- >the locale setting).
-
- I agree. A pure locale model is unacceptable for multinationalization (use
- in a multilingual environment). This is the problem with "code pages" in
- DOS. I don't think that a "pure" locale is required, though.
-
- [ ... ]
- >However, if the producer is another individual, then
- >the locale probably should be ignored, and information contained in the
- >text itself should be consulted about matters such as character set encoding,
- >font(s), language tag(s), or other presentation information. In this case,
- >the locale may be of help only in the case that such explicit information
- >(encoding, font, etc.) is absent, and here, it may produce complete garbage
- >if it makes the wrong assumptions.
-
- Agreed; a compounding mechanism is necessary, potentially combined with
- some method of per document attribution for the majority case of monolingual
- documents, for use in a multinational implementation of locale to make it
- useful.
- >
- >One may ask where 10646/Unicode fits into all of this? It provides only
- >a small part of the solution; namely, a single universal character set
- >encoding rather than many non-universal encodings. Of course, one might
- >propose to solve a small part of the problem by using 10646: use 10646
- >for all encoded text. However, as has been pointed out by Ohta-san and
- >others, this doesn't solve many other problems, e.g., whether to use a
- >Chinese or a Japanese font to display a given Han character. So more
- >is needed still.
-
- Right; the above mentioned out-of-unicode band mechanisms for achieving
- multinationalization, or some similar methodes, such as those used in
- the ISO 2022 standard, but with the advantages of Unicode by virtue of
- being layered on top of Unicode.
-
- I see localized monolingual as the majority case, and multilingual as
- the minority case. Because of the fact that name space information is
- not locale specific, you can't have a "/home" for an English user on
- the system and a "/casa" for the Spanish user. This means that an
- actual multinational system which is not partitioned by language (chroot
- with hard links or a translucent "namespace mount?) must have the ability
- to display the character sets of all the languages in use on the system
- at any time, if only for consistant opeartion of "ls" in shared or publicly
- examinable directories. This is a heavy burden for minority use to carry.
-
- I further believe it to be a long way between the acceptance of a greater
- than 16-bit font and the ability of, for instance, the X system to support
- 2^20 character glyphs downloaded to the terminal (20 bits has been
- consistently used by "Ohta-san" (correctly, the honorific "san" should be
- on the surname, and I am not sure if he has chosen a westernized ordering
- on his name; you may be using the equivalent of "Glenn-san"... but I
- digress 8-)) as a working approximation for his idea of the size requirements
- of a non-unified set containing the non-intersecting "unification" of all
- existing sets.
-
- >I, for one, do not believe that 10646 will become universally used
- >in a fortnight. Local and other standard encodings will continue to
- >exist, probably forever. So we need to start doing one thing very
- >quickly: tagging character data as to its encoding. Another thing we
- >need to do, is add language (or writing system) tags to texts which
- >mix multiple languages. Alternatively, this could be done by tagging
- >font runs and then associating languages with those runs [I do not
- >advocate this method - I prefer explicit language or writing system
- >tags]. Other kinds of tags might be necessary for certain types
- >of processing, e.g., yomi (phonetic reading) tags for allowing the
- >display of furugana, sort keys for allowing producer specified sorting
- >behavior, and so forth.
-
- Yes!
-
- >10646 will not even address any of these matters. However, Unicode may
- >do so in the form of implementation guidelines or further work on I18N.
-
- Yes!
-
- >Nonetheless, I think that it is quite important for many parties to begin
- >implementing such systems so that development of standard tags and tagging
- >systems can proceed. We need prior art and experience in these areas
- >before effective standards can be developed with a reasonable hope of
- >success.
-
- Yes!
-
- Thank you for your succinct statement of the issues!
-
-
- Terry Lambert
- terry@icarus.weber.edu
- terry_lambert@novell.com
- ---
- Any opinions in this posting are my own and not those of my present
- or previous employers.
- --
- -------------------------------------------------------------------------------
- "I have an 8 user poetic license" - me
- Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial
- -------------------------------------------------------------------------------
-