home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.std.internat
- Path: sparky!uunet!spool.mu.edu!agate!dog.ee.lbl.gov!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry
- From: terry@cs.weber.edu (A Wizard of Earth C)
- Subject: Re: Dumb Americans (was INTERNATIONALIZATION: JAPAN, FAR EAST)
- Message-ID: <1993Jan11.193710.29580@fcom.cc.utah.edu>
- Keywords: Unicode ISO10646 CharacterEncoding
- Sender: news@fcom.cc.utah.edu
- Organization: Weber State University (Ogden, UT)
- References: <1in2c8INNmbj@life.ai.mit.edu> <1993Jan10.000115.28150@fcom.cc.utah.edu> <1ipo2kINN6g2@life.ai.mit.edu>
- Date: Mon, 11 Jan 93 19:37:10 GMT
- Lines: 110
-
- In article <1ipo2kINN6g2@life.ai.mit.edu> glenn@wheat-chex.ai.mit.edu (Glenn A. Adams) writes:
- >The FSS-UTF (UTF2), developed by Ken Thompson, and used in Plan9, had
- >a different set of goals which were established by their usage requirements.
- >I would not claim that UTF2 is inappropriate given the goals articulated
- >by the Plan9 designers; however, I do not believe one should take
- >their goals to be those of the Unicode/10646 community at large. Most
- >Unicode/10646 implementors that I'm aware of, tend to use 10646 UCS2
- >(Unicode) fixed width 16-bit encodings only, for both processing and
- >interchange.
-
- I definitely agree with this approach for processing. I *don't* agree
- for interchange, if that interchange takes the form of a non-internationalized
- or 8-bit interntationalized (with an 8-bit or smaller character set) NFS
- client of an internationalized system. Storage should be local specific
- for interoperability and for storage savings to avoid non-international
- users paying the higher price of raw encoding, even if the NFS argument
- is thrown out the window.
-
- >>In my opinion, UTF and other species of what I have been calling "Runic
- >>encoding" are inappropriate for storage of non per-language attributable
- >>data.
- >
- >I think you may be misusing the term as Ken Thompson used it (I believe)
- >to refer to fixed width 16-bit 10646 UCS2 (Unicode) encodings, and not to
- >the variable width UTF encodings. The term "multibyte" encoding as
- >currently used in the Posix, X/Open, and Unix communities tends to be
- >used for what is equivalent to UTF variable length multibyte encodings.
-
- This is indeed what I meant. Variant numbers of bytes per character is
- nearly intolerable.
-
- >>This [language attribution] *must* be in-band for multilingual Unicode
- >>documents if we are to overcome the (I believe reasonable) objection to
- >>the lack of information for display localization.
- >
- >No, this is incorrect. Language attribution *is not* necessary for
- >the legible display of any multilingual Unicode data. That is, unless
- >you consider that font attribution is necessary for display. Neither
- >Ohta-san's claims nor Vadim's claims have shown that language attributes
- >are necessary to perform legible display. If you (or anyone else) can
- >demonstrably show this to be the case, then I welcome you to do it.
- >Otherwise, you can take my word (having implemented a Unicode rendering
- >engine) that language attributes are not necessary.
-
- You misunderstand the basis of the objections. The objections are not
- made on the basis of legibility, but rather on the [apparent] imposition
- of cultural imperialism on those languages undergoing unification. The
- point is esthetic in many cases, rather than technical.
-
- I can say that an English text mixing normal, italic, and bold characters
- will "look like hell" when printed in a single font. The point is
- language attribution so that font selection is possible. In a monolingual
- document, the locale information (ala file attribute or per system) is
- sufficient to provide the rendering clues; a multilingual document is
- a compound document.
-
- It *is* possible to seperate out language attribution in a file based on
- any compund (multilingual) document by seperating documents by language
- per file. This is unacceptable if the file is, for instance, a document
- like Roland Lange's "Japanese Verbs", with drastically mixed text. The
- issue is not that the text is mixed in and of itself, but that the text
- is nearly interspersed, and that it isn't practical to seperate it into
- files by language, with a document description file to do the compunding.
-
- Consider the worst case scenerio: a Japanese text telling how to write
- Chinese characters.
-
- >I agree that language attribution is desirable for typographically
- >correct display; however, so are font and style attributes.
-
- Agreed; and a good argument could be made that this is indeed a thin line.
-
- >>The destruction of information, like record boundries or an aritmmetic
- >>relationship between the character count in a document and the number of
- >>bytes in a file, is unavoidable in a Unicode implementation (10646 can
- >>escape this by implementing Vadim or Ohta's "super character set" elsewhere
- >>in the vastly larger space provided by the 32 bit space).
- >
- >No, this is incorrect:
- >
- > byte_count = unicode_char_count * sizeof (unsigned short)
- >
- >Your claim is only true in the context of UTF(s), not Unicode (or 10646)
- >in general.
-
- Right. Again, UTF encoding was what I was referring to.
-
- >>To my mind, the way UTF is formulated seems like a buy-off of the 7-bit
- >>ASCII world, with little benefit to non (or only partial) 7-bit ASCII
- >>users.
- >
- >UTF2 buys significant software backward compatibility with many 8-bit
- >clean for ASCII only applications. I believe that was its design goal,
- >so it can't be faulted on that basis.
-
- Perhaps not, but there are cleaner mechanism which buy backward compatability
- with character sets other than simply 7-bit US ASCII.
-
-
- Terry Lambert
- terry@icarus.weber.edu
- terry_lambert@novell.com
- ---
- Any opinions in this posting are my own and not those of my present
- or previous employers.
- --
- -------------------------------------------------------------------------------
- "I have an 8 user poetic license" - me
- Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial
- -------------------------------------------------------------------------------
-