NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / std / internat / 1075 < prev next >

Wrap

Text File | 1993-01-07 | 5.2 KB | 103 lines

Newsgroups: comp.std.internat Path: sparky!uunet!spool.mu.edu!agate!dog.ee.lbl.gov!hellgate.utah.edu!fcom.cc.utah.edu!cs.weber.edu!terry From: terry@cs.weber.edu (A Wizard of Earth C) Subject: Re: INTERNATIONALIZATION: JAPAN, FAR EAST Message-ID: <1993Jan7.063116.14846@fcom.cc.utah.edu> Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages Sender: news@fcom.cc.utah.edu Organization: Weber State University (Ogden, UT) References: <1hu9v5INNbp1@rodan.UU.NET> <8490@charon.cwi.nl> <1hvu79INN4qf@rodan.UU.NET> Date: Thu, 7 Jan 93 06:31:16 GMT Lines: 90 In article <1hvu79INN4qf@rodan.UU.NET> avg@rodan.UU.NET (Vadim Antonov) writes: >The idea of visual encoding (and one letter-onr glyph is nothing more >than a compressed image of the text) is simply wrong because it >drops valuable information readily available at the point of the CREATION >of the text but not later. Multiple potential sort orders give lie to this argument. At least several languages have been mentioned (ie: German) where there are multiple possible sort orders. Let's call a spade a spade -- collation sequences. The collation sequences for Japanese, for instance, vary on pronuciation. If it were the intent of Unicode to provide a unified collation mechanism, this would be a very strong argument against Chinese/Japanese unification. Luckily, this was not a goal. The non-intersecting unification of all sets such as suggested by yourself and Ohta is not supported by this argument unless you provide one character set per collation sequence. This is folly for single languages with multiple collation sequences. If one is to maintain multiple collating sequences for one font, why not all fonts? >Sure, the information can (*must* if you're >going to do trivial things like sorting or case-insensitive comparisons) >be preserved off-text (in mail headers or in file attributes, for >example) but it effectively defeats the very purpose of ISO10646 -- >why on the Earth do i need to spare bits for encoding glyphs if >i already know the language and 8 (or 16 for oriental languages) bits >is quite enough to map the alphabet. Don't you see this gap in >the logic nullifying all benefits of 10646? I don't see how this nullifies the benefits of Unicode (which you seem to be using 10646 as a synonym for, given that this is the only codified portion). First, Unicode is not the sole definition of 10646; just the only currently defined character set within 10646. There is no reason to throw out 10646 because of Unicode (although I could make an argument for 32 bits being a nifty reason for doing so). Second, Unicode buys more than simply another character set; it buys the ability to produce non-conflicting monolingual localizations of software systems (as opposed to conflicting ones as a result of a lack of standards coordination with existing standards). It also buys a platform for non-conflicting multinationalization (multilingual data processing) given a means of compounding documents by language/locale (there may be more than one locale per language). Admittedly, this is not as elegant as a unified glyph set for all languages, but it does charge the penalty to the multilingual (minority) rather than the localized-monolingual (majority) user. >10646 was meant as an encoding eliminating the necessity to carry off-text >information (which is not a piece of cake, especially in multi-lingual >texts). However, the "single glyph" approach ruined the very intent >because you need the off-text information to do trivial tasks anyway! >What's the gain? More wasted bits, yeah? By virtue of the "multiple collating sequences within a single language" argument, the same holds true of your soloution -- worse, there are exception cases in your soloution, while there is a potential uniform impementation on top of Unicode. The Fact is, a multilingual word processor will have to present its menus to the user, probably in his native language, by means of a locale mechanism. If a "vi" style implementation is used (no explict commands if you ignore ":set" and ":map" and all OS escapes), there is still the requirement of localization of error messages and keyboard input. There is no divorcing the language from the application entirely, if the application is one which operates on text as data. >Take a life, guys. We in Russia did that mistake (DKOI and "GOST" encodings) >many years ago and came to realize that this solution is too simple to >be correct. "The simplest explanation which fits the facts is the correct one" -- William of Occam I don't see why every soloution which resembles a past soloution (not that I am claiming a familiarity with "DKOI" or "GOST", so the resemblence is one which you see) must be incorrect by virtue of that resemblence. The Unicode character set solves a number of long standing problems. Terry Lambert terry@icarus.weber.edu terry_lambert@novell.com --- Any opinions in this posting are my own and not those of my present or previous employers. -- ------------------------------------------------------------------------------- "I have an 8 user poetic license" - me Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial -------------------------------------------------------------------------------