home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.unix.bsd
- Path: sparky!uunet!cs.utexas.edu!wupost!gumby!destroyer!gatech!news.byu.edu!ux1!fcom.cc.utah.edu!cs.weber.edu!terry
- From: terry@cs.weber.edu (A Wizard of Earth C)
- Subject: Re: Ohta enpitsu inke desu
- Message-ID: <1993Jan7.222423.899@fcom.cc.utah.edu>
- Keywords: Han Kanji Katakana Hirugana ISO10646 Unicode Codepages
- Sender: news@fcom.cc.utah.edu
- Organization: Weber State University (Ogden, UT)
- References: <2628@titccy.cc.titech.ac.jp> <1993Jan7.045612.13244@fcom.cc.utah.edu> <2637@titccy.cc.titech.ac.jp>
- Date: Thu, 7 Jan 93 22:24:23 GMT
- Lines: 313
-
- In article <2637@titccy.cc.titech.ac.jp> mohta@necom830.cc.titech.ac.jp (Masataka Ohta) writes:
- >In article <1993Jan7.045612.13244@fcom.cc.utah.edu>
- > terry@cs.weber.edu (A Wizard of Earth C) writes:
- >
- >>Before I proceed, I will [ once again ] remove the "dumb Americans" from my
- >>original topic line.
- >
- >I changed the subject to reflect the content better.
-
- I changed the subject as you did, to be insulting rather than useful. My,
- aren't we resolving these issues wonderfully. I expect you to change the
- topic again, but hopefully to something germane to the content of the
- posting rather than another insult.
-
- >>>>>>This I don't understand. The maximum translation table from one 16
- >>>>>>bit value to another is 16k.
- >>>>>WHAAAAT? It's 128KB, not 16k.
- >>It is still a translation of one 16 bit value to another. In is *not* an
- >>*arbitrary* translation we are talking about, since the spanning sets will
- >>be known.
- >You wrote MAXIMUM.
-
- I was referring to a spanning set less than the full 16 bit set; to anyone
- who took these sentences in context (you, obviously, did not so as to
- manufacture a situation for you to be able to disagree with me), it is
- obvious that by "maximum" I was referring to "maximum translation table
- spanning set", not "maximum translation table for the full 16 bit set".
-
- Keep this up, and I will be openly insulting.
-
- >>>>>>This means 2 16k tables for translation into/out of
- >>>>>>Unicode for Input/Output devices,
- >>Sorry; I misspoke (mistyped?) here.
- >
- >You are dumb.
-
- I can speak just fine, thank you, not that a handicap of that nature, were
- I to have it, would make me any less of a person.
-
- Nice of you to remove the context here, as well. The statement "you are
- dumb" is, of course, a brilliant rebuttal of my point, which follows:
-
- >>I meant to refer to any arbitrary 8-bit
- >>set for which a localization set is available (example: and ISO 8859-x set).
- >
- >Do you know what HASHING is? If not, read Knuth.
-
- Hashing involves a loss of information (read Knuth yourself). I was not
- suggesting that information be destroyed in the mapping process (as you
- apparenty wish would happen, since it would invalidate my argument). The
- translation would be from 16-bit Unicode through a 16-to-8-bit spanning
- table to a specific 8-bit ISO character set. This is *not* hashing.
-
- >>Obviously, by this response, you meant "cat two files to a third file" rather
- >>than what you stated,
- >
- >You don't have to create a third file, as the output might be piped.
- >
- >>what you stated, which would have resulted in the files going to the
- >>screen. Display device attribution based on supported character
- >
- >While you may not know UNIX at all, "cat" has nothing to do with display.
- >Instead, some device drivers and terminal emulators might.
-
- EXCUSE ME, BUT YOUR ORIGINAL STATEMENT WAS:
- ] How can you "cat" two files with different file attributes?
-
- To which I replied.
-
- ] By localizing the display output mechanism.
-
- Thinking that, since you did not suggest that the ouput would be other than
- the default for cat, I made the mistake of taking your words to mean what
- they meant. To which you intentionally misinterpreted:
-
- ] Wow! Apparently he thinks "cat" is a command to display content of
- ] files. No wonder he think file attributes good.
-
- DO YOU DENY THIS?
-
- From that derived the quoted (">") section just above.
-
- Any one with half a brain knows that cat can be used to display files, that
- the default output of cat is to fd 1 (stdout), and that by the phrasing
- `you "cat" two files` you implied with "you" that I would do it
- personally rather than as part of a script. Further, stdout in an
- interactive environment is attached to a device driver for a tty or a pty
- -- a display device. You, of all people, are exactly qualified to know
- this.
-
- >>Obviously what you are asking is "how do I make two monolingual/bilingual/
- >>multilingual files of different language attribution into a single bilingual/
- >>multilingual file using cat" -- not the question as you have phrased it, nor
- >>as I have answered it, but in the context of the discussion, clearly the
- >>intended tack.
- >
- >"How to "cat" files with different attributes" is the classic question
- >to piss off attribute-lovers, which all UNIX lovers know.
-
- It didn't piss me off; I answered it in good faith, and provided a workable
- soloution that you could even call "cat" if you wanted.
-
- Yes, it introduced a case where multiple output streamss combined to
- produce its input failed; but it worked in all other cases. We can
- name it "cat" instead of "combine" if we choose to say that this is a
- case where the beahaviour is undefined. This is exactly analogous to
- the ANSI C standard changing expected behaviour to undefined behaviour
- for things like memcpy() to overlapping areas, or similar changes to
- the action of system calls under Posix. I do not see you claiming
- that ANSI C is not C or that a Posix compliant UNIX is not UNIX. My
- redefinition of "cat" stands as a potential soloution to your attribute
- problems.
-
- If there is not a default attribution of files and a default attribution
- of all files below a mount point where the mount goes remote via NFS to
- an older system, how do you propose to deal with use of non-international
- files on an internationalized system? You may cop out for 7 bit US ASCII,
- but whatever your answer, it damn well doesn't hold for existing file
- from 8-bit clean internationalizations in Western Europe, Russia, and
- elsewhere where small glyph-set character sets are currently in use --
- or would you have us all update all our systems and all our software
- simultaneously?
-
- The reverse case of a non-internationalized system mounting an exported
- file system from an internationalized system applies here as well. How
- do you propose to solve this problem with a character set containing
- nonintersecting (non-unified) national character sets?
-
- Obviously, you will make a snide comment about me, rather than answering the
- questions, unless you chose to take that tack that there are bound to
- be incompatabilities with existing software *the same answer you berate
- me for here).
-
- >Of course, there are several other reasons why not to use file attributes,
- >which yuu don't know. But, I'm tired.
- >
- >>Rather than pretending I don't know what you are getting at,
- >
- >Then, don't post anymore.
-
- I should have pretended that I didn't know you were attempting to disguise
- the '"cat" of attributed files' problem and let you work up to it over
- the period of a week? Seems like sour grapes on your part.
-
- >>The answer is "you don't use 'cat'". The "cat" command does not deal with
- >
- >OK, say it in comp.unix.misc and see what happens.
-
- If I don't delete the context from this (as you did) and state that the
- "cat" command can be replaced with the "combine" command, and that the
- "combine" command can be renamed to "cat" as long as you don't construct
- wildly pessimistic code, an example of which I pointed out -- I am well
- aware of drawbacks in suggestions I make, and, unlike you, I not only
- admit them, but point them out. Did it occur to you that criticism is
- necessary only to draw attention to a flaw, and if the originator of
- the flaw admits it, perhaps they are looking for suggestions rather
- than someone parroting their own words back to them?
-
- >>What this means is that all files which are multilingual in nature require
- >>a compound document architecture.
- >
- >No thank you. I do want to grep my multilingual files.
-
- Grep for "macro". It's a Latin word used in many, many western languages;
- tell me: how will this match "macro" in Latin, German, and English when
- the character sets are not unified? Is your "grep " going to unify
- internally? How does your suggestion (which does not have a standard
- codified for it) resolve this issue?
-
- >>What this means is that a utility to combine documents (let's call it
- >>"combine") must have the ability to either generate language attributed
- >>files (if the source files are all of a single language attribution) or
- >>our default compound document format (TBD).
- >
- >You are making simple problem unsolvable.
-
- You are taking information from my posting out of context to make it seem
- as if this were the case. This is not a technique in rational discourse,
- it is a sales job. Salesman.
-
- >>The correct approach is to note that since Unicode does not provide a
- >>mechanism directly for language attribution, and that file attribution
- >>is only a partial soloution,
- >
- >So, the correct aproach is not to use Unicode as it is.
-
- No, the correct approach is to use a full soloution; one potential full
- soloution is in my previous posting (following the comma in the
- "mysteriously" truncated half line above.
-
- >>What this means is that a utility to combine documents (let's call it
- >>"combine")
- >
- >Wow!
-
- "Wow" indeed, as you "cleverly" omit the fact that "combine" may be renamed
- to "cat" if we omit a single contrived and pessimistic use from the set
- of defined behaviours for "cat"... as I stated in my previous posting.
-
- >>Does this answer your "cat" question sufficiently?
- >
- >Conglaturations! You are now prepared to accept the second question.
- >
- >Under internationalized environment, we often create a file with Japanese
- >name. At the same time,
- >
- > 1) we might have a file having Chinese name in the same directory.
- > 2) we might have a file having Chinese name in the different directory.
- > 3) the Japanses file's full pathname might contain Chinese at its
- > intermediate directory name.
- >
- >Could you design a replacement of "ls" for such a situation?
-
- Yes, no problem, since the name space information is not considered to
- be multilingual text in common usage, but rather it is considered to be
- name space information.
-
- Each name in a file is already tagged in the inode as to the nature of
- the language to be used within the inode (for monolingual documents).
- For documents which are *not* monolingual, the file name must have been
- entered in the context of a particualar language-dependant input mechanism
- for the file to exist within the file system name space at all. Thus the
- language tagging of the file name itself is also derivable at creation time.
-
- This is only untrue if you are proposing the maintenance of multiple name
- spaces, one per language used on the machine. This is both at odds with
- your stated intent of minimizing the currently loaded font sets, a natural
- requirement of your expansion of the combined font size -- an unworkable
- soloution in both Unicode and your suggested environment. This also has
- the ramification of mapping files into other than their creation name space
- at creation time, or to save space taken up by directories, on first
- reference within a particular name space. Unless you have personally
- solved the machine translation problem, there are attributes which do
- not move from language to language in the file system name space itself,
- such as file names denoting ownership ("bobs.file") or the contents of the
- file ("QuartlySales.Q3.1991"). Thus there is nothing added in doing this
- which is unresolovable, unless it is also unresolvable in your suggested
- mechanism as well.
-
- The only possible argument is collating sequence, and we both know your
- proposed soloution breaks down in languages with multiple possible
- collation sequences (ie: German dictionary vs. phone book order). It
- requires an exception. There is no reason not to make the exception the
- rule, and provide routines for alpha sort and locale-specific tables
- for all languages, instead of just the exceptions. This soloution is
- one that has been proposed for a Unicode-based environment as well.
-
- >Then, the third:
- >
- >>Attribution of output and clever construction of out output device drivers
- >>would even allow us to switch fonts as dictated by the compound document
- >>architecture controls embedded in the file and/or the attribution of the
- >>file descriptor (the absence of such attribution being an indicator of a
- >>compund document).
- >
- >Given the above situation for "ls", I'm afraid that "argv" to any command
- >be the compound document. Am I correct? Is it still have a type "char"?
- >Do you think the entire OS still UNIX?
-
- In order:
- No.
-
- No, unless you don't mean "byte" by "char",
-
- Yes, if POSIX and ANSI C haven't "unUNIXed UNIX" by their specification of
- previously non-existant exception cases, No, if you mean SVID -- but
- then again, your proposal (or any multibyte proposal) fails this test,
- as does 386BSD itself, the OS to which we will be applying the work.
-
- >>The problem seemed to
- >>be that there was not a means around the problem from your point of view.
- >
- >Just include language information in character code, and the problem
- >disappears.
-
- Unfortuantely (or fortunately, since it means I am not culpable and do
- not owe you nor anyone else an explanation on the matter), I am not a
- member of the responsible standards committee, or I might have done what
- you suggest. If you could suggest a standard that did what you are
- suggesting, allowed X11 to operate on 16 bit fonts (since X is our
- only possible common user interface at this time and most servers do
- not support 32 bit fonts), and allowed language specific compaction by
- character set choice as an optimization for monolingual documents (or
- did not disallow it!), then I would adopt your approach. Even a draft
- standard which was under serious consideration by a standards committee
- would be acceptable.
-
- One can not build a palace of bricks when one has only straw; but with
- straw, one may build bricks.
-
- Unicode is straw.
-
- The work on 386BSD is widely distributed, and it is not possible to use
- an approach which has not been formally documented when the developers
- are so widely seperated geographically. It is not possible to use a
- "standard" where a reference takes the form of "ask Ohta; it's his
- standard" (if you had a car accident, we would all be screwed). It is
- useless to use a standard which has no hope of becoming codified by
- a respected standards committee... thus it must be a draft standard
- under consideration or an actual standard.
-
-
- Terry Lambert
- terry@icarus.weber.edu
- terry_lambert@novell.com
- ---
- Any opinions in this posting are my own and not those of my present
- or previous employers.
- --
- -------------------------------------------------------------------------------
- "I have an 8 user poetic license" - me
- Get the 386bsd FAQ from agate.berkeley.edu:/pub/386BSD/386bsd-0.1/unofficial
- -------------------------------------------------------------------------------
-