home *** CD-ROM | disk | FTP | other *** search
- From: guy@sun.com (Guy Harris)
- Date: Mon, 20 Oct 86 10:49:33 PDT
-
- Responses to a couple of messages:
-
- >From Mark Horton:
-
- > Any solution to this problem must be in the kernel, or possibly
- > in libc underneath such subroutines as open, unlink, and chmod, (if you
- > have shared libraries or full source to recompile) or it won't work all
- > the time.
-
- Any solution to this problem must be applied to operating systems other than
- UNIX. As John Bruner pointed out, mandating case-insensitivity will only
- have the effect of removing UNIX from the list of standard-conforming
- systems. Changing the semantics of file names at this late date is unlikely
- to meet with approval from many UNIX vendors and users. For one thing, what
- are you going to do about directories that contain files named, say,
- "makefile" and "Makefile" (yes, they exist)? You may feel that having
- directories like this is a mistake, but declaring them to be a mistake isn't
- going to make them go away.
-
- There seem to be two issues here:
-
- 1) Should POSIX mandate case-sensitivity?
-
- 2) Should UNIX be changed to be case-insensitive if POSIX doesn't mandate
- case-sensitivity?
-
- These are rather separate issues. A case can be made that POSIX should not
- mandate case-sensitivity. Applications must then not depend on
- case-sensitivity. This will affect programs that create files with names
- other than those provided by the user. It could also affect programs that
- *read* directories, since they'd have to know that "foobar" and "FoOBaR"
- refer to the same file.
-
- I see great difficulty in changing UNIX to be case-insensitive, however. It
- certainly wouldn't pose any great *implementation* difficulties, but I would
- not like to bet that no user or program would be greatly affected.
-
- >From Mark R. Crispin:
-
- > It seems that the two sides in this issue boil down to this:
- > . "gee, since we're defining a standard portable operating system
- > that isn't necessarily the present de facto Unix, let's fix
- > this case sensitivity cretinism"
- > . "case sensitivity is what makes Unix better than any other
- > operating system, and only a cretin can't understand why this
- > is wonderful"
-
- Not really. A POSIX standard that does not *mandate* case-sensitivity need
- not *forbid* it. And I have seen *no* arguments that "case sensitivity is
- what makes UNIX better than any other operating system."
-
- > Let's start by discarding the arguments which are bogus.
- > The most glaring of these has got to be the international
- > compatibility argument. The only advocates of this argument seem
- > to be pro case sensitivity Americans who have seized upon this as
- > an argument to shore up their position without really thinking
- > over the issue carefully.
-
- Well, it may seem that way, but it isn't. I admit to being a United States
- citizen, but I am not unreservedly pro-case-sensitivity. I see the merits
- to both sides of the argument, but I see more problems with
- case-insensitivity than with case-sensitivity.
-
- > Unix does not allow arbitrary strings in filenames. Any
- > number of "funny" characters must be within a quoted string. I
- > can't say
- > rm foo.bar;1
- > I have to say
- > rm "foo.bar;1"
- > Guess what. A number of foreign keyboards use those "funny"
- > characters to be non-English glyphs.
-
- As the moderator pointed out, the shell, not the operating system,
- interprets these funny characters. Applications need not get file names
- passed as arguments from the shell. The office automation system we
- developed at CCI had its own shell, which did no parsing of path names
- whatsoever; the only characters it forbade were the slash and the null
- character (because they are not allowed in UNIX filenames) and those
- characters its forms package didn't allow you to type in (because we never
- got around to changing it to do so). I frequently used file names
- containing blanks within this application, even though it made it
- inconvenient to manipulate those files using commands typed at the UNIX
- shell.
-
- > I have yet to hear of any organization in Japan using kanzi
- > or hirogana or katakana in filenames.
-
- I have a document in front of me from ASCII Corporation in Japan, describing
- changes made to 4.2BSD to support Kanji and Kana. It says:
-
- It is possible to create a file whose name contains Kana and/or
- Kanji characterss, since the file system and Kanji version of
- the shell support it. However, we don't recommend such filenames,
- becasue it is impossible to handle such files from ASCII terminals.
-
- The argument used against it would not apply if, for example, no terminals
- attached to the machine were ASCII terminals and the site didn't expect to
- export these files to machines with only ASCII terminals attached. The
- developers of it may be coming from a more "traditional" UNIX environment,
- where you have many ASCII terminals attached to the machine and where you
- frequently exchange files with other sites not running the same hardware and
- software that you are running. In an office environment, it may be possible
- to provide everyone with a Kanji/Kana terminal, and it may not be as
- important to worry about exchanging file with some random development
- machine in the United States.
-
- > There are good reasons for
- > this! One is that there isn't a single way of representing
- > written Japanese. In older terminals, the high order bit when
- > set indicated katakana (much as DEC VT220's use the high order
- > bit for their "international characters"). Modern Japanese
- > terminals use the JIS (Japanese Industrial Standard) system of
- > ESCAPE followed by two bytes to define a 14 bit character.
-
- The system they describe uses "Shift JIS" code for Kanji, and supports both
- terminals that use this code and the regular JIS code for Kanji; it does
- code conversion between the codes for JIS-Kanji terminals.
-
- > Some German keyboards use various 7-bit glyphs (I believe
- > "@" is umlaut-a) for their umlauts and ess-tset. Or, there's the
- > VT220 system. I just tried creating a file called Goethestrasse
- > (using umlaut-o for "oe" and ess-tset for "ss") on my local Unix
- > system using my VT220 clone. It made "GVthestra_e", the 7-bit
- > form.
-
- The latter sounds like ISO Latin Alphabet No. 1; "umlaut-O" has the hex code
- D6 and capital V has the code 56; 56 hex + 80 hex is D6 hex. (I believe DEC
- recommended the VT220 code set to ISO for standardization.)
-
- > Dare I mention that in German, only nouns (and the first
- > word in a sentence) are capitalized?
-
- The same is true of English; so what?
-
- > The point is that Unix does *not* support international
- > character sets in filenames. It supports 7-bit USASCII. So
- > let's leave that issue to rest.
-
- As the moderator pointed out, this is not the case. The kernel supports all
- characters except slash and the null character, except for the 4.[23]BSD
- kernel which (too helpfully) refuses to create files with characters in
- their name that have the eighth bit set. Certain UNIX utilities do not
- handle 8-bit characters; this is not, however, an intrinsic characteristic
- of the UNIX system. I would ask European and Asian customers what they
- wanted the UNIX system to do about character sets other than 7-bit USASCII
- before I casually dismissed the possibility of supporting them.
-
- > I haven't yet heard of any serious use of full 8-bit bytes
- > for filenames on any other operating system, which, if you are
- > serious about supporting international character sets, you must
- > do. There's this small problem of getting 8-bit (as opposed to
- > 7-bit) ASCII through various pieces of hardware and networks
- > which think that the high order bit is parity...
-
- Not all such pieces of hardware have this limitation. The paper from ASCII
- Corporation simply says "Kana and Kanji terminals must be set up to use 8
- bit no parity mode." If other terminals use a 7-bit encoding of an 8-bit
- data stream, the terminal driver can do code translation transparently to
- the rest of the system.
-
- The fact that most OSes haven't solved these problems, and don't provide for
- full 8-bit characters in file names, doesn't mean there is no demand for
- full 8-bit characters in file names. The users in non-English-speaking
- countries may just have learned to get around this problem, and either use
- English-language file names or approximate their native spelling in file
- names.
-
- Volume-Number: Volume 7, Number 76
-
-