home *** CD-ROM | disk | FTP | other *** search
- From: cbosgd!cbosgd.ATT.COM!mark@ucbvax.berkeley.edu (Mark Horton)
- Date: Fri, 17 Oct 86 11:20:32 edt
- Organization: AT&T Medical Information Systems, Columbus
-
- Don Provan raises some interesting questions about foreign languages.
- In general, I think we know how to do a case insensitive comparison
- appropriately, by extending a function (I think it's called strcoll,
- but I don't have my X3J11 draft handy) defined in ANSI C; the function
- is like strcpy, but the destination buffer gets a translation of the
- string that will collate properly when a lexicographic comparison like
- strcmp is used. If we extend this function to also translate to one
- case (as appropriate) and allow each country to define its own function,
- it's technically possible to ignore case. Whether it's fast enough for
- the UNIX filesystem is unclear, although this problem is not restricted
- to UNIX.
-
- I think it would be interesting to hear what other, case-insensitive
- operating systems do about these issues. What do MS DOS, or VM/CMS,
- or VMS, or whatever, do with their case insensitive file names in
- Europe, or Japan, or whereever?
-
- If the answer is that file names are restricted to use the same character
- set as in the USA, and that extra letters are disallowed, then we need to
- know how well this is accepted by the users on other systems. Maybe it's
- good enough. Do users in other countries often create files whose names
- contain extra letters? If they try, does the shell get in the way if their
- letter happens to be "|", for example?
-
- If the answer is that other operating systems have forced other countries
- to put up with Americanisms, and that POSIX is an opportunity to break new
- ground by handling other languages properly, then by all means let's do it
- right. This might require 8 bit characters in file names, for example.
-
- Incidently, I've seen it claimed here that UNIX allows arbitrary byte
- streams in file names. Perhaps this is the intent, but in reality the
- UNIX filesystem is far from a transparent path. There are lots of
- restrictions, some of which are:
-
- The slash character is special.
- The null character is special.
- Sequences of more than 14 chars not containing a slash are
- either illegal or only significant to 14 chars or
- significant to 256 chars, depending on the version of UNIX.
- Characters with the 8th bit turned on are not allowed.
- Since many commands take names beginning with "-" as flags,
- file names beginning with "-" don't always work.
- Since the shell treats many of the punctuation characters
- specially, file names containing space, #, $, &, *, (, ),
- [, ], ;, ', ", \, |, <, >. and ? do not always work
- properly. Even if you quote them, the shell strips
- off the quotes, so that if multiple layers of shell
- are involved (for example, uux) it still fails.
-
- Because some of these problems only affect certain uses of the filesystem
- (whether or not you go through the shell, whether or not you're going
- through a command that takes arguments) it's not unusual for casual users
- to create a file and then have trouble using, renaming, or even removing it.
- I recall that removing a file whose 8th bit has been set is a frequent topic
- on net.unix.
-
- If the filesystem were really transparent, the designers of /proc would
- not have had to encode process ID's in ASCII digits, they could have
- directly used the binary representation.
-
- It's for these reasons that I feel that a conservative UNIX user should
- restrict themselves to certain "reasonable" filename conventions; basically
- using only lower case letters, digits, and a few save punctuation characters
- such as . and - in their filenames. Just because it's possible to put a
- space in a file name doesn't make it a good idea.
-
- Mark
-
- Volume-Number: Volume 7, Number 67
-
-