Usenet 1994 January

home *** CD-ROM | disk | FTP | other *** search

/ Usenet 1994 January / usenetsourcesnewsgroupsinfomagicjanuary1994.iso / sources / std_unix / mod.std.unix.v7 / text0015.txt < prev next >

Wrap

Internet Message Format | 1987-06-30 | 2.9 KB

Date: Fri, 3 Oct 86 12:26:22 PDT From: sun!gorodish!guy@utastro.UUCP (Guy Harris) > From: mark@cbosgd.att.com (Mark Horton) > Subject: Case sensitive file names > I think this is a mistake. UNIX is the only major operating system > that treats things like file names, logins, host names, and commands > as case sensitive. It's been a while since I used Multics; I think it was case-sensitive. Of course, I don't know whether it counts as "major" here or not; I don't know how many sites are around. Are you sure there are no others? > It's also reasonable to leave the case alone, but ignore case in > comparisons. This would probably be the best scheme (I think the Xerox Alto's operating system did this). Some people may want to use mixed case in file names for aesthetic reasons, for example. > There is also probably a good argument for keeping it case sensitive > (after all, there are probably 5 or 6 people out there who really need > both makefile and Makefile... This means UNIX probably can't change, at least not without a fair bit of pain. I know of at least one directory on a UNIX system that has both "makefile" and "Makefile" in it; this would cause some upset on a case-mapping UNIX system. However, there is another problem with case mapping. It's dependent on the language the text is in! Doing case mapping is all very well and good for English-speaking users; the algorithm for mapping characters between cases in English is straightforward. However, in German "ss" is a single special character in lower-case but "SS" in upper case. Even if you don't have anomalies like this, the current schemes proposed by AT&T for "international UNIX" use various ISO codes; this means that the character whose hex value is E6 is the "ae" diaresis in the ISO Latin Alphabet #1, and thus matches the character whose hex value is C6 (which is the "AE" diaresis); however, in the JIS C6226 Kanji set, it is probably the first byte of a two-byte sequence representing a Kanji sysmbol, and I don't think it gets case mapped at all. This means that the operating system would have to know what character set a particular character was in, so that it could map its case correctly; this would be best done with sequences embedded in the file name indicating shifts in the character set to which bytes belong. (These same sequences should be used in text files, character strings in programs, etc.. Other suggestions include a per-file character set designator, that would presumably apply to any files containing character strings, including directories; however, this means that *all* strings in that file must be in the same character set, which is not always a reasonable restriction.) It would then have to know how to do case mapping for all character sets supported by the system, and would have to be modified or have new information supplied to it if a new character set was to be supported. Volume-Number: Volume 7, Number 16