Usenet 1994 January

home *** CD-ROM | disk | FTP | other *** search

/ Usenet 1994 January / usenetsourcesnewsgroupsinfomagicjanuary1994.iso / sources / std_unix / volume.26 / text0008.txt < prev next >

Wrap

Text File | 1992-02-21 | 2.6 KB | 55 lines

Submitted-by: rml@hpfcdc.fc.hp.com (Bob Lenk) In article <1991Nov21.235529.9196@uunet.uu.net> gwyn@smoke.brl.mil (Doug Gwyn) writes: > cc -Dmacrostufff -Iheaderdir -c -O foo.c bar.o mylib.a -lX > > The requirement that this invocation (when -I etc. aren't being used) > obtain a C implementation that conforms to the C standard could be left > as a separate specification, not necessarily required for 1003.2 proper. Then what use would the 1003.2 spec be? An application (script/makefile) using it couldn't depend on it compiling standard C, or K&R C, or 6th edition C, or perhaps even Cobol. The separate specification that binds 1003.2 to the C Standard would be required to write portable applications, and it would have to specify that existing practice be violated. > >More than that, regexp's as usually implemented were hopelessly > >ethnocentric; changing languages was impossible. > > No, to the contrary the existing regexp implementation was acultural; > you're referring to the idea that "[a-z]" for example ought to mean > "match any lowercase character in the current locale", but that is > NOT what it meant. It actually meant "match any byte having value > between the values I gave you around the dash-representation" (this > already was important to understand on machines that preferred > EBCDIC codesets, for example). Now lets look at reality. How are subranges in regular expressions really used? How many scripts have you written that really want to find all characters with encodings between those of 'a' and 'z'? How many scripts have you written that take advantage of the coincidence that "[a-z]" happens to match "any lowercase character" on an ASCII machine in an English-speaking country? Now expand "you" in the previous two sentences to all users of regular expressions. How many scripts using the existing definition work as intended except on an ASCII machine on English language data? Do you think regular expressions would have been developed with this definition on EBCDIC machines or in Denmark or Japan? Do you think anyone would have used them if they had been? IMHO subranges in regular expressions are only interesting, worth standardizing, or even worth implementing because of the coincidence that they can be used for concepts like "any lowercase character". The people who are happy with the traditional definition are happy because that coincidence applies with their language and codeset. Basing an international standard on this would be not only ethnocentric, but (as Doug helps to point out) codeset-centric as well. Bob Lenk rml@fc.hp.com {uunet,hplabs}!fc.hp.com!rml Volume-Number: Volume 26, Number 9