home *** CD-ROM | disk | FTP | other *** search
- Submitted-by: rml@hpfcdc.fc.hp.com (Bob Lenk)
-
- In article <1991Nov21.235529.9196@uunet.uu.net> gwyn@smoke.brl.mil (Doug Gwyn) writes:
-
- > cc -Dmacrostufff -Iheaderdir -c -O foo.c bar.o mylib.a -lX
- >
- > The requirement that this invocation (when -I etc. aren't being used)
- > obtain a C implementation that conforms to the C standard could be left
- > as a separate specification, not necessarily required for 1003.2 proper.
-
- Then what use would the 1003.2 spec be? An application (script/makefile)
- using it couldn't depend on it compiling standard C, or K&R C, or 6th
- edition C, or perhaps even Cobol. The separate specification that binds
- 1003.2 to the C Standard would be required to write portable applications,
- and it would have to specify that existing practice be violated.
-
- > >More than that, regexp's as usually implemented were hopelessly
- > >ethnocentric; changing languages was impossible.
- >
- > No, to the contrary the existing regexp implementation was acultural;
- > you're referring to the idea that "[a-z]" for example ought to mean
- > "match any lowercase character in the current locale", but that is
- > NOT what it meant. It actually meant "match any byte having value
- > between the values I gave you around the dash-representation" (this
- > already was important to understand on machines that preferred
- > EBCDIC codesets, for example).
-
- Now lets look at reality. How are subranges in regular expressions
- really used? How many scripts have you written that really want to find
- all characters with encodings between those of 'a' and 'z'? How many
- scripts have you written that take advantage of the coincidence that
- "[a-z]" happens to match "any lowercase character" on an ASCII machine
- in an English-speaking country? Now expand "you" in the previous two
- sentences to all users of regular expressions. How many scripts using
- the existing definition work as intended except on an ASCII machine on
- English language data? Do you think regular expressions would have been
- developed with this definition on EBCDIC machines or in Denmark or
- Japan? Do you think anyone would have used them if they had been?
-
- IMHO subranges in regular expressions are only interesting, worth
- standardizing, or even worth implementing because of the coincidence
- that they can be used for concepts like "any lowercase character". The
- people who are happy with the traditional definition are happy because
- that coincidence applies with their language and codeset. Basing an
- international standard on this would be not only ethnocentric, but (as
- Doug helps to point out) codeset-centric as well.
-
- Bob Lenk
- rml@fc.hp.com
- {uunet,hplabs}!fc.hp.com!rml
-
-
- Volume-Number: Volume 26, Number 9
-
-