home *** CD-ROM | disk | FTP | other *** search
- From: Dominic Dunlop <domo@tsa.co.uk>
-
- Note Followup-To: above. Post elsewhere iff you think appropriate.
-
- In article <1990Jun8.174056.15313@icc.com> wdm@icc.com (Bill Mulert) writes:
- : ... I would
- : like to have a tool, call it regex, that would allow me to say:
- :
- : regex ' "^[^=]*=\(.*\)\" '
- : and have regex say, in plain language, what the expression means.
-
- In article <8353@jpl-devvax.JPL.NASA.GOV> lwall@jpl-devvax.JPL.NASA.GOV
- (Larry Wall) writes:
- >It's not likely to be too practical, for a couple of reasons.
- >
- >First, there a number of different standards out there. For instance,
- >sed and expr use \( ... \) to indicate grouping, while egrep and perl
- >use ( ... ) for grouping, and \( and \) to indicate real parens...
-
- We interrupt this posting for a word from our Sponsor Executive
- Committee. But seriously, the 1003.2 working group for the Shell and
- Tools has at least documented the sh*t out of both ``Basic Regular
- Expressions'' (as in ed) and Extended Regular Expressions (egrep) (and
- perl, not that the thought of getting their claws (clauses?) into perl
- seems yet to have occired to standards people). The result of 1003.2
- should be no obvious functional change in your favourite RE-using
- utility, but rather the clearing up of what should happen in any number
- of limiting cases around their edges. (Actually, there are wide
- ranging and useful functional extensions, which add yet more
- hieroglyphic syntax: [=, =], [: and :] become special. These
- character pairs were chosen so as to minimise the danger of breaking
- existing REs.) The availability of a rigorous definition of REs and
- EREs should make easier the work of anybody who wants to write a RE to
- English translator. (1003.2 could be considerably mor rigorous if it
- used formal techniques, but let's leave that for another year or few.)
-
- > On top of that, when are ?, +, |, { and } metacharacters? They
- >are in some programs, and aren't in others. Are you going to have a
- >switch?
- >
- > regex -sed ' "^[^=]*=\(.*\)\" ' # In 1003.2
- > regex -expr ' "^[^=]*=\(.*\)\" ' # In 1003.2
- > regex -egrep ' "^[^=]*=\(.*\)\" ' # In 1003.2
- > regex -ed ' "^[^=]*=\(.*\)\" ' # In 1003.2
- > regex -perl ' "^[^=]*=\(.*\)\" ' # Not 1003.2 territory
- > regex -emacs ' "^[^=]*=\(.*\)\" ' # Not 1003.2 territory
- > regex -vi ' "^[^=]*=\(.*\)\" ' # In 1003.2 User Portability
- # Extension
-
- Probably, even with 1003.2: the precise details of RE syntax vary between
- utilities, as those who worked on the standard usually opted in favour of
- documenting existing practice, even when the temptation to fix things was
- strong. Of course, to conform to 1003.2's command line syntax rules, usage
- would have to be
-
- regex -t sed; regex -t expr # or similar...
- >
- >Second, your big problem is not so much the regular expressions themselves
- >as it is all the quoting you have to put around them because of the paucity of
- >quoting mechanisms. Take your first example:
- >
- > echo "`expr \"$1\" : \"^[^=]*=\(.*\)\"`"
- >
- And so on, at rightly frustrated length.
-
- One of several features which the 1003.2 definition of the shell lifts from
- the korn shell is the construct
-
- $(command)
-
- as a preferred alternative to the now-deprecated
-
- `command`
-
- (It's stuff like this which has vendors jumping up and down, complaining
- that they'll actually have to do some work before they can conform to the
- standard.) The new construct does cut down on the number and depth of
- backslashes needed in... er... more interesting shell commands. Although
- my fingers still fly for the backslashes, even though I normally use a
- korn shell...
-
- >Unix is not a simple language.
-
- I guess it's that way because it was designed to be easy for simple
- computers to understand.
- --
- Dominic Dunlop
-
-
- Volume-Number: Volume 20, Number 31
-
-