home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Archive Magazine 1996
/
ARCHIVE_96.iso
/
discs
/
shareware
/
share_46
/
fre
/
re
< prev
next >
Wrap
Text File
|
1992-04-23
|
3KB
|
93 lines
What is a regular expression ?
==============================
It's a fancy wildcarding system - i.e. you don't need exact matches. A
regular expression is really a mini-programming language with the
following "commands":
. any character
^ start of line
$ end of line
\` start of string (*)
\' end of string
\w word character [A-Za-z0-9]
\< start of word
\> end of word
\@ word boundary
[...] set of characters (*)
\s first character of a symbol [a-z_A-Z]
\y any symbol character [a-z_A-Z0-9]
\S a whole symbol
\W a whole word
\a, \b, \f, \n, \r, \t match the corresponding C escape character If you
need to specify a code by number use the C \0.. or \x.. escape
sequences. Spaces are significant. ANY OTHER CHARACTER MATCHES ITSELF
There are also modifiers:
| or, "re1|re2" match either RE1 or RE2
~ not, "~c" matches anything but the next CHARACTER
+ many, "re+" matches RE repeated one or more times
* repeat, "re+" matches RE repeated zero or more times
? optional, "re?" matches RE repeated zero or one times
Parenthesis () can be used to group REs
Finally, there are memories:
\{ start recording
\} end recording, and place in next memory (starts with memory 1)
\<n> \1, ... \9 will then match the corresponding memory
(*) strings:
strings are the same as lines in this program
(*) character sets:
character sets are used to match one of a group of characters. All
characters are taken literally except "-", "]", and "\". The "-"
indicates a range, unless it is the first or last character, or the
first character after a previous range. The "\" can be used to include
a "]" or a "-" in the set, or to introduce a C escape sequence, or (if
not the rhs of a range) one of the predefined character sets ("\s",
"\y", or "\w").
Example regular expressions
===========================
For Risc-Os users unaquainted with REs, note that the standard filename
wildcarding can be done if you substitute "." for "#", and ".*" for "*"!
The directory specification "$.doc##.monday*" would translate to the RE:
"\$\.doc..\.monday.*"
^^ ^ ^ ^^
| | \ ++--> "*" translates to ".*"
| | \
| \ +-----------> "#" translates to "."
| \
| +---------------> "." is a reserved character
\
+------------------> "$" is a reserved character
To find either of the words "has", "have", or "had":
"\<(has|have|had)\>"
^ ^
+---------------+---> These ensure don't match "hasty", "haddock",
or "chastity", etc...
Look for symbols ending with a digit
------------------------------------
"\y[0-9]\╗"
Find a symbol used in two places on a line
------------------------------------------
1. "\{\S\}.*\1"
Almost. But fails on "Zappa = Frank_Zappa".
2. "\{\S\}.*\½\1\╗"