home *** CD-ROM | disk | FTP | other *** search
- What is a regular expression ?
- ==============================
-
- It's a fancy wildcarding system - i.e. you don't need exact matches. A
- regular expression is really a mini-programming language with the
- following "commands":
- . any character
- ^ start of line
- $ end of line
- \` start of string (*)
- \' end of string
- \w word character [A-Za-z0-9]
- \< start of word
- \> end of word
- \@ word boundary
- [...] set of characters (*)
-
- \s first character of a symbol [a-z_A-Z]
- \y any symbol character [a-z_A-Z0-9]
- \S a whole symbol
- \W a whole word
-
- \a, \b, \f, \n, \r, \t match the corresponding C escape character If you
- need to specify a code by number use the C \0.. or \x.. escape
- sequences. Spaces are significant. ANY OTHER CHARACTER MATCHES ITSELF
-
- There are also modifiers:
-
- | or, "re1|re2" match either RE1 or RE2
- ~ not, "~c" matches anything but the next CHARACTER
- + many, "re+" matches RE repeated one or more times
- * repeat, "re+" matches RE repeated zero or more times
- ? optional, "re?" matches RE repeated zero or one times
-
- Parenthesis () can be used to group REs
-
- Finally, there are memories:
-
- \{ start recording
- \} end recording, and place in next memory (starts with memory 1)
- \<n> \1, ... \9 will then match the corresponding memory
-
- (*) strings:
- strings are the same as lines in this program
-
- (*) character sets:
- character sets are used to match one of a group of characters. All
- characters are taken literally except "-", "]", and "\". The "-"
- indicates a range, unless it is the first or last character, or the
- first character after a previous range. The "\" can be used to include
- a "]" or a "-" in the set, or to introduce a C escape sequence, or (if
- not the rhs of a range) one of the predefined character sets ("\s",
- "\y", or "\w").
-
-
-
- Example regular expressions
- ===========================
-
- For Risc-Os users unaquainted with REs, note that the standard filename
- wildcarding can be done if you substitute "." for "#", and ".*" for "*"!
-
- The directory specification "$.doc##.monday*" would translate to the RE:
- "\$\.doc..\.monday.*"
- ^^ ^ ^ ^^
- | | \ ++--> "*" translates to ".*"
- | | \
- | \ +-----------> "#" translates to "."
- | \
- | +---------------> "." is a reserved character
- \
- +------------------> "$" is a reserved character
-
- To find either of the words "has", "have", or "had":
- "\<(has|have|had)\>"
- ^ ^
- +---------------+---> These ensure don't match "hasty", "haddock",
- or "chastity", etc...
-
- Look for symbols ending with a digit
- ------------------------------------
-
- "\y[0-9]\»"
-
- Find a symbol used in two places on a line
- ------------------------------------------
-
- 1. "\{\S\}.*\1"
-
- Almost. But fails on "Zappa = Frank_Zappa".
-
- 2. "\{\S\}.*\«\1\»"
-