Regular Expression Search Operators

image\regex2_shg.gif

Search Operators Index

Basic Operators
  * - Zero or More
  + - One or More
  ? - Exactly One
  | - Or Operator
  ! - Not Operator
  ^ - Start of Line
  $ - End of Line
  ^^ - Start of File
  $$ - End of File

Sub-Expression Operators
  [] - Range Operator
  () - Sub-Expression
  +n - Column Operator

Additional Issues
   Notes on Search Expressions
   Literal Characters
   Operations in Word Documents & binary files

Related Topics
   Regular Expression Replacement Operators
   Special Replacement Operators
   Regular Expression Examples
   Regular Expression Counter Operators
   Regular Expressions Overview

 

Notes on Search Expressions:

 

Regular Expression Search - Match Operators

*

Zero or More Operator: Matches zero or more expressions enclosed in () or []. * may be used by itself, although it is intended to be used around strings. If the * operator is entered alone it will match all characters from the start of the line to the end of the line. You can match characters between two or more strings up to the maximum regular expression size by specifying a range after the * operator. Entering several expressions in a row containing * should be done carefully to avoid overlapping matches which may produce unpredictable results.

Note: By design, * does not match characters under 'space' - ASCII 32 or 20 hex. If you need to search for a low order characters use the form *[] or *[\0x-00- ]

 

*(is)
*[is]
Windows *[0-9]
Windows*[]95

Windows*[\0- ]

matches
matches
matches
matches

matches

is, Mississippi
Some, Expression, single
Windows 95, Windows 98
up to 32767 characters (across several lines)
   between Windows and 95
same as the above (older syntax)

 

Note: When * is combined with a numeric range and the %n>> or %n>starting value> replacement operators, the search expression above, Windows *[0-9], would be part of a Regular Expression Counter Operation.

 

Note: By design, * alone does not match characters under 'space' - ASCII 32 or 20 hex. If you need to search for all possible characters using *, use the form *[] or *[\0x-00- ].

 

+

One Or More Operator: Matches one or more of the occurrences of the expression. + is intended to be combined with () and care should be taken when using + by itself. For example,

 

+(is)
w+e

matches
matches

is, Mississippi
wide, white, write but not we

 

Note: By design, + alone does not match characters under 'space' - ASCII 32 or 20 hex. If you need to search for all possible characters using +, including low order characters, use the form +[] or +[\0x-00- ].

 

?

One Occurrence Operator: Matches exactly any character either before or after a string. ? also matches any character between two strings. When combined with (), ? matches exactly one expression enclosed in (). Using the ? operator by itself will match every character in a file one at a time and therefore probably should be avoided.

 

?(is)
Win?95

matches
matches

is
Win 95
, Win-95, Win/95

 

Note: By design, ? alone does not match characters under 'space' - ASCII 32 or 20 hex. If you need to search all possible single characters using ?, including low order characters, use the form ?[] or ?[\0x-00- ].

 

|

Or Operator: Matches the simple expression either before or after the | (pipe) symbol. This should be used in conjunction with (). Or expressions should not contain other operators such as *+^$?. You may, however, make use of other operators outside the (). For example,

 

(01/|02/)+[0-9](/95|/98)
w*[a-z ](98|NT)\?

matches
matches

01/15/98 & 02/12/98
Windows 98?, Win NT?

 

!

Not Operator: A match will be made when both a 'positive' hit component and a !() or ![] component of the expression are found. The complete expression requires both components. The first may be as simple as a single regular expression operator such as * or ?. You should provide a wild card operator of some type prior to the ! component. The ! component should be enclosed in () or []. Be sure to nest ( ) when using |, e.g., ?!(a|b) won't work - Use ?!((a|b)) instead. Multiple!() component can be used to create an 'or', e.g, ?!(a)!(b)!(c). You can also use other regular expressions inside ( ). Additional 'postive hit' strings &/or regular expressions to find may be specified after !() or ![]. Note, however, that regular expressions following the !() or ![] will not be available to the %n operators. See Not Operator Notes for more information.

 

?at!((b|c)at)
*file!(beg*file)
*98!(Windows 98)
*98!(+[a-z ]98)
a?b?c!(aub?c)!(a?bvc)

^*!(^t)!(</p>$)
#include*!()"C

matches
matches
matches
matches
matches

matches
matches

mat & sat but not 'b'at or 'c'at
a file & this file but not 'beginning of file'
98 in 1998 but not in 'Windows 98'
98 in 1998 but not in 'Windows 98'
In a search of 'aXbYc', find where X not 'u'
   and Y not 'v'
Lines that don't begin w/ t or end with </p>
"ChildFrm.h" in "#include "ChildFrm.h"" but finds
   nothing in "#include "Edit.h"". This example
   uses !() as an AND operator of sorts.

 

^

Beginning Of Line Operator: Matches an expression at the beginning of a line. ^ should be the first character in your search term. ^ is best thought of as an 'anchor' - it anchors the entire expression to the start of a line. ^ can be combined with other wildcard and operators, with the following qualifications:

- Only one ^ can be present in an expression. If you need to consider two 'beginning of line' terms, use line boundary characters (\r\n) as literals as in the example below.

- A search such as, \r\nFind this, is the same thing as ^Find this (if your files are PC format).

- ^ can be used in 'not' expressions but do not use ^ inside () expressions. Use *(\r)\n instead.

- Do not use ^ and $ in a single expression. If you need to anchor a search to the start and end of a line, use literal line boundary characters at the end of your term. For example, if your files are PC format, use something like ^find this as the only thing on a line\r\n. If you are making a replace, include \r\n in your replace string so you don't strip out the line boundary characters.

- The ^, $, ^^, and $$ operators are counted for the purposes of an %n operator in a replacement expression. For example, in the search expression ^+[ ][a-zA-Z], the corresponding %n terms are: %1 = ^, %2= +[ ], %3 = [a-zA-Z].

- A Trick/Tip: During replacements, Search and Replace assumes ^ in the replacement term so it is often not necessary to reference specifically reference ^ in your replacement string. For example, an operation to remove the first character from each line could use:
S:^?* R:%1%3
S:^?* R:%3

Some other examples of ^ are:

 

^the

^(the|[a<])


^*( )BEnd\r\n*( )Exit Function

^the*end.\r\n

matches

matches


matches

matches

the, The, THE, tHE at the beginning of a line
   and if case sensitive is off.
the in The opening paragraph... OR
A in At 5 pm ... OR
< in <HR>
BEnd <immediately followed on next line by>
   Exit Function
And entire line that begins with The and ends with end.

 

^*( )BEnd\r\n*( )Exit Function

matches

<space(s)>BEnd <immediately followed on next line by> <space(s)>Exit Function

 

^the*end.\r\n

matches

And entire line that begins with The and ends with end.

 

$

End Of Line Operator: The $ operator is similar to the ^ operator but anchors your search to the end of a line. $ can be use with other wildcard and subexpression operators with the following qualifications:

- Only one $ can be present in an expression. If you need to anchor a search to two line ends, use line boundary characters (\r\n) as literals. See below example.

- These two search terms are the same: FindThis$ FindThis\r\n (PC format files)

- $ can be used in 'not' expressions but do not use $ inside () expressions. Use *(\r)\n instead.

- Do not use ^ and $ in a single expression. If you need to anchor a search to the start and end of a line, use literal line boundary characters at the end of your term. For example, if your files are PC format, use something like ^find this as the only thing on a line\r\n. If you are making a replace, include \r\n in your replace string so you don't strip out the line boundary characters.

- Note: The ^, $, ^^, and $$ operators are counted for the purposes of an %n operator in a replacement expression. For example, in the search expression l+[ls]$, corresponding %n terms are: %1 = +[ls], %2= $.

Some examples of $ are:

 

end$
*L1End\r\n*L2End$

(the end|of a line.)$

matches
matches

matches

end only if it is at the end of a line
two sequencial lines, the first ending with L1End
and the second ending with L2End
the end in "This is the end"
of a line. in "the end of a line. "

 

^^

Beginning Of File Operator: Matches an expression found at the beginning of a file. Usage is similar to ^. Do not use ^^ inside ().

Note: The ^, $, ^^, and $$ operators are counted for the purposes of an %n operator in a replacement expression. For example, in the search expression ^^?omething, the corresponding %n terms are: %1 = ^^, %2= ?.

 

^^First

^^+50[]

matches

matches

First in "First line of the file" if that string is on the first
   line of the file.
The first 50 characters in the file.

 

$$

End Of File Operator: Matches an expression found the end of the file. Usage is similar to $. Do not use $$ inside ()

Note: The ^, $, ^^, and $$ operators are counted for the purposes of an %n operator in a replacement expression. For example, in the search expression in the below, *$$, the corresponding %n terms are: %1 = *, %2= $$.

 

*$$

matches

The last line in the file

 

Regular Expression Search - Sub-Expression Operators

[ ]

Range Operator: This may be a list of single characters such as [gdo], one or more ranges of characters such as [d-o0-2], or a more complex expression using other match or sub-expression operators such as do[g|uble]. Ranges using an "a-z" type of notation and are parsed in the order of the table of characters in the Binary Mode - Binary Codes list. If you need to include the - character as a specific character, make it a literal by specifying \-. Use the ?, *, or + operators to modify the range to be matched by the [] sub-expression operator. When nothing is specified inside the brackets, [] matches all characters and is equivalent to ?[]. Be careful if you specify [] as the only string to search for -- it will match all characters in the file, one at a time. The term *[] spans across one or more lines up to the number of characters specified by Options-Search: Maximum Regular Expression Size. *[] is very useful for 'finding anything' between two other component of your search term. If ?, *, or + are not specified, expressions that use a range in [ ] match single characters. This is the same as specifying ?
Some examples are:

 

t[]e
H*[]d
<title>*[]</title>

*[0-9]
+[niewW]
[a-z]

matches
matches
matches

matches
matches
matches

The, Toe
Hello (cr -lf) World across two lines
All characters in the html title tag, even if they span
across multiple lines
234907, 5795 (or an empty string)
one or more strings such as Win, new, win
any lower case string if case sensitive is on
   and any words if not case sensitive

 

Note: When [] is combined with a numeric range, the * operator, and the %n>> or %n>starting value> replacement operators, a search expression such as Windows *[0-9] would be part of a Regular Expression Counter Operation.

 

( )

Subexpression Operator: Parentheses are used to denote one or more sub-expressions. This is usually combined with the OR operator - |. For example,

 

Win( 95|dows 95)

matches

Windows 95, Win 95

 

+n

Column Specifier: This is used to denotes the number of columns to match either before or after an expression. The Column Specifier may be used with a simple search term such as the expression +4The or in combination with other the [] or () sub-expression operators. A range of columns to match may also be specified, such as The+4-10. Note that you should combine the + operator with [] or () if you want to be clear about literal strings that serve as an anchor to the expression. Some examples:

 

w+2[a-z]
+4[]w

[ ]+5-15[0-9.]

matches
matches
matches
matches

Wor in Hello World
llo W
in Hello World

100.01, 123.9, & 543.21 in
   Data1 100.01 Somethin'
   Dat2 123.9 Nuthin'
   Dataa3 543.21 and junk

 

Regular Expression Search - Special Literal Characters

- + * ? ( ) [ ] \ | $ ^ !

If you wish to search for any of these characters, they must be preceded by the \ character to be interpreted as a literal in a search.