Searching with Regular Expressions  
 
 

Studio supports searching with regular expressions (or regexes) to match patterns in character strings in the Extended Find and Replace commands. Regular expressions allow you to specify all the possible variants in a search and to precisely control replacements. Ordinary characters are combined with special characters to define the pattern for the search. The regex parser evaluates the selected files and returns each matching pattern.

In the Find command, the matching pattern is added to the find list. In the Replace operation, it triggers insertion of the replacement string. When replacing a string, it is just as important to ensure what is not found as what is. Simple regular expressions can be concatenated into complex search criteria.

 
 
  Note  
 

The rules listed in this section are for creating regular expressions in Studio. The rules used by other regex parsers may differ.

 
 
  Special characters  
 
 

Because special characters are the operators in regular expressions, in order to represent a special character as an ordinary one, you need to precede it with a double backslash (\\)

 
 
  Single-character regular expressions  
 
 

This section describes the rules for creating regular expressions. You can use regular expressions in the Search > Extended Find and Replace commands to match complex string patterns.

The following rules govern one-character regexes that match a single character:

  • Special characters are: + * ? . [ ^ $ ( ) { | \
  • Any character that is not a special character matches itself.
  • A backslash (\) followed by any special character matches the literal character itself, that is, the backslash escapes the special character.
  • A period (.) matches any character, for example, ".umpty" matches either "Humpty" or "Dumpty."
  • A set of characters enclosed in brackets ([]) is a one-character RE that matches any of the characters in that set. For example, "[akm]" matches an "a", "k", or "m".
  • Any regular expression can be followed by one of the following suffixes: {m,n} forces a match of m through n (inclusive) occurrences of the preceding regular expression. The suffix {m,} forces a match of at least m occurrences of the preceding regular expression. The syntax {,n} is not allowed.
  • A range of characters can be indicated with a dash. For example, "[a-z]" matches any lowercase letter. However, if the first character of the set is the caret (^), the regex matches any character except those in the set. It does not match the empty string. For example: [^akm] matches any character except "a", "k", or "m". The caret loses its special meaning if it is not the first character of the set.
  • All regular expressions can be made case insensitive by substituting individual characters with character sets, for example, [Nn][Ii][Cc][Kk].
 
 
  Character classes  
 
 

You can specify a character by using one of the POSIX character classes. You enclose the character class name inside two square brackets, as in this example:

REReplace("Allaire's Web Site","[[:space:]]","*","ALL")

This code replaces all the spaces with *, producing this string:

Allaire's*Web*Site

The following table shows the POSIX character classes that Studio supports.

Supported Character Classes 
Character Class
Matches
alpha
Matches any letter. Same as [A-Za-z].
upper
Matches any upper-case letter. Same as [A-Z].
lower
Matches any lower-case letter. Same as [a-z].
digit
Matches any digit. Same as [0-9].
alnum
Matches any alphanumeric character. Same as [A-Za-z0-9].
xdigit
Matches any hexadecimal digit. Same as [0-9A-Fa-f].
space
Matches a tab, new line, vertical tab, form feed, carriage return, or space.
print
Matches any printable character.
punct
Matches any punctuation character, that is, one of ! ` # S % & ` ( ) * + , - . / : ; < = > ? @ [ / ] ^ _ { | } ~
graph
Matches any of the characters defined as a printable character except those defined to be part of the space character class.
cntrl
Matches any character not part of the character classes [:upper:], [:lower:], [:alpha:], [:digit:], [:punct:], [:graph:], [:print:], or [:xdigit:].

 
 
  Multi-character regular expressions  
 
 

You can use the following rules to build a multi-character regular expressions:

  • Parentheses group parts of regular expressions together into grouped sub-expressions that can be treated as a single unit. For example, (ha)+ matches one or more instances of "ha".
  • A one-character regular expression or grouped sub-expressions followed by an asterisk (*) matches zero or more occurrences of the regular expression. For example, [a-z]* matches zero or more lower-case characters.
  • A one-character regular expression or grouped sub-expressions followed by a plus (+) matches one or more occurrences of the regular expression. For example, [a-z]+ matches one or more lower-case characters.
  • A one-character regular expression or grouped sub-expressions followed by a question mark (?) matches zero or one occurrences of the regular expression. For example, xy?z matches either "xyz" or "xz".
  • The concatenation of regular expressions creates a regular expression that matches the corresponding concatenation of strings. For example, [A-Z][a-z]* matches any capitalized word.
  • The OR character (|) allows a choice between two regular expressions. For example, jell(y|ies) matches either "jelly" or "jellies".
  • Braces ({}) are used to indicate a range of occurrences of a regular expression, in the form {m, n} where m is a positive integer equal to or greater than zero indicating the start of the range and n is equal to or greater than m, indicating the end of the range. For example, (ba){0,3} matches up to three pairs of the expression "ba".
 
 
  Backreferences  
 
 

Studio supports backreferencing, which allows you to match text in previously matched sets of parentheses. A slash followed by a digit n (\n) is used to refer to the nth parenthesized sub-expression.

One example of how backreferencing can be used is searching for doubled words -- for example, to find instances of `the the' or `is is' in text. The following example shows the syntax you use for backreferencing in regular expressions:

("There is is coffee in the the kitchen",
"([A-Za-z]+)[ ]+\1","*","ALL")

This code searches for words that are all letters ([A-Za-z]+) followed by one or more spaces [ ]+ followed by the first matched sub-expression in parentheses. The parser detects the two occurrences of is as well as the two occurrences of the and replaces them with an asterisk, resulting in the following text:

There * coffee in * kitchen
 
 
  Anchoring a regular expression to a string  
 
 

All or part of a regular expression can be anchored to either the beginning or end of the string being searched:

  • If a caret (^) is at the beginning of a (sub)expression, the matched string must be at the beginning of the string being searched.
  • If a dollar sign ($) is at the end of a (sub)expression, the matched string must be at the end of the string being searched.
 
 
  Expression examples  
 

The following examples show some regular expressions and describe what they match.

Regular Expression Examples 
Expression
Description
[\?&]value= A URL parameter value in a URL.
[A-Z]:(\\[A-Z0-9_]+)+ An uppercase DOS/Windows full path that (a) is not the root of a drive, and (b) has only letters, numbers, and underscores in its text.
[A-Za-z][A-Za-z0-9_]* A ColdFusion variable with no qualifier.
([A-Za-z][A-Za-z0-9_]*)(\.[A-Za-z][A-Za- z0-9_]*)? A ColdFusion variable with no more than one qualifier, for example, Form.VarName, but not Form.Image.VarName.
(\+|-)?[1-9][0-9]* An integer that does not begin with a zero and has an optional sign.
(\+|-)?[1-9][0-9]*(\.[0-9]*)? A real number.
(\+|-)?[1-9]\.[0-9]*E(\+|-)?[0-9]+ A real number in engineering notation.
a{2,4} Two to four occurrences of 'a': aa, aaa, aaaa.
(ba){3,} At least three 'ba' pairs: bababa, babababa, ...

 
 
  Resources  
 
 

An excellent reference on regular expressions is Mastering Regular Expressions by Jeffrey E.F. Friedl, published by O'Reilly & Associates, Inc.



 
 
BackUp LevelNext
 
 

allaire     AllaireDoc@allaire.com
    Copyright © 1998, Allaire Corporation. All rights reserved.