Back To Topic | ![]() |
Regular Expression Pattern Syntax |
Literals
All characters are literals except: ".", "*", "?", "+", "(", ")", "{", "}", "[", "]", "^" and "$". These characters are literals when preceded by a "\". A literal is a character that matches itself.
Wildcard
The dot character "." matches any single character.
Repeats
A repeat is an expression that is repeated an arbitrary number of times. An expression followed by "*" can be repeated any number of times including zero. An expression followed by "+" can be repeated any number of times, but at least once. An expression followed by "?" may be repeated zero or one times only. When it is necessary to specify the minimum and maximum number of repeats explicitly, the bounds operator "{}" may be used, thus "a{2}" is the letter "a" repeated exactly twice, "a{2,4}" represents the letter "a" repeated between 2 and 4 times, and "a{2,}" represents the letter "a" repeated at least twice with no upper limit. Note that there must be no white-space inside the {}, and there is no upper limit on the values of the lower and upper bounds. All repeat expressions refer to the shortest possible previous sub-expression: a single character; a character set, or a sub-expression grouped with "()" for example.
Examples:
"ba*" will match all of "b", "ba", "baaa" etc.
"ba+" will match "ba" or "baaaa" for example but not "b".
"ba?" will match "b" or "ba".
"ba{2,4}" will match "baa", "baaa" and "baaaa".
Parenthesis
Parentheses are used to group items together into a sub-expression. For example the expression "(ab)*" would match all of the string "ababab".
Alternatives
Alternatives occur when the expression can match either one sub-expression or another, each alternative is separated by a "|". Each alternative is the largest possible previous sub-expression; this is the opposite behavior from repetition operators.
Examples:
"a(b|c)" could match "ab" or "ac".
"abc|def" could match "abc" or "def".
Sets
A set is a set of characters that can match any single character that is a member of the set. Sets are delimited by "[" and "]" and can contain literals, character ranges, character classes, collating elements and equivalence classes. Set declarations that start with "^" contain the compliment of the elements that follow.
Examples:
Character literals:
"[abc]" will match either of "a", "b", or "c".
"[^abc] will match any character other than "a", "b", or "c".
Character ranges:
"[a-z]" will match any character in the range "a" to "z" and in the range "A" to "Z".
Note that character ranges are highly locale dependent: they match any character that collates between the endpoints of the range. For example, [a-z] will match the ASCII characters a-z, and also 'A', 'B' etc.
Character classes are denoted using the syntax "[:classname:]" within a set declaration, for example "[[:space:]]" is the set of all whitespace characters. The available character classes are:
|
alnum |
Any alpha numeric character. |
|
|
alpha |
Any alphabetical character a-z and A-Z. Other characters may also be included depending upon the locale. |
|
|
blank |
Any blank character, either a space or a tab. |
|
|
cntrl |
Any control character. |
|
|
digit |
Any digit 0-9. |
|
|
graph |
Any graphical character. |
|
|
lower |
Any lower case character a-z. Other characters may also be included depending upon the locale. |
|
|
|
Any printable character. |
|
|
punct |
Any punctuation character. |
|
|
space |
Any whitespace character. |
|
|
upper |
Any upper case character A-Z. Other characters may also be included depending upon the locale. |
|
|
xdigit |
Any hexadecimal digit character, 0-9, a-f and A-F. |
|
|
word |
Any word character - all alphanumeric characters plus the underscore. |
|
|
unicode |
Any character whose code is greater than 255, this applies to the wide character traits classes only. |
|
There are some shortcuts that can be used in place of the character classes:
\w in place of [:word:]
\s in place of [:space:]
\d in place of [:digit:]
\l in place of [:lower:]
\u in place of [:upper:]
To include a literal "-" in a set declaration you should make it the first character after the opening "[" or "[^", the endpoint of a range, or precede it with an escape character as in "[\-]". To include a literal "[" or "]" or "^" in a set then make them the endpoint of a range, or precede them with an escape character.
Line anchors
An anchor is something that matches the null string at the start or end of a line: "^" matches the null string at the start of a line, "$" matches the null string at the end of a line.
Escape operator
The escape operator may make the following character normal, for example "\*" represents a literal "*" rather than the repeat operator.
Single character escape sequences
The following escape sequences are aliases for single characters:
|
Escape sequence |
Character code |
Meaning |
|
|
\a |
0x07 |
Bell character. |
|
|
\f |
0x08 |
Form feed. |
|
|
\n |
0x0A |
Newline character. |
|
|
\r |
0x0D |
Carriage return. |
|
|
\t |
0x09 |
Tab character. |
|
|
\v |
0x0B |
Vertical tab. |
|
|
\e |
0x1B |
ASCII Escape character. |
|
|
\0dd |
0dd |
An octal character code, where dd is one or more octal digits. |
|
|
\xXX |
0xXX |
A hexadecimal character code, where XX is one or more hexadecimal digits. |
|
|
\x{XX} |
0xXX |
A hexadecimal character code, where XX is one or more hexadecimal digits, optionally a unicode character. |
|
|
\cZ |
z-@ |
An ASCII escape sequence control-Z, where Z is any ASCII character greater than or equal to the character code for '@'. |
|
The regular expression code in the BeeGrid control is based on the Regex++ library. Here follows its copyright notice.
Copyright (c) 1998-9 Dr John Maddock
Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose is hereby granted without fee, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation. Dr John Maddock makes no representations about the suitability of this software for any purpose. It is provided "as is" without express or implied warranty.