Back To Topic Send us your Feedback
Regular Expression Pattern Syntax
 

Literals

All characters are literals except: ".", "*", "?", "+", "(", ")", "{", "}", "[", "]", "^" and "$". These characters are literals when preceded by a "\". A literal is a character that matches itself.

Wildcard

The dot character "." matches any single character.

Repeats

A repeat is an expression that is repeated an arbitrary number of times. An expression followed by "*" can be repeated any number of times including zero. An expression followed by "+" can be repeated any number of times, but at least once. An expression followed by "?" may be repeated zero or one times only. When it is necessary to specify the minimum and maximum number of repeats explicitly, the bounds operator "{}" may be used, thus "a{2}" is the letter "a" repeated exactly twice, "a{2,4}" represents the letter "a" repeated between 2 and 4 times, and "a{2,}" represents the letter "a" repeated at least twice with no upper limit. Note that there must be no white-space inside the {}, and there is no upper limit on the values of the lower and upper bounds. All repeat expressions refer to the shortest possible previous sub-expression: a single character; a character set, or a sub-expression grouped with "()" for example.

Examples:

"ba*" will match all of "b", "ba", "baaa" etc.

"ba+" will match "ba" or "baaaa" for example but not "b".

"ba?" will match "b" or "ba".

"ba{2,4}" will match "baa", "baaa" and "baaaa".

Parenthesis

Parentheses are used to group items together into a sub-expression. For example the expression "(ab)*" would match all of the string "ababab".

Alternatives

Alternatives occur when the expression can match either one sub-expression or another, each alternative is separated by a "|". Each alternative is the largest possible previous sub-expression; this is the opposite behavior from repetition operators.

Examples:

"a(b|c)" could match "ab" or "ac".

"abc|def" could match "abc" or "def".

Sets

A set is a set of characters that can match any single character that is a member of the set. Sets are delimited by "[" and "]" and can contain literals, character ranges, character classes, collating elements and equivalence classes. Set declarations that start with "^" contain the compliment of the elements that follow.

Examples:

Character literals:

"[abc]" will match either of "a", "b", or "c".

"[^abc] will match any character other than "a", "b", or "c".

Character ranges:

"[a-z]" will match any character in the range "a" to "z" and in the range "A" to "Z".

Note that character ranges are highly locale dependent: they match any character that collates between the endpoints of the range. For example, [a-z] will match the ASCII characters a-z, and also 'A', 'B' etc.

Character classes are denoted using the syntax "[:classname:]" within a set declaration, for example "[[:space:]]" is the set of all whitespace characters. The available character classes are:

 

alnum

Any alpha numeric character.

 

 

alpha

Any alphabetical character a-z and A-Z. Other characters may also be included depending upon the locale.

 

 

blank

Any blank character, either a space or a tab.

 

 

cntrl

Any control character.

 

 

digit

Any digit 0-9.

 

 

graph

Any graphical character.

 

 

lower

Any lower case character a-z. Other characters may also be included depending upon the locale.

 

 

print

Any printable character.

 

 

punct

Any punctuation character.

 

 

space

Any whitespace character.

 

 

upper

Any upper case character A-Z. Other characters may also be included depending upon the locale.

 

 

xdigit

Any hexadecimal digit character, 0-9, a-f and A-F.

 

 

word

Any word character - all alphanumeric characters plus the underscore.

 

 

unicode

Any character whose code is greater than 255, this applies to the wide character traits classes only.

 

There are some shortcuts that can be used in place of the character classes:

\w in place of [:word:]

\s in place of [:space:]

\d in place of [:digit:]

\l in place of [:lower:]

\u in place of [:upper:]

To include a literal "-" in a set declaration you should make it the first character after the opening "[" or "[^", the endpoint of a range, or precede it with an escape character as in "[\-]". To include a literal "[" or "]" or "^" in a set then make them the endpoint of a range, or precede them with an escape character.

Line anchors

An anchor is something that matches the null string at the start or end of a line: "^" matches the null string at the start of a line, "$" matches the null string at the end of a line.

Escape operator

The escape operator may make the following character normal, for example "\*" represents a literal "*" rather than the repeat operator.

Single character escape sequences

The following escape sequences are aliases for single characters:

 

Escape sequence

Character code

Meaning

 

 

\a

0x07

Bell character.

 

 

\f

0x08

Form feed.

 

 

\n

0x0A

Newline character.

 

 

\r

0x0D

Carriage return.

 

 

\t

0x09

Tab character.

 

 

\v

0x0B

Vertical tab.

 

 

\e

0x1B

ASCII Escape character.

 

 

\0dd

0dd

An octal character code, where dd is one or more octal digits.

 

 

\xXX

0xXX

A hexadecimal character code, where XX is one or more hexadecimal digits.

 

 

\x{XX}

0xXX

A hexadecimal character code, where XX is one or more hexadecimal digits, optionally a unicode character.

 

 

\cZ

z-@

An ASCII escape sequence control-Z, where Z is any ASCII character greater than or equal to the character code for '@'.

 

 


The regular expression code in the BeeGrid control is based on the Regex++ library. Here follows its copyright notice.

Copyright (c) 1998-9 Dr John Maddock

Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose is hereby granted without fee, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation. Dr John Maddock makes no representations about the suitability of this software for any purpose. It is provided "as is" without express or implied warranty.