Syntax for Regular Expressions: Previous Syntax


The RX methods of JS and JSComponent (for example, buttonRX() and dialogRX()) use regular expressions. Regular expressions use a syntax in which a few characters are special constructs and the rest are "ordinary." This section describes this syntax and how you can perform complex matches to complex criteria.

Syntax for regular expressions

Term Definition
Ordinary characters A simple regular expression which matches that character and nothing else--for example, a simple alphabetical character such as f matches only itself and does not match any other characters or strings.
Special characters {, }, ~, *, +, ?, ,, -, [, ], and \. \ is defined only as an escape character, to allow the use of special characters as literals.

Concatenating Regular Expressions

When you concatenate two regular expressions--for example, A and B--the result is a regular expression that matches a string if A matches some amount of the beginning of that string and B matches the rest of the string.

In a simple example, you can concatenate the regular expressions a and t to get the regular expression at. This expression only matches the string at.

To do something more complex, you need to use special characters, as this table shows:

Special character use for string concatenation

Character Use
[ ... ] `[` begins a "character set", which is terminated by a `]`. In the simplest case, the characters between the two form the set. Thus, `[ad]' matches either one 'a' or one 'd', and `[ad]*' matches any string composed of just 'a's and 'd's (including the empty string). The expression 'c[ad]*r' matches `cr', `car', `cdr', `caddaar', etc. You can include character ranges in a character set by writing two characters with a '-' between them. Thus, '[a-z]' matches any lower-case letter. Ranges may be intermixed freely with individual characters, as in '[a-z$%.]', which matches any lower- case letter or '$', '%', or period.
~[ ... ] '~[' begins a "complement character set", which matches any character except the ones specified. Thus, '~[a-z0-9A-Z]' matches any character except letters and digits.
{ ... } A grouping construct that serves two purposes:
  • To enclose a set of ',' alternatives. Thus, '{foo,bar}x' matches either 'foox' or 'barx'.
  • To enclose a complicated expression for the postfix '*', '?', or '+' to operate on. Thus, 'ba{na}*' matches 'bananana', etc., with any (zero or more) number of 'na' strings.
  • * '*' means any number (zero or more) of something. It may be used in three ways:
  • Standalone, to mean any number of characters. For example, 'ab*' means any string starting with 'ab' including 'ab' itself.
  • Immediately following the ']' of a character set (regular or complemented). Thus, 't~[e]*' matches any string beginning with 't' and including no 'e's (such as 't', 'this', or 'that').
  • Immediately following the '}' of a grouping. For example, '{aa,cc}*' would match '' 'aa' 'cc' 'aacc' 'ccaa' 'aaccaa', etc.
  • + '+' is just like '*', except that it requires one or more of something. It has the same three basic uses:
  • Standalone, to mean any number of characters, except the empty set. For example, '+ab+' means any string containing 'ab' except there must be at least one character before and after: 'cabs' or 'Alabama', but not 'cab' or 'abe'.
  • Immediately following the ']' of a character set (regular or complemented). Thus, '[a-z]+' matches any string containing only lower case letters.
  • Immediately following the '}' of a grouping. For example, '{aa,cc}+' would match 'aa' 'cc' 'aacc' 'ccaa' 'aaccaa', etc., but not the empty string.
  • ? `?' is like '+' and '*', except that it normally requires zero or one of something. It has the same three basic uses:
  • Standalone, to mean any single character. For example, 'x?y' will match 'xay' or 'xvy' but not 'xy' or 'xaay'.
  • Immediately following the ']' of a character set (regular or complemented). Thus, '[a-z]?' matches either the empty string or any single lower case letter.
  • Immediately following the '}' of a grouping. For example, '{aa,cc}?' would match only 'aa','cc', or the empty string.
  • - '-' is only allowed unescaped inside a character set.
    , ',' is only allowed unescaped inside a grouping.
    ~ '~' is used to mean "not". It has two uses:
  • Preceding '[' as the start of a complement character set.
  • Preceding any other character as a match for any character other than that. For example, 'b~at' matches any three letter string starting with 'b' and ending in 't' except 'bat'. This includes 'b t' 'bit' 'bot' 'b%t'.
  • These regular expressions use '\' as an escape. Java also uses '\' as an escape and this can be somewhat confusing. To put a '\' into a Java String, the literal will appear as '\\'. For example, in the Java code:

    rx1 actually contains: '[\\\+\-]' which will match a '\','+' or '-'.

    rx2 actually contains: '[]+' and will match any whitespace.

    More Examples

    '[\+\-/\*]' matches any of the four standard operators '+', '*', '/', '-'. Note that the three special characters had to be prefaced with the escape character for this expression.

    '[A-Za-z][a-z]*' is a minimum for a well-formed English word (other than acronyms) by requiring a letter followed by zero or more lower case letters.

    '{OK,Done,Quit,Cancel,Continue}' matches any of five frequently used button labels.

    '[A-Za-z_$][A-Za-z_$0-9]*' matches any legal Java identifier (in normal ASCII--Java also allows UniCode characters).




    Send feedback to JavaStar-feedback@suntest.com
    Copyright © 1998 Sun Microsystems, Inc. 901 San Antonio Road, Palo Alto, CA 94303. All rights reserved.