Syntax for Regular Expressions: Previous Syntax
The RX methods of JS and JSComponent (for example, buttonRX()
and dialogRX()
) use regular expressions. Regular expressions use a syntax in which a few characters are special constructs and the rest are "ordinary." This section describes this syntax and how you can perform complex matches to complex criteria.
Syntax for regular expressions
Term
|
Definition
|
Ordinary characters
|
A simple regular expression which matches that character and nothing else--for example, a simple alphabetical character such as f matches only itself and does not match any other characters or strings.
|
Special characters
|
{ , } , ~ , * , + , ? , , , - , [ , ] , and \ .
\ is defined only as an escape character, to allow the use of special characters as literals.
|
Concatenating Regular Expressions
When you concatenate two regular expressions--for example, A and B--the result is a regular expression that matches a string if A matches some amount of the beginning of that string and B matches the rest of the string.
In a simple example, you can concatenate the regular expressions a
and t
to get the regular expression at
. This expression only matches the string at
.
To do something more complex, you need to use special characters, as this table shows:
Special character use for string concatenation
Character
|
Use
|
[ ... ]
|
`[ ` begins a "character set", which is terminated by a `] `. In the simplest case, the characters between the two form the set. Thus, `[ad] ' matches either one 'a ' or one 'd ', and `[ad]* ' matches any string composed of just 'a 's and 'd 's (including the empty string). The expression 'c[ad]*r ' matches `cr ', `car ', `cdr ', `caddaar ', etc.
You can include character ranges in a character set by writing two characters with a '- ' between them. Thus, '[a-z] ' matches any lower-case letter. Ranges may be intermixed freely with individual characters, as in '[a-z$%.] ', which matches any lower- case letter or '$ ', '% ', or period.
|
~[ ... ]
|
'~[ ' begins a "complement character set", which matches any character except the ones specified. Thus, '~[a-z0-9A-Z] ' matches any character except letters and digits.
|
{ ... }
|
A grouping construct that serves two purposes:
To enclose a set of ',' alternatives. Thus, '{foo,bar}x ' matches either 'foox ' or 'barx '.
To enclose a complicated expression for the postfix '* ', '? ', or '+ ' to operate on. Thus, 'ba{na}* ' matches 'bananana ', etc., with any (zero or more) number of 'na ' strings.
|
*
|
'* ' means any number (zero or more) of something. It may be used in three ways:
Standalone, to mean any number of characters. For example, 'ab* ' means any string starting with 'ab ' including 'ab ' itself.
Immediately following the '] ' of a character set (regular or complemented). Thus, 't~[e] *' matches any string beginning with 't ' and including no 'e 's (such as 't ', 'this ', or 'that ').
Immediately following the '}' of a grouping. For example, '{aa,cc}* ' would match '' 'aa ' 'cc ' 'aacc ' 'ccaa ' 'aaccaa ', etc.
|
+
|
'+ ' is just like '* ', except that it requires one or more of something. It has the same three basic uses:
Standalone, to mean any number of characters, except the empty set. For example, '+ab+ ' means any string containing 'ab ' except there must be at least one character before and after: 'cabs ' or 'Alabama ', but not 'cab ' or 'abe '.
Immediately following the '] ' of a character set (regular or complemented). Thus, '[a-z]+ ' matches any string containing only lower case letters.
Immediately following the '} ' of a grouping. For example, '{aa,cc}+ ' would match 'aa ' 'cc ' 'aacc ' 'ccaa ' 'aaccaa ', etc., but not the empty string.
|
?
|
`? ' is like '+ ' and '* ', except that it normally requires zero or one of something. It has the same three basic uses:
Standalone, to mean any single character. For example, 'x?y ' will match 'xay ' or 'xvy ' but not 'xy ' or 'xaay '.
Immediately following the '] ' of a character set (regular or complemented). Thus, '[a-z]? ' matches either the empty string or any single lower case letter.
Immediately following the '}' of a grouping. For example, '{aa,cc}? ' would match only 'aa ','cc ', or the empty string.
|
-
|
'- ' is only allowed unescaped inside a character set.
|
,
|
', ' is only allowed unescaped inside a grouping.
|
~
|
'~' is used to mean "not". It has two uses:
Preceding '[ ' as the start of a complement character set.
Preceding any other character as a match for any character other than that. For example, 'b~at ' matches any three letter string starting with 'b ' and ending in 't ' except 'bat '. This includes 'b t ' 'bit ' 'bot ' 'b%t '.
|
These regular expressions use '\
' as an escape. Java also uses '\
' as an escape and this can be somewhat confusing. To put a '\
' into a Java String, the literal will appear as '\\
'. For example, in the Java code:
String rx1 = "[\\\\\\+\\-]";
String rx2 = "[\t\r\n ]+";
rx1
actually contains: '[\\\+\-]
' which will match a '\
','+' or '-
'.
rx2
actually contains: '[]+
' and will match any whitespace.
More Examples
'[\+\-/\*]
' matches any of the four standard operators '+
', '*
', '/
', '-
'. Note that the three special characters had to be prefaced with the escape character for this expression.
'[A-Za-z][a-z]*
' is a minimum for a well-formed English word (other than acronyms) by requiring a letter followed by zero or more lower case letters.
'{OK,Done,Quit,Cancel,Continue}
' matches any of five frequently used button labels.
'[A-Za-z_$][A-Za-z_$0-9]*
' matches any legal Java identifier (in normal ASCII--Java also allows UniCode characters).
Send feedback to
JavaStar-feedback@suntest.com
Copyright © 1998
Sun Microsystems, Inc. 901 San Antonio Road, Palo Alto, CA 94303.
All rights reserved.