MORegularExpression


Abstract

Represents a regular expression which can be matched against candidate strings.

Discussion

MORegularExpression objects are initialized from a pattern string in something similar to unix-style regular expression syntax (such as used in egrep) and can be used to match other strings against the pattern. In addition to the pattern string you can specify whether the expression should be case-insensitive. They are immutable. If you need to match another pattern, make another MORegularExpression.

The implementation is almost entirely provided by Henry Spencer's Uniocode-based regular expression package which is used by the MOKit framework in a (slightly) modified form and was originally taken from TCL (8.3.2). The unmodified code can be found in the HSRegexp group/folder in the Readmes and Notes group of the MOKit_2 project. (Using FileMerge to compare the original HSRegexp folder with the modified MORegexp folder will show the exact changes made.)

MORegularExpression uses the Advanced Regular Expression (ARE) syntax. This is basically a further extension of POSIX Extended Regular Expression (ERE) syntax (basically, what egrep uses). Details on the syntax can be found in the document RESyntax.rtf included with the MOKit framework (Safari and OmniWeb will show this RTF document directly in the browser, other browsers may need to use a helper application).

In addition to simply matching candidate strings, MORegularExpressions can take advantage of the subexpressions defined within the regular expression and can return the matching ranges or substrings for any subexpression from a matching candidate string.



Methods

validExpressionString:
Syntax checks a regular expression string.
regularExpressionWithString:ignoreCase:
Convenience factory for creating a new regular expression instance.
regularExpressionWithString:
Convenience factory for creating a new regular expression instance.
initWithExpressionString:ignoreCase:
Init method. Designated Initializer.
initWithExpressionString:
Init method.
expressionString
Returns the regular expression string.
ignoreCase
Returns whether the receiver is case insensitive.
matchesCharacters:inRange:
Check whether a specific range in a candidate character buffer matches the regular expression.
matchesString:inRange:
Check whether a specific range in a candidate string matches the regular expression.
matchesString:
Check whether a candidate string matches the regular expression.
rangeForSubexpressionAtIndex:inCharacters:range:
Retrieve a subexpression match range.
rangeForSubexpressionAtIndex:inString:range:
Retrieve a subexpression match range.
rangeForSubexpressionAtIndex:inString:
Retrieve a subexpression match range.
substringForSubexpressionAtIndex:inString:
Retrieve a subexpression match substring.
rangesForSubexpressionsInCharacters:range:
Retrieve all subexpression match ranges.
subexpressionsForString:
Retrieve subexpression matches.

expressionString

Returns the regular expression string.
- ( NSString *) expressionString;

Returns the regular expression string that was used to initialize the receiver.

method result
The regular expression string.

ignoreCase

Returns whether the receiver is case insensitive.
- ( BOOL ) ignoreCase;

Returns whether the receiver is case insensitive.

method result
YES if the receiver is case insensitive, NO if not.

initWithExpressionString:

Init method.
- ( id ) initWithExpressionString:
        (NSString *) expressionString;

This simply calls the Designated Initializer with ignoreCase:NO. Given a regular expression string this method initializes the receiver. The new expression will be case sensitive.

Parameter Descriptions
expressionString
The regular expression string.
method result
The initialized MORegularExpression, or nil if expressionString is not a valid regular expression string.

initWithExpressionString:ignoreCase:

Init method. Designated Initializer.
- ( id ) initWithExpressionString:
        (NSString *) expressionString ignoreCase:
        (BOOL ) ignoreCaseFlag;

This is the Designated Initializer for the MORegularExpression class. Given a regular expression string and a flag indicating whether the expression should be case insensitive, this method initializes the receiver.

Parameter Descriptions
expressionString
The regular expression string.
ignoreCaseFlag
Whether the expression object should ignore case differences when matching candidate strings.
method result
The initialized MORegularExpression, or nil if expressionString is not a valid regular expression string.

matchesCharacters:inRange:

Check whether a specific range in a candidate character buffer matches the regular expression.
- ( BOOL ) matchesCharacters:
        (const unichar *) candidateChars inRange:
        (NSRange ) searchRange;

Given a candidate character buffer and a range to match in, this method will return whether or not it matches the regular expression. This is the primitive matching method. All others call through to this one eventually.

Parameter Descriptions
candidateChars
The unichar buffer to test against the regular expression.
searchRange
The range of the buffer to use for matching.
method result
YES if the searchRange of the candidateChars matches the expression, NO if not.

matchesString:

Check whether a candidate string matches the regular expression.
- ( BOOL ) matchesString:
        (NSString *) candidate;

Given a candidate string, this method will return whether or not it matches the regular expression. This method calls -matchesString:inRange: with a range encompassing the whole string.

Parameter Descriptions
candidate
The string to test against the regular expression.
method result
YES if the string matches the expression, NO if not.

matchesString:inRange:

Check whether a specific range in a candidate string matches the regular expression.
- ( BOOL ) matchesString:
        (NSString *) candidate inRange:
        (NSRange ) searchRange;

Given a candidate string and a range to match in, this method will return whether or not it matches the regular expression. This extracts a unichar buffer and calls -matchesCharacters:inRange:.

Parameter Descriptions
candidate
The string to test against the regular expression.
searchRange
The range of the string to use for matching.
method result
YES if the searchRange of the string matches the expression, NO if not.

rangeForSubexpressionAtIndex:inCharacters:range:

Retrieve a subexpression match range.
- ( NSRange ) rangeForSubexpressionAtIndex:
        (unsigned ) index inCharacters:
        (const unichar *) candidateChars range:
        (NSRange ) searchRange;

Given a candidate character buffer and a range to match in and the index of a subexpression, this method will return the range from the candidate characters that matched the given subexpression index (if the string matches at all).

Parameter Descriptions
index
The index of the subexpression range to return.
candidateChars
The unichar buffer to test against the regular expression.
searchRange
The range of the buffer to use for matching.
method result
If the candidate characters match, the range of the subexpression match. If the candidate does not match, the range (NSNotFound, 0).

rangeForSubexpressionAtIndex:inString:

Retrieve a subexpression match range.
- ( NSRange ) rangeForSubexpressionAtIndex:
        (unsigned ) index inString:
        (NSString *) candidate;

Given a candidate string and the index of a subexpression, this method will return the range from the candidate string that matched the given subexpression index (if the string matches at all).

Parameter Descriptions
index
The index of the subexpression range to return.
candidate
The string to test against the regular expression.
method result
If the candidate string matches, the range of the subexpression match. If the candidate does not match, the range (NSNotFound, 0).

rangeForSubexpressionAtIndex:inString:range:

Retrieve a subexpression match range.
- ( NSRange ) rangeForSubexpressionAtIndex:
        (unsigned ) index inString:
        (NSString *) candidate range:
        (NSRange ) searchRange;

Given a candidate string and a range to match in and the index of a subexpression, this method will return the range from the candidate string that matched the given subexpression index (if the string matches at all).

Parameter Descriptions
index
The index of the subexpression range to return.
candidate
The string to test against the regular expression.
searchRange
The range of the string to use for matching.
method result
If the candidate string matches, the range of the subexpression match. If the candidate does not match, the range (NSNotFound, 0).

rangesForSubexpressionsInCharacters:range:

Retrieve all subexpression match ranges.
- ( NSRange *) rangesForSubexpressionsInCharacters:
        (const unichar *) candidateChars range:
        (NSRange ) searchRange;

Given a candidate character buffer and a range to match in, this method will return an array of ranges from the candidate characters that matched the subexpressions (if the string matches at all). The range is MO_REGEXP_MAX_SUBEXPRESSIONS in length and any unused subexpressions will be {NSNotFound, 0}. The returned array is valid only until the next match or subexpression operation on the receiver. This method is useful when working with large candidate buffers and when you need to get information about multiple subexpressions. MORegularExpression's caching makes repeated queries about a given string cheap, usually, but when the string is large, MORegularExpression does not cache (since the cost of caching starts to outweigh the benefit). This API give you a way to get all the data you might need in one operation.

Parameter Descriptions
candidateChars
The unichar buffer to test against the regular expression.
searchRange
The range of the buffer to use for matching.
method result
If the candidate characters match, the array of subexpression match ranges. If the candidate does not match, NULL.

regularExpressionWithString:

Convenience factory for creating a new regular expression instance.
+ ( id ) regularExpressionWithString:
        (NSString *) expressionString;

Given a regular expression string this method returns a newly allocated, autoreleased MORegularExpression. The new expression will be case sensitive.

Parameter Descriptions
expressionString
The regular expression string.
method result
The new autoreleased MORegularExpression, or nil if expressionString is not a valid regular expression string.

regularExpressionWithString:ignoreCase:

Convenience factory for creating a new regular expression instance.
+ ( id ) regularExpressionWithString:
        (NSString *) expressionString ignoreCase:
        (BOOL ) ignoreCaseFlag;

Given a regular expression string and a flag indicating whether the expression should be case insensitive, this method returns a newly allocated, autoreleased MORegularExpression.

Parameter Descriptions
expressionString
The regular expression string.
ignoreCaseFlag
Whether the expression object should ignore case differences when matching candidate strings.
method result
The new autoreleased MORegularExpression, or nil if expressionString is not a valid regular expression string.

subexpressionsForString:

Retrieve subexpression matches.
- ( NSArray *) subexpressionsForString:
        (NSString *) candidate;

Given a candidate string, this method will an array of all subexpression matches.

This method should not be used and is included only for compatibility. Use the rangeForSubexpression... or substringForSubexpression... methods instead which more accurately distinguish between the no-match case and the zero-length-match case.

method result
An array of the subexpression substrings.

substringForSubexpressionAtIndex:inString:

Retrieve a subexpression match substring.
- ( NSString *) substringForSubexpressionAtIndex:
        (unsigned ) index inString:
        (NSString *) candidate;

Given a candidate string and the index of a subexpression, this method will return the substring from the candidate string that matched the given subexpression index (if the string matches at all). The return value will be nil if the candidate does not match and the empty string if the candidate matches but the subexpression matched a zero-length range. This is a convenience method that calls -rangeForSubexpressionAtIndex:inString: and then creates a substring from the range. The convenience method is only implemented for the simple case of matching a whole NSString. If you're matching within a sub-range or using unichar buffers, use the appropriate rangeForSubexpressionAtIndex:... API.

Parameter Descriptions
index
The index of the subexpression range to return.
candidate
The string to test against the regular expression.
method result
If the candidate string matches, the substring of the subexpression match. If the candidate does not match, the range, nil.

validExpressionString:

Syntax checks a regular expression string.
+ ( BOOL ) validExpressionString:
        (NSString *) expressionString;

Given a candidate regular expression string, this method attempts to compile it into a regular expression to see if it is valid. In effect it syntax checks regular expression strings.

Parameter Descriptions
expressionString
The candidate regular expression string.
method result
YES if the expressionString is a valid regular expression, NO otherwise.

(Last Updated 3/20/2005)