Class rxScanner

PatternPro
ClassModule rxScanner

Implements functionality needed to create flexible multi-state scanners, tokenizers and parsers.

Description:
The scanner is designed to recognize tokens from an input stream using regular expressions. Matching of patterns to the input is performed in the order in which the patterns are added to the scanner. The first successful match of any pattern causes a token to be returned.

Multiple scanner states are implemented using a stack, allowing for the recognition of different sets of patterns throughout the scanning process. The scanner is shifted into an alternate state using the PushState and PopState methods.

Pattern matching is expected to occur at the beginning of the input. When a match is found, the matching text is returned and removed from the input stream.

Author:
Andrew Friedl

Copyright:
1998, 1999, 2000 BlackBox Software & Consulting


General Methods
object members
ScanInput Public Property Let ScanInput(S As String)
        Sets the scanners input to a specified string value.
ScanInput Public Property Get ScanInput() As String
        Returns the current value of the scanners input.
Newline Public Property Let Newline(S As String)
        Sets the character sequence to be recognized as a "newline"
Newline Public Property Get Newline() As String
        Returns the current value of the Newline setting.
BOLmatchesBeg Public Property Let BOLmatchesBeg(B As Boolean)
        Sets the flag governing the behavior of the "^" metacharacter in patterns.
BOLmatchesBeg Public Property Get BOLmatchesBeg() As Boolean
        Returns the current value of the "beginning of line" matching flag.
EOLmatchesEnd Public Property Let EOLmatchesEnd(B As Boolean)
        Sets the flag governing the behavior of the "$" metacharacter in patterns.
EOLmatchesEnd Public Property Get EOLmatchesEnd() As Boolean
        Returns the current value of the "end of line" matching flag.
CurrentState Public Function CurrentState() As String
        Returns the current state of the scanner.
GetToken Public Function GetToken(TokenType As Integer, TokenValue As Variant) As Boolean
        Requests the next token from the scanner.
AddState Public Sub AddState(StateName As String, ID As Integer)
        Adds a named state to the scanner.
AddPattern Public Sub AddPattern(Pattern As String, ID As Integer, Optional StateName)
        Adds a pattern to the scanner.
PushState Public Sub PushState(StateName As String)
        Places the scanner into a named "state"
PopState Public Sub PopState()
        Returns the scanner to the previous state.
UnGetToken Public Sub UnGetToken(Token As String)
        Places a string value at the head of the scanners input.

General Methods - Detail
object members

ScanInput

Sets the scanners input to a specified string value.

Description:
This method sets the scanner's input to a specified string value. Previous input assigned to the scanner is discarded.

Definition:
Public Property Let ScanInput(S As String)

Parameters:
S A string value that will become the scanners input. input stream.


ScanInput

Returns the current value of the scanners input.

Definition:
Public Property Get ScanInput() As String


Newline

Sets the character sequence to be recognized as a "newline"

Description:
This property allows users to set the value of the "newline". Typically, the ^ and the $ meta characters recognize the unix carriage return "\r" as the end of a line. In the DOS world, a line's end often follows this with a linefeed character "\n". PatternPro defaults to a line ending with both characters, insuring that the linefeed is not misread as the first charcter of the "next line". This property may be set to reflect unix, dos and mac style text.

Definition:
Public Property Let Newline(S As String)

Parameters:
S Only vbCr, vbLf, vbCrLf, or the default vbNewline.


Newline

Returns the current value of the Newline setting.

Definition:
Public Property Get Newline() As String


BOLmatchesBeg

Sets the flag governing the behavior of the "^" metacharacter in patterns.

Description:
This flag determines whether or not the "^" metacharacter matches the beginning of a string or not. Setting this value to true causes the "^" to match the strings beginning. A false value causes "^" to match only after a valid newline sequence. The default value for this property is True. (see NewLine)

Definition:
Public Property Let BOLmatchesBeg(B As Boolean)

Parameters:
B A Boolean value.


BOLmatchesBeg

Returns the current value of the "beginning of line" matching flag.

Definition:
Public Property Get BOLmatchesBeg() As Boolean

Returns:
A Boolean


EOLmatchesEnd

Sets the flag governing the behavior of the "$" metacharacter in patterns.

Description:
This flag determines whether or not the "$" metacharacter matches the end of a string or not. Setting this value to true causes the "$" to match the inputs end. A false value causes "$" to match only at the character position prior to a valid newline sequence. The default value for this property is True. (see NewLine)

Definition:
Public Property Let EOLmatchesEnd(B As Boolean)

Parameters:
B A Boolean value.


EOLmatchesEnd

Returns the current value of the "end of line" matching flag.

Definition:
Public Property Get EOLmatchesEnd() As Boolean

Returns:
A Boolean


CurrentState

Returns the current state of the scanner.

Definition:
Public Function CurrentState() As String

Returns:
The string name of the current state.


GetToken

Requests the next token from the scanner.

Description:
This method causes the scanner to locate the next match using the patterns defined on its current state. If a match is found the token is returned both as a string and the integer ID assigned to the pattern when it was assigned to the scanner.

Definition:
Public Function GetToken(TokenType As Integer, TokenValue As Variant) As Boolean

Parameters:
TokenType The integer ID assigned to the pattern that was matched. This parameter must be passed by reference.
TokenValue The actual string that was matched from the scanner's input. This parameter must be passed by reference.

Returns:
True if a token was found, False otherwise.


AddState

Adds a named state to the scanner.

Description:
This method creates a new named state on the scanner. An error is raised if a state already exists with the same name.

Definition:
Public Sub AddState(StateName As String, ID As Integer)

Parameters:
StateName The name to be given to the new state.
ID A unique integer identifier for the state. Uniqueness is not enforced by the scanner.


AddPattern

Adds a pattern to the scanner.

Description:
This method is used to add regular expression patterns to states existing within the scanner. The ID value passed to the scanner is returned whenever the pattern locates a match. The optional parameter StateName is allows the pattern to be assigned to a specific named state.

Important: The scanner is programmed to automatically ignore any and all tokens, regardless of state, that have been assigned an ID value equal to zero. This is to allow easy implementation of whitespace removal.

Definition:
Public Sub AddPattern(Pattern As String, ID As Integer, Optional StateName)

Parameters:
Pattern A well formed syntactically correct regular expression pattern.
ID An integer identifier for the token definition. This value is also returned when a match is found for this pattern.
StateName Optional Specifies the named state to which the pattern should be added. If the named state is not specified, or is missing, the pattern will be added to the default state named "default".


PushState

Places the scanner into a named "state"

Description:
This method places the scanner into any of the named states defined upon it, including the default state "default". States are defined using the AddState method. If the named state exists, it is pushed onto the scanner's internal stack. states existing within the scanner, otherwise an error is raised.

Definition:
Public Sub PushState(StateName As String)

Parameters:
StateName The named state into which the scanner should be placed.


PopState

Returns the scanner to the previous state.

Description:
This method returns the scanner to the previous state. Assuming that alternate states have been "pushed", this method causes the scanner to remove to topmost state from its internal stack. If the scanner is at the default or lowest state, and error will be raised.

Definition:
Public Sub PopState()


UnGetToken

Places a string value at the head of the scanners input.

Description:
It is ocassionally necessary to place a token back into the input stream. This method allows any string value to be placed at the head of the scanners input.

Definition:
Public Sub UnGetToken(Token As String)

Parameters:
Token The string value to be placed the the head of the scanners input stream.


Generated by DocuPro for VB5
Copyright 1999-2000 BlackBox Software & Consulting