The plug-in validator, VALLEX
,
performs validation by searching data for occurrences of key
words or phrases. This type of validation is known as lexical
analysis. See page 7-86
for more details on VALLEX
.
VALLEX
uses a special file, know as an ExpressionList file, to obtain
the majority of information it needs when performing lexical
analysis. This information includes the expressions to be
searched for, configuration settings and any loading values1 that are to be applied.
The format of the file is line based, with a complete statement appearing on a line of its own.
Comments can be included in the file. A comment is denoted by placing the semicolon character at the start of the comment. The comment is terminated at the end of the line.
; This is a comment line.
The file can be
considered as being split into three sections:
![]() |
It is
important not to change the ordering of
these sections, due to the way VALLEX
performs the search. |
The loading statements are listed first in the ExpressionList file.
These statements define some conditional loading values that may be applied when calculating the final lexical analysis score. If a loading value is to be applied it is multiplied to the numeric total generated after all the data has been searched, to give the final lexical analysis score. For more details on how this score is generated see page 7-138.
Loading values are usually applied to allow different results to be generated depending on the value of previously set attributes.
For example, you may decide that it is more important to capture profanity in outgoing messages than it is in incoming messages, or you may consider that sensitive company information leaving your sales department carries a higher risk than company information leaving other departments.
In these situations an attribute expression2 can be used to determine the direction and/or source of the message. An appropriate loading value can be applied depending on the results.
The format of the loading statement is:
Loading
<Value>
when
<Attribute_Expression>
![]() |
There can be several loading statements listed in the file. |
Loading 3 when direction==in Loading 4 when direction==out Loading 2 Default
The Default
loading value is used if no matching attribute expression is
found. If the Default
loading statement is not
present and no attribute expression evaluates to TRUE
then no loading value is applied.
direction
1 attribute is in
then a loading value of 3 is applied when calculating the
final lexical analysis score. direction
1 attribute is out
then a loading value of 4 is applied when calculating the
final lexical analysis score. Loading 4 when department=out_of_sales
department
3 attribute is
out_of_sales (for example, to indicate a message is
leaving the sales department) then a loading value of 4
is applied. department
attribute is not out_of_sales
(to indicate
any other department) then no loading value is applied.This section can contain three directives, these are:
![]() |
Any of these directives can be omitted from the ExpressionList file. If so, the default values are used. |
The CaseSensitive
directive indicates whether or not expression matches are to be
case sensitive. It can have a value of TRUE
or FALSE
.
CaseSensitive=TRUE
indicates that all expression matches are to be case sensitive.
CaseSensitive=FALSE
indicates that expression matches are not case sensitive.
The Separators
directive lists the characters that are to be used as word
separators during the search. These characters are entered as a
string enclosed within double quotes, using escaped characters
where necessary, for example, \t for tab, \n for linefeed, \r for
carriage return, \" for double quotes. For a full list of
escaped characters see page
7-4.
In this example, the separator characters used are space, tab, linefeed, carriage return, double quotes, hyphen, comma and full stop.
Any character can be included in the list by using the three digit decimal representation of the character, preceded by a backslash. A range of characters can be specified by separating two such entries with a hyphen. There must be no space either side of the hyphen.
Separators="
\t\n\r\"-,.
\001-\031
"
These separators would typically be used in more complicated files, for example, Word documents that include formatting commands.
If no separator characters are specified in the file then the default characters used are space, tab, linefeed, carriage return, full stop and comma. A null character is always interpreted as a separator character.
The MaxWordLength
directive is used to alter the maximum word length allowed in the
search. It would typically be used to ensure that VALLEX
performs efficiently when presented with non text data.
Where the number can be any positive integer between 1 and 4095 inclusive. The default value is 127.
![]() |
When
performing lexical analysis on compound documents, such
as Word documents, it is recommended that you set the MaxWordLength
directive to 4095. |
Each expression is listed as a string within double quotes, along with a numeric value for the expression.
![]() |
An optional + or -
character can also be included, to indicate whether
expression matches are case sensitive. If omitted,
default is the value set by the CaseSensitive
directive (see page
7-136). |
[+|-]
"<Expression>" <Value>
where [+/-] indicates that these values are optional.
Each expression can
be a word or phrase. A phrase can contain a maximum of 20 words.
Each phrase is split into words using one or more separator
characters defined by the Separators
directive.
"Company Confidential" 10 "Policy N*$" 5 "Copyright" 4 "Internal" 3
Certain wildcard characters can be included in an expression. These characters match with certain characters during the search and can be used to make the search more general.
"Policy N*$" 5
In this example, the *
character matches with any number of characters, including none.
The $
character matches with any number of digits,
including none. Valid matches would therefore be `Wd', `NA5
', `NABC5
', `NA55
', `NABC555
'
and so on.
The wildcard characters that can be used in an expression are shown on the following table:
Wildcard | Matches |
---|---|
* | Any number of non-separator characters, including none. |
? | Any single non-separator character. |
$ | Any number of any digits (0 to 9) including none. |
# | Any single digit (0 to 9). |
If any of the
wildcard characters need to be included in the search, as
themselves, each character must be preceded with
a backslash (\
).4
A*B - reads as any text string that starts with the character A and ends with the character B. (The * matches any string of characters.)
A\*B - reads as the text string A*B.
![]() |
If the
backslash is also required in the search then use two
backslashes (\\ ). |
Two other characters
can be used in association with expression lists. These are +
and -
. These characters are optional and are used to
indicate whether or not individual expression matches are case
sensitive.
![]() |
The value of
this character overrides the value set by the CaseSensitive
directive (see page
7-136), but only for the one
associated expression. |
+ "Company Confidential" 10 - "Internal" 3
This example indicates that matches with the expression `Company Confidential' are case sensitive and that matches with the expression `Internal' are not.
![]() |
If neither
the + or - character is
specified then the value set by the CaseSensitive
directive is used. |
The numeric value assigned to the expression depends on the nature and context of the expression. The more important you consider the expression to be, the higher the value that should be assigned to it.
In the above example, the expression `Company Confidential' is considered to be more important than the expression `Internal' so it is assigned a higher value.
The numeric value assigned to an expression is added to a running total whenever the expression is encountered in the data being validated. This running total represents a numeric score for the expressions found in the data so far.
If a loading value is to be applied (see page 7-134), it is multiplied to the running total that is generated when the search is complete. This results in a final score being generated for the data.
The final score is checked against a list of numeric values.5 Comparison with these values sets the <Response> that is generated as the result of the lexical analysis.
[AMU]
Authfile=c:\MSW\CONFIG\AUTHFILE.TXT
If=allow_in,direction=in,allow
If=allow_out,direction=out,allow
[Validation] F-PROT=VALEXEConfidential=VALLEX
ValidateAttributes=VALATTR[Confidential] ExpressionList=C:\MSW\CONFIG\CONFID.LST 20=ConfidentialModerate 50=Confidential
The If
directive checks the <Response> generated by
AMUcheck. If it is allow_in
or allow_out
then an attribute called direction
is set, with the
value in
or out
respectively. The <Response>
is then reset to allow
, which is the actual <Response>
generated by AMUcheck. It allows the message to be delivered
normally, assuming no higher priority <Response>
is returned by one of the other validator instances. (For
example, the VALLEX
validator instance defined in
this example).
A new instance of the
VALLEX
validator, Confidential
, is
defined by creating an entry in the [Validation]
section.
A configuration section with the same name is created in the body
of the file. The configuration section maps numeric values to <Response>
values. It also specifies the path to the ExpressionList file,
called CONFID.LST. See page 7-86 for more
details on how to configure a VALLEX
validator
instance.
CONFID.LST contains the expressions to be searched for, loading values that may be applied and some configuration settings. An example file is shown on the next page.
In CONFID.LST (the ExpressionList file):
; ;Direction Loadings ; Loading 3 when direction==in Loading 4 when direction==out Loading 2 Default ; ; Configuration ; CaseSensitive=TRUE Separators=" \t\n\r-,." MaxWordLength=4095 ; ; Expression list ; "Company Confidential" 10 "Policy N*$" 5 "Copyright" 4 "Internal" 3
Next any loading
values to be applied are determined. The message is incoming so
the attribute direction has the value in
.6 A loading value of 3 is
therefore applied. This gives a final score for
the message of:
The final score is
checked against the numeric values defined in the [Confidential]
section (see the previous page). The value of 20
has
been exceeded, but the value of 50
has not been
reached. The <Response> returned in this instance
is therefore ConfidentialModerate
.
1 MAILsweeper only.
Loading values are usually applied to allow different results to be generated, depending on previously set
attribute values.
2 The attribute is usually set by
AMUcheck, using the If
directive. See page 7-102 for details.
3 This attribute would be set by
AMUcheck, using the If
directive. See page 7-102 for details.
5 These values are defined in the
VALLEX
configuration section. See page 7-86 for details.
6 This is the attribute set by
AMUcheck, using the If
directive. See page 7-102 for details.
Copyright © 1998, Content Technologies Limited. All rights reserved.