Inappropriate pages can be determined by:
![]() |
WEBsweeper can only check the PICS rating if it is contained in the body of the HTML page. It cannot check ratings that are requested separately. |
For ease of
configuration, WEBsweeper maps the PICS ratings into its own
internal rating system, using several categories, shown below.
These mappings are set using the VALHTML
validator.
See page 7-92 for
details.
Category | Values |
---|---|
WSW_AGE |
0-4 |
WSW_SEX |
0-4 |
WSW_LANGUAGE |
0-4 |
WSW_VIOLENCE |
0-4 |
WSW_OTHER |
0-4 |
WSW_LABEL |
TRUE/FALSE |
[Validation]PICS=VALATTR
[PICS] ;NoRating=WSW_LABEL=FALSE HaveOther=WSW_OTHER>2 HaveAge=WSW_AGE>2 HaveLanguage=WSW_LANGUAGE>2 HaveViolence=WSW_VIOLENCE>2 HaveSex=WSW_SEX>2
The mapped ratings of
the page are compared against the threshold values for each of
the categories. If a mapped rating exceeds the
threshold value for one or more categories then an appropriate <Response>
is generated, according to the VALATTR
rules.
WSW_AGE
category is greater than 2 and no other rating is mapped
then the <Response> generated is HaveAge
.
WSW_AGE
category is greater than 2 and the mapped rating for the WSW_LANGUAGE
category is also greater than 2 then the <Response>
generated is HaveLanguage
.
DefaultDisposal
.![]() |
See page 7-81 for
more details on how VALATTR performs
validation. |
The entry maps the <Response> to a final disposition for the data.
[Disposal]
DefaultDisposal=Clean
...
NoRating=BlockNoRating
HaveOther=BlockOther
HaveAge=BlockAge
HaveLanguage=BlockLanguage
HaveViolence=BlockViolence
HaveSex=BlockSex
...
VIRUSPRESENT=Virus
Using this example,
assuming that HaveLanguage
is the highest priority <Response>
generated by validation then the final disposition for the Web
page in this instance will be BlockLanguage
.
Each disposition listed has a corresponding configuration section in the same file, used to control the disposal actions taken.
[BlockLanguage]
InformText=Page blocked - content unsuitable
![]() |
Pages with no
PICS rating, or a rating that cannot be mapped, are assigned the
attribute WSW_LABEL
, with the value FALSE
.
The value of this attribute can be checked like all the other
values.
This is achieved by
editing the [PICS]
configuration section of the http
configuration file, HTTP.CFG, to ensure that the NoRating
directive is no longer commented out.
[PICS]
;NoRating=WSW_LABEL==FALSE
HaveOther=WSW_OTHER>2
HaveAge=WSW_AGE>2
HaveLanguage=WSW_LANGUAGE>2
HaveViolence=WSW_VIOLENCE>2
HaveSex=WSW_SEX>2
[PICS]
NoRating=WSW_LABEL==FALSE
HaveOther=WSW_OTHER>2
HaveAge=WSW_AGE>2
HaveLanguage=WSW_LANGUAGE>2
HaveViolence=WSW_VIOLENCE>2
HaveSex=WSW_SEX>2
The above example will block any page that has a PICS rating category of greater than two and also any page that has no PICS rating assigned.
![]() |
The majority of HTML pages do not currently have a PICS rating assigned. The above example will therefore result in more pages being blocked than may be practical. |
The mappings from external rating services to WEBsweeper's rating scheme is found in the file PICSMAP.CFG. See Appendix A for details and for a full description of the PICS rating scheme.
Another method of
detecting and blocking unsuitable pages is to search the HTML
text for certain expressions, for example, words or phrases that
indicate profanity is present. This can be achieved using the
lexical analysis validator, VALLEX
.
The following example
shows how VALLEX
can be configured to detect
unsuitable content, by searching the HTML text for certain
keywords and phrases.
[Validation]
F-PROT=VALEXE
LEX=VALLEX
PICS=VALATTR
[LEX] PerformIf=ContainerName==PlainText
1
ExpressionList=C:\MSW\CONFIG\PROF.LST 1=HaveProfane
A new instance of the
VALLEX
validator is created, called LEX
.
It is defined in the [Validation]
section and a
corresponding [LEX]
configuration section is created
in the body of the file.
The [LEX]
configuration section specifies the name of an ExpressionList
file that contains the expressions to be searched for and certain
other configuration information. In this example the file is
called PROF.LST.
The [LEX]
configuration section also maps numeric values that may be
obtained as a result of the search to <Response>
values. In this example there is only one mapping, that is, 1=HaveProfane
.
This mapping has a numeric value of 1
and a <Response>
of HaveProfane
.
0
the <Response> generated is an
empty string. This equates to a <Response>
of DefaultDisposal
. 1
then the <Response>
generated is HaveProfane
. [Disposal]
DefaultDisposal=Clean
...
HaveProfane=BlockProfane
...
VIRUSPRESENT=Virus
Each <Response>
used in the [LEX]
configuration section has a
corresponding entry in the [Disposal]
section. In
this example there is only one entry, for the HaveProfane
<Response>. This entry maps the <Response>
to a final disposition for the Web data.
Assuming that HaveProfane
is the highest priority <Response> generated by
validation then the final disposition is BlockProfane
.
The BlockProfane
disposition has a corresponding configuration section in the same
file. This configuration section controls the disposal actions
taken.
[BlockProfane]
InformText=Page blocked - content unsuitable
In this example the
page is discarded and replaced with a message indicating that the
download was not successful. The message text sent is the string
specified by the value of the InformText
directive.
![]() |
See page 7-43 for
more details on the InformText directive.
|
In PROF.LST (the ExpressionList file):
"profane_word1" 1 "profane_word2" 1
Each expression is
given a numeric value, depending on its considered importance in
the search. In this example, each expression is considered to be
of equal importance, so is given the same value, that is, 1
.
Each time an
expression is found in the data being searched, the associated
numeric value is added to a score generated for the message so
far. At the end of validation a final numeric score is obtained.
This score is used to determine the <Response>
generated, by comparing it with the entries listed in the [LEX]
configuration section, as explained on page 5-59.
In this example, if
any of the expressions listed are detected, even once, the <Response>
generated is HaveProfane
.
![]() |
See the VALLEX
section on page
7-86 and the Disposal section on page 7-22 for
more details. |
1 The PerformIf
directive is used to ensure that lexical analysis is only
performed on plain text.
Copyright © 1998, Content Technologies Limited. All rights reserved.