home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
OS/2 Shareware BBS: 35 Internet
/
35-Internet.zip
/
srev13h.zip
/
caching.doc
< prev
next >
Wrap
Text File
|
2001-03-27
|
14KB
|
328 lines
27 June 1999: SRE-http and caching.
SRE-http ver 1.3h supports several forms of caching. This document outlines
what levels of caching may apply to a request, and what you can do
to increase (or decrease) the extent to which caches answer requests.
Hint: The appendix discusses how to configure the "cache relevant"
response headers used by SRE-http.
There are several different sorts of caches that may apply. In decreasing
universality these include:
1) Proxy server caches.
For purposes of this discussion, a "proxy server" is any intermediate
site, somewhere on the web, that may handle a request issued by a client.
These sites may store responses, and use these cached responses the next
time the same request is recieved. When such a stored response is used,
the origin server is typically not contacted (the origin server does not
know that the proxy delivered content to a client).
** Perhaps the principal advantage of http/1.1 (over http/1.0) is the
** attention given to making the web proxy-cache friendly.
The appendix discusses how to configure the response headers used by
SRE-http to "talk to proxy servers".
2) The GoServe cache.
The GoServe cache consists of a list that matches selectors (the local
portion of a URI) to filenames. When a request for the same selector
arrives, GoServe can resolve the request by sending the matched file
(and a few http/1.0 response headers). As an option, the GoServe cache
can "run the filter anyways", which allows the filter to perform post-filter
actions (such as auditing).
3) The SREPROXY cache.
SREPROXY is a front-end to SRE-http. SREPROXY maintains a cache that matches
selectors to files. These files may be temporary files (say, as generated
by adding SSI's to an HTML document). In addition, SREPROXY can resolve a
few "dynamic" SSIs (such as the current time), and can do a limited amount
of access control.
4) The SSI and !DIR caches.
SREFILTR (the main filter) maintains a cache for SSI documents (that contains
"partially compiled" server side includes) and a cache for !DIR requests
(that contains directory listing). These are used when a matching selector
is recieved. Note that the SSI cache is often times used as a base to which
dynamic SSIs are added; where "dynamic SSIs" refers to information that
changes on a request specific basis (i.e; the current time, the client's
IP address, and output from INTERPRET SSIs).
The basic notion behind the use of a cache is to reduce processing
requirements and bandwidth demands. Proxy caches are highly effective at
both -- when successful, no communication with the origin server is
necessary. The GoServe cache does not save bandwidth, but can reduce
server load considerably (by skipping the "call the filter to resolve this
request" step). SREPROXY is similar -- although it is a filter that has
to be called, it's much smaller and faster then the regular (SREFILTR)
filter. Lastly, the SSI and !DIR caches can save a lot of processing for
SSI-including and directory-listing "processor intensive" resources.
Each of these caches has advantages and disadvantages.
Proxy Caches:
Advantages
* Very fast response times
* Can completely eliminate load on your server
* Helps reduce internet traffic
Disadvantages
* Should not be used with actively changing, or access controlled,
resources
* Should not be used when accurate auditing is important
GoServe Cache
Advantages:
* Response times are very fast (compared with SREFILTR)
* Minimizes load on your server
Disadvantages:
* Should not be used with actively changing, or access controlled,
resources
* Currently, the GoServe cache is http/1.0, but not http/1.1,
compliant.
SREPROXY cache:
Advantages:
* Response times are fast
* Can reduce load (since SREPROXY is smaller then SREFILTR)
* Can be used with changing and access controlled resources
* No loss of functionality -- when in doubt, SREFILTR is used
Disadvantages:
* Introduces another round of processing -- if a request does not
match a cached entry, the net result is to diminish response time.
* On occasion, a stale response may be returned
SSI and !DIR caches:
Advantages:
* Fully functional -- changes are immediately detected
* Greatly reduces processing for a subset of otherwise processing
intensive requests.
Disadvantages:
* On rare occassions, stale requests may be returned
It should be stressed that these caches are not mutually exclusive. In fact,
a typical scenario would have the three higher caches (proxy servers, GoServe,
and SREPROXY) examining a request, which may then be resolved via the use of
the SSI (or !DIR) cache. Thus, optimal performance is acheived by using each
cache in a complementary fashion.
The following discusses some tricks and techniques you can use. In addition,
the appendix discusses the "cache relevant" response headers used by SRE-http.
Proxy Servers:
* If you have a very dynamic site of non-access controlled resources,
transparency concerns may override the desire for faster throughput.
That is, you might want to suppress all proxy caching.
This can be accomplished by setting proxy_cache=0 (in INIT_STA.80)
Alternatively, you can use proxy_cache to "force revalidation."
See the appendix for more details, or see the description of the
PROXY_CACHE variable in INITFILT.DOC.
* SRE-http will automatically supress proxy caching whenever access controls
(such as CHECKLOG and ALLOW_ACCESS), or dynamic SSIs, apply to the resource.
If desired, you can explicitily allow these resources to be cached -- just
include a CACHE (or CACHE*) "permission" in a selector-specific entry
in ACCESS.IN (or in ATTRIBS.CFG). Alternatively, resources listed
as PUBLIC URLS (using PUBURLS.IN or ATTRIBS.CFG) are assumed to
be cachable by proxy caches.
* See HITMETER.DOC for hints on how to resolve problems associated
with accurate metering of hits when proxy servers may be active.
GoServe cache:
* If you do enable the GoServe cache, be aware that it uses an http/1.0
response algorithim. Thus, your site will sometimes return http/1.1
responses, and sometimes http/1.0 responses. Although this is not
fatal, it may have strange impacts (and it's somewhat asthetically
displeasing).
Therefore, SRE-http will only use the GoServe cache (that is, allow a
request to be cached by GoServe) when a CACHE* permission exists.
Alternatively, resources listed as LITERAL_NORECORD PUBLIC URLS
(in PUBLURL.IN) are assumed to be cachable by the GoServe cache.
* In general, we recommend using the GoServe cache only for resources that
you do not care to audit (such as backgrounds and icons). In this vein,
we recommend checking the "do not call filter" GoServe caching option.
* Future releases of GoServe may upgrade the GoServe cache, so that it
returns appropriate http/1.1 response headers.
* The GoServe cache ignores TE: request headers.
SREPROXY:
* If your site is highly access controlled, or consists primarily of dynamic
HTML documents (with lots of SSIs') or addons/cgi-bin scripts, then use
of SREPROXY may hurt (increase) response times.
* NUSTATUS contains an option that will display simple statistics on
the proportion of requests satisfied by SREPROXY.
* SREPROXY.DOC contains a detailed discussion on how to use SREPROXY.
* If SREPROXY detects a TE: GZIP request header, it will NOT resolve
the request.
SSI and !DIR caches:
* There is almost no reason not to use these caches....
the exceptions being:
i) You have lots of HTML documents, and not much extra disk space
ii) Your documents change rapidly (have lots of dynamic SSIs).
iii) HTML files are contantly being edited, added, and removed.
-----------------------------
Appendix: Cache relevant response headers used by SRE-http
There are several ways to effect the "cache relevant" response headers returned
by SRE-http:
a) the setting of the PROXY_CACHE variable (in INIT_STA.80)
b) the setting of the FIX_EXPIRE variable (in INITFILT.80)
c) the use of PUBLIC_URLS
d) the use of the NOCACHE, CACHE and CACHE* selector-specific permissions
e) the use of selector-specific advanced options to specify an explicit
response header.
Note that these are used for "normal" responses -- cgi scripts and
addons may override these rules, and provide their own headers.
This listing goes from general to more specific -- with the setting of PROXY_CACHE
controlling default behavior, whearas specification of selector-specific
advanced options can be used to override these defaults.
I) PROXY_CAHCE:
PROXY_CACHE can take 4 basic values, which yields the following
"default" response headers.
0= disallow caching
If this is a dynamic file (i.e.; contains dynamic ssi's)
Cache-control: no-cache
otherwise
Cache-control: private
1= allow caching
Cache-control: public
2= allow caching, with revalidation
Cache-control: public,max-age=0
If this is a dynamic resource (i.e.; an HTML document with SSIs),
the following "stronger form" is used instead:
Cache-control: public,max-age=0, must-revalidate
3= allow proxy caching with revalidation, full caching by private caches
(private caches include the "browser's cache")
Cache-control:public,s-maxage=0
I.1) A modification: If PROXY_CACHE=n_mmmmm, where n=0,1,2, or 3, and mmm is
a integer number of seconds, then the following modifications occur:
1: Cache-control: public,max-age=mmmmm
2: Cache-control: public,max-age=mmmmm, or
Cache-control: public,max-age=mmmmm, must-revalidate
3: Cache-control:public,s-maxage=mmmmm
II) PUBLIC_URL
If this selector is a "PUBLIC_URL" (i.e.; belongs to a PUBLIC realm, as
specified in ATTRIBS.CFG), then rule I is ignored. Instead:
Cache-control: public,s-maxage=mmmmm
where mmmmm either is 0, or the mmmmm value from the proxy_cache variable.
II.1) A Modification: If the PUBLIC_URL is a "literal" public_url, then
use the following "re-validate in 1 day" header:
Cache-control: public, max-age=86400
III) If a NOCACHE, CACHE or CACHE* permission is used, then the I and II
rules are ignored. Instead, use the following:
NOCACHE: use
Cache-control: no-cache
Pragma: nocache
CACHE: use
Cache-control: public,s-maxage=mmmmm
(mmmmm is from proxy_cache)
CACHE*: use
Cache-control: public
Note:
* only one of these should be specified. Should a mistake occur, with
more then one specified, then NOCACHE overrides CACHE*, and CACHE*
overrides CACHE.
IV) FIX_EXPIRE
If Fix_expire is specified, and this is a "dynamic" response, then
Expires: current_time+fix_expire
is also added.
if FIX_EXPIRE is not specified, and either
i) PROXY_CACHE=0,2 or 3
ii) a NOCACHE permission is specified
then
Expires: current_time
is also added.
Note that using "Expires: current_time" implies "immediate expiration"
V) Advanced options
If you specify a header as a "selector specific" advanced option, then
the matching header will be suppressed, and the header you specify will be
used instead.
This allows you to fine tune your "cache relevant" response headers.
For example:
Header add Cache-control: public,max-age=100000
means:
"ignore rules I,II, and III; and use
Cache-control: public, max-age=100000"
Header add Expire: Mon, 20 June 1998 10:11:12 GMT
means:
"ignore rule IV, and use"
Expires: Mon, 20 June 1998 10:11:12 GMT
(a date in the past means "immediate expiriation")
Notes:
* Summarizing the more important Cache-control directives:
NO-CACHE: never cache this response
PUBLIC: this can be cached in a public place (a proxy server cache)
PRIVATE: this can be cached by user-agents, or other "client side"
caches
MAX-AGE: after this response is this many seconds old, all caches
must re-validate
S-MAXAGE: after this response is this many seconds old, proxy (non-
private) caches must re-validate this response. Private
caches can ignore this directive.
MUST-REVALIDATE: a strong re-validation (discourages caches from
tolerating stale responses)
Revalidation typically means sending a If-modified-since
request to the origin server; which SRE-http can quickly
answer if there has been on change.
For further details, please see the http/1.1 specification
* http/1.1 proxies will always use a Cache-control: response header instead
of an expires: response header. http/1.0 proxies will typically
ignore a cache-control response header.
* resources subject to content negotatiation will often add a Vary: header.
The Vary header lists the request headers that MUST match (in addition
to the URI); such as the Accept and Accept-Language request headers.