INTERNET-DRAFT Vinod Valloppillil Microsoft Corporation Josh Cohen Netscape Communications 21 April 1997 Expires October 1997 Hierarchical HTTP Routing Protocol Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Abstract Recent interest in finding solutions for traffic problems stemming from HTTP have centered around the use of cooperating proxy-caches. We contend that by using a deterministic, hash-based approach for routing URLs within an "array" of proxy servers, many of the benefits of alternative cache cooperation protocols (such as ICP) may be realized. As an example of such an implementation we propose the use of "Proxy Client Configuration Files" between proxy servers in order to exchange routing information. This implementation is motivated in part by the adoption of this file by existing, popular web browsers to provide intelligent URL request routing. This draft discusses adopting this well-understood, widely implemented browser protocol by web proxies in order to facilitate intelligent routing of requests within a network of proxy servers. Valloppillil & Cohen [Page 1] INTERNET-DRAFT Hierarchical HTTP Routing Protocol 21 April 1997 1. Introduction There is significant interest in the Internet community and the ICP working group in particular in finding mechanisms where these public caches on individual proxy servers can be further aggregated and shared by as many browsers as possible. Philosophically, protocols such as ICPv2 are based on dynamic "pinging" of neighboring proxy servers in an attempt to locate copies of cached objects. We propose an alternate approach based on hash-based routing of URLs. The hash-based routing approach documented here uses a known "request resolution path" through a network of proxies that is determined by the URL of the request. An interesting side effect of this deterministic mechanism is that cache duplication is avoided. Hashing distributes the URL space among several proxies which are assumed to be relatively equidistant from each other. Additionally, this hash-based approach is more tuned for "hierarchical" deployments of proxy servers. One example of this might be a departmental level proxy which routes into an "array" of top level proxies in a corporation which provide the gateway to an ISP. The ISP, in turn, might operate another "array" of proxies at his/her POP. By contrast, ICP networks typically involve peered caches which may operate at the top level of many ISP hierarchies. As an example of an implementation of hash-based routing, we propose extending the existing "Proxy Client Configuration File" protocol used by browsers to intelligently route HTTP requests. Our proposal would implement this protocol on proxy servers in order to provide a vendor independent mechanism for specifying sophisticated hop-by-hop HTTP routing between groups of proxy servers. We also demonstrate that intelligent utilization of this routing protocol can yield almost all of the benefits of alternative cache cooperation protocols. We do NOT propose any specific routing scripts and instead leave determination of such scripts up to individual vendor implementations. Although there are clear advantages to the use of the Proxy Client Configuration File as the vehicle for transporting routing information, there may be interest in the working group in exploring other vehicles (e.g. publishing a static data table containing proxies in an "array" implementing a well-known hash function within proxies) Valloppillil & Cohen [Page 2] INTERNET-DRAFT Hierarchical HTTP Routing Protocol 21 April 1997 2. Proxy Client Configuration File The Proxy Client Configuration File is described in [1] and [2]. Additionally, multiple interoperable implementations of this protocol are available in popular client browsers. As originally constructed, this file is intended for consumption by client programs (web browsers) and is evaluated per URL to be retrieved by the browser. The output of this script provides an ordered series of proxy servers to be used by the browser to retrieve the object specified by the URL. One of the excellent properties of HTTP-proxy protocol [5] is that it exposes proxy servers to upstream servers & upstream proxies as regular clients. Because the administrator a group of proxies may wish to make make assumptions about a downstream client's ability to interpret a script, we wish to extend the metaphor to include use of the configuration file by proxies as well as "classical" clients. 3. Example implementation Researchers have documented the concept of using client-side hash-based routing to spread load across multiple proxy servers. The deterministic nature of many of these algorithms has the additional benefit of improving cache hit rates by creating the image of a single logical cache spread over many proxies. [4] In this proposal, the administrator of an "array" of proxies at an ISP may wish to construct a script that hashes URLs and distributes the hash space across each of his/her proxy servers. Using the same downstream script, the administrator should be able to service both dial-in clients (whose browsers already support the protocol) as well as leased lines to corporate proxies. The hop-by-hop nature of the routing provides additional flexibility in this example. The corporation may wish to use one particular routing script internally (one which tells clients to directly access intranet content, for example) whereas the ISP may wish for the corporation's proxy servers to use a different script to route into the ISP's proxies (one which routes all requests through the caches for maximum hit rates). Valloppillil & Cohen [Page 3] INTERNET-DRAFT Hierarchical HTTP Routing Protocol 21 April 1997 4. Security Considerations Security issues are not directly addressed in this document. Any security functionality is derived from the underlying HTTP layer. Some consideration may need to be given to ensure the integrity / security of the initial script passing. More specifically, this draft doesn't address issues that may stem from the possiblity that malicious scripts may be constructed. 5. Advantages of script-based routing vs. ICP v2 We now provide a comparison of this proposal vs. the current Internet Cache Protocol draft [3]. a. Symmetric protocol between client -> proxy and proxy -> proxy This preserves the symmetry of HTTP's presentation of proxy servers as "mega clients" to upstream servers / proxies. ICP is not currently processed / generated by client browsers. b. Eliminate messages for cache 'miss' events. A very significant percentage of all ICP messages exchanged in the field are cache "misses." [NLANR's field experience indicates that 85-90% of all ICP transactions are "misses".] Because this protocol eliminates querying, miss messages no longer occur (the outcome of all forwards are now either either "cache hit" or "continue resolving upstream"). c. Takes advantage of all HTTP work including options, cache-control, authentication, etc. HTTP already provides protocol options to perform functions such as proxy to proxy authentication, etc. These functions don't have to be re-invented. Additionally, much of the new behavior in the HTTP 1.1 cache-control headers is not expressible in ICPv2. Forwarding the entire HTTP request to the next upstream/neighboring proxy allows it to be privy to these options. d. Already implemented on the browser Eases compliance testing and demonstrates soundness of the protocol (in a limited case). Valloppillil & Cohen [Page 4] INTERNET-DRAFT Hierarchical HTTP Routing Protocol 21 April 1997 e. Sorted requests between proxies = single logical cache Over time, assuming that URL requests are randomly routed (e.g. round robin DNS) to a set of peer ICP neighbors (e.g. on a LAN within an ISP's head-end), the contents of these neighboring caches will eventually become roughly identical. A deterministic hash-based routing scheme, however, provides for a single logical cache image across 'n' proxies instead of 'n' identical caches. ICP's peer to peer queries are replaced by intelligent request routing in the previous level of the hierarchy. f. No new transport protocols The behavior of HTTP is already well understood by system administrators and passed through firewalls, etc. By contrast, ICP is relatively unknown in the vast majority of intranets which may affect speed of deployment. In general, the development and deployment of new wire protocols should be a carefully evaluated endeavor due to huge support costs and "entropy" effects on corporate networks. 6. Advantages of ICP v2 vs. script-based routing a. Exchange of messages over WAN ICP is sometimes used across very wide area links to perform cache look-ups. An example of this might be peered top-level caches between two overseas ISPs. This protocol is more intended for use by proxies that are in relative proximity to each other. One critical question is whether these transoceanic cache look-ups are worth their cost. This is especially a concern given the opportunity to build larger caches within a traditional cache hierarchy. Do large local caches "skim" most of the potential cache hits? This question could be answered with some idea of the hit rate for ICP over WAN links between very large peer caches. Valloppillil & Cohen [Page 5] INTERNET-DRAFT Hierarchical HTTP Routing Protocol 21 April 1997 b. Exchange of messages across peer administrative domains Correct implementation of the proxy configuration script is in part dependent on having a series of proxies within the same administrative domain which share their logical cache. Because ICP maintains a very loose relationship between neighbors, it is easier to implement across such domains. However, once again, the question of whether anything more than 2 or 3 levels of cache look-ups is valuable becomes pertinent. If not, then a 2-3 level hierarchical array of proxies within corporations & ISPs might be sufficient for maximum cache hit rates. c. Binary protocol ICP is clearly faster and easier to parse than HTTP due to it's binary nature. However, the construction of efficient HTTP engines is already at a premium due to the wide deployment of the protocol. d. Connectionless transport ICP can and often is transported over UDP which is lighter weight than HTTP's TCP connection. Many of these disadvantages may be mitigated by performance optimizations such as keep-alives and pipelining. Additionally, notice that in the case of a cache hit, ICP may require construction of a TCP connection to transport the requested object. Furthermore, the lack of congestion control on ICP messages is the obvious downside of connectionless transport. In this scheme connections between proxy servers would almost certainly be HTTP Keep-Alive sessions. e. Failure case benefit. If for some reason, the ICP cache who has a URL is too slow to respond or is down an alternate cache will be used to fulfill the request. It is likely that this cache will cache the results. At any later point in time, this cache will respond with a HIT message when queried about the URL. This allows very busy URLs to be spread among multiple caches and stems from the non-deterministic nature of the protocol. In the hashing scheme, if a busy set of URLS is assigned to one cache via the hash, and that server is too slow or down, another cache will handle and cache that request. Unfortunately, that cached version is of no use to any clients or proxies anymore since the clients/proxies will never go to that proxy again if it doesnt match the hash function. Valloppillil & Cohen [Page 7] INTERNET-DRAFT Hierarchical HTTP Routing Protocol 21 April 1997 f. Server distance determination In the field, a secondary benefit of ICP has been use of its UDP round-trip times as a means of guaging relative distance between peer caches. Because hash-based routing relies on TCP and implies hierarchies known a priori, this feature of ICP isn't realized. g. Current installed base ICP currently has an installed base of ~3000 proxies. 7. Open Issues As specified via Proxy Client Configuration files, there are two primary open issues associated with this protocol: 1) Standardization of the Proxy-client configuration file. Currently, this protocol is only a de facto standard and has not been formally accepted / endorsed by the IETF 2) Performance of script evaluation on proxy servers. There are potentially significant issues with evaluating proxy configuration scripts per URL processed by a proxy server. Requiring an interpreter for Javascript [1] may be outside of the bounds of the working group. Additionally, performance of the script + script interpreter may be a significant cost for proxy servers which need to handle high transaction volumes. 8. Acknowledgements The authors would like to thank Brian Smith, Kip Compton, Ari Luotonen, and Kerry Schwartz for their assistance in preparing this document. Valloppillil & Cohen [Page 8] INTERNET-DRAFT Hierarchical HTTP Routing Protocol 21 April 1997 9. References [1] Luotonen, Ari., "Navigator Proxy Auto-Config File Format", Netscape Corporation, http://home.netscape.com/eng/mozilla/2.0/ relnotes/demo/proxy-live.html, March 1996. [2] Microsoft Corporation., "Automatic Proxy Configuration", http://www.microsoft.com/ie/ieak/autosys.htm, March 21, 1997. [3] Wessels, Duane., "Internet Cache Protocol Version 2", http://ds. internic.net/internet-drafts/draft-wessels-icp-v2-00.txt, March 21, 1997. [4] Sharp Corporation., "Super Proxy Script", http://naragw.sharp.co.jp/sps/, August 9, 1996. [5] Fielding, R., et. al, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2068, UC Irvine, January 1997. 10. Author Information Vinod Valloppillil Microsoft Corporation One Microsoft Way Redmond, WA 98052 Phone: 1.206.703.3460 Email: VinodV@Microsoft.Com Josh Cohen Netscape Communications Corporation 501 E. Middlefield Rd. Mountain View, CA 94043 Phone: 1.415.937.4157 Email: Josh@Netscape.Com Expires October 1997