![]()
The author runs some websites and use this module for a lot of tasks. Some special solutions are listed here: Practical Examples
There are a lot of other situations where this module can help or, indeed, which can only be solved with the help of this module.
- URL Canonicalization
To rewrite a lot of URL notations into their canonical form, e.g.
requested URL: gets rewritten to: --------------------------- ------------------------------ / /e/www/ /~user /u/user /{u,g,e}/{user,group,entity} /{u,g,e}/{user,group,entity}/which directly corresponds to the filesystem layout on our machines.
RewriteEngine On : # canonicalize the rootdir RewriteRule ^/$ /e/en[R,L] # canonicalize the Unix shorthand for user dirs # (we don't do a 'L'ast command here because below all # /[uge] dirs will be redirected, too) RewriteRule ^/~([^/]+)/?(.*) /u/$1/$2[R] # always append / to homedirs if the client forgot it # (if this matches it is the 'L'ast rule and it does a 'R'edirect) RewriteRule ^/([uge])/([^/]+)$ /$1/$2/[R,L] # enable the Robot Exclusion Standard configuration file RewriteRule ^/robots.txt /v/sw/free/lib/apache/internal/html/robots.txt # disable getting of .wwwacl, .wwwpasswd and .wwwgroups files RewriteRule .*/\.wwwacl$ /internal/cgi/errors/nph-404-notfound RewriteRule .*/\.wwwpasswd$ /internal/cgi/errors/nph-404-notfound RewriteRule .*/\.wwwgroups$ /internal/cgi/errors/nph-404-notfound
- Homogeneous URL Layout
Create a homogenous and consistent URL layout over all WWW servers on a Intranet webcluster, i.e. all URLs (per definition server local and thus server dependent!) become actually server independed! This is obtained by instructing all servers to redirect URLs of the form
to
/{u,g,e}/
{user,group,entity}/
anypath...
http://
physical-host/{u,g,e}/
{user,group,entity}/
anypath...when
/{u,g,e}/
{user,group,entity}/
is not locally valid to anyone of the servers, i.e. the homepage of user doesn't reside on the requested machine. This gives our WWW namespace a consistent layout: because no URL has to include any physically correct target server, and because all servers know the physical target host and do a external redirect if needed. The knowledge of the target servers comes from (distributed) external maps which are used by mapping-functions inside the rewriting rules.
RewriteEngine On : # the map files: RewriteMap user-to-host txt:/v/sw/free/lib/apache/conf/maps/map.user-to-host RewriteMap group-to-host txt:/v/sw/free/lib/apache/conf/maps/map.group-to-host RewriteMap entity-to-host txt:/v/sw/free/lib/apache/conf/maps/map.entity-to-host # and the rules: RewriteRule ^/u/([^/]+)/?(.*) http://${user-to-host:$1|en2.en.sdm.de}/u/$1/$2 RewriteRule ^/g/([^/]+)/?(.*) http://${group-to-host:$1|en2.en.sdm.de}/g/$1/$2 RewriteRule ^/e/([^/]+)/?(.*) http://${entity-to-host:$1|en2.en.sdm.de}/e/$1/$2 # we do a explicit expansion of the effective homedirs by # manually inserting the "UserDir" (see above) into the path! # this gives us the feature of "virtual homedirs", i.e. homedirs # which actually don't have a corresponding user (UID, Homedir). # RewriteRule ^/([uge])/([^/]+)/?$ /$1/$2/.www/ RewriteRule ^/([uge])/([^/]+)/([^.]+.+) /$1/$2/.www/$3
- Secure CGI Script Integration
Be able to pipe any script
.scgi
CGI-program through the popular CGIwrap utility. This will check a CGI-program for security problems. If it passes, the CGI-program runs under the UID/GID of the physical owner. This could not be achieved by a simpleAction
directive (mod_action) because the executable cgiwrap requires itsPATH_INFO
in a special form and not/u/
user/.../
script.scgi
# transform our canonical path into the one CGIwrap wants RewriteEngine On : RewriteRule ^/[uge]/([^/]+)/\.www/(.+)\.scgi(.*) ... ... /internal/cgi/user/cgiwrap/~$1/$2.scgi$3 [NS,T=application/x-http-cgi]
- Simplification Of Services
To be able to add some string to a URL to start a service which operates on that particular URL. For example: We have a search-engine query-form, running as a CGI-program, which gets the directory to operate on via the
QUERY_STRING
variable ``i
''. Usually the user had to reference this program directly and supply a ``i=
directory''QUERY_STRING
part inside the URL, e.g. to call the search-form for
/u/foo/abc/def/
a URL reference to
/internal/cgi/user/swwidx?i=/u/foo/abc/def/
would be have been needed. This was really bad, because the user has to know and hard-code the location of our search-form CGI script and the location of its directory. With the help of this rewriting module he now can just reference the URL
/u/foo/abc/def/swwidx
and this gets rewritten on-the-fly to the physically needed format. The same technique is used for another tool which extracts the information about the specific URL from the local access.log file.
RewriteEngine On : RewriteRule ^/([uge])/([^/]+)(/?.*)/\* /internal/cgi/user/swwidx?i=/$1/$2$3/ RewriteRule ^/([uge])/([^/]+)(/?.*)/swwidx /internal/cgi/user/swwidx?i=/$1/$2$3/ RewriteRule ^/([uge])/([^/]+)(/?.*):swwlog /internal/cgi/user/swwlog?f=/$1/$2$3
- Backward Compatibility for Obsolete URLs
Suppose you have just renamed file oldfile.html inside your Homepage structure to newfile.html and want the old URL to be still valid, i.e. a request to oldfile.html should give the contents of newfile.html. You can achieve this with the following rule in the .htaccess file of the local directory where oldfile.html resides:
RewriteRule ^oldfile\.html$ newfile.html
- The Trailing Slash Problem
Every webmaster can sing a song about the problem of the trailing slash on URLs referencing directories. If they are missing, the server dumps an error, because if you say /somepath/somedir instead of /somepath/somedir/ then the server searches for a file named somedir. And because this file is a directory it complains.
The solution to this subtle problem is to let the server add the trailing slash automatically. To do this correctly we have to use a external redirect, so the browser correctly requests subsequent images etc. If we only did a internal rewrite, this would only work for the directory page, but would go wrong when any images are included into this page with relative URLs, because the browser would request an in-lined object. For instance, a request for image.gif in /somepath/somedir/index.html would become /somepath/image.gif without the external redirect!
So, to do this trick we write:
RewriteRule ^/somepath/somedir$ /somepath/somedir/ [R]The crazy and lazy can do the following in the top-level .htaccess file of their homedir:
RewriteBase /~userfoo RewriteCond %{REQUEST_FILENAME} -d RewriteRule ^(.+[^/])$ $1/ [R]
- Map External Stuff into Local Namespace
You can use the internal proxy module of the Apache server to map remote stuff into your local namespace. This gives a more powerful implementation of the ProxyPass directive from mod_proxy. It is activated by the P (proxy) flag.
Suppose we want to map the latest mod_rewrite manual into a subdirectory, say /u/rse/manuals/, I.e: if the URL /u/rse/manuals/mod_rewrite.html is requested from our local server it should give out the same stuff as the user had requested http://www.engelschall.com/sw/mod_rewrite/mod_rewrite.html. To achieve this we setup the following rule in /u/rse/manuals:
RewriteEngine On : RewriteRule ^mod_rewrite\.html$ ... ... http://www.engelschall.com/sw/mod_rewrite/mod_rewrite.html [P]Or if we want to map the whole mod_rewrite Homepage we can do:
RewriteEngine On : RewriteRule ^mod_rewrite/(.*)$ ... ... http://www.engelschall.com/sw/mod_rewrite/$1 [P]Notice! The proxy feature does not copy the file to your directory, it just looks this way for the user. Instead it is internally retrieved by the Apache proxy module and perhaps internally cached. This feature is very useful because it allows you to have virtual, up-to-date copies of hot stuff available locally.
- Hardcore Example: net.sw
Here is a hardcore example: a killer application which heavily uses per-directory RewriteRules to get a smooth look and feel on the Web while its data structure is never touched or adjusted.
Background:
net.sw is my archive of freely available Unix software packages, which I started to collect in 1992. It is both my hobby and job to to this, because while I'm studying computer science I have also worked for many years as a system and network administrator in my spare time. Every week I need some sort of software so I created a deep hierarchy of directories where I stored the packages:
drwxrwxr-x 2 netsw users 512 Aug 3 18:39 Audio/ drwxrwxr-x 2 netsw users 512 Jul 9 14:37 Benchmark/ drwxrwxr-x 12 netsw users 512 Jul 9 00:34 Crypto/ drwxrwxr-x 5 netsw users 512 Jul 9 00:41 Database/ drwxrwxr-x 4 netsw users 512 Jul 30 19:25 Dicts/ drwxrwxr-x 10 netsw users 512 Jul 9 01:54 Graphic/ drwxrwxr-x 5 netsw users 512 Jul 9 01:58 Hackers/ drwxrwxr-x 8 netsw users 512 Jul 9 03:19 InfoSys/ drwxrwxr-x 3 netsw users 512 Jul 9 03:21 Math/ drwxrwxr-x 3 netsw users 512 Jul 9 03:24 Misc/ drwxrwxr-x 9 netsw users 512 Aug 1 16:33 Network/ drwxrwxr-x 2 netsw users 512 Jul 9 05:53 Office/ drwxrwxr-x 7 netsw users 512 Jul 9 09:24 SoftEng/ drwxrwxr-x 7 netsw users 512 Jul 9 12:17 System/ drwxrwxr-x 12 netsw users 512 Aug 3 20:15 Typesetting/ drwxrwxr-x 10 netsw users 512 Jul 9 14:08 X11/In July 1996 I decided to make this 350 MB archive public to the world via a nice Web interface ( http://net.sw.engelschall.com/net.sw/). "Nice" means that I wanted to offer a interface where you can browse directly through the archive hierarchy. And "nice" means that I didn't wanted to change anything inside this hierarchy - not even by putting some CGI scripts at the top of it. Why? Because the above structure is accessible via FTP as well, and I didn't want my CGI scripts to be there.Solution:
The solution has two parts: The first is a set of CGI scripts which create all the pages at all directory levels on the fly. I put them under /e/netsw/.www/ as follows:
-rw-r--r-- 1 netsw users 1318 Aug 1 18:10 .wwwacl drwxr-xr-x 18 netsw users 512 Aug 5 15:51 DATA/ -rw-rw-rw- 1 netsw users 372982 Aug 5 16:35 LOGFILE -rw-r--r-- 1 netsw users 659 Aug 4 09:27 TODO -rw-r--r-- 1 netsw users 5697 Aug 1 18:01 netsw-about.html -rwxr-xr-x 1 netsw users 579 Aug 2 10:33 netsw-access.pl -rwxr-xr-x 1 netsw users 1532 Aug 1 17:35 netsw-changes.cgi -rwxr-xr-x 1 netsw users 2866 Aug 5 14:49 netsw-home.cgi drwxr-xr-x 2 netsw users 512 Jul 8 23:47 netsw-img/ -rwxr-xr-x 1 netsw users 24050 Aug 5 15:49 netsw-lsdir.cgi -rwxr-xr-x 1 netsw users 1589 Aug 3 18:43 netsw-search.cgi -rwxr-xr-x 1 netsw users 1885 Aug 1 17:41 netsw-tree.cgi -rw-r--r-- 1 netsw users 234 Jul 30 16:35 netsw-unlimit.lstThe DATA/ subdirectory holds the above directory structure, i.e. the real net.sw stuff and gets automatically updated via rdist from time to time.The second part of the problem remains: how to link these two structures together into one smooth-looking URL tree? We want to hide the DATA/ directory from the user while running the appropriate CGI scripts for the various URLs. This is the solution: first I put the following into the per-directory configuration file in the Document Root of the server to rewrite the announced URL /net.sw/ to the internal path /e/netsw:
RewriteRule ^net.sw$ net.sw/ [R] RewriteRule ^net.sw/(.*)$ e/netsw/$1The first rule is for requests which miss the trailing slash! The second rule does the real thing. And here comes the killer configuration which stays in the per-directory config file /e/netsw/.www/.wwwacl:
Options ExecCGI FollowSymLinks Includes MultiViews RewriteEngine on # we are reached via /net.sw/ prefix RewriteBase /net.sw/ # first we rewrite the root dir to # the handling cgi script RewriteRule ^$ netsw-home.cgi [L] RewriteRule ^index\.html$ netsw-home.cgi [L] # strip out the subdirs when # the browser requests us from perdir pages RewriteRule ^.+/(netsw-[^/]+/.+)$ $1 [L] # and now break the rewriting for local files RewriteRule ^netsw-home\.cgi.* - [L] RewriteRule ^netsw-changes\.cgi.* - [L] RewriteRule ^netsw-search\.cgi.* - [L] RewriteRule ^netsw-tree\.cgi$ - [L] RewriteRule ^netsw-about\.html$ - [L] RewriteRule ^netsw-img/.*$ - [L] # anything else is a subdir which gets handled # by another cgi script RewriteRule !^netsw-lsdir\.cgi.* - [C] RewriteRule (.*) netsw-lsdir.cgi/$1Some hints for interpretation:
- Notice the L (last) flag and no substitution field ('-') in the forth part
- Notice the ! (not) character and the C (chain) flag at the first rule in the last part
- Notice the catch-all pattern in the last rule
- Static HTML, Dynamically created:
We will now present another tricky example: assume we have a CGI-script which generates HTML on-the-fly. But the generation takes a lot of time and the generated HTML output is not really sensitive enough that we need it to be generated on every request. It would be enough to have it regenerated from time to time. So, we let the CGI-script, say, page.cgi not only create the HTML code on stdout but write the data to a file named page.html. Now if a request comes in for page.html, we serve it up if exists and was generated not too long ago. If not, we want to run the CGI-script in the background to regenerate it. The user should not see any differences. The following config makes it possible:
RewriteCond %{REQUEST_FILENAME} !-s RewriteRule ^page\.html$ page.cgi [L]
- Blocking some URLs
With the help of the "forbidden|F" flag you can block some URLs according to certain conditions. For example, the following config will block all requests for inlined-in-page.gif when the request contains a ``Referer:'' header which does not end in page-with-gif.html. This way, no one can (theoretically!) include your images into their pages. But in practice this is only true if his browser can send a Referer: header.
RewriteCond %{HTTP_REFERER} !.*/page-with-gif\.html$ RewriteRule ^inlined-in-page\.gif$ - [F]Or you can block all access to a security-sensitive page from IP-address 1.2.3.4:
RewriteCond %{REMOTE_ADDR} ^1\.2\.3\.4$ RewriteRule ^security-page\.html$ - [F]Or to get rid of hits from a specific robot for a specific subtree do the following:
RewriteCond %{HTTP_USER_AGENT} ^HatedFooRobot.* RewriteRule ^/somepath.* - [F]
- Programmed Maps
This example shows the way to do very complicated URL rewriting which cannot be done with the basic functionality of mod_rewrite. There is a RewriteMap filetype ``prg'' - for programs. With this you can setup dynamic maps, i.e. a program which acts like a map. The program gets one key per lookup on stdin and has to provide the value as one line on stdout. If it wants to say ``no value found'' it returns the string: ``NULL''. Here is a trivial example:
RewriteMap foopath-map prg:/usr/local/lib/apache/maps/foopath.pl RewriteRule ^/foo/(.*)$ /foo/${foopath-map:$1}This gives us the ability to program the URL-rewriting stuff as an external program. For this example we take a trivial Perl-script named foopath.pl with the following contents:
#!/usr/local/bin/perl $| = 1; while (<>) { s|bar|quux|; print $_; }This will fork foopath.pl once when Apache starts up and then mod_rewrite will communicate through the stdin/stdout filehandles of foopath.pl with this "map". In the example above this will rewrite /foo/bar/test to /foo/quux/test.
- Partially Forwarded Homepages
Another common situation is the following: You have two webservers, say www.company.dom and www2.company.dom. Each has its own variant of homepages, i.e. the users (or even just some directories or some files) are spread over the two machines. Now you want to provide all pages through www.company.dom virtually. This can be achieved with following configuration (assuming all homepages stay under ``/home'' and the UserDir is ``.www''):
ProxyRequests on RewriteEngine on RewriteRule ^/~([^/]+)/?(.*) /home/$1/.www/$2 RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule ^/home/([^/]+)/\.www/?(.*) http://www2.company.dom/~$1/$2 [P]