home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Internet Info 1997 December
/
Internet_Info_CD-ROM_Walnut_Creek_December_1997.iso
/
www_text
/
www-servers.txt
< prev
next >
Wrap
Text File
|
1996-07-29
|
15KB
|
325 lines
An Information Provider's Guide to Web Servers
INTRODUCTION
This document is an introduction to the programs that provide
information on the World Wide Web. It is not an introduction to the Web
--- see the parallel document ``World Wide Web Primer'' for this (see
the section ``How to obtain this document'' for instructions on
obtaining it). It describes the current HTTP servers and their relative
features, as well as discussing whether an HTTP is even necessary.
This document is available on the Web, as well as being posted
fortnightly to the Usenet newsgroups comp.infosystems.www,
alt.hypertext, news.answers, comp.answers and alt.answers. It is
available as LaTeX, plain ASCII, DVI and Postscript files via anonymous
ftp. For instructions on retrieving the latest version of this
document, consult the last section, called ``How to obtain this
document''.
This document was last revised on Wed Sep 15 15:26:45 NZT 1993by Nathan
Torkington.
TABLE OF CONTENTS
Introduction
Table of Contents
Hypermedia or Not?
Load Generated by Hypermedia Servers
CERN server
NCSA server
Plexus server
Obtaining This Document
HYPERMEDIA OR NOT?
You will probably want to provide information in a hypermedia format
(HTML) rather than plain text, because of the power of HTML. Not only
can you represent plain text in HTML, but you can also represent
gopher-like menus, true hypertext (where certain words in a paragraph
can bring up other pages), and hypermedia (where images and audio can be
the destination of a hyper-link). If you only want to provide plain
text and menus, you might want to try something like gopher (which is
accessable by Web browsers).
Running an HTTP isn't the only way to put hypermedia on the Web
(browsers can access FTP sites and gopher servers). This is because (on
the Web) the protocol and the data format are different --- you can
provide both hypermedia and plain text via FTP and HTTP. Knowing this,
you can decide how you want to provide information.
FTP servers have the benefit that they may already be set up at your
site, they have reasonable logging, and automatic filename indexing (via
archie). There is a lot of behind-the-scenes overhead for the browser
programs in obtaining files via FTP, however (the anonymous user and
password need to be sent each time a file is retrieved). Because
hypertext often consists of little files, with lots of links, this
overhead may prove the deciding factor against using an FTP server.
Gopher also has automatic title indexing (Veronica), and a fairly simple
setup struture (see Chapter n). Because gopher's HTML type isn't
recognised by gopher clients, hypertext cannot be easily served through
gopher and for this reason I recommend setting up a gopher server only
if you don't need to serve hypertext.
HTTP servers are geared toward hypertext, and because plain text is a
degenerate case of hypertext they do equally well at serving plain text.
There are three main HTTP servers in use, and all three are briefly
described below.
LOAD GENERATED BY HYPERMEDIA SERVERS
The load generated by the servers varies, depending on the task
requested by the browser. Resource intensive tasks such as searching
files, translating between data formats, or starting other programs,
will cause a larger load than simple document delivery. In general, a
well-used server such as that run by NCSA or CERN, should sit on a
devoted low-to-mid-range machine, whereas less-used servers can exist
quite happily on a multiuser machine.
CERN SERVER
CERN is a high-energy physics organisation, based in Switzerland. They
started the World Wide Web project, and provided much of the initial
software that helped it gain acceptance. E-mail regarding the server
should be sent to www-bugs@info.cern.ch.
Features
Remapping of requests This enables requests for files to remapped onto
requests for other files, not necessarily on the
same server. For instance, I can specify in my
rule file that requests for /cern/* can be
remapped into requests for http://info.cern.ch/*
and the server will remap requests anything in the
/cern/ directory to requests for files from a
machine in Switzerland.
Mapping from filename suffix to file type
You can specify rules to convert file suffixes
(.tex, for instance) for instance, onto MIME types
(application/tex, in this case).
HTTP/1.0 ability The initial, simple, implementation of HTTP had no
way of specifying which data formats clients could
cope with, which version of HTTP was being used,
and no MIME typing. HTTP/1.0 is a version of HTTP
which does provide all these features (and more).
Ability to act as a gateway to WAIS
Using the remapping above, wais: queries can be
passed on to other machines.
Ability to act as a gateway through a firewall
Also implemented using the remapping feature.
Both standalone and inetd capability
Being able to be run ``standalone'' means that you
don't have to be superuser to use it. ``inetd''
is a Unix system utility that provides a nice
interface between TCP ports and programs, but
requires superuser access to add programs to.
Automatically presents directory listings nicely
When a browser requests a directory, rather than a
file, the server will produce a menu of the files
in the directory.
Support for README files
If a browser requests a directory, and there is a
README file present, the server will prepend the
README file to the directory listing.
Multiformat documents If you have the same document stored in multiple
formats, the server will return a format that the
browser can understand (if the browser is using
the HTTP/1.0 protocol).
Logging For each request, the server logs the date, time,
IP number of the machine originating the request,
and the text of the request (without the HTTP/1.0
MIME information).
Access Control Simple user authentication and access control is
new in this version.
The CERN server is available as
ftp://info.cern.ch/pub/www/src/WWWLineMode_XXX.tar.Z where XXX is the
latest version number. It requires the CERN WWW library, available as
ftp://info.cern.ch/pub/www/src/WWWLibrary_XXX.tar.Z where XXX is the
library version number.
It will compile automatically for most systems.
NCSA SERVER
NCSA, the National Centre for Supercomputing Applications, is based out
of the University of Illinois at Urbana-Champaign, in the USA. They are
responsible for providing the Mosaic series of browsers, and
accelerating the acceptance of Web browsers. E-mail regarding the
server should be sent to httpd@ncsa.uiuc.edu.
Features
Simple It consists of less than ten source-code files,
and is easy to install because of it.
Can operate from a gopher setup
The server will map the gopher .cap and
.linksfiles into menus, when directories are
requested.
inetd and standalone support
See the same section in the description of the
CERN server.
Logging For each request, the server logs the hostname of
the machine originating the connection, the date,
time, and the request (without the HTTP/1.0 MIME
information).
HTTP/1.0 ability See the same section in the description of the
CERN server.
Because of its extreme simplicity, the NCSA server will compile readily
on most systems. It is available in
ftp://ftp.ncsa.uiuc.edu/Web/ncsa_httpd/. It is a good place to start if
you are already running a gopher server.
PLEXUS
Plexus is written by Tony Sanders (sanders@bsdi.com) and is written in
perl (an interpreted language, suitable for most text-processing and
system management tasks).
Features
Written in perl Because perl is an interpreted language, there is
no compilation step between changing the code and
running it. Because of this, making changes to
the code is quicker than changing (for instance)
the CERN code.
Built-in setext, archie, calendar, manual page and finger gateways
These provide excellent base services for a local
server, as well as giving good indication on how
to implement new gateways.
Easily extendable The code is exceptionally easy to understand and
add to, and perl is not difficult to learn.
Access control on a per-directory basis
You can deny or permit access to files in
directories based on the IP address/hostname of
the machine the browser is running from.
Recommended only as stand-alone
Because the perl interpreter is rather large, it
is not recommended that Plexus be run from inetd
(which would run perl for each connection),
although it does have inetd support if you really
want to do this.
Logging For each request, Plexus logs the hostname of the
machine originating the connection, the date and
time, and the text of the request.
HTTP/1.0 ability See the same section in the description of the
CERN server.
Perl is available in ftp://ftp.uu.net/pub/languages/perl/.
Configuring Plexus
(this section is for release 2 of Plexus. Release 3 will probably have
a different system).
The configuration for Plexus is done in the file plexus.conf, and via
environment variables. The environment variables are:
$HTTPD The directory base from which the server can serve
files.
$HTTPD_CONF The configuration file (relative to the directory
base).
$HTTPD_DEBUG Whether debugging should be turned on.
The variables to set in plexus.conf are:
$http_support This should be HTML that describes how to report
errors. For instance,
'<ADDRESS>www-admin@vuw.ac.nz</ADDRESS>.
$http_homepage The file (relative to the directory base above)
that should be returned if the user requests
http://host/
$http_index The filename in a directory that should be
returned if the user requests http://host/path/
$http_log The filename to place log messages in. This does
not need to be relative to the directory base.
$http_indexdirs Set to 1 if directories should be indexed, 0
otherwise.
$http_chroot If non-zero, the server should use the chroot()
call.
Also inside plexus.conf are the mappings which decide which HTTP
commands are understood. These look like:
$method{'get'} = "do_get";
Any mapping commented out with a # at the start of the line, will not
have the corresponding command recognised by the server. The mappings
commented out in the distribution are the mappings which the server
doesn't have code for (they are only included for completeness).
After the mappings in plexus.conf are the configuration options for the
methods, the configuration options for the gateways, and the list of
scripts to load. Gateways are implemented by mapping URLs like
http://host/specialstring/blah into a request for blah from the gateway.
These mappings are called translations, and are defined after the list
of scripts to load.
The mapping from filename extensions to MIME content types is done
through the list of assignments after the translations. These all look
like
$ext{'dvi'} = $ext{'DVI'} = 'application/dvi';
Similarly the MIME encoding definitions follow those for content types.
The remainder of plexus.conf is all internal to plexus and should not be
changed.
HOW TO OBTAIN THIS DOCUMENT
The latest version of this document is always available on the Web as
http://www.vuw.ac.nz/non-local/gnat/www-servers.html, and the most
recently posted ASCII version will be available via anonymous FTP from
rtfm.mit.edu in the directory /pub/usenet/news.answers/www as servers.
The ASCII, LaTeX, DVI, and PostScript versions will be available via
anonymous FTP from wuarchive.wustl.edu in the directory /doc/misc/www/.
This document is part of a series: ``World Wide Web Primer'', ``An
Information Provider's Guide to HTML'', and ``An Information Provider's
Guide to Web Servers''. The other documents in the series are available
from the archives above.
Please send feedback to the author, Nathan Torkington, at the e-mail
address Nathan.Torkington@vuw.ac.nz --- all discussion will be treated
as public domain and may be used in future versions of this document.