CGI Programming FAQ

The Table of contents starts at 0 (preamble). Older (pre-HTML 3) or broken browsers may start it at 1: apologies for any confusion.

Preamble
Basic Questions
HTTP Headers and NPH Scripts
Techniques: "How do I..."
Applications: Is there an existing script to ...
Troubleshooting a CGI application
Further Reading
1. Other FAQs/collections (including online book)
2. Reference Pages

INDEX

Section 0: Preamble

NOTE: the Reply-to address in this FAQ is an autoresponder.   If you
want to write to me, you'll have to set the "To:" line by hand:
mailto:nick@webthing.com

NOTE: the numbering in this document is automatically generated by my
posting software, and will change between postings if new questions are
added (as _may_ happen when I see - or someone contributes - a FAQ I've
previously overlooked :-)

0.1: Changes

Last Modified: July 12th 1997:
* Added question on remote hostname (thanks Alain Deckers)
* Added reference to Login tutorial
* Updated reference for Selena's scripts

[Table of Contents] [Index]

0.2: Notice and Disclaimer

Copyright 1996-7 Nick Kew.

You are free to copy or distribute this document in whole or in part
for any purpose and on any medium you choose, provided: 

      You DON'T do so for profit.
      You DO include this notice and disclaimer in full.

Disclaimer: This information is offered in good faith and in the hope
that it may be of use, but is not guaranteed to be correct, up to date
or suitable for any particular purpose.   The author accepts no liability
in respect of this information or its use.

[Table of Contents] [Index]

0.3: Where to get this document

The homes of this document on the Web are now
* the WebThing Virtual Office, at http://www.webthing.com/:
	URL  http://www.webthing.com/page.cgi/cgifaq
* the Web Design Group, at http://htmlhelp.com/
	URL  http://htmlhelp.com/faq/cgifaq.html

NOTE - If you want to mirror the FAQ on your WWW site, the best document
to use is the HTML version from my autoresponder (see below).   If you're
putting it on a publicly-visible server, please make sure you keep it
up-to-date (if you let me know you have it, I can automate the updates).

Other known sources are:

(1) USENET: posted to newsgroups				(TEXT)
	news:comp.infosystems.www.authoring.cgi
	news:comp.answers
	news:news.answers

(2) RTFM and mirror sites					(TEXT)
	ftp://rtfm.mit.edu/pub/usenet/news.answers/www/cgi-faq

(3) RTFM WWW mirror sites, including			(Partial HTML)
	Europe - http://www.cs.ruu.nl/cgi-bin/faqwais 
	America - http://www.cis.ohio-state.edu/hypertext/faq/usenet/

(4) By EMAIL from my autoresponder 			(HTML or TEXT)
	Send blank email to
		mailto:nick+cgi_text@webthing.com
	or
		mailto:nick+cgi_html@webthing.com
	(depending on which version you want)
**** NOTE CHANGE FROM PREVIOUS AUTORESPONDER SETUP! ****

(5) By EMAIL from the FAQserver at RTFM 			(TEXT)
	Send email to mailto:mail-server@rtfm.mit.edu with
		send usenet/news.answers/www/cgi-faq
	in the body of your message

[Table of Contents] [Index]

0.4: How to contribute to this document?

The WebThing software permits collaborative authoring using your web
browser. When you are reading any entry in this InterFAQ, you can add a
new entry which will then appear as another "more on" subject.
http://www3.pair.com/webthing/
(note: the version at this site is no longer listed in the previous question)

In order to maintain the quality of the FAQ, and avoid inappropriate
'commercial' entries, write permission is limited using an Access Control
List. If you have a contribution to make, send me an email including your
WebThing userid (i.e. what you entered in the registration form) and I'll
add you to the list.

InterFAQ readers - If your browser isn't showing a "new entry" button, then
either you aren't logged in or you're not on the access control list.

Note that this InterFAQ is limited to questions-and-answers appropriate to
periodic Usenet posting. Other types of contribution can be added
elsewhere in the WebCentre. For example

    * If you have a relevant website and want to link to it, enter it the
      appropriate collection (e.g. "scripts" or "misc").    You can then
      also include a description of your site, and have it indexed.
    * If you want to post a question or comment on something in this
      document, you can post it as a followup to the "flat" version of the
      FAQ (library document in the "FAQS" collection). 

If you don't want to use the InterFAQ you can always mail me
( mailto:nick@webthing.com )

[Table of Contents] [Index]

0.5: Can I email the author my questions?

I am not a free advice centre, but in response to persistent questions
I have recently (July '97) opened a commercial help service.
If you're willing to pay, see http://www.webthing.com/support/general.html

If you think something already in the FAQ needs clarifying, feel free
to mail me: don't expect a personal reply, but I *might* add
something to the answer in question, so check the next posting (or three).
The newsgroup is the appropriate place for free advice.   But remember:
bad questions usually get bad answers, so think carefully before posting.

[Table of Contents] [Index]

0.6: What's up with posting to comp.infosystems.www.authoring.cgi?

This is now a moderated newsgroup.   The moderator is a bot run by
Thomas Boutell ( mailto:boutell@boutell.com ).   The charter for
moderation is as follows:

  This newsgroup is self-moderated.  Your first posting will not appear
  until you have read and responded to an automatic welcome mailing, at
  which point your posting will appear with no further delay.  Provision
  will also be made to automatically approve first postings that contain
  a header requesting this.  Subsequent postings are approved
  automatically.

If posting normally doesn't work - as could be the case if your
newsfeed has trouble with moderated groups - you can post articles
by emailing them to:
	mailto:authoring-cgi@boutell.com
Provided the return address in your mail is correct, you will then
receive precise instructions for having your post(s) automatically approved.

Alternative means of posting are detailed in the WWW FAQ, posted
regularly by Thomas Boutell.

[Table of Contents] [Index]

0.7: Credits

This FAQ was written by Nick Kew, and has been considerably improved
with the help of comments and criticisms, newsgroup posts and
miscellaneous suggestions from correspondents including
Nathan Neulinger, Maurice L. Marvin, Matthew Healy, Alan J. Flavell,
Don Libes, Alain Deckers, and no doubt others I've forgotten to
credit (please remind me if necessary).

[Table of Contents] [Index]

Section 1: Basic Questions

This section aims to deal with basic questions, addressing the role and
nature of CGI, and its place in Web programming. Questions/answers which
just don't appear to 'fit' under any other section may also be included
here.

1.1: What is CGI?

[ from the CGI reference http://hoohoo.ncsa.uiuc.edu/cgi/overview.html ]

The Common Gateway Interface, or CGI, is a standard for external
gateway programs to interface with information servers such as HTTP servers.
A plain HTML document that the Web daemon retrieves is static,
which means it exists in a constant state: a text file that doesn't change.
A CGI program, on the other hand, is executed in real-time, so that it
can output dynamic information.

[Table of Contents] [Index]

1.2: Is it a script or a program?

The distinction is semantic.   Traditionally, compiled executables
(binaries) are called programs, and interpreted programs are usually
called scripts.   In the context of CGI, the distinction has become
even more blurred than before.   The words are often used interchangably
(including in this document).   Current usage favours the word "scripts"
for CGI programs.

[Table of Contents] [Index]

1.3: When do I need to use CGI?

There are innumerable caveats to this answer, but basically any
Webpage containing a form will require a CGI script or program
to process the form inputs.

[Table of Contents] [Index]

1.4: Should I use CGI or JAVA?

[answer to this non-question hopes to try and reduce the noise level of
the recurrent "CGI vs JAVA" threads].

CGI and JAVA are fundamentally different, and for most applications
are NOT interchangable.   Neither are the two isomorphic: you could
in principle write a CGI program in JAVA, although it is hard to
think of an instance where this would be the best choice.

CGI is a mechanism for running programs on a WWW server.
Typical applications include accessing a database, submitting
an order, or posting messages to a bulletin board.
JAVA enables programs to run on the Client machine, and is
suited to such tasks as detailed manipulation of an image.
Alternatives to JAVA may include the X windows client/server
protocol, use of browser plugins and helper applications, and
other clientside languages such as SafeTCL and perl/penguin.

In certain instances the two may be combined in a single application:
for example a JAVA applet to define a region of interest from a
geographical map, together with a CGI script to process a query
for the area defined.

[Table of Contents] [Index]

1.5: Should I use CGI or SSI?

CGI and SSI (Server-Side Includes) are often interchangable, and it may
be no more than a matter of personal preference.   Here are a few
guidelines:
  1) CGI is a common standard agreed and supported by all major HTTPDs.
     SSI is NOT a common standard, but an innovation of NCSA's HTTPD
     which has been widely adopted in later servers.   CGI has the
     greatest portability, if this is an issue.
  2) If your requirement is sufficiently simple that it can be done
     by SSI without invoking an exec, then SSI will probably be
     more efficient.   A typical application would be to include
     sitewide 'house styles', such as toolbars, netscapeised <body>
     tags or embedded CSS stylesheets.
  3) For more complex applications - like processing a form -
     where you need to exec (run) a program in any case, CGI
     is usually the best choice.

[Table of Contents] [Index]

1.6: Should I use CGI or an API?

APIs are proprietary programming interfaces supported by particular
platforms.   By using an API, you lose all portability.   If you know
your application will only ever run on one platform (OS and HTTPD),
and it has a suitable API, go ahead and use it.   Otherwise stick to CGI.

[Table of Contents] [Index]

1.7: What do I absolutely need to know?

If you're already a programmer, CGI is extremely straightforward, and just
three resources should get you up to speed in the time it takes to read them:
  1) Installation notes for your HTTPD.   Is it configured to run CGI
     scripts, and if so how does it identify that a URL should be executed?
     (Check your manuals, READMEs, ISP webpages/FAQS, and if you still can't
     find it ask your server administrator).
  2) The CGI specification at NCSA tells you all you need to know
     to get your programs running as CGI applications.
     http://hoohoo.ncsa.uiuc.edu/cgi/interface.html
  3) WWW Security FAQ.   This is not required to 'get it working', but
     is essential reading if you want to KEEP it working!
     http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html

If you're NOT already a programmer, you'll have to learn.   If you would
find it hard to write, say, a 'grep' or 'cat' utility to run from the
commandline, then you will probably have a hard time with CGI.   Make
sure your programs work from the commandline BEFORE trying them with CGI,
so that at least one possible source of errors has been dealt with.

[Table of Contents] [Index]

1.8: Does CGI create new security risks?

Yes.   Period.
There is a lot you can do to minimise these.   The most important thing
to do is read and understand Lincoln Stein's excellent WWW security
FAQ, at http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html .

[Table of Contents] [Index]

1.9: Do I need to be on Unix?

No, but it helps.   The Web, along with the Internet itself, C, Perl,
and almost every other Good Thing in the last 20 years of computing,
originated in Unix.   At the time of writing, this is still the
most mature and best-supported platform for Web applications.

[Table of Contents] [Index]

1.10: Do I have to use Perl?

No - you can use any programming language you please.   Perl is simply
today's most popular choice for CGI applications.   Some other widely-
used languages are C, C++, TCL, BASIC and - for simple tasks -
even shell scripts.

Reasons for choosing Perl include its powerful text manipulation
capabilities (in particular the 'regular' expression) and the fantastic
WWW support modules available.

[Table of Contents] [Index]

1.11: Do I have to put it in cgi-bin?

see next question

[Table of Contents] [Index]

1.12: Do I have to call it .cgi? .pl?

Maybe.   It depends on your server installation.

These types of filenames are commonly used conventions - no more.
It is up to the server administrator whether or not CGI scripts are
enabled, and (if so) what conventions tell the server to run or
to print them.

If you are running your own server, read the manual.
If you're on ISP or other rented webspace, check their webpages for
information or FAQs.   As a last resort, ask the server administrator.

[Table of Contents] [Index]

1.13: What is CGIWrap, and how does it affect my program?

[ quoted from http://www.umr.edu/~cgiwrap/intro.html ]

> CGIWrap is a gateway program that allows general users to use CGI scripts
> and HTML forms without compromising the security of the http server.
> Scripts are run with the permissions of the user who owns the script. In
> addition, several security checks are performed on the script, which will not
> be executed if any checks fail. 
> 
> CGIWrap is used via a URL in an HTML document. As distributed, cgiwrap
> is configured to run user scripts which are located in the
> ~/public_html/cgi-bin/ directory. 

See http://www.umr.edu/~cgiwrap/

[Table of Contents] [Index]

1.14: How do I decode the data in my Form?

The normal format for data in HTTP requests is URLencoded.   All Form data
is encoded in a string, of the form
	param1=value1&param2=value2&...paramn=valuen
Many non-alphanumeric characters are "escaped" in the encoding:
the character whose hexadecimal number is "XY" will be represented by
the character string "%XY".

Decoding this string is a fundamental function of every CGI library.

Another format is "multipart/form-data", also known as "file upload".
You will get this from the HTML markup
<form method="POST" enctype="multipart/form-data">

(but note you must accept URLencoded input in any case, since not all
browsers support multipart forms).

Most(?) CGI libraries will handle this transparently.

[Table of Contents] [Index]

Section 2: HTTP Headers and NPH Scripts

This is a fairly technical section dealing with HTTP, the protocol of
the Web. It also includes NPH, the mechanism by which CGI programs can
return HTTP header information directly to the Client.

2.1: What is HTTP (HyperText Transfer Protocol)?

HTTP is the protocol of the Web, by which Servers and Clients (typically
browsers) communicate.  An HTTP transaction comprises a Request sent by
the Client to the Server, and a Response returned from the Server to
the Client.
Every HTTP request and response includes a message header, describing
the message.   These are processed by the HTTPD, and may often be
mostly ignored by CGI applications (but see below).
A message body may also be included:
  1) A HEAD or GET request sends only a header.   Any form data is encoded 
     in an HTTP_QUERY_STRING header field, which is available to the CGI
     program as an environment variable QUERY_STRING.
  2) A POST request sends both header and body.   The body typically
     comprises data entered by a user in a form.
  3) A HEAD request does not expect a body in the response.
  4) A GET or POST request will accept a response with or without a body,
     according to the header.   The body of a response is typically an
     HTML document.

[Table of Contents] [Index]

2.2: What HTTP request headers can I use?

Most HTTP request headers are passed to the CGI script as environment
variables.   Some are guaranteed by the CGI spec.   Others are server,
browser and/or application dependent.

To see what _your_ browser and server are telling each other, just use
a trivial little CGI script to print out the environment.   In Unix:
	#!/bin/sh
	echo "Content-type: text/plain"
	echo
	set

(Just call it "env.cgi" or something, and put it where your server
will execute it.   Then point your browser at
http://your.server/path/to/env.cgi ).

This enables you to see at-a-glance what useful server variables are set.
Note that dumping the environment like this within a more complex
script can be a useful debugging technique.

For details, see the CGI Environment Variables specification at
http://hoohoo.ncsa.uiuc.edu/cgi/env.html
(which also includes a version of the above script - somewhat more
nicely formatted - online).

[Table of Contents] [Index]

2.3: What Environment variables are available to my application?

See previous question.   Those you can rely on are documented in NCSA's
pages; those associated with your particular server and browser can
be determined using the above script.

[Table of Contents] [Index]

2.4: What HTTP response headers do I need to know about?

Unless you are using NPH, the HTTPD will insert necessary response
headers on your behalf, always provided it is configured to do so.

However, it is conventional for servers to insert the Content-Type header
based on a page's filename, and CGI scripts cannot rely on this.  Hence
the usual advice is to print an explicit Content-Type header.
At least one of "Content-Type", "Status" and "Location" is almost
always required.

A few other headers you may wish to use explicitly are:
Status		(to set HTTP return code explicitly.   Caveats:
		   (1) Behaviour is undefined if it conflicts with
		   another header. (2) This is NOT an HTTP header.)
Location	(to redirect the user to another URI, which may or may
		not be on your own server)
Set-cookie	(Netscape/Nonstandard) Set a cookie
Refresh		(Netscape/Nonstandard) Clientpull

You can also use general MIME headers: eg "Keywords" for the benefit of
indexers (although in this instance some major search robots have
regrettably introduced a new protocol to do the same thing).

The 'official' list of HTTP response headers is at
http://www.w3.org/pub/WWW/Protocols/HTTP/Object_Headers.html

[Table of Contents] [Index]

2.5: What is NPH?

NPH = No Parsed Headers.   The script undertakes to print the entire
HTTP response including all necessary header fields.   The HTTPD
is thereby instructed not to parse the headers (as it would normally do)
nor add any which are missing.

[Table of Contents] [Index]

2.6: Must/should/can I write nph scripts?

Generally, no.   It is usually better to save yourself hassle by letting
the HTTPD produce the headers for you.

If you are going to use NPH, be sure to read and understand the HTTP spec at
http://www.w3.org/pub/WWW/Protocols/

Your headers should be complete and accurate, because you're instructing
the HTTPD not to correct them or insert what's missing.

Possible circumstances where the use of NPH is appropriate are:
  * When your headers are sufficiently unusal that they might be
    differently parsed by different HTTPDs (eg combining "Location:"
    with a "Status:" other than 302).
  * When returning output over a period of time (eg displaying
    unbuffered results of a slow operation in 'real' time).
See http://www.w3.org/pub/WWW/Protocols/HTTP/HTRESP.html

[Table of Contents] [Index]

2.7: Do I have to call it nph-*

According to NCSA's reference pages, this is the standard for telling
the server that your script is NPH, so this should be a fully portable
convention.

[Table of Contents] [Index]

2.8: What is the difference between GET and POST?

Firstly, the the HTTP protocol specifies differing usages for the two
methods.   GET requests should always be idempotent on the server.
This means that whereas one GET request might (rarely) change some state
on the Server, two or more identical requests will have no further effect.

This is a theoretical point which is also good advice in practice.
If a user hits "reload" on his/her browser, an identical request will be
sent to the server, potentially resulting in two identical database or
guestbook entries, counter increments, etc.   Browsers may reload a
GET URL automatically, particularly if cacheing is disabled (as is usually
the case with CGI output), but will typically prompt the user before
re-submitting a POST request.   This means you're far less likely to get
inadvertently-repeated entries from POST.

GET is (in theory) the preferred method for idempotent operations, such
as querying a database, though it matters little if you're using a form.
There is a further practical constraint that many systems have builtin
limits to the length of a GET request they can handle: when the total size
of a request (URL+params) approaches or exceeds 1Kb, you are well-advised
to use POST in any case.

In terms of mechanics, they differ in how parameters are passed to the
CGI script.   In the case of a POST request, form data is passed on
STDIN, so the script should read from there (the number of bytes to be
read is given by the Content-length header).   In the case of GET, the
data is passed in the environment variable QUERY_STRING.   The content-type
(application/x-www-form-urlencoded) is identical for GET and POST requests.

[Table of Contents] [Index]

Section 3: Techniques: "How do I..."

This section comprises programming hints and tips for a number of popular
tasks. Also included are a number of common questions to which the answer
is "you can't", with the reasons why.

3.1: Can I get information about who is visiting?

*sigh*
Many people keep mailing me questions or suggested hacks to get
visitor information, particularly email addresses.   It seems they
won't take "NO" for an answer.

The bottom line is that whatever information is available to _you_
is _equally_ available to every spammer on the net.   Therefore when
a browser bug _does_ permit personal data to be collected, it gets
reported and fixed very quickly (one short-lived Netscape release
reportedly had such a bug).

You can get some limited information from the environment variables
passed to you by the browser.   Relatively few of these are guaranteed
to be available, and some may be misleading.   For particular types
of information, see below.   For full details, see NCSA's reference pages.

[Table of Contents] [Index]

3.2: Can I get the email of visitors?

Why do you want to do this?

The best information available is the REMOTE_ADDR and REMOTE_HOST,
which tell you nothing about the user.   Techniques such as "finger@"
are not reliable, are widely disliked, and generally serve only to
introduce long delays in your CGI.   Better - as well as more polite -
just to ask your users to fill in a form.

BTW: the "From:" header line (HTTP_FROM variable) is usually only set
by robots, since human visitors to your webpage will not normally want
their addresses collected without permission, and browsers respect this.

[Table of Contents] [Index]

3.3: "But I saw some.kool.site display my email address..."

Some sites will play party tricks, which can get *some users* email
addresses.   Possible tell-tale signs of this are inordinate delays
loading a page (fingering @REMOTE_HOST - doesn't often work but
probably can't be detected from the webpage), or a submit button that
appears to do nothing at all (a mailto: link - works quite well but
trivially detectable).   As a "snoop" party trick that's fine, but
if you find someone abusing these facilities (eg they send you
junkmail), alert their service provider!

[Table of Contents] [Index]

3.4: Can I verify the email addresses people enter in my Form?

Unfortunately people will sometimes enter an incorrect or invalid
email address in your Form.   Worse, they may enter a valid but
incorrect email address that will deliver to someone who doesn't
want your mail.

Proposed regexps to match email addresses are sometimes posted.
Most of these will fail against perfectly valid email addresses,
like "S=N.OTHER/OU1=X12345A/RECIPNUM=1/MTA-BASIC@attmail.com"
(which is what your address looks like if you are connected to
the Internet via X400 - and if you think that example is too easy,
check the ones at the end of Eli the Bearded's Email Addressing FAQ).

Probably the most complete parser and checker available for download
is Tom Christiansen's, at
http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/ckaddr.gz
Of course, this still says nothing about deliverability.

A frequently-suggested hack that doesn't work is to use
SMTP EXPN or VRFY commands.   Modern versions of sendmail permit
administrators to disable these commands, and many sites take
advantage of this facility to protect their users' privacy.

Probably the best way to verify an email address is to send mail to
it, asking the user to respond.   Include a clause like "if you have
received this mail in error, please accept our apologies..."

[Table of Contents] [Index]

3.5: Subject: How can I get the hostname of the remote user?

You can't. Well, not always.

IF it is available, you'll find it in the REMOTE_HOST environment
variable.  However, this will more often than not contain the numerical
IP address rather than the IP name of the remote host. Remember that
not all IP addresses have a hostname associated with them; this is the
case of most IP addresses assigned to dialup users, for example. Your
web server may also not perform a reverse lookup on incoming
connections, in which case REMOTE_HOST will contain the IP address even
if it has a corresponding IP name. In the second case, you can do a
reverse lookup yourself in your script, but this is expensive and
should probably be avoided unless absolutely necessary.

Even if you do manage to obtain a hostname, you should be aware that it
may not correspond to the hostname the user is accessing your page
from. It may instead be that of an intervening proxy host.

The short answer is therefore that there is no reliable way of finding
out what the remote user's hostname is.

[Table of Contents] [Index]

3.6: Can I get browser details and return different pages?

Why do you want to do this?

Well-written HTML will display correctly in any browser, so the correct
answer to this question is to design a template for your output in good
HTML, and make sure your output is correct.

If you insist on a different answer, you can use the HTTP_USER_AGENT
environment variable.  This requires care, and can lead to unexpected
results.   For example, checking for "Mozilla" and serving a frameset
to it ensures that you *also* serve the frameset to early (Non-Frame)
Netscapes, me-too browsers (notably MicroSoft) and others who have
chosen to lie to you about their browser.

Note also that not every User Agent is a browser.   Your page may be
read by a user agent you've never heard of, and then displayed by
100 different browsers.   Or retrieved by different browsers from
a cache.   Another reason to write good HTML, and not try to
devise a clever or koool substitute.

[Table of Contents] [Index]

3.7: Can I trace where a user has come from/is going to?

HTTP_REFERER might or might not tell you anything.   By all means
use it to collect partial statistics if you participate in (say)
an advertising banner scheme.   But it is not always set, and may
be meaningless (eg if a user has accessed your page from a bookmark,
and the browser is too dumb to cope with this).

You cannot trace outgoing links at all.   If you really must try,
point all the external links to your HTTPD and use its redirection
facility (which gives you generally-reliable logs).   This is much
less inefficient than using a CGI script.

BTW: don't even think about asking Javascript to send you information
on some event: it's a violation of privacy which Netscape fixed as
soon as complaints about its abuse started coming in.   If it works
with *your* browser, you should upgrade!

[Table of Contents] [Index]

3.8: Can I launch a long process and return a page before it's finished?

[UNIX]
You have to fork/spawn the long-running process.
The important thing to remember is to close all its file descriptors;
otherwise nothing will be returned to the browser until it's finished.
The standard trick to accomplish this is redirection to/from /dev/null:

        exec ("long_process < /dev/null > /dev/null 2>&1 &")
        print HTML page as usual

(don't take "exec" as literal in anything but a shell script - in
C, Perl, etc use fork+exec or system() :-)

[Table of Contents] [Index]

3.9: Can I launch a long process which the user interacts with?

This does not fit well with the basic mechanics of the Web, in which
each transaction comprises a single request and response.
If your processing can be done on the Client machine, you can use
a clientside application; for example a Java applet.

For processing on the server, one trick that works well for Clients
running an X server (and far more efficient than a JAVA solution) is:
  if ( fork() ) {
    print HTML page explaining what's going on and advising about xhost
  } else {
    exec ("xterm -display THEIR_DISPLAY -title MY_APP -e MY_PROG ARGS
        < /dev/null > /dev/null 2>&1 &") ;
  }
NOTE: THEIR_DISPLAY is not necessarily the same as REMOTE_HOST or REMOTE_ADDR.
You have to ask users to supply their display (set REMOTE_HOST as default).

[ Question: Is there a JAVA alternative to xterm yet, for platforms
  which support JAVA but not X ? ]

[Table of Contents] [Index]

3.10: Can I password-protect my pages?

Yes.   Use your HTTPD's authentication, just as you would a basic HTML page.
Now you'll have the identity of every visitor in REMOTE_USER.

[Table of Contents] [Index]

3.11: Can I do HTTP authentication using CGI?

It depends on which version of the question you asked.

Yes, you can use CGI to trigger the browser's standard Username/Password
dialogue.   Send a response code 401, together with a "WWW-authenticate"
header including details of the the authentication scheme and realm:
e.g. (in a non-NPH script)

	Status: 401 Unauthorized to access the document
	WWW-authenticate: Basic realm="foobar"
	Content-type: text/plain

	Unauthorised to access this document

The use you can make of this is server-dependent, and harder,
since most servers expect to deal with authentication before ever
reaching the CGI (eg through .www_acl or .htaccess).
Thus it cannot usefully replace the standard login sequence, although
it can be applied to other situations, such as re-validating a user -
e.g after a certain timeout period or if the same person may need to
login under more than one userid.

What you can never get in CGI is the credentials returned by the user.
The HTTPD takes care of this, and simply sets REMOTE_USER to the
username if the correct password was entered.

For a much longer discussion of this question (with code extracts),
see my discussion at http://www.webthing.com/tutorials/login.html

[Table of Contents] [Index]

3.12: Can I identify users/sessions without password protection?

The most usual (but browser-dependent) way to do this is to set a cookie.
If you do this, you are accepting that not all users will have a 'session'.

An alternative is to pass a session ID in every GET URL, and in hidden
fields of POST requests.   This can be a big overhead unless _every_ page
requires CGI in any case.

Another alternative is the Hyper-G solution of encoding a session-id in
the URLs of pages returned:
	http://hyper-g.server/session_id/real/path/to/page
This has the drawback of making the URLs very confusing, and causes any
bookmarked pages to generate old session_ids.

Note that a session ID based solely on REMOTE_HOST (or REMOTE_ADDR)
will NOT work, as multiple users may access your pages concurrently
from the same machine.

[Table of Contents] [Index]

3.13: Can I redirect users to another page?

For permanent and simple redirection, use the HTTPD configuration file:
it's much more efficient than doing it yourself.   Some servers enable
you to do this using a file in your own directory (eg Apache) whereas
others use a single configuration file (eg CERN).

For more complicated cases (eg process form inputs and conditionally
redirect the user), use the "Location:" response header.
If the redirection is itself a CGI script,  it is easy to URLencode
parameters to it in a GET request, but don't forget to escape the URL!

[Table of Contents] [Index]

3.14: Can I run a CGI script without returning a new page to the browser?

Yes, but think carefully first:  How are your readers going to know
that their "submit" has succeeded?   They may hit 'submit' many times!

The correct solution according to the HTTP specification is to
return HTTP status code 204.   As an NPH script, this would be:

	#!/bin/sh
	# do processing (or launch it as background job)
	echo "HTTP/1.0 204 No Change"
	echo

Alan J Flavell has pointed out that this will fail with certain
popular browsers, and suggests a workaround to accommodate them:

> 1. Send status 204, Content-type of text/html, and a short body content
> that (for those few browsers that display it) will tell the reader that
> their browser does not handle this reponse correctly, and invites them
> to use their browser's Back function (hey, if someone tells me to put
> a back button on the HTML page itself, I think I shall scream...).

His survey is at
http://ppewww.ph.gla.ac.uk/%7Eflavell/status204/results.html

[Table of Contents] [Index]

3.15: Can I write output to a different Netscape frame?

Yep.   The fact you're using CGI makes no difference: use
"target=" in your links as usual.   Alternatively, the script
can print a "Window-target:" header.   Read Netscape's pages
for detail: these answer all the questions about things like
"getting rid of" or "breaking out of" frames, too.

[Table of Contents] [Index]

3.16: Can I write output to several frames at once?

A single CGI script can only ever print to one frame.

However, this limitation may be overcome by using more than one script.
The first script (the URL of the "submit" button) prints a frameset,
typically to a "_parent" or "_top" target.   The sources for one or
more of the frames thus generated may also be CGI scripts, to which
you can easily pass parameters (eg encoded in URLs with method GET).
This hack is definitely not recommended.   If you find yourself wanting
to update several frames from a single user event, it probably means
you should review the design of your application at a higher level.

Warnings:
 1. Don't forget to escape your URLs.
 2. This technique results in your server being hit by multiple 
    concurrent CGI requests.   You'll need LOTS of memory, especially
    if you use a memory-hog like Perl.   It can be a good recipe
    for bringing a server to its knees.

Javascript is often a valid alternative here, but note just how silly
it can (and often does) look in a different browser.

[Table of Contents] [Index]

3.17: Can I use a CGI script to generate both text and inline images?

Not directly.   One script generates one response to one request.

If you want to generate a dynamic page including dynamic images
(say, a report including graphs, all of which depend on user input)
then your primary script will print the usual
   <img src="[script-to-generate-image]" alt="[what you asked for]">
and, just as in the multiple frames case, you can pass data to the
image-generating program encoded in a GET URL.   Of course, the same
caveats apply: see above.

[Table of Contents] [Index]

3.18: How can I use Caches to make CGI scripts faster and more Net-friendly?

This is currently beyond the scope of this FAQ (whose author urgently
needs to improve his own applications in this regard).   However,
there is an excellent introduction to net-friendly webpages, including
CGI pages, at http://vancouver-webpages.com/CacheNow/

A sample cacheing perl/cgi script by Andrew Daviel is available at
http://vancouver-webpages.com/proxy/log-tail.pl

[Table of Contents] [Index]

3.19: How can I avoid users hitting "submit" twice?

You can't.   You just have to deal with it when they do.

You can avoid re-processing a submission by embedding a unique ID in your
Form each time it is displayed.   When you process the form, you enter
the ID in a database.  Or, if it's already there, you don't repeat the
processing.

You probably want to expire your database entries after a little time:
an hour should be fine in a typical situation.

If you're already using cookies (e.g. a shoppingcart), an alternative is
to use the cookie as a unique identifier.   This means you also have to
handle the situation where a user deliberately "goes round twice" and
submits the same form with different contents.

If your script may take some time to process, you should also consider
running it as a background job, and returning an immediate
acknowledgement to the user (see above if your "immediate" response
gets delayed until processing is complete in any case).

[Table of Contents] [Index]

3.20: How can I stop my CGI script reading and writing files as "nobody"?

CGI scripts are run by the HTTPD, and therefore by the UID of the HTTPD
process, which is (by convention) usually a special user "nobody".

There are two basic ways to run a script under your own userid:
(1) The direct approach: use a setuid program.
(2) The double-server approach: have your CGI script communicate
    with a second process (e.g. a daemon) running under your userid,
    which is responsible for the actual file management.

The direct approach is usually faster, but the client-server architecture
may help with other problems, such as maintaining integrity of a database.

When running a compiled CGI program (e.g. C, C++), you can make it
setuid by simply setting the setuid bit:
e.g. "chmod 4755 myprog.cgi"

For security reasons, this is not possible with scripting languages
(eg Perl, Tcl, shell).   A workaround is to run them from a setuid
program, such as cgiwrap.

In most cases where you'd want to use the client-server approach,
the server is a finished product (such as an SQL server) with its
own CGI interface.
A lightweight alternative to this is Don Libes' "expect" package.

Note that any program running under your userid has access to all your
files, and could do serious damage if hacked.   Take care!

[Table of Contents] [Index]

Section 4: Applications: Is there an existing script to ...

There are a lot of applications available.   For all the tasks
listed here, there are free systems you can download and install
yourself (at least if you're on UNIX).   Many are excellent.

Before ever *buying* software, do a Net search on what you want and
check what freeware is available.   Does the commercial system you
had in mind *really* have any advantages?   If you can't follow
the jargon they use to explain the merits of their system, insist
on some clarification (hey, that's not just for Web software :-)

Most questions under this heading are probably best answered by
reference to appropriate review sites on the Web (in many cases,
Thomas Boutell's WWW FAQ).   In cases where I know of one or more
good sites, I've referenced them.

4.1: Where to look for programs, scripts, and other resources?

Matt Wright - himself author of many popular CGI resources - has
recently (March.97) opened a new website dedicated to this subject:

http://www.cgi-resources.com/

I am happy to recommend this as a rich and well-organised collection,
and probably the best of its kind on the Web today.

[Table of Contents] [Index]

4.2: Where to look for free scripts for my application?

(see also previous question, which should perhaps replace this one altogether)

Some popular places to look for a wide range of free CGI applications are:

Selena Sol's Public Domain CGI Scripts
http://www.extropia.com/Scripts/

Matt Wright's Script Archive
http://www.worldwidemart.com/scripts/

Dale Bewley has a much longer list of script archives
(along with his own scripts) at
http://www.engr.iupui.edu/~dbewley/perl/

[Table of Contents] [Index]

4.3: Discussion group/bulletin board

David R Woolley maintains a list of currently around 100 systems at
http://freenet.msp.mn.us/~drwool/webconf.html
("Conferencing on the Web").

[Table of Contents] [Index]

4.4: CSCW/Groupware

There are several overview sites for this.   A few are:

The CSCW Yellow Pages, at
http://www11.informatik.tu-muenchen.de/cscw/yp/YP-index-type.html

NCSA Web Collaboration pages, at
http://union.ncsa.uiuc.edu/HyperNews/get/www/collaboration.html

[Table of Contents] [Index]

4.5: Database

This subject deserves its own FAQ.   When someone recently asked about one,
Matthew.Healy@yale.edu (Matthew D. Healy) posted this answer (slightly chopped)

> : Is there a CGI and Database FAQ available?
> : If so, could someone tell me where can I get it?
> 
> Dunno about a FAQ on that.  I can recommend a couple of published
> works, however:
> 
> 1. I wrote a chapter about CGI/Database work for the book
> {Special Edition Using CGI}.  Fulltext is online at the
> publisher's WWW site:
> 
> http://www.mcp.com/que/et/se_cgi/  The book
> http://www.mcp.com/que/et/se_cgi/Cgi13fi.htm  My chapter on WWW/DBMS
> 
> 2. Jeff Rowe wrote an excellent book, {Building Internet Database
> Servers With CGI}.  URL for more info:
> 
> http://cscsun1.larc.nasa.gov/~beowulf/db/existing_products.html
> 
> Jeff's WWW site has scads of useful information on WWW/DBMS programming,
> and pointers to lots more sites.

Matthew's CGI links page at http://ycmi.med.yale.edu/~healy/cgilinks.html
expands the list, and includes links to popular packages including
Bo Frese Rasmussen's WDB at http://venus.dtv.dk/~bfr/wdb/

[Table of Contents] [Index]

4.6: Is than a non-setuid script to allow users to change password?

Yes.  Here is an example:

        http://pitch.nist.gov/cgi-bin/cgi.tcl/passwd-form.cgi

It is an Expect script that wraps itself around the passwd command.
With this technique, there is no need to make scripts setuid (e.g.,
cgiwrap).  This same technique lends itself to many other scripts that
might otherwise need setuid.

(contributed by Don Libes <libes@cme.nist.gov>)

[Table of Contents] [Index]

Section 5: Troubleshooting a CGI application

Since this subject is quite well conered by other documents, this FAQ has
relatively little to say. 

Tom Christiansen's "Idiot's guide to solving Perl/CGI problems" is a
slightly tongue-in-cheek list of common problems, and how to track
them down.  Much of what Tom covers is not specifically Perl, but
applies equally to CGI programming in other languages. 

Marc Hedlund's CGI FAQ and Thomas Boutell's WWW FAQ also
deal with this subject. 

See "Further Reading" below (if you don't already know where to find these
documents).

5.1: Are there some interactive debugging tools and services available?

(1) Several CGI programming libraries offer powerful interactive
    debugging facilities.   These include:

	- for Perl, Lincoln Stein's CGI.pm
	http://www-genome.wi.mit.edu/ftp/pub/software/WWW/cgi_docs.html

	- for Tcl, Don Libes' cgi.tcl
	http://expect.nist.gov/cgi.tcl

	- for C++, Nick Kew's CGI++
	http://www3.pair.com/webthing/cgiplusplus/

(2) Nathan Neulinger's cgiwrap is another package with debugging aids.
http://www.umr.edu/~cgiwrap/

(3) The "mod_cgi" Apache module (new with Apache 1.2) enables you to
capture script output and errors for diagnosis.

See also the next question.

[Table of Contents] [Index]

5.2: I'm having trouble with my headers. What can I do?

For simple cases, examining your response headers "by hand" may suffice:
(1) telnet to the host and port where the server is running - e.g.
        telnet www.myhost.com 80
(2) Enter HTTP request.   The most useful for this purpose is usually HEAD; eg
        HEAD /index.html HTTP/1.0
        (optionally other headers)
        (followed by a blank line)
Now you'll get a full HTTP response header back.

For complex cases, such as sending a request with several headers
(as a browser does) or POSTing a form, there is a free diagnosis
service at the WebThing WebCentre.   This will take a request from your
browser (eg form inputs) and forward the identical request to your
server, printing a full report of your request (request headers and
form data) and the response from your server (response headers and data).
http://www3.pair.com/webthing/

[Table of Contents] [Index]

5.3: Why do I get Error 500 ("the script misbehaved", or "Internal Server Error")

Your script must follow the CGI interface, which requires it to print:
(1) One or more Header lines.
(2) A blank line
(3) (optional, but strongly advised) a document body.

This error means it didn't.

The Header lines can include anything that's valid under HTTP, but must
normally include at least one of the special CGI headers:
	Content-Type
	Location
	Status

Example (a very minimal HTML page via CGI)
Content-Type: text/html			<= Header
					<= Blank Line
<title>HelloWorld</title>Hello World	<= Document Body

A common reason for scripts failing is that they crash before printing
the header and blank line (or while these are buffered).  Another possible
reason is that it printed something else - like an error message - in
the Headers.   Check error logs, put a dummy header right at the top (for
debugging only), check the "Idiot's Guide", and use the debug mode of your
CGI library.

[Table of Contents] [Index]

5.4: I tried to use (Content-Type|Location|whatever), but it appears in my Browser?

That means you put the line in the wrong place.  It must appear in the
CGI Header, not the document body.  See previous question.

It's also possible that you didn't print a header at all, or had a blank
line or other noise before or in the header, but that the HTTPD has
corrected this error for you (servers which correct your errors may give
rise to the "works on A not on B" phenomenon).   See previous question.

[Table of Contents] [Index]

Section 6: Further Reading

6.1: Other FAQs/collections (including online book)

****	Lincoln Stein's FAQ is probably the most	****
****	important WWW document you will ever read.	****

Special Edition Using CGI (full book text available online)
http://www.mcp.com/que/et/se_cgi/

The Web Authoring FAQ by 'Galactus' Engelfriet and John Pozadzides
http://htmlhelp.com/links/wdgfaq.htm
(although at the time of writing the online version appears to be a little
behind the updated drafts posted).

For general WWW issues, the World Wide Web FAQ by Thomas Boutell
http://www.boutell.com/faq/

Another CGI FAQ, by Marc Hedlund
http://www.best.com/~hedlund/cgi-faq/

Perl/CGI programming FAQ, by Shishir Gundavaram and Tom Christiansen
http://www.perl.com/perl/faq/perl-cgi-faq.html

The Idiot's Guide to solving Perl/CGI problems by Tom Christiansen
http://www.perl.com/perl/faq/idiots-guide.html

The WWW Security FAQ by Lincoln Stein
http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html

CGI Resources Library
http://www.cgi-resources.com/

The WWW Virtual Library
http://WWW.Stars.com/Vlib/

[Table of Contents] [Index]

6.2: Reference Pages

The Common Gateway Interface (CGI)
http://www.ast.cam.ac.uk/%7Edrtr/cgi-spec.html
http://hoohoo.ncsa.uiuc.edu/cgi/interface.html

HyperText Transfer Protocol (HTTP)
http://www.w3.org/pub/WWW/Protocols/HTTP/

HyperText Markup Language (HTML)
http://www.w3.org/pub/WWW/MarkUp/

Up to Table of Contents

INDEX

The index is generated from an arbitrary list of keywords. If I've missed anything obvious that should be here, please let me know.