home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.robelle3000.ai 2017
/
ftp.robelle3000.ai.tar
/
ftp.robelle3000.ai
/
papers
/
www.txt
< prev
Wrap
Text File
|
2015-10-30
|
48KB
|
1,217 lines
Client/Server, the Internet, and WWW
By David J. Greer
Abstract
Much of the Internet was made possible by client/server
computing. The World Wide Web (WWW) is a means of providing
hypertext access to the Internet using client/server protocols.
The WWW allows you to point at links to text, pictures, music, or
video located on servers anywhere in the world and then play the
files on your local client PC, workstation or terminal (along
with more links to related information). You never need to know
where the information is located or learn any obscure commands to
access it.
This presentation will teach you how the WWW client/server
architecture works, how to set up your own WWW server for MPE or
HP-UX, and what the differences are among various WWW clients.
You will also receive useful tips about how to find information
on the Web. David Greer set up Robelle's WWW service and he
participates in the development of Lynx, the character-mode WWW
client. David is the President of Robelle Consulting Ltd. and
the person in charge of Research and Development for its Qedit
and Suprtool products.
Robelle Consulting Ltd.
Unit 201, 15399-102A Ave.
Surrey, B.C. Canada V3R 7K1
Toll-free: 1-800-561-8311
Phone: (604) 582-1700
Fax: (604) 582-1799
E-mail: david_greer@robelle.com
WWW: http://www.robelle.com
Copyright Robelle Consulting Ltd. 1995-1996
Permission is granted to reprint this document (but not
for profit), provided that copyright notice is given.
Client/Server, the Internet, and WWW
http://www.robelle.com/www-paper/overview.html
By David J. Greer
Overview
The World Wide Web (WWW) (http://www.w3.org) is a collection of
servers distributed all over the world that respond to various
clients. The WWW allows you to click on links to text, pictures,
music, or video located on these servers and then to play the
selected files on your local client PC, workstation, or terminal,
along with more links to related information. You never need to
know where the information is located or to learn any obscure
commands to access it.
The on-line version of this paper is available as a linked set of
files (http://www.robelle.com/www-paper/overview.html) or as a
large single file (http://www.robelle.com/www-paper/paper.html).
Downloading this paper as a single file may take some time, but
has the advantage of making it convenient to save or print the
entire paper with your Web browser.
To help you understand the World Wide Web, we have organized this
paper into these major sections:
WWW Introduction (http://www.robelle.com/www-paper/intro.html)
To understand the WWW, it helps if you understand some basic
Web concepts. Fundamental to this understanding is the
concept of client/server computing on a global scale.
The Language of the Web
(http://www.robelle.com/www-paper/language.html)
Whether you're reading WWW documents or creating your own, it
helps if you understand the basic components of the WWW
language.
WWW Clients (http://www.robelle.com/www-paper/clients.html)
One powerful feature of the WWW is that the information you
publish on your server can be read by many different clients.
In this section, we provide a quick introduction to some of
the popular WWW clients.
WWW Servers (http://www.robelle.com/www-paper/servers.html)
If you want to make your own information available to WWW
clients, you'll want to set up your own server. In this
section, we discuss some common WWW server software and give
our suggestions for how WWW server information should be
designed.
Interesting Places to Visit
(http://www.robelle.com/www-paper/links.html)
The WWW is a big place. Here are a few pointers to some of
the things that we have liked or found useful.
Summing it Up (http://www.robelle.com/www-paper/summary.html)
These are our parting thoughts on client/server, WWW, and the
Internet.
Bibliography (http://www.robelle.com/www-paper/bib.html)
A short list of books that we have found very useful for
learning more about the WWW.
Jump on board for a ride on the Web. We hope that you'll find
enough information here to join us with your own WWW information.
Introduction
The WWW is a new way of viewing information -- and a rather
different one. If, for example, you are viewing this paper as a
WWW document, you will view it with a browser, in which case you
can immediately access hypertext links. If you are reading this
on paper, you will see the links indicated in parentheses and in
a different font. Keep in mind that the WWW is constantly
evolving. We have tried to pick stable links, but sites
reorganize and sometimes they even move. By the time you read
the printed version of this paper, some WWW links may have
changed.
The World Wide Web
The WWW project has the potential to do for the Internet what
Graphical User Interfaces (GUIs) have done for personal computers
-- make the Net useful to end users. The Internet contains vast
resources in many fields of study (not just in computer and
technical information). In the past, finding and using these
resources has been difficult.
The Web provides consistency: Servers provide information in a
consistent way and clients show information in a consistent way.
To add a further thread of consistency, many users view the Web
through graphical browsers which are like other windows
(Microsoft Windows, Macintosh windows, or X-Windows) applications
that they use.
A principal feature of the Web is its links between one document
and another. These links, described in the section on hypertext,
allow you to move from one document to another. Hypertext links
can point to any server connected to the Internet and to any type
of file. These links are what transform the Internet into a web.
A History of the Web
The Web project was started by Tim Berners-Lee at the European
Particle Physics Laboratory (CERN) in Geneva, Switzerland. Tim
wanted to find a way for scientists doing projects at CERN to
collaborate with each other on-line. He thought of hypertext as
one possible method for this collaboration.
Tim started the WWW project at CERN in March 1989. In January
1992, the first versions of WWW software, known as Hypertext
Transfer Protocol (HTTP), appeared on the Internet. By October
1993, 500 known HTTP servers were active. When Robelle joined
the Internet in June 1994, we were about the 80,000th registered
HTTP server. By the end of 1994, it was estimated that there
were over 500,000 HTTP servers. Attempts to keep track of the
number of HTTP servers on the Internet have not been successful.
Programs that try to automatically count HTTP servers never stop
-- new servers are being added constantly.
On-Line versus Batch
This paper is available on the World Wide Web (on-line) or as a
paper document (batch). If you are reading this via Robelle's
WWW Service, (http://www.robelle.com) you probably already know
how to access the on-line version.
Much of the value of the Web lies in its links between one
document and another. When you view this paper with a WWW
browser, the links are hidden from you. When you read the text
or paper copy of this paper, you see the links in parentheses.
Because links tend to be long, they do not format well in the
text and paper versions. Since more than half the effort of
writing this paper went into finding and testing the links, we
have left them in the text and printed versions, despite their
distracting appearance. We will describe what the links mean a
little later.
What is Hypertext?
Hypertext provides the links between different documents and
different document types. If you have used Microsoft Windows
WinHelp system or the Macintosh
(http://emu.mit.edu/mac_resource.html) hypercard application, you
likely know how to use hypertext. In a hypertext document, links
from one place in the document to another are included with the
text. By selecting a link, you are able to jump immediately to
another part of the document or even to a different document. In
the WWW, links can go not only from one document to another, but
from one computer to another.
Client/Server Computing
The last few years have seen an explosion of information about
client/server computing. For many people, the definition of
client/server is still unclear. We describe it as a method of
distributing applications over one or more computers. A client
is one process that requests services of another process. These
processes can be on different computers or on the same computer.
The processes communicate via a networking protocol.
People often think of client/server computing in terms of local
area networks, PCs with graphical user interface capabilities,
and servers with information that is needed by the PC clients.
You do not have to implement client/server computing this way.
It is possible for the same computer to be both the client and
the server. The key point is that there is a communications
protocol that allows two processes (often on different computers)
to request and to respond to demands for services.
The Hypertext Transfer Protocol
When you use a WWW client, it communicates with a WWW server
using the Hypertext Transfer Protocol (HTTP)
(http://www.w3.org/pub/WWW/Protocols/). When you select a WWW
link, the following things happen:
1. The client looks up the hostname and makes a connection with
the WWW server.
2. The HTTP software on the server responds to the client's
request.
3. The client and the server close the connection.
Compare this with traditional terminal/host computing. Users
usually logon (connect) to the server and remain connected until
they logoff (disconnect). An HTTP connection, on the other hand,
is made only for as long as it takes for the server to respond to
a request. Once the request is completed, the client and the
server are no longer in communication.
WWW clients use the same technique for other protocols. For
example, if you request a directory at an anonymous FTP site
(e.g., ftp://ftp.robelle.com), the WWW client makes an FTP
connection, logs on as an anonymous user, switches to the
directory, requests the directory contents, and then logs off the
FTP server. If you then select a file, the WWW client once again
makes an FTP connection, logs on again, changes directories,
downloads the file, and then logs off. If you use an FTP client
to do the same thing, you would normally log on to the FTP
server, change directories several times, and download one or
more files. Only when you were finished would you log off.
The Internet
The Internet is the world's largest interconnected computer
network. Computers on the Internet communicate using the
Internet Protocol (IP) and the Transmission Control Protocol
(TCP). You identify individual computers by their IP-address.
This address is a 32-bit number that is usually represented by
four octets (e.g., 192.40.254.0). Fortunately, you can usually
refer to a computer by its name (e.g., www.robelle.com
(http://www.robelle.com)).
If you can send network packets to one computer on the Internet,
you can send network packets to any computer on the Internet.
This feature is what makes the Internet so powerful; it is also
what concerns system managers. If you can send packets to the
Internet, it follows that anyone can send packets to your
computer, even the PC on your desktop.
Accessing the Internet
If you are reading the text or paper version of this paper,
you're probably wondering "How do I get started on the Internet?"
It is much easier to connect an individual PC and a modem to the
Internet than it is to connect a server like an HP 3000 or HP
9000. We suggest that you find a local Internet access provider
to connect your PC to the Net. Most access providers include
everything you need to log on and start exploring. In addition,
several books on connecting to the Internet also provide all the
software and the telephone numbers of Internet access providers
you need to get started.
Once you're connected to the Internet, you can begin
investigating many of the sites described in this paper. You
will also be able to access and download much of the software
needed to create your own WWW application which, as we discuss
further on, can be of help to you, even if you never plan to
connect your servers to the Internet.
The Language of the Web
In order to use the WWW, you must know something about the
language used to communicate in the Web. There are three main
components to this language:
Uniform Resource Locators (URLs)
URLs provide the hypertext links between one document and
another. These links can access a variety of protocols (e.g.,
ftp, gopher, or http) on different machines (or your own
machine).
Hypertext Markup Language (HTML)
WWW documents contain a mixture of directives (markup), and
text or graphics. The markup directives do such things as
make a word appear in bold type. This is similar to the way
UNIX users write nroff or troff documents, and MPE users write
with Galley, TDP, or Prose. For PC users, this is completely
different from WYSIWYG editing. However, a number of tools
are now available on the market that hide the actual HTML.
Common Gateway Interfaces (CGI)
Servers use the CGI interface to execute local programs. CGIs
provide a gateway between the HTTP server software and the
host machine.
Uniform Resource Locators (URLs)
Uniform Resource Locators
(http://www.w3.org/hypertext/WWW/Addressing/URL/Overview.html)
(URLs) specify the access-method (how), the server name (where),
and the location (what) needed for a WWW client to find and
access a WWW object. The general form of a URL is
access-method://server-name[:port]/location
Access Methods
The three most popular access methods are
http:
This is the method provided by WWW servers. It includes
hypertext linking, the hypertext markup language, and server
scripts.
gopher:
Gopher (gopher://gopher.micro.umn.edu) was developed at the
University of Minnesota as a distributed campus information
service. There are gopher servers everywhere -- many of them
provide campus-wide information systems. Gopher information
is organized into menus. Because hypertext provides the same
services as gopher and more, many sites are moving from
gopher-supplied information to WWW-supplied information.
ftp:
The File Transfer Protocol is one of the oldest and most
popular of all Internet services. You can access millions of
files, documentation, source code, and other useful objects on
anonymous FTP archives. You can use a WWW browser to view and
to retrieve information from FTP archives.
Server Name
The server name is an IP host name or an IP address. WWW
servers often start with the name "www" as in www.robelle.com
(http://www.robelle.com) or www.mayfield.hp.com
(http://www.mayfield.hp.com).
The port number is usually not needed. If there are many servers
on one machine (e.g., two different WWW servers on the same
host), you would use a port number to select one of them. By
default, WWW servers are on port 80. Other protocols have
different ports (e.g., the default for FTP is 21). Most users
never need to know about port numbers.
Welcome Page
Most WWW servers provide a welcome or home page. This is the
document that you see if you specify a machine name, but not a
document name (see all the examples above under "Server Name").
Good WWW welcome pages provide a short description of the
information the WWW server provides, as well as links to all the
other information available on the server. The welcome page must
be explicitly configured for each WWW server. If you access a
WWW server without giving a document name, and receive the error
message "no document found", you should try one of the following
common document names: welcome.html, index.html, or
default.html.
Location
The location can be a filename, a directory, a directory and
filename, a server-script name, or something specific to the
access-method. Filenames and directory structure often change,
so don't be surprised if a URL that worked a few months ago no
longer works now.
Hypertext Markup Language (HTML)
When you write documents for WWW, you use the Hypertext Markup
Language (HTML).
(http://www.ncsa.uiuc.edu/General/Internet/WWW/HTMLPrimerP1.html}.
In a markup language, you mix your text with the marks that
indicate how formatting is to take place. Most WWW browsers have
an option to "View Source" that will show you the HTML for the
current document that you are viewing.
Each WWW browser renders HTML in its own way. Character-mode
browsers use terminal highlights (e.g., inverse video, dim, or
underline) to show links, bold, italics, and so on. Graphical
browsers use different typefaces, colors, and bold and italic
formats to display different HTML marks. Writers have to
remember that each browser in effect has its own HTML style
sheet. For example, Lynx and Mosaic do not insert a blank line
before unnumbered user lists, but Netscape does.
If you want to see how your browser handles standard and
non-standard HTML, try the WWW Test Pattern
(http://www.uark.edu/~wrg/). The test pattern will show
differences between your browser, standard HTML, and other
broswers.
Creating HTML
Creating HTML is awkward, but not that difficult. The most
common method of creating HTML is to write the raw markup
language using a standard text editor. If you are creating HTML
yourself, we have found the chapter Authoring for the Web in the
O'Reilly (http://www.ora.com) book "Managing Internet Information
Services" to be an excellent resource. You might also find the
HTML Quick Reference
(http://kuhttp.cc.ukans.edu/lynx_help/HTML_quick.html) to be
useful.
Bob Green, founder of Robelle,
finds HTML Writer (http://lal.cs.byu.edu/people/nosack) to be
useful for learning HTML. Instead of hiding the HTML tags, HTML
Writer provides menus with all of the HTML elements and inserts
these into a text window. To see how your documents look, you
must use a separate Web browser.
If you don't want to deal directly with HTML, you can get a
WYSIWYG HTML editor. On the PC, we have tried HoTMetal and the
Microsoft Word Internet add-on. HoTMetal is produced by SoftQuad
(http://www.sq.com). There is a free version, which we found
somewhat unreliable, and a professional version. HoTMetal
probably works best if you are writing HTML documents from
scratch (we tried to edit existing documents, some of which may
have had invalid HTML).
Microsoft has produced a new add-on to Microsoft Word that
produces HTML. The Internet Assistant
(http://www.microsoft.com/msoffice/freestuf/msword/download/ia/default)
is available from Microsoft at no charge. You will need to know
the basic concepts of Microsoft Word to take advantage of the
Internet Assistant. Since we are not experienced Microsoft Word
users, we found that the Internet Assistant didn't help us much.
The HTML area of WWW is changing quickly. Users do not want to
go back to ascii text editing after they've used WYSIWYG editors
for the last several years. The Web itself carries a list of
WYSIWYG HTML editors
(http://www.yahoo.com/Computers/World_Wide_Web/HTML_Editors) for
a variety of operating systems.
Common Gateway Interface (CGI)
The Common Gateway Interface (CGI)
(http://hoohoo.ncsa.uiuc.edu/cgi/overview.html) provides a method
for WWW servers to invoke other programs. You can write these
programs with any tool or language. They usually return HTML as
their output. The Robelle WWW server statistics
(http://www.robelle.com/server.html) are provided by a CGI script
that runs the getstats program
(http://www.eit.com/software/getstats/getstats.html).
Forms
The WWW supports simple forms
(http://www.robelle.com/forms/comments.html) with text boxes,
radio buttons, and pull-down lists. Forms are processed by CGI
scripts.
WWW Clients
You will likely first experience the World Wide Web through a WWW
client. In WWW terms, these are called browsers. Browsers are
available for almost all major computer platforms, however you
also need the appropriate network infrastructure to make them
work.
Network Infrastructure
What browser you use depends largely on how you are connected
to the Internet. If you are using a terminal emulator and a
serial connection, you will most likely use a character-mode
browser. If you can send network packets from your computer
to the Internet, you will probably use a graphical-mode
browser.
Character-Mode Browsers
A popular character-mode browser is Lynx
(http://www.cc.ukans.edu/about_lynx/about_lynx.html). You
cannot use Lynx to display graphical images, but it does
support forms, as well as all HTML 2.0.
Graphical Browsers
Three popular graphical browsers are Mosaic
(http://www.ncsa.uiuc.edu), Netscape (http://www.mcom.com) and
Microsoft Internet Explorer
(http://www.microsoft.com/ie/msie.htm).
Mosaic and Netscape are available for Microsoft Windows,
X-Windows, and the Macintosh, while Microsoft's IE is only
available for Microsoft Windows. Mosaic and Microsoft IE are
free to anyone; Netscape is free to any not-for-profit
institution.
Network Infrastructure
How you connect to the Internet affects how you view the WWW. If
you connect via a modem, you won't be able to view large WWW
pages, images, sounds, or video; if you have a T1 connection
(1.544M bits/second), you will be able to enjoy these features.
Some WWW pages assume that you have a fast connection to the
Internet.
Local Area Networks
If your Local Area Network has a gateway to the Internet (there
are several different methods to do this), you should be able to
use a graphical browser on your own workstation to cruise the
WWW. If you are using a PC with Microsoft Windows, you'll need
to have a Winsock
(http://www.microsoft.com/pages/developer/winsock/default.html)
interface installed (in addition to the regular networking
configuration). Macintosh users already have network support via
MacTCP. UNIX workstation users should also have built-in support
for networking.
Dial-in Access
There are two methods of dialing into a machine to get access to
the Internet. If you dial in and log on as usual (on UNIX you
see "login:" and shell prompt or on MPE you type "HELLO" and get
a colon prompt), your computer is not directly connected to the
Internet, so it cannot send network packets from your PC to the
Internet. In this case, you will have to use Lynx to access the
WWW.
If you dial-in using SLIP (Serial Line IP) or PPP (Point-to-Point
Protocol), your computer becomes part of the Internet, which
means it can send network packets to and from the Internet. In
this case, you can use graphical browsers like Mosaic or Netscape
to access the WWW. The Internet Adapter
(http://marketplace.com/tia/tiahome.html) is supposed to allow
users with only shell account access to obtain a SLIP connection.
Shiva (http://www.shiva.com) and Livingston
(http://www.livingston.com) provide products that allow users to
dial into hosts using SLIP or PPP.
Character-Mode Browsers
While Lynx is not the only character-mode browser, it is one of
the most powerful. Lynx (ftp://ftp2.cc.ukans.edu/pub/lynx) is
available for many platforms. You can obtain a pre-compiled
version of Lynx for MPE/iX from
(http://jazz.external.hp.com/src/www_src/index.html).
Some users are disappointed that Lynx's display is limited to
text. What Lynx does demonstrate is that a single server can
provide information to both character-mode and graphical clients.
Still, to gain a full understanding of how powerful the
client/server concept can be, you should compare Lynx's
capabilities to the capabilities of graphical browsers such as
Mosaic or Netscape.
Graphical Browsers
Mosaic is one of the tools that makes the WWW so popular. With
Mosaic, you can view in-line graphical images surrounded by
proportional font text in multiple colors. For an excellent
introduction to Mosaic, see the O'Reilly book The Mosaic Handbook
(http://www.ora.com). Three versions of the book are available
(Windows, Macintosh, and X-Windows). The PC version of Mosaic
requires the Win32s subsystem which is described in the Mosaic
readme file
(ftp://ftp.ncsa.uiuc.edu/Web/Mosaic/Windows/README.TXT).
While Mosaic is popular, the newer Netscape browser is even more
appealing, especially when used with slower network connections.
Earlier versions of Mosaic did not display anything until an
entire URL (and its associated graphical images) had been
downloaded. Netscape, by contrast, starts displaying as soon as
a screenful of information is available. As you page down
through a document, Netscape barely pauses as it continues to
download the URL in the background.
The newest graphical browser is the Microsoft Internet Explorer
(http://www.microsoft.com/ie/msie.htm). This browser is part of
Microsoft's strategy to make the Internet an important part of
all Microsoft products. Like Netscape, the Microsoft IE also
does background network transfers. We perfer Netscape over
Microsoft IE, due to Netscape's user interface and better
reliability.
External Viewers
Neither Mosaic nor Netscape tries to handle all the data that can
potentially be served up on the Web. They both understand HTML,
in-line graphics, and URLs. Netscape can display external GIF
(Graphics Interchange Format) files, but Mosaic cannot. To view
images, listen to sound, watch movies, or view spread sheets, you
must have external tools
(http://www.ncsa.uiuc.edu/SDG/Software/WinMosaic/viewers.htm) to
support these data formats. For Microsoft Windows users, a
popular graphical viewer is LView
(ftp://ftp.ncsa.uiuc.edu/PC/Windows/Mosaic/viewers). The Mosaic
Handbook provides a good introduction to the external tools that
you need to support full multimedia applications. Most of these
tools also work with Netscape.
WWW Servers
WWW servers provide information to the Web. Server software is
available for many computer platforms, but setting up a server
isn't always easy.
Why Set Up a WWW Server?
Even if you don't have an Internet connection, there are lots
of uses for an internal WWW server.
WWW Server Design
Setting up a server to provide information to the many
different Internet clients requires extra thought, but the
effort is worth it.
Setting Up Your WWW Server
Server software exits for UNIX, MPE, Windows NT, Microsoft
Windows, and even MS-DOS.
Maintaining Your WWW Server
Like most applications, your WWW server will need a little
help from time to time.
Why Set Up a WWW Server?
If you have a full-time Internet connection, you might want to
set up a WWW server to provide information about your company,
your division, your group, or yourself. Even if you are not
connected to the Internet, you still might want to set up a
server.
Hypertext is a useful way to distribute information because it
can contain mixed text and graphics (or more), as well as links
to other documents. Using WWW servers, you can create
sophisticated help systems without a lot of work. Once
established, these systems then become available to all users on
your internal network who have suitable client software
(browsers).
With CGI scripts and e-mail, you can automate forms which you now
process by hand (e.g., expense reports, travel reports, or
purchase requisitions). With some extra work, you could even
have the forms processed directly into a database. You can also
design scripts to look up information in your existing databases
and display it for clients.
If your users are pushing for Microsoft Windows interfaces to all
of their database data, you can use your WWW server as an
intermediate solution. This way users get an immediate graphical
interface and managers can experience the difficulties of
managing client/server configurations.
WWW Server Design
When you set up a WWW server, keep in mind that many different
clients will be accessing your server. If your server is
available on the Internet, you should not assume that the clients
will all have high-speed Internet connections and graphical
browsers.
Consider these things when designing your WWW server:
* Concentrate on your text. Well-written text conveys a lot of
information. If you use text to convey essential information,
then your server will be friendly to text-based clients like
Lynx.
* Organize documents the way you would organize a book: gather
information together into chapters; each chapter should
describe a single idea or related topics. Provide
navigational tools (like previous or next chapter) and an
overview with a table-of-contents. We have attempted to have
all of these elements in this paper.
* Question each graphical image that you provide. Does the
graphic add meaning to the text or is it just neat? Compare
the size of the graphics file to the size of your text files.
If the graphical image is much larger, does it really add a
lot of necessary information?
* If your WWW server is on a fast network, do all the clients
have fast access to your server? You may have a T1 connection
(1.544M bits/sec), but many WWW clients connect via 14.4
modems. Some commercial Internet providers even charge by the
hour, which makes it more expensive for clients to download
large files and graphics. If your clients have a fast
connection to the Internet, you can provide more graphical
information and larger text files without annoying them.
Nevertheless, it's a good idea to keep these limitations in
mind when you're developing your server.
* Try to keep files to a reasonable size (we suggest three to
ten thousand bytes long). When converting existing documents
to HTML, remember that they will often end up quite large
(tens of thousands of bytes). Do clients want to download
such a large file only to find that it is of no interest? The
converse is also true. Can clients download a single file
with the complete text (e.g., this paper), without having to
follow all the hypertext links?
* Hypertext does not mean disorganization. Provide an index or
a table of contents to your web pages, so users can quickly
find information. Provide summaries for long articles and
files.
* Use graphic-design common sense. Use white space to increase
readability. If you use special effects (bold, italics,
underline, horizontal rules, etc), use them sparingly to
increase their effect.
* If your WWW server is available on the Internet, many visitors
will access your server out of curiosity. Make your welcome
page attractive, but clearly identify what information your
WWW server is providing. Of all the files you publish, be
most careful of the size of your welcome page. It will likely
be the most frequently accessed page.
We also suggest that you look at the W3 Style Guide
(http://www.w3.org/hypertext/WWW/Provider/Style/Overview.html).
Setting Up A WWW Server
First, you need to decide what computer will host your WWW
information (or you could pick several hosts). If your WWW
server will make information available to many machines, the host
must be connected to your network or the Internet.
While WWW server software is available for a variety of machines,
each server software package runs only on certain operating
systems. The server software you pick will have to be compatible
with the host machine that provides the WWW service.
WWW Server Software
W3 maintain a good list of WWW Server software
(http://www.w3.org/hypertext/WWW/Daemon/Overview.html). Two of
the most popular UNIX WWW server software packages are NCSA HTTPD
(http://hoohoo.ncsa.uiuc.edu) and CERN HTTPD
(http://www.w3.org/hypertext/WWW/Daemon/Status.html). A
pre-compiled copy of the NCSA HTTPD software is available for
MPE/iX (http://jazz.external.hp.com/src/www_src/index.html).
Windows NT is becoming more popular as a WWW server, largely due
to its built-in networking support and its familiar Windows
interface. Free Windows NT HTTP Server software
(http://emwac.ed.ac.uk/html/internet_toolchest/https/contents.html)
is available from the European Microsoft Windows NT Academic
Center (http://emwac.ed.ac.uk). The Robelle Windows NT WWW
Server (http://wwwnt.robelle.com) uses the O'Reilly Website
(http://website.ora.com) software. Website comes with
comprehensive documentation -- something other server software is
lacking.
Configuration and management is different for each package. We
found the O'Reilly Book (http://www.ora.com) Managing Internet
Information Services to be a valuable resource in setting up our
WWW servers. The book is an excellent introduction to HTML, with
many good examples of configurations. Unfortunately, the book
only covers the configuration of the NCSA HTTPD software.
Security
The CERN and NCSA HTTPD packages allow the WWW administrator to
configure security. By default, both packages allow anyone to
connect to your WWW service. However, you can configure the
servers to allow connections only from specific IP addresses (be
sure to do this if your WWW service is for internal use only).
You can also password protect individual files. The MPE WWW
Server (http://jazz.external.hp.com/demo.html) includes a
demonstration of the NCSA security features.
By default, the CERN and NCSA server software allow individual
directories of hypertext files. If someone specifies a URL with
a directory starting with tilde (~), the server software looks
for a user directory of that name and then searches under the
user name for the directory public_html.
Writing HTML
Once you have the WWW server software running, you need to create
WWW information. WWW documents use the Hypertext Markup Language
(HTML). See the HTML description
(http://www.robelle.com/www-paper/language.html) earlier in this
paper for suggestions and tools for writing HTML.
Be sure to test your files before adding them to your WWW server.
We test with at least three different browsers (Lynx, Mosaic, and
Netscape). We also use Weblint
(http://www.khoros.unm.edu/staff/neilb/weblint.html) on all of
our Web documents. Weblint checks for common errors in HTML.
While Weblint isn't perfect, it does help produce HTML that is
acceptable to the widest range of WWW browsers.
Weblint is written in Perl (http://www.cis.ufl.edu/perl). To use
Weblint, you must have a working copy of Perl. Perl is short for
"Practical Extraction and Report Language". Perl is designed to
be more powerful than the shell, but easier to use than C.
Host Name
If your WWW server is available on the Internet, it's a good idea
to create an alias for the actual computer that hosts your WWW
service. Most people chose "www" as the alias name. This will
make it easier for you to change the host without affecting users
of your WWW service.
Robots
WWW servers on the Internet are often visited by robots
(http://web.nexor.co.uk/mak/doc/robots/robots.html). Robots
usually visit Web sites in order to create indexes of the
information that you publish on your WWW server. Since robots
can cause problems for a WWW server, it's a good idea to create a
robots.txt (http://web.nexor.co.uk/mak/doc/robots/norobots.html)
file. This file tells well-behaving robots which parts of your
WWW they should visit. You might want to exclude graphical
images, CGI scripts, and forms from a robot search, but include
all other information about your WWW server.
Internal WWW Servers
If your WWW server will only be available on a Local Area
Network, you have more flexibility in your design. Since users
will have reasonably fast access to the server, you can make your
HTML pages larger. You can also distribute more binary objects,
such as graphics, word-processing documents, and spread sheets.
You do have to remember to configure each client browser with the
information on how to handle each filename suffix (e.g., you
might want to associate ".doc" with Microsoft Word). See the
section on External Viewers in the Clients section of this paper
for more information.
Maintaining Your WWW Server
Once you have your WWW server working, you need to continue
maintaining it. The Web is changing rapidly. You need to insure
that you obtain newer versions of the HTTPD software from the
original source.
All WWW server software can produce log files. If you do enable
log files (some software has them enabled by default and others
not), they usually grow without bounds. At Robelle, we make a
copy of the current log files once a day and then we empty them.
We keep the daily copies for approximately 60 days. This lets us
provide statistics (http://www.robelle.com/server.html) about our
WWW service through the getstats program
(http://www.eit.com/software/getstats/getstats.html).
Because more and more users are joining the Internet, you will
likely want to continue to improve and expand your WWW
information. This is a challenge, since the conversion and
authoring tools are not yet well developed. At Robelle, we have
tried to automate some of the production of our WWW information.
For example, when the most recent change notices for Qedit/MPE
(http://www.robelle.com/ftp/changes/qeditmpe.txt), Qedit/UX
(http://www.robelle.com/ftp/changes/qeditux.txt), Suprtool/MPE
(http://www.robelle.com/ftp/changes/suprtool.txt) and Suprtool/UX
(http://www.robelle.com/ftp/changes/suprux.txt) are released,
they are automatically posted to the Robelle FTP Service
(ftp://ftp.robelle.com)
Interesting Places to Visit
The WWW is a huge place. The following are a few personal
recommendations for sites that we have found interesting or
useful. Your mileage may vary.
Virtual References
The Web contains links to everywhere. We show you a few sites
that have a lot of excellent reference materials.
Travel Resources
Finding good travel information is a challenge. Here are a
few suggestions for WWW travel resources.
Searching WWW
So much information is available via the WWW that finding the
answer to a specific question can be hard. Here are some WWW
search engines that help you to search the Web.
Virtual References
Yahoo (http://www.yahoo.com) contains links to many Internet
resources organized into subject catagories.
If you have ever had trouble finding someone's e-mail address,
try the Four 11 Directory Services (http://www.four11.com) or
WhoWhere? (http://www.whowhere.com) instead. You can also add
your own e-mail address and other information about yourself to
the Four11 or WhoWhere? directories.
Travel Resources
Curious about a city, a region, or a country? Planning for that
big trip across Europe or Asia? You might first want to check
out one of these travel resources.
We have found the Rec.Travel Library
(http://www.solutions.mb.ca/rec-travel) to be useful. The travel
library is based on discussions from the rec.travel newsgroup.
O'Reilly and Associations (http://www.ora.com) publish technical
books, especially about UNIX. O'Reilly was one of the first
companies to publish an on-line magazine called The Global
Network Navigator (http://gnn.com/GNNhome.html). Included in
GNN, is the GNN Travel Center
(http://gnn.com/meta/travel/index.html) with current travel
information and links to many Internet travel resources.
Internet travel resources tend to be organized into major areas
(e.g., Canada and the US, Europe, Asia). You often have to be
patient when accessing their indexes, since they cover all
countries and cities in an area. Keep in mind that England,
Scotland, and Wales are usually indexed under United Kingdom,
which is at the end of any listing for Europe.
Searching WWW
Users have invented robots
(http://web.nexor.co.uk/mak/doc/robots/robots.html) to search the
Web for documents. Since searches take a long time, these robots
usually index everything they find into a database. The server
provides the tools to search these databases. For example,
InfoSeek (http://www2.infoseek.com/), Lycos
(http://lycos.cs.cmu.edu/), Alta Vista from Digital
(http://altavista.digital.com), WebCrawler Search Database
(http://webcrawler.com/), or Architext Excite
(http://www.excite.com/query.html") are all good. Because these
databases are indexed from the entire WWW, you usually have to
qualify your searches in order to find what you are looking for.
For example, if you search for "travel" you will likely have too
many choices, but if you search for "travel Alaska" the list may
be just what you want. Each database is different, so be sure to
try two or three before giving up on your search for information
on the Web. MetaCrawler
(http://metacrawler.cs.washington.edu:8080/index.html) will
search many of the popular search databases at once.
Summary
The World Wide Web demonstrates how powerful client/server
computing can be. If you are thinking of implementing
client/server computing in your organization, it wouldn't hurt to
first take a look at the Web.
A WWW server is an application. System managers must pay
attention to the security and maintenance problems that go with
any large application.
Creating Web documents is time consuming. It took me at least
twice as long as I expected to write this paper. I spent a lot
of the time finding and checking the many WWW links. With our
9600-baud connection to the Internet, this was a slow process.
Tools for creating HTML are still in their infancy. We expect a
lot of new tools to appear in the next year to help create HTML.
It's easy to waste time on the Web, but it is one of the largest
and most up-to-date resources available anywhere in the world.
Get an Internet connection, a WWW client program, and start
surfing!
Bibliography
Here is a short list of books that we have found very useful in
understanding the WWW and in creating our own WWW services.
Managing Internet Information Services
Managing Internet Information Services
Cricket Liu, Jerry Peek, Russ Jones, Bryan Buus, and Adrian
Nye
O'Reilly and Associates, Inc.
ISBN: 1-56592-051-1
If you are managing any Internet information services (e.g., ftp,
gopher, or WWW), you should get this book. The book includes an
excellent primer on writing HTML. There are lots of hints on how
to setup your own WWW server and extensive documentation on the
NCSA server software for UNIX. The book also includes examples
of CGI scripts.
The Mosaic Handbook
The Mosaic Handbook
Dale Dougherty and Richard Koman
O'Reilly and Associates, Inc.
ISBN: 1-56592-094-5
There are three versions of this book: MS Windows, Macintosh,
and X-windows. The book includes a copy of Enhanced Mosaic.
There is a good explanation of the WWW and how clients and
servers work together. The chapter Using Mosaic for Multimedia
includes a description of MIME types, how to configure them, and
some suggests for external viewers. This section of the book
would apply to any graphical browser.
Teach Yourself Web Publishing with HTML in a Week
Teach Yourself Web Publishing with HTML in a Week
Laura Lemay
SAMS
ISBN: 0-672-30667-0
This book really does do what the title says. Here is the
description from the author's home page.
This book describes how to write, design, and publish information
on the World Wide Web. In addition to describing the the HTML
language itself, it provides extensive information on using
images, sounds, video, interactivity, gateway programs (CGI),
forms, and imagemaps. Through the use of dozens of real-life
examples, the book helps you not only learn the technical details
of writing Web pages, but also teaches you how to communicate
information effectively through the Web.
The Whole Internet User's Guide and Catalog
The Whole Internet User's Guide and Catalog
Ed Krol
O'Reilly and Associates, Inc.
ISBN: 1-56592-063-5
One of the best introductions to the Internet. Ed Krol covers
most major Internet services (e.g., ftp and WWW). He also
includes references to many useful Internet resources. The
appendix Getting Connected to the Internet discusses the
different grades of service and provides a list of suggested
Internet connection providers.
WWW Pointers
These are the WWW pointers for these books.
* O'Reilly and Associates, Inc. (http://www.ora.com).
* SAMS (http://www.mcp.com/sams).
* Laura Lemay (http://slack.lne.com/lemay/theBook/index.html).
* The Whole Internet User's Guide and Catalog
(http://gnn.com/gnn/wic/index.html).