28 Mar 1995 - Preliminary Information

Simple Forms Programming

Most of the time a Web user asks the server to send an HTML file stored on disk. However, the Web protocols also allow the remote user to request that a program be run on the server. Data can be sent to the program either as a "canned" hyperlink from a keyword in another document or from forms fields filled in by the user interactively.

A Form is a collection of interactive objects displayed on the user's screen by the Browser program in response to specific tags in the HTML. The user enters data and clicks a button (or presses Enter in some cases). The Browser then sends a new request to the server consisting of a prefix (URL) specified in the Form followed by the data that the user entered or the selections that the user made. The Web standards describe the format of the HTML and of the submitted data.

There is no formal standard on how the user data should be processed. The original NCSA and CERN Unix Web servers call separate programs to process the data. The interface between the server and the programs is called "CGI". In CGI, information about the server and request are passed as Environment variables, the remote user's data is passed as the standard input file, and the program's response is written to the standard output file. These conventions are natural for Unix programs, though they may not translate well to other operating systems.

Just as SpHyDir reexamined the idea of HTML, and moved from "formatting" to "structure" in order to simplify the production of text, so it makes sense to reexamine the way that CGI has been done previously to see if the user's task can be simplified.

Visual HTML

Visual Programming is supported in Windows with Visual Basic and Delphi and in OS/2 with VX-Rexx. The user creates a screen form by dropping GUI objects from a toolbar. Each object has a variable name. Connected to the objects are event handler routines written in the associated language (Basic for VB, Pascal for Delphi, Rexx for VX-Rexx). Within the event handler, the objects and attributes are often available as native variables.

Web Forms programming cannot be made quite that simple. For one thing, the user interface to the Form is managed by the Browser on the client machine, while the programming runs on the server. There is only one event (the form is submitted) and then all of the entered data must be processed at once.

However, Forms programming can be simplified by nearly eliminating the HTML language and the mechanics of the Client/Server interface. SpHyDir accomplishes this first for Rexx programs running on OS/2. Later this can be extended to other languages and other server environments.

The first step is to create the Form itself. A separate section of this document describes the Forms tools and their options. HTML produced by SpHyDir is system independent. It can be shipped to a Unix server and displayed on a Macintosh screen. However, if the resulting user data is transmitted back to the Unix server, then a CGI program must be written (usually in C or Perl) to process it. Writing the CGI program is not made any harder if SpHyDir is used to design the forms than if they are coded by hand.

When the form is constructed, the author assigns a variable name to every element. When the user submits the form, the data or selections will be transmitted from the Browser to the Server as a series of "variable=value" statements. When SpHyDir goes to generate the form, it knows the names of these variables and the type of form object to which each is associated.

Each field in the form can have a default value. This value is transmitted to the client Browser as text in the HTML language. The particular syntax changes from one type of field to another. When SpHyDir is used to edit the form, the author can specify a "static" default value that would be transmitted if the file is sent out without further editing. However, SpHyDir also remembers the byte offset and length of every default value, along with the name of each associated variable and the type of the field. This can be used by helper routines to replace the static default values with information dynamically extracted from any processing program when the form is transmitted back in response to a previous transaction.

The Symbol Table is stored as an Extended Attribute of the HTML file. The information in this table can be interpreted directly by helper functions (this is the preferred path for programs written in Rexx) or it can be translated by an external utility into header (*.H) files.

The objective is to free the programmer from the semantics of the HTML and Web environment. Ideally, the CGI or other Web processing routine should consist exclusively of statements that manipulate the value of native variables that correspond to the data in the form or in the reply, and should contain only trivial statements to manage the Web environment.

The URL/URI Thing

When the user requests a document, follows a hypertext link, or submits a forms request, the Browser transmits a Uniform Resource Identifier to the Server. The URI contains a protocol (http:, gopher:, or ftp:), the name of a server machine, the name of a resource, and optionally some data from a form or map-click.

In normal use, the "name of a resource" is a file name. This isn't part of the standard, but most requests are for HTML files and the file system provides the simplest environment in which to store them. The standard simply leaves room for the names to extend to system objects and SQL queries. Current Unix servers require that the URI contain a file name.

The GOSERVE package written by IBM's Mike Cowlishaw implements a "purer" version of the standard. The URI part of every arriving request is passed to a Rexx program called the Filter. It determines how the server will respond. The supplied filter simply sends back a named file or generates an error message. However, this Server structure allows a programmer to easily experiment with other definitions of the URI "name space."

Rexx is an interpreted language, so it is less efficient than code written in C. However, the Filter program does very little work and its execution cost is a small part of the overall transaction. Rexx also allows each transaction to be handled in a thread of the Server, rather than loading a complete new program. It is unclear what the performance of an industrial-strength server would be. However, hypertext links can jump back and forth between different server machines. An OS/2 server can afford to specialize in those transactions where features of the OS/2 system or of the Rexx language offer important advantages. In this context, GOSERVE provides an attractive environment to experiment with broader interpretations of the URI.

Challenge Assumptions

There may be some minor but annoying problems in the current Information Superhighway. On the other hand, I may just be ranting. Decide for yourself:

Self-Extracting ZIP is harmful

The "self-extracting" ZIP file is a useful feature for the small number of casual users who have not entirely figured how to download UNZIP for themselves. For the rest of us, they are an annoyance.

If you run "zipinfo thing" then the utility fails to find THING.EXE. It doesn't get recognized as a Zipfile. The larger the library, the more confusing it becomes just exactly what that THING really is.

In Windows or OS/2, files are associated with a viewing program by their type. You can associate a *.ZIP file with an archive viewer, but *.EXE files are treated as programs and cannot be associated with anything.

If you accidentally double-click such a file to open it, it executes and splats its contents all over the ZIP library. There is no "Undo" for this mistake. You have to delete all those files by hand.

If an uninformed DOS user receives a file on diskette, then the self-extracting format means that there is one less step to execute and one less thing to learn. By the time that a Windows or OS/2 user has a Web Browser and auxiliary Viewer programs to work with, the ZIP-as-EXE has lost any real benefit.

Don't send any old junk

Traditional Web servers will transmit any file that they find in the data directory. If they recognize a file type by extension, they assign it a specific MIME data type. Otherwise, it is sent as generic bytes. As a result, files that are not supposed to be transmitted (such as CGI programs) have to be stored in a separate directory.

However, in the current state of the art, Browsers are able to internally process HTML, plain text, GIF, XBM, and JPEG. They often have external viewer programs for sound (AU, WAV), PostScript (PS), and MPEG. Binary file distribution is normally in the form of compressed archives (ZIP, tar.gz, or tar.z). Any other file format can be converted to a ZIP for transmission.

Of course there are special formats. Windows HLP files, OS/2 INF files, IBM BOO or BOOK files, and Microsoft Word DOC files are examples of data types with special viewers. For a particular library and server, the set of supported file types may be extended. However, it doesn't have to extend to EXE, CMD, C, H, OBJ, or database types.

The convention that current Web Browsers will transmit data of any type that happens to be in the library is not helpful. The idea of file types was not established as strongly in the Unix world as it has been on other operating systems. However, security and flexibility can be improved if the Web Server is configured to only reply with files that are one of the declared Information System data types.

The GOSERVE design incorporates request filtering. The supplied filters simply duplicate the original Unix conventions. However, it is trivial to recode the Rexx to be more selective. SpHyDir supplies a sample replacement filter that tranmits only files with well known data types. If one assumes that ZIP files will be named ZIP and not EXE, there is no confusion between data and program extensions.

Unix compatibility isn't necessary

Hypertext links can freely jump between machines. Even a single Web library of interrelated documents doesn't have to be hosted on a single machine. If a particular function is portable, then it will probably be hosted on the Primary Web Server. Other servers will specialize.

Large companies will probably select a commercial server with added function, such as the servers offered by Netscape that run on Unix. Universities may prefer one of the free software packages, but they will also run on Unix. Smaller companies will probably stick with the Microsoft family of products and mount a server on Windows NT.

OS/2 is the "integration platform." It is designed to exist within a larger network of heterogeneous machines. This means, however, that the customers to which it is targeted probably have other machines to act as the Primary Web Server. However, if the data needed to respond to the query is located inside the DB/2 family of databases, or if it requires an APPC transaction to the mainframe, then OS/2 may be the most appropriate specialized platform to handle those particular requests.

OS/2 is also an attractive development environment. Unix systems are not widely deployed as Personal Computers. Windows (except for NT) doesn't have the system infrastructure to support Servers in general and Forms/CGI in particular. OS/2 is split between the Client and Server roles. This means that all of the tools can be run together on a single, modestly sized, desktop PC.

SpHyDir can be used to develop the HTML for documents, menus, forms, and replies. GOSERVE provides an easily installed, wonderfully flexible server environment. Web Explorer provides a Browser for testing. VX-Rexx provides the development environment for the applications and for customizing the GOSERVE Filter.

SpHyDir-GOSERVE Tools

The first step is to get the current copy of the GOSERVE distribution package . Unzip it to a directory (recommend "\GOSERVE") on disk. Read the documentation that comes with it. Test running GOSERVE as a Web Server using the files that come in that package. When you have the distributed code working, then consider the SpHyDir files.

When GOSERVE is running, it calls a Rexx program named GOFILTER.80 every time a request is received by a remote user. When a program written in another language calls Rexx as a "macro processor", then that program gets to choose the file extension that the Rexx programs will have. Since Web Server operate on TCP Port 80, GOSERVE uses the "80" as the file extension. GOFILTER.80 examines the URL of the request and decides how to respond. It can send back an error message, send a file from disk, or it can call another program. When it calls another Rexx program, then that file also has to have a *.80 file extension.

As GOSERVE is distributed by IBM, it is not reasonable to use the GOFILTER.80 file unmodified. That program has a lot of example code about how things might be done. It doesn't support external programs in any language.

The SpHyDir distribution package includes an alternative GOFILTER.80 program. It can be used as a simple production server filter if its assumptions agree with local conventions. Rather than disturbing the original IBM code as little as possible, the SpHyDir GOFILTER contains only the functions that are actually needed. It seemed to make the most sense to start with a mimimal version and add features as needed.

The SpHyDir GOFILTER will send back files with meaningful file types (HTM, GIF, JPG, WAV, TXT, PS, ...). The list can be extended with additional types. If the file type is "80" then the corresponding Rexx program is called.

PCLT

Copyright 1995 PCLT -- SpHyDir Web Document Manager -- H. Gilbert
May be distributed with SpHyDir program

This document generated by SpHyDir another fine product of PC Lube and Tune.