home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
OS/2 Shareware BBS: 6 File
/
06-File.zip
/
sphydir3.zip
/
sphydir.INF
(
.txt
)
< prev
next >
Wrap
OS/2 Help File
|
1995-06-26
|
240KB
|
3,665 lines
ΓòÉΓòÉΓòÉ 1. The SpHyDir Project ΓòÉΓòÉΓòÉ
SpHyDir is an Object Oriented tool that builds documents for Web Browsers such
as Netscape, Mosaic, and Web Explorer. With SpHyDir the user concentrates on
the important issues of content and overall document structure. Since SpHyDir
automatically generates the HTML (Hypertext Markup Language), it is possible to
generate flawless Web documents without studying obscure syntax diagrams.
A Web document is an ordinary text file that contains formatting instructions
in the form of HTML "tags". Normally, these tags are processed by a Web Browser
(such as Netscape Navigator) and are then viewed on a computer screen. If the
final image "looks right" then many authors are satisfied. This is resonable
when producing a single personal home page.
A library of information should adopt a common document structure. Similar
information should have a common format. The same type of information should be
presented in the same way on all files. The reader should be able to jump to
releated information, or to view a sequence of documents covering a larger
topic. Traditional printed books solve this problem with chapters, page
numbers, a table of contents, an index, and the editorial control of a
professional publisher.
SpHyDir is the Structured Professional Hypertext Directory Manager.
It is Structured because it sees a Library that contains Documents that
are made up of interlinked HTML Files. Each File contains Sections
(Chapter, Topic, Appendix) which in turn contain figures, paragraphs,
lists, and tables. Each of these document elements is presented as an
Object and the overall document is structured as a tree of such Objects.
It is Professional because it makes it easy for the author to control
more of the advanced Web features than any other tool. Simple HTML
editors and Word Processing packages insert only the most basic tag
options. A professional job requires more: alternate text should be
provided if the user chooses not to fetch images automatically, image
sizes should be included in each reference so the browser can operate at
maximum efficiency, parameters should be used to generate special effects
efficiently rather than transmitting images, etc.
It is Hypertext because links between documents and to other files can be
made through simple Drag-and-Drop. To build a link to a remote Web
resource, just display the resource in Web Explorer. SpHyDir will pick
the URL reference out of the WE window and attach it to text or to an
image in the document under construction.
The Directories of an HTML library contain related files. An editor or
word processor handles one file at a time. SpHyDir manages the structure
of interrelated files and automatically generates navigational
information such as Next and Previous pointers.
SpHyDir runs in OS/2 Warp and follows the Workplace model of its user
interface. The user drags a document into the workarea. New elements are added
to the document by dragging empty paragraph, image, list, or forms objects and
dropping them into the document. Hypertext links are created by dragging files
from the library or URL references from a Web Browser and dropping them on
existing text or images.
SpHyDir only reads Web (HTML) documents. However, after editing it produces
both a revised *.HTM file and a second *.IPF file that is input to the OS/2
Help compiler. IPF source can be used to generate INF documentation and HLP
program help files. INF files can be viewed in OS/2 and (with a tool from IBM)
in Windows. Viewing information from a local hypertext file is faster, and
there are additional keyword search functions not available through the Web.
The document you are now viewing (and its related files) are also available as
SPHYDIR.INF. Since this represents a useful example of many SpHyDir features,
the source is available for download along with the SpHyDir program.
ΓòÉΓòÉΓòÉ 2. Project Status ΓòÉΓòÉΓòÉ
SpHyDir is updated irregularly. Generally something is posted at the start of
the week and critical bug fixes can be posted at any time. Check here for the
latest information.
ΓòÉΓòÉΓòÉ 2.1. June 26 ΓòÉΓòÉΓòÉ
On request, the IPF generation has been brought back to its prior level of
support. SpHyDir should be able to generate INF files for HTML 2.0 constructs.
At this time there is no attempt to map HTML 3.0 attributes to IPF or to try to
deal with Tables and other new features. The SpHyDir.INF file is now up to
date.
Characters that are in the Latin-1 set (and therefore in Code Page 850) should
display correctly in the INF file. If someone configures the ENTITIES, CHARIN,
and CHAROUT tables for another code page, sets CODEPAGE in CONFIG.SYS, runs
IPFC, and does not get proper display of characters in any standard Latin IBM
code page, please report back to the author.
ΓòÉΓòÉΓòÉ 2.2. June 21 ΓòÉΓòÉΓòÉ
[Note: a defective version of the June 21 code was posted between Midnight and
11:00 EDT on that date. Please replace it with the corrected version]
ΓòÉΓòÉΓòÉ 2.2.1. Editing Internation Character Sets ΓòÉΓòÉΓòÉ
A World Wide Web has to deal with international character sets. Unfortunately,
there are more characters in the World than can be easily handled in any simple
one-byte encoding. Three solutions are available.
1. Unicode provides a two-byte character set that can handle all the Western
languages and Chinese, Japanese, and Korean. This is the ultimate
solution, but it is new and there are no good tools available.
2. There are a family of one-byte character sets that provide coverage for
most languages. Since no single code can include the Western "Latin"
alphabet and Hebrew, Arabic, and Cyrillic, the ISO proposed a set of
8859-x (where x=1...n) character sets covering each major alphabet group.
The 8859-1 (also called "Latin 1") character set covers all the languages
of Western Europe (and America, Australia, etc.). Starting with HTML 2.0,
the Web standards hold that an HTML document is assumed to be in 8859-1
unless stated otherwise. The HTTP and some of the HTML <HEAD> area
conventions may provide for the use of other 8859-x tables for other
alphabets, though the Netscape Browser, for example, only supports 8859-1
and Japanese.
3. Since a complex document may include characters (including Math and other
special symbols) from different 8859-x character sets, HTML has a
notation for "Entities". An Entity reference begins with the "&" escape,
then contains the character name, and ends in a semicolon. For example,
it is becoming widely accepted that the copyright symbol Γòò can be
represented by the Entity reference "©".
From the beginning, SpHyDir read and created Entity references for the three
special control characters "<", ">", and "&" (< > and &). The
Entities were converted to the native character for easy display and editing.
When HTML was generate, these characters were converted back to Entities.
Starting with the May 28th release, SpHyDir supported foreign language Entity
names, but without any translation. The "&" character was simply converted to
a Smiley Face dingbat character (so that the real "&" character could be
treated as normal text in expressions like "PC Lube&Tune"). A user could
create new Entity references by entering the Smiley Face character in the Text
Edit Window (you can enter any character in the PC set by typing its decimal
value on the keypad while holding down the Alt key. Since Smiley Face is 1,
hold down Alt, type "1" on the numeric pad, and release Alt).
This made the entry of foreign characters possible, but not natural. The name
of a landmark in the Stuttgart area had to be rendered with the Entity
reference to the German sharp s (ß) as Schloßkirche (castle
church)
The best approach would be to display foreign characters natively, just as
SpHyDir translates the < > and & characters natively. After trying
unsuccessfully to come up with some magic application that would solve the
problem SpHyDir now caves in and handles the problem with the OS/2 standard
Code Page Solution. Now Schloсkirche will display in its native format in the
SpHyDir Workarea and in the Text Edit window, provided that you are using Code
Page 850 or convert the supplied tables to work with some other Code Page.
The original PC character set is known today as Code Page 437. It is the
default and is the set you get if you don't know enough to change it in
CONFIG.SYS. The 437 set does not have all the characters in the "Latin 1"
group. However, it does have the most important Western European characters.
Although SpHyDir recommends and supplies tables for the improved Code Page 850
(along with instructions to change the CODEPAGE statement in CONFIG.SYS to
make it the default), a user who insists on keeping the old 437 Code Page
could change the supplied tables to support it.
SpHyDir ships files named ENTITIES.850, CHARIN.850, and CHAROUT.850. They
should be copied to the root directory of the HTML library. ENTITIES.850
provides a mapping between the Entity names and the code assignments for the
corresponding characters in the 850 Code Page. This is just a text file and it
can be used as a model to produce an ENTITIES.437 if you insist on using the
old defective 437 Code Page. It can also be changed to any other national use
Code Page supported by OS/2. The other two files provide translate tables
between the ISO 8859-1 encoding and the Code Page locations of the
corresponding characters in the IBM numbered Code Page. This is a bit more
tedious to edit, but it can be done.
SpHyDir doesn't do anything special about the keyboard. The assumption must be
that a user has already selected a Code Page and Keyboard layout that allow
the entry of characters to the ordinary editors and windows of the system.
Rather than trying to duplicate (or worse to override) that support, SpHyDir
simply provide a simple mechanism completely under the user's control to
translate HTML entities into the standard operating system character support.
If a file is encoded in the ISO 8859-1 standard, the SpHyDir supplied
translate tables will convert it to Code Page 850 as it is read in. This
version of SpHyDir is biased to Entity notation, so when SpHyDir generates
HTML it will use Entity notation for all characters that have values with
defined Entity names. It will convert back to ISO 8859-1 only the characters
that do not have Entity names. This version of SpHyDir will not display as
native characters anything entered with the "numeric" entity notation. Only
named characters will be translated for native display.
A new section of the SpHyDir document describes Code Pages in general and the
ENTITIES definition file in particular.
ΓòÉΓòÉΓòÉ 2.3. June 20 ΓòÉΓòÉΓòÉ
Saw a request on the net for JPEG support. Added JPG and JPEG (case
insensitive) to the list of extensions regarded as IMG file types. At this
time, SpHyDir does not know how to find the size of JPEG files, so it does not
add WIDTH and HEIGHT attributes as it does for GIF.
Discovered a user was still using TARGET. It was reported that if you drag the
TARGET object from the toolbar (which was by the way blank now for some reason)
and try to save the file, SpHyDir crashes. Well the TARGET object isn't
supposed to be there, which is why the tool is blank. However, there really
still was a TARGET tool under that blank face, and yes when you insisted on
using it the HTML generation failed. This, however, also exposed a problem with
the ToolChest, since SpHyDir would fail if the user created a tool with an
unsupported type. Anyone who ran SpHyDir in the last few weeks has a TARGET
tool in the ToolChest that won't go away until the TOOLDEF.TXT file is deleted
from the root directory of the HTML Library. So HTML Generation is now
protected against invalid object types.
ΓòÉΓòÉΓòÉ 2.4. June 15 ΓòÉΓòÉΓòÉ
Corrected a bug deleting Links. Added most remaining Netscape non-standard
extensions.
Note: Netscape has this terrible idea to make BORDER an attribute of IMG taking
a numeric value. This contrasts to the use of BORDER in TABLE which is just a
Yes or No. SpHyDir II doesn't currently have the ability to distinguish valid
ranges of values when the same attribute has different types of values in
different tags. Reluctantly, BORDER is now defined as taking any type of value.
So while it was possible previously to simply select BORDER from the Properties
popup menu of a TABLE object and it would be set to the right thing, now when
you select BORDER you will get a dialog box and you have to type in "Yes" and
press OK.
ΓòÉΓòÉΓòÉ 2.5. IMG and FIG ΓòÉΓòÉΓòÉ
SpHyDir has been struggling with the problem of Image Objects. The HTML 2.0
standard allows IMG tags to appear in the middle of sentences, headers, and
captions. An IMG is regarded as a large letter. This is why the official ALIGN
options for IMG (TOP, MIDDLE, BOTTOM) relate to how the image is aligned
vertically with the characters immediately preceeding and following it.
That is not the way that most people want to treat large images. Netscape
proposed two additional alignments (LEFT, RIGHT) that allow text to flow around
the margin of an image, much as word processors handle inserted graphics. HTML
3.0 also includes these values for ALIGN, but for the most part it wants to use
FIG.
A Figure is enclosed in the <FIG>..</FIG> tags. The FIG tag contains a SRC
attribute that points to a graphic file, much as IMG. FIG has a number of
extensions over IMG that might prove useful if any browser supported it.
However, at this point no mainstream Web browser knows about FIG and all ignore
it. Of course, they do not ignore the markup inside the FIG tag, just the tag
itself.
The ordinary text content of the FIG tag is supposed to be displayed on
non-graphic browsers as an alternative to the image:
<FIG SRC=monalisa.gif>
A famous painting of a woman smiling.
</FIG>
The IMG tag has an ALT attribute that contains alternate text for the same
purpose. However, the FIG tag can contain much larger descriptions with
paragraphs, lists, headings, and all other document elements. A major objective
is to allow detailed description of diagrams for readers who are visually
impaired and use text-to-speech browsers.
Currently, however, no major browser supports the FIG tag. Following standard
practice, a browser ignores tags that it does not understand, but does not
ignore their contents. This makes it difficult in the near term to use the FIG
contents for its intended purpose since it will be displayed by graphic
browsers like Netscape, Web Explorer, and Mosaic.
However, FIG appears to be a way out of the hole that SpHyDir in which SpHyDir
has fallen. That hole was caused by the desire to have an Image Object. An
Object is a nice big thing. You can drop GIF files on it. You can link to it
easily. You can parameterize it with properties. The problem, of course, is
that an object cannot occur between a couple of words, and since HTML 2.0 wants
to regard the IMG tag as part of the ordinary paragraph text, this in the long
run produced all sorts of problems.
The FIG is a much more suitable starting point for a Document Object. It stands
along, like a paragraph. It even aligns LEFT, CENTER, RIGHT, and JUSTIFY like a
paragraph. It cannot appear in the middle of a sentence or in a Heading.
Since FIG isn't currently supported by browsers, there has to be a migration
strategy. As with "<CENTER><P ALIGN=CENTER>", the SpHyDir approach to HTML
migration is to do the thing "every which way" it can be done so that every
browser must precisely what is intended. Of course, there is no way to fake the
advanced features that caused FIG to be invented in the first place. The
browsers will just have to catch up. However, it is possible right now to
generate a FIG that can replace standalone and text-wraparound images:
CODE<FIG SRC=monlisa.gif ALIGN=LEFT>
<IMG SRC=monalisa.gif ALIGN=LEFT ALT="A famous painting of a woman smiling">
</FIG>/CODE
1. An HTML 3.0 graphical browser will understand FIG, display the GIF file,
and ignore the contents.
2. An HTML 2.0 graphical browser will not understand FIG and will ignore it.
It will not ignore the contents, and so will process the IMG tag. Again
the GIF file is displayed.
3. A non-graphical browser may or may not understand FIG, but in any case
will ignore it because the browser doesn't display images. It will then
look at the IMG tag and again decide not to display the GIF, but it will
print the ALT text.
Although the GIF file appears in both the FIG and IMG tags, the two are
mutually exclusive. One or the other will be processed, but not both. As to
the ALIGN=LEFT, this is valid for a FIG in HTML3 and is a widely supported
Netscape extension of IMG in HTML 2.
SpHyDir 1 did not deal with IMG tags that were embedded in the middle of a
paragraph or Heading. SpHyDir II accomodated such IMG tags by creating another
"dingbat" sequence in the text. The 0x08 character, which looks like a box
with a hole, or roughly "[o]", is used to start and end an IMG reference.
Between these two dingbats are the name of the GIF file, then optionally a
blank and the alternate text.
However, in the first cut SpHyDir II continued to extract IMG tags from the
start of a paragraph (or when they form a paragraph by themselves). They
become IMG objects. Later on, SpHyDir II tries to decide if they should be
merged back into the paragraph that follows them.
The intent is to change this. Images that are contained in a paragraph of
their own, or that have ALIGN values LEFT or RIGHT will be extracted as before
to for a separate object. However, the object will then be regenerated as a
FIG+IMG construct as described above. This will allow the Image Object to
begin to have Properties derived from the more powerful FIG tag, instead of
limiting it to the current IMG Properties. Conversely, IMG tags that fall at
the start of a paragraph containing other text, and that have ALIGN values of
TOP, MIDDLE, or BOTTOM, will be treated as embedded images and will be
represented by the 0x08 dingbat character sequence.
ΓòÉΓòÉΓòÉ 2.6. June 14 Update ΓòÉΓòÉΓòÉ
The Text Edit Window now allows drag and drop. Position the cursor or select a
phrase of text. Now
Drag and Drop a GIF file anywhere on the Text Edit Window. This will
generate an "embedded IMG". If no text is selected, the dingbat character
and file name will be placed at the cursor position. If text is selected,
then the selected text will become ALT alternate text within the
dingbats.
Select some text. Hold down Ctrl-Shift and drop any file from the HTML
library on the Text Edit Window. The previously selected text becomes a
hypertext link to the file. Previously it was necessary to save the text,
Link-Drop a file on the Paragraph Object in the workarea, and then select
the text from the Hotword Selection window. Allowing links to be formed
directly in the Text Edit window simplifies this process. Note that links
created this way are saved only if the rest of the edited text is saved.
Pressing the Cancel Button or closing the Text Edit window cancells the
new Links as well.
Select text and drop a URL Object created by Web Explorer from the
Workplace on the Text Edit Window. The selected text will be converted to
a Link to the remote resource represented by the URL.
Do not select text. Leave the cursor at an insertion point in the
paragraph (usually after a blank). Drop a URL Object created by WE on the
Text Edit Window. The title of the remote resource is inserted at the
point of the cursor and becomes a hotlink to the document itself.
Unfortunately, it is not possible to drop Link Manager list items on the
Text Edit Window. To maintain the integrity of the data being edited, the
Text Edit Window locks up the underlying Workarea until the edit
completes. This also blocks the Link Manager from functioning. So links
have to come from the WPS environment, or save the text and use the Link
Manager as it has traditionally been used.
Extra blank spaces and lines after the <LI>, <TH>, and <TD> tags were removed.
They were errors picked up by Netscape producing undesired results.
SpHyDir now declines to insert some of ending tags that nobody else bothers to
generate. Tables generated too many </TH>, </TD>, and </TR> tags. It made the
HTML ugly and hard to read.
The temporary dialog for new Tables has been replaced by a more polished
dialog box. Enter the number of rows and columns and select by a checkbox if
they are to be labelled.
Select a Table Row Object. Click the Second Mouse Button. Select Create
Another from the popup menu. A new row is created with the same number of
label and cell objects as the previous row. [Adding a new column is harder and
is left to a later date.]
When generating an Ordered or Unordered list, SpHyDir II has added an extra
step because List Points no longer contain text. One might have to drop a new
Point on the list, then go back and drop a new Paragraph on the Point. A
shortcut is to popup the Second Mouse Button window for the previous Point and
choose Create Another. This not only creates another Point object, but it also
creates another Paragraph under it and opens the Text Edit window directly.
More generally, Create Another populates the new Point with another object of
the same type as the first object contained in the old point, so the trick
works for Points of Images as well.
Forms bugs caused by rewrite: TYPE did not default to TEXT, NAME attribute
generate twice, SUBMIT incorrectly genned as HIDDEN.
ΓòÉΓòÉΓòÉ 2.7. June 12 SpHyDir II ΓòÉΓòÉΓòÉ
SpHyDir II now appears stable enough to remove some of the disclaimers. There
are certain to be problems with the new HTML 3.0 tags that nobody is using, but
more bugs have been fixed in the old code than seem to be problems with the
new. SpHyDir II is now the "standard" distribution. Where the old code is
mentioned, it is called "SpHyDir 1". The documentation has been updated.
A user had problems with the small characters. SpHyDir now remembers the WPS
font that has been dropped on the Workpace, Properties Table, Edit Windows, and
Link Manager. The Link manager window can now be widened.
ΓòÉΓòÉΓòÉ 2.7.1. Entity Syntax ΓòÉΓòÉΓòÉ
Newer versions of the HTML standards pointed out a number of details about
Entities. An ampersand is only regarded as a possible entity if it is followed
by letters or numbers. The sequence "A & P" is legal. An entity doesn't require
an ending ";" except to separate it from characters that could be part of the
entity name. "A & P" is also valid. SpHyDir now recognizes these forms,
though on output it always generates the full "A & P" in the output HTML.
ΓòÉΓòÉΓòÉ 2.7.2. Target Objects become ID Property ΓòÉΓòÉΓòÉ
After an initial false start, SpHyDir 1 support for Targets (the object
corresponding to the HTML <A NAME=xxx> tag) became stalled. The problem is that
HTML standards and use permitted the <A> tag to include text and Headings.
Unfortunately, this might mean a construct of the form:
CODE<A NAME=FUZZY>the end of one topic.
<H2>Now for Something Completely Different</H2>
On a completely unrelated matter, </A>/CODE
This is perfectly legal HTML, but any attempt to make structural sense out of
it is hopeless.
HTML 3.0 presented a much better idea. Labels can be assigned to headers or
paragraphs with the ID attribute. This presents a Hypertext label whose
location and purpose is unambiguous. Unfortunately, ID is not widely supported.
SpHyDir II takes its inspiration from the "Recommended" syntax of HTML 2 and 3.
"Recommended" practice holds that an anchor should go inside a Header rather
than including the header. This would produce
CODE<H2><A NAME="Python Introduction">
Now For Something Completely Different</A></H2>/CODE
One big advantage is that this syntax is effectively interchangeable with the
preferred (but currently not widely supported) HTML 3 construct:
CODE<H2 ID="Python Introduction">
Now For Something Completely Different</H2>/CODE
Now a reasonable strategy appears. It eliminates ambiguity, gets rid of the
Target object (which was cute but a problem), and provides for the sane
migration from HTML 2 to 3. First, ambiguous structure is resolved by asserting
that legacy HTML should have been following the Recommended practice. The <A
NAME=xxx> tag is logically associated with the very next thing that follows it
no matter where the </A> is located. The Name, however, becomes an attribute of
the object that contains the next thing that follows the <A>.
"CODE<A NAME=X>Fred. <H2>Mary/CODE" assigns the name X to the
Paragraph or other text object containing "Fred". The name has nothing to
do with the following Mary section.
"CODEFred.<A NAME=X><H2>Mary/CODE" - Assigns the name X to the
Section associated with the Header for Mary. Although the <A> tag is
"outside" and "before" the Header, nothing else comes between the tag and
the start of the Header. The name applies to the first thing that follows
the <A> tag, not the superficial location of the <A> tag itself.
"CODEFred.<H2><A NAME=X>Mary/CODE" - HTML 2.0 Recommended practice.
SpHyDir will convert the previous case to this when generating new HTML.
That is, SpHyDir will move the <A> tag inside the <H2> tag.
"CODEFred.<H2 ID=X>Mary/CODE" - Recommended HTML 3.0 practice.
Unfortunately, many browsers don't support this yet. SpHyDir will
recognize it and "backlevel it" to the previous case of Recommended HTML
2.0 practice. Later on, when the browsers catch up, this will become the
syntax that SpHyDir will produce and SpHyDir will "upgrade" HTML 2 to 3.
The Target button has now been restored to the Link Manager window. This time
it works. Pressing the target button displays all the target lables in the
current document tree. They can be dragged and dropped onto Images and text to
form hyperlinks as previous Link Manager entries were used. SpHyDir does not
intend to extende the reach of the target button outside the current document.
Rather, XSpO programs will be developed to identify targets in other documents
or databases.
Although it is common practice to give short names to targets, HTML allows the
ID/NAME value to be long and to have multiple words if it is quoted. Note that
names are case sensitive. Using somewhat more descriptive names is helpful
when a hypertext link must be selected from a long list of available labels.
ΓòÉΓòÉΓòÉ 2.7.3. CENTER Again ΓòÉΓòÉΓòÉ
CENTER and ALIGN=CENTER should be handled correctly in most cases. There is one
area where problems can arise. When an IMG appears at the start of a paragraph,
SpHyDir tries to break it out as a separate object. Objects are easier to
change, since you can update properties with the table and can drop a new GIF
file on the icon. Thus
CODE<P><IMG SRC=xxx ALIGN=MIDDLE>This is typical.</P>/CODE
is processed by SpHyDir to produce an Image Object and a Paragraph Object. The
ALIGN=MIDDLE on the Image is the flag that warns SpHyDir to shuffle the image
back "inside" the paragraph when HTML is generated.
This is even harder to code than it is to describe. It becomes impossible,
however, when you add CENTER:
CODE<CENTER>
<P><IMG SRC=xxx ALIGN=MIDDLE>This is typical.</P>
</CENTER>/CODE
The problem is that CENTER is not an attribute of IMG tags. The "MIDDLE" value
means to align the image vertically so that the text that follows is at the
middle of the image. It has nothing to do with CENTER which is horizontal
alignment. An IMG can have ALIGN values of TOP, MIDDLE, BOTTOM, and with
extensions LEFT and RIGHT. However, the way to center an IMG is to put it
inside a centered paragraph as above (Netscape) or below (HTML 3.0):
CODE<P ALIGN=CENTER><IMG SRC=xxx ALIGN=MIDDLE>This is typical.</P>/CODE
SpHyDir is left with three bad choices. One approach is to give up entirely on
Image Objects. This reduces the drag and drop functionality of the system. A
second approach is to allow Paragraphs to be "opened" to expose embedded images
as objects. This would be a major revision of current use. So SpHyDir will try
to salvage the current approach from the onslaught of new HTML features that
make it more difficult.
ΓòÉΓòÉΓòÉ 2.7.4. Groupies ΓòÉΓòÉΓòÉ
Suppose you want to center two lines. One HTML 3.0 approach is to apply the
center attribute to each separately:
CODE<P ALIGN=CENTER>Tastes Great!</P>
<P ALIGN=CENTER>Less Filling!</P>/CODE
This does the job, but if you change your mind it becomes necessary to uncenter
each separately. The Netscape extension does the job:
CODE<CENTER>Tastes Great!<P>Less Filling!</CENTER>/CODE
but there are a number of technical semantic problems with the CENTER tag that
make it unlikely to survive standarization. The HTML 3.0 view is to group the
lines:
CODE<DIV CLASS=BUDLITE ALIGN=CENTER>
<P>Tastes Great!</P>
<P>Less Filling</P>
</DIV>/CODE
Alignment on a DIV applies to everything inside it. The best use of CLASS names
in this context is unclear.
SpHyDir tries to migrate everything in the direction of the formal standard. To
that purpose, SpHyDir introduces the Group Object which is logically associated
with a DIV tag. The Insert - Structure - Group option of the Second Mouse
Button popup will create an empy group. Alternately, Mark a range of objects
and select Group from the popup to create a group containing all the marked
objects (leaving them where they were).
However, the final use of Group and DIV is not clear. The standard provides
little direction, and there is no body of use on the Net to point to the right
direction. Clearly the current SpHyDir casual approach to Section Objects
should be formalized by creating <DIV CLASS=xxx> markups. However, if a
document spans multiple files in a tree, how does one determine the CLASS names
(VOLUME, CHAPTER, SECTION, SUBSECTION, APPENDIX, etc). It should also be
possible to convert a Group object into a Section Object (by adding a Title) or
demote a Section to a plain group. What other transformations are needed?
ΓòÉΓòÉΓòÉ 2.8. June 5 SpHyDir II Beta ΓòÉΓòÉΓòÉ
SpHyDir II supports most of the HTML 3.0 and Netscape extended functions.
Anything missing will be added quickly. This code is Beta because a large
amount of core logic had to be ripped up and reorganized. There has not been
enough time to test everything.
For the next few weeks, before using this code make an archive copy of the
original file. Check carefully for any individual element that might have been
dropped out of the document because of a bug. Please report problems back to
the author.
Rewriting the documentation is one of the tasks ahead. As a result, SpHyDir II
Beta is available only by FTP from pclt.cis.yale.edu in the sphydir
subdirectory of /pub. Source will temporarily be unavailable to Professional
uses, though the key that enables Professional features on SpHyDir 1 continues
to work on II.
ΓòÉΓòÉΓòÉ 2.8.1. Properties Table ΓòÉΓòÉΓòÉ
An HTML tag has attributes. SpHyDir II document objects have properties. There
is largely a one to one correspondence between attributes and properties.
HTML 3.0 adds a ton of attributes to previously fairly simple tags. Even the
<P> tag can now become
CODE<P ID="HomeTown" ALIGN=CENTER CLEAR=ALL>/CODE
SpHyDir II needs a way to display and change all these new properties for each
object.
The Properties Table is modelled after similiar windows in Visual Basic and
Delphi. Since PM doesn't exactly have the same kind of controls, SpHyDir
settles for a Container (the same type of object as an open WPS folder or the
Workarea) set to Details View. There are two columns in the table, a property
description and a value.
It seemed to be confusing to list all the properties that every object might
have, so the table lists only those with a significant value. However, if you
point to the whitespace of the table (below the last entry) and click the
second mouse button, a popup menu will list all the properties known to be
valid for this type of object. During the Beta period, SpHyDir may be a bit
fuzzy about this selection and may include a few properties that belong to a
larger class of objects of which the current object is a particular case. For
example, the properties of all the forms objects are jumbled together and need
to be sorted out.
The value column of the table can be directly edited. Since this is a
container, changing the value this way uses the same technique for renaming a
file in WPS. Hold down Alt and click on the old value. A box appears around the
old value and it can be edited. Clicking elsewhere in the table completes the
process and saves the new value. As with WPS file renaming, this interface is
not ideal.
Since the value column may be narrow and awkward, an alternative strategy is to
doubleclick the property. This pops up a box with a bit more room to change the
old value, and some edit rules that are a bit nicer to use.
If a property has a list of possible values, the list can be displayed by
clicking on the property with the second mouse button. The possible values
popup as a menu. SpHyDir II does not feel strongly that it really knows the
absolutely correct list of possible values. First, the HTML 3.0 standard
changes a lot. Secondly, the same attribute name can have different possible
values in different contexts. So the user is free to type in values that are
not in the list. Its just that the second mouse button popup menu cannot be
used to set other values.
It is a restriction for the near term that the values "Yes" and "No" may not be
used for any property other than a logical switch. So don't try to title a
section "No" cause that won't work. Switches correspond to attributes whose
presence signals a option, such as "COMPACT" in a list. Setting a logical
switch to "No" is eventually going to delete it from the properties table,
because "No" is the default setting for a switches and corresponds to the
attribute not being present in the tag.
A few properties are changed implicitly by the dropping things on an object.
For example, dropping a GIF file on the Document object changes the Background
property and produces Netscape/HTML3 backgrounds. The Title of the Document or
of a Section appears both as a property and as the caption of the Workarea
Object. It can be edited by doubleclicking the Section object.
ΓòÉΓòÉΓòÉ 2.8.2. New Features ΓòÉΓòÉΓòÉ
The HEAD tags are now parsed. Along with the attributes of the BODY tag, they
generate attributes of the document. Support is provided for BASE, ISINDEX,
LINK (REV=HOME,TOC,INDEX, GLOSSARY, HELP, BOOKMARK), and the BODY Netscape
attributes for background and color control. META attributes will be added if
anyone can send me a list.
CODE<BLOCKQUOTE>
What, Me Worry?
<CREDIT>Alfred E. Newman
</BLOCKQUOTE>/CODE
This construct should be properly supported. At this time, SpHyDir accepts both
BLOCKQUOTE and BQ but it currently generates BLOCKQUOTE to the output file.
Although BQ is recommended, it is not supported by all current browsers, while
BLOCKQUOTE is universal. It has been observed that <CREDIT> is not supported by
WE.
The sequence:
CODE<UL>Some text.
<LI>One.
<LI>Two.
</UL>/CODE
is upgraded so that the isloated text is rendered as
CODE<LH>Some text.</LH>/CODE
List header text is displayed as the caption of the List Object and can be
changed by doubleclicking the object.
Similarly, the construct
CODE<Table>Some text.
<TR> etc./CODE
is upgraded to
CODE<CAPTION>Some text.</CAPTION>/CODE
In general, SpHyDir II will put ending tags in all output even when the tags
can be legally omitted. In general, SpHyDir will put all text in some block,
with <P>...</P> as the default. In the previous case the <P> is not appropriate
because <CAPTION> (and <CREDIT>,<LH>, and a few other such things) are
themselves the block container and are not allowed to contain other blocks.
SpHyDir 1 tried to promote all IMG references to objects. There is a certain
simplicity when this can be done, because you can drop a GIF file on an IMG
object to set the file association. However, SpHyDir 1 was therefore unable to
place IMG references in a Heading or in the middle of a sentence. An IMG can
now appear in the previously usupported places. In this context, the IMG is
treated as "honorary text". A dingbat (corresponding to the PC character for
the value 8) appears before and after the embedded image. Between the dingbats,
the first word is the name of the GIF file and the remaining text is treated as
ALT text.
There is currently no nice support for creating new embedded images. It will be
added by the end of the beta period. For now, you can always type this stuff in
manually. In the Text editor window, position the insert where you want the
image to go. Hold down Alt, press the "8" key on the numeric pad, and release
the Alt key. The dingbat appears. Type the name of the file in the ususal HTML
format, say "../icons/face.gif". Type alternate text if you choose. End by
repeating the Alt trick to create a second dingbat. (This can also be used to
enter other unsupported tags and markup).
No attempt has or will be made to do HTML 3.0 Math markup. No support is
provided for the horizontal tab <TAB> tag.
FIG has not been attempted this week. It should not be hard, but needs a bit of
study to get it just right.
It is not clear just how many of the proposed new forms of character emphasis
should be supported. <DFN>, <Q>, <LANG>, <AU>, <PERSON>, <ACRYONYM>, <INS>,
<DEL>, <BIG>, and <SMALL> seem to be stretching things a bit. It is not clear
that all of them will actually survive the standardization process, expecially
since few browsers do anything particularly meaningful with the HTML 2.0
character formatting tags that already exist.
Most of the attributes in the HTML 3.0 standard are supported. A bunch of
Netscape stuff was added, but a few more things are needed before the end of
the Beta. To know what is supported, select an object of the appropriate type,
then click the second mouse button on the whitespace of the Properties Table.
If the attribute doesn't show up in the list, its not supported. Write me about
it.
ΓòÉΓòÉΓòÉ 2.8.3. Table ΓòÉΓòÉΓòÉ
A major feature of HTML 3.0 is tables. They allow information to be layed out
in columns. The rows and columns may have labels. Each cell of the table
contains any type of document element (paragraphs, images, buttons, etc).
Architecturally, a table is a two dimensional version of an Unordered List.
Viewed as HTML, Tables involve a large number of confusing tags. They are hard
to edit by hand. SpHyDir allows you to construct specialized tables, but it
will automate the process of building simple N by M tables. Even if you want
something special, like a heading that spans two columns, it may be easier to
let SpHyDir start by generating the normal table and then change or delete the
automatically generated entries that you don't need.
When you use the Table Object, a dialog box pops up. During the Beta it is a
bit cheezy. You can abort the dialog, leave a bare Table object, and add
elements yourself (or you will if all the elements are available in some
toolbar). Alternately, specify the number of rows and columns and choose
whether lables are to be generated or not for each.
A simple 2x3 table might look like:
CL0 CL1 CL2 CL3
RL1 X11 X12 X13
RL2 X21 X22 X23
Where CL# are the column labels, RL# are the row labels, and X## are the cells.
HTML is going to ravel this out row by row. The nasty part is getting the tags
right. If SpHyDir II is asked to construct this table, it will produce a Table
Object containing three Row Objects. The first Row Object contains the four
Label Objects for the columns. The second and third Row Objects contain one
Label Object (the row label) and three Cell objects. Initially, all are empty.
TABLE
ROW
LABEL (CL0)
LABEL (CL1)
LABEL (CL2)
LABEL (CL3)
ROW
LABEL (RL1)
CELL (X11)
CELL (X12)
CELL (X13)
ROW
LABEL (RL1)
CELL (X11)
CELL (X12)
CELL (X13)
The table is then filled in by dropping Paragraph Objects (or Image or any
other document) on each Label and Cell object to provide contents for that
label or cell. The twelve objects that need to be assigned contents are all at
the third level of the tree (under the three Row Objects that are in turn under
the one Table Object). It is fairly easy to see what needs to be done.
During the Beta period, SpHyDir II may not support all the defined table
attributes. VALIGN, COLSPEC, ROWSPEC need some study. Currently SpHyDir doesn't
have tools to create new Rows or cells. This is by design. It is the intent of
the design that the table be expanded by selecting an exising object, clicking
the second mouse button, and then choosing Create Another from the menu popup.
If SpHyDir can dope out the table, it would then be able (after asking for your
intention) to create all of the objects needed for another column or row.
However this is not currently available.
ΓòÉΓòÉΓòÉ 2.8.4. Things ΓòÉΓòÉΓòÉ
After learning more about HTML details, it became clear that SpHyDir 1 had made
a big mistake. List Points should not contain text. Semantically, a proper list
is of the form:
<UL>
<LI><P>First point.</P></LI>
<LI><P>Second point.<P></LI>
</UL>
Nobody ever actually codes a list this way, so it is easy to miss. In HTML 2.0,
the <LI> and <P> tags have no attributes, so they appear to be redundant. Then
along comes HTML 3.0. Now proper construction of the list is "Recommended", and
any tool that plans to read in HTML had better understand this implied
structure because <LI> and <P> tags now have meaningful attributes. The SpHyDir
1 view that a List Point object contained the text of the implied paragraph has
now become unworkable.
When SpHyDir II reads a document with lists, it creates a second level of tree
indentation. The List object contains Point objects, and each Point object now
contains paragraphs and stuff. You can no longer doubleclick a Point to get the
Text Edit window.
Since the opportunity presented itself, a Point in a definition list has as one
of its properties the term from the <DT> clause. This can be changed with the
Properties Table.
This mess had to get cleared up before it was possible to do tables. A Table
looks like a List. The Points of the Table are Rows which are themselves like a
nested List. The Labels and Cells act like points. If this thing was going to
be added, then the original List Points had to get cleared up.
This produces a generalization about Things that contain Stuff. In addition to
the obvious Things (the Document, Section objects, and the three types of
Lists) there are ten other Things that contain other objects: Points, Table
Cells, ADDRESS, BLOCKQUOTE, DIV, FIG, FN, NOTE, BANNER, and CENTER. It did not
seem to make sense to structurally distinguish DIV and BANNER (a few months ago
they were merged in an earlier version of the HTML proposal). CENTER is an
obsolete construct that has not been cleanly replaced. Points, Cells, Address,
and BlockQuote seem to need their own objects. The rest SpHyDir II will try to
collect under the category of a GROUP object. The icon for a group is a
brightly colored Folder. It is an objective that Group become an option of the
second mouse button popup menu for the workarea when items are marked. Choosing
Group will create a new Group item and place all the marked objects in the
group. The collection can then be assigned properties by assigning the property
to the Group that contains in. In particular, this is the preferred way to
center a collection of things.
There may be a transition during the Beta period for anyone using the previous
SpHyDir 1 haphazard approach to <CENTER>. In a few weeks, SpHyDir will have dug
itself out of the hole. Lacking any Group object, SpHyDir 1 assigned a Centered
attribute to every object between <CENTER> and </CENTER>. The new thinking
holds that if you chose to center a collection of objects, then the objects
must collectively form a group with common properties. So SpHyDir will create
the group for you and will use both new <DIV ALIGN=CENTER> and old <CENTER>
syntax. Currently, however, CENTER may not work right.
ΓòÉΓòÉΓòÉ 2.8.5. Internal Reorganization ΓòÉΓòÉΓòÉ
There were two key decisions in SpHyDir II. The first was to completely
reorganize key areas in order to make them ready for Object Oriented
technology. The second was the choice to stick with existing Rexx and not use
Object Rexx quite yet.
SpHyDir uses the services of VX-Rexx to store information. The Workarea that
the user sees is a VX-Rexx container in Tree-Name view. What the user doesn't
see is that the records in that container contain all the text and attributes.
SpHyDir 1 was designed around the HTML 2.0 standard. Since it was slowly moving
toward formal adoption, it seemed that any design that handled 2.0 would be
good enough to last for years. Then the Netscape folks captured a big share of
the market and pushed the 3.0 features to the front. This added a whole bunch
of attributes that would break the SpHyDir 1 design for storing information.
One solution would be to build real Object Rexx classes and store the
information there.
Instead, SpHyDir II ripped out a lot of ugly, bug prone logic and created a
simpler (though possibly slightly less efficient) general purpose data store
within the existing VX-Rexx support. Internal control structures were
generalized by more agressive use of Stem variables.
SpHyDir 1 processed HTML input and generated HTML output from large SELECT/WHEN
blocks. SpHyDir breaks this logic into a mass of small subroutines. The
subroutine names are registered in stem tables indexed by the tag name, the
attribute name, or the object type.
Rexx doesn't make it easy to initialize large syntax tables. Some tables are
handled by simply listing all possible words in a character string. Rexx has
some very nice WORDxxx functions that make it easy to manipulate such strings.
However, such wholesale changes will produce bugs when an isolated piece of the
old code is missed during the update. A few routines handle most of the logic
for creating new objects. However, initial debugging discovered that the New
button in the Text Edit window was a special case that created a new paragraph
without drag-and-drop or menus. That logic had to be updated also. The Beta
period should identify any other special cases that slipped through the cracks.
ΓòÉΓòÉΓòÉ 2.9. May 28 Release ΓòÉΓòÉΓòÉ
IBM release Web Explorer Beta (5/25) Friday. It creates URL-file objects that
can be dragged from WE to a disk directory to save interesting Web locations in
WPS. You can drag these URL objects from WPS and drop them on SpHyDir objects
to create Links, just as you previously dropped Link Manager URLs and XSpOs.
A ToolChest Window has been added. The ToolChest is a container that is
intended to provide an extension or alternative to the current Toolbar.
Currently, however, the ToolChest simply duplicates a subset of the Toolbar
objects (though it adds descriptive captions missing from the Toolbar).
ΓòÉΓòÉΓòÉ 2.9.1. Preserve Entities in HTML ΓòÉΓòÉΓòÉ
In HTML, an "entity" is a special character represented by a name preceeded by
"&" and ending in ";". Because they have special significance to the syntax,
"<", ">", and "&" must be represented in HTML documents as "<", ">", and
"&". SpHyDir previously supported only these three entities, based on the
incorrect assumption that all the other entities existed only to support ISO
accented characters that would be better displayed using the international
character set. However, ISO editing got put off, and a more careful examination
of HTML 3.0 entities shows that they will include ISO, Greek, math, dingbats,
and many other characters not found in any single code page. To allow simple
editing of the "&" character, SpHyDir has to change the introducer to some
funny character. Therefore, as HTML is parsed in an the leading "&" is
converted to the 0x01 ("smiley face") PC character and it is converted back to
"&" on output. There is no explict GUI support for entities, but as with any
funny character you can enter it from the keyboard. To get a copyright symbol Γòò
hold down ALT, press the 1 key on the numeric pad, then release ALT (now you
have a smiley face) then type the name "copy" and a trailing ";". Funny
characters can be deleted or edited just like any other character.
ΓòÉΓòÉΓòÉ 2.9.2. ../ICONS/tiger.gif Image Left of Heading ΓòÉΓòÉΓòÉ
Although it doesn't solve the entire problem, SpHyDir now has limited support
for putting an image in a Section title. It allows one image to appear in front
of the title. Worse, except for the H1 at the start of the document, you cannot
create this construction with normal SpHyDir drag and drop but instead have to
(ick) edit the HTML file with a plain text editor. Clearly there is room for
improvement. In HTML terms, the construct looks like the following (from the
PCLT home page):
<H1 >
<IMG SRC="exitsign.gif" ALIGN=MIDDLE WIDTH="218" HEIGHT="171">
Welcome to PC Lube and Tune </H1>
The <IMG> tag has to come after the <Hn> and it must have an ALIGN value
(MIDDLE generally looks best). To put an image in front of the H1 tag in the
document, drag the Image Object from the toolbar and drop it on the Document
Object. The Image Object will be created just before the first Section object.
Now drop a GIF file on the Image Object and set its ALIGN attribute to MIDDLE
(or TOP or BOTTOM).
Once the Image is set up, it will be read in by SpHyDir and regenerated
properly. So the worst case it to manually set it up once.
ΓòÉΓòÉΓòÉ 2.10. SpHyDir II - Statement of Direction ΓòÉΓòÉΓòÉ
When SpHyDir was first created, some objectives were announced in the
documentation. Subsequent developments have show some of these claims to be ill
advised. It seems appropriate to provide users with advance warning of a change
in direction. The term "SpHyDir II" is now being introduced to reflect some new
ground rules that will be required to make further progress.
The biggest mistake was to promise that SpHyDir would generate HTML that would
pass through a validator. Effectively, that ties it to HTML 2.0 syntax (for
which there are standards) at a time when eveyone is moving rapidly to HTML 3.0
or "Netscape" extensions long before rigourous validation is possible. SpHyDir
users, or at least the more vocal of them, want the extensions now.
There seems to be no limit on the number of structures that HTML can include,
nor the number of attributes that will be added to HTML tags. There is no room
at the top of the screen in the Toolbar or entry fields for everthing that the
new language features support. If the number of features grows much larger,
than simple icons will not be enough to remember what's what. Furthermore,
users are asking for specialized features that may require user customization.
At the same time, HTML remains a poorly specified language. Important syntax
changes occur from one version to the next. There are some differences between
the "human" explanation of what is going on and the formal syntax descriptions.
But the most important feature is that there is an enormous amount of invalid
HTML on the Web that is accomodated because the mistakes don't prevent the
browsers from displaying the correct image to users, and the final image is the
only thing than seems to count. Worse, the standards documents explicitly
mention common invalid constructions and urge browsers to accomodate them.
When people are first learning C, it is a common mistake to code
if (a=5) ...
when the correct statement is
if (a==5) ...
Yet no matter how common the error is, nobody would expect a C compiler to
accomodate the user and automatically "correct" the program. Yet Web tools are
expected, perhaps even required, to accomodate HTML syntax violations. Yet
SpHyDir cannot advance to support more complicated syntax without some rigour.
Sphydir has to resolve a conflict between two goals:
<UL>
<LI>To support the common use of ordinary people.
<LI><P>To encourage "recommended" practice</P></LI>
</UL>
HTML has levels of conformance. "HTML.Recommended" holds that text should be
contained in a block (P, PRE, BQ) instead of standing alone. The "<LI>text"
construction is thus not "Recommended" but is widely used. There is an
important difference between the human explanation of what is going on here and
the semantic difference.
Books and articles on the "Complete Moron's Guide to HTML" will divide tags
into those that create paragraphs breaks (P, H1..H6, CENTER, UL, OL, DL, LI,
HR, etc.) and those that don't (B, I, A, IMG). BR creates a line break but not
a paragraph break, though <BR><BR> might be hard to distinuish visably on most
browsers. The non-rigourous explanation would be that <LI><P> seems redundant
because <LI> itself generates the necessary break.
HTML is rigourously defined in a document called the "DTD". The DTD defines a
%block as a P, UL, OL, DL, PRE, BQ, FORM, etc. At the Recommended level, a UL
contains LI structures, and an LI contains %blocks. At the Recommended level,
ordinary text is not supposed to be in the document BODY or in a list or list
element. It is only supposed to appear as the contents of a P, PRE, BQ, header,
etc. The DTD identifies "<LI>text" as not Recommended, but it doesn't specify
what to do about it.
The DTD allows some ending tags to be omitted. The effect of a previous tag
ends when a new tag is encountered that cannot be contained within the current
structure. Thus "<LI> [stuff] <LI>" implies "<LI> [stuff] </LI><LI>" because
one list item cannot occur within the previous list item.
Having said this, there is not one shred of usable standard to transform
tolerated HTML into Recommended HTML.
<LI>Speak softly and carry a big stick</LI>
<LI><P>Speak softly and carry a big stick</P></LI>
Once the leading <P> is added, the ending </P> can be deduced because the </LI>
cannot be inside the paragraph. However, no amount of DTD will every explain
why the free text in the list item should have been turned into a paragraph in
the first place. This is reasonable, because the reader could argue that
BLOCKQUOTE is a justifiable alternative to the P tag in the case of this
particular famous phrase.
SpHyDir can only accomplish its original objective if its HTML parsing is
heuristic. A parse that is driven simply by syntax tables will not be quite
enough. This means that SpHyDir will keep, though it ought to clean up, its
current logic in the Read_HTML and Parse_Block routines.
At the same time, SpHyDir won't really work as an object oriented environment
unless users can create their own objects. Sometimes people want to support new
or experimental tags. Sometimes an author uses the same special construction in
all documents. There have been requests, for example, for a document toolbar
object of the form:
<P><A HREF= ><IMG SRC= ></A>...<A HREF= ><IMG SRC= ></A></P>
One can imagine creating this with an XSpO (an external Rexx
program dropped onto the Workarea), but it would then lose its identity. HTML
3.0 provides the solution with the CLASS attribute.
<P CLASS="TOOLBAR">
<A HREF= ><IMG SRC= ></A>...
<A HREF= ><IMG SRC= ></A>
</P>
CLASS allows most block tags to be assigned a user specified category name.
CLASS definitions are intended to be hierarchical and there is some implication
in the standard of one class being derived from another with inheritance.
SpHyDir can also make object distinctions based on the value of other
attributes in the tag. For example, the current code distinguishes between
<BR>, which is treated as part of paragraph text, and <BR CLEAR=ALL> which is
treated as an document object like <HR>.
The proposal, then, is to allow new SpHyDir objects to be defined externally.
The new objects would appear in the ToolChest Container window (maybe in the
menu popup if it can be changed dynamically). New objects would be recognized
as the HTML is read in by a new Tag name, the presence of an attribute or a
special value assigned to an attribute on an existing tag, or a CLASS attribute
specification. This is not designed to allow SpHyDir to assign user defined
objects to raw HTML from external sources. The only claim is that SpHyDir
should be able to read back HTML that it had previously written and recognize
and redisplay extended constructions.
Simple user objects could be defined based on the fundamental attributes that
SpHyDir currently potentially assigns to each object (an icon, caption, text
content, name, variable name, variable value, etc.). Objects could also be
define that contain other objects from a certain set of types. In its simplest
version, this will be used to create "macros". For example, a SECTION-like
object could be created named CHAPTER. It would be managed like an ordinary
SECTION, but it would generate something more complicated:
<DIV CLASS="CHAPTER" CLEAR="ALL">
<HR SIZE=6>
<H1 ALIGN="CENTER">This is the ordinary Section title</H1>
[ordinary section contents]
</DIV>
The second time through, SpHyDir II will recognize its own construction by the
DIV tag with the CHAPTER class. It will match up the </DIV> ender to determine
the scope. It then has to suck up and discard the boilerplate tags (the HR
SIZE=6 in this case). Although there is a HR object in the SpHyDir vocabulary,
this particular HR is part of the formal expansion of a CHAPTER object and
should not generate a separate object. Except in the HTML that it generates,
the CHAPTER object would then behave in every way as if it were the existing
SECTION object.
Now comes the trickey part. Programmers should instantly recognize that, in
object oriented lingo, this example creates the CHAPTER class as a subclass of
the SECTION class inheriting SECTION's methods (mostly the way to edit titles
and its behavior as a container) but overriding a few methods (HTML parsing and
generation). It seems likely that many new objects will have behavior exactly
modelled on Paragraph, Section, or Image objects. SpHyDir might add a few new
built-in objects on which new things can be constructed. However, full
implementation of the concept will require that everything be rewitten in
Object Rexx, and that will be disruptive enough to be put off until it is
unavoidable.
SpHyDir has not dependencies on other programs, through it is distributed with
a few utilities (GBM, RCS) that have proven useful. However, the new release of
GOSERVE (2.30) is getting too slick to ignore. Since GOSERVE uses Rexx, and
SpHyDir is written in Rexx, a closer relationship should be worked out between
the two programs.
Currently, when SpHyDir wants to test a document, it calls Web Explorer or
Netscape with its local file name. However, hyperlinks from this first document
will not work if they are designed for another server and have fully qualified
URL's or if the document contains a BASE statement with the production server's
name. This made support for BASE a long requested but seemingly impossible
objective.
Any machine running SpHyDir can probably run GOSERVE in the background. GOSERVE
can be told to serve documents out of the HTMLLIB directory tree. The final
trick is to override the name of the production server machine with a pointer
to the local loopback IP address. This can be accomplished by adding an entry
in the \TCPIP\ETC\HOSTS file with values like:
127.0.0.1 sphydir
127.0.0.1 pclt.cis.yale.edu
If TCP/IP is set to check the HOSTS file first, then with this entry any URL
for "http://pclt.cis.yale.edu/pclt/sphydir/status.htm" will be redirected to
the GOSERVE running on the local machine, which will then fetch
pclt/sphydir/status.htm out of the HTMLLIB directory. BASE will then work, and
all the URL's that point back to PCLT work. Of course, to FTP files to the real
PCLT server I need to use a different alias for that machine (or temporarily
change the HOSTS file).
GOSERVE also provides an environment to test FORMS and CGI-like programming.
Some days it seems like an HTML form displayed on the Web Explorer window would
be a more flexible way to choose options and configure SpHyDir or document
objects than popup VX-Rexx windows.
In any event, future releases of SpHyDir may move toward almost requiring that
GOSERVE and either Web Explorer or seamless Netscape be running.
ΓòÉΓòÉΓòÉ 2.11. May 15 Release ΓòÉΓòÉΓòÉ
Document Objects now have a popup menu. Click the second mouse button to
display Open, Settings, Insert, Create Another, Mark, Delete, etc. Insert
provides a quick way to add a new Paragraph, Image, Point, or List without
going to the Toolbar. You can also Insert a Horizontal Rule for which there is
no tool (though you have been able to generate it with ALT-H for quite some
time). Create Another creates a second object like the current object (say
another paragraph or point). Mark duplicates the old Alt-L. Delete duplicates
the old Ctrl-D. Currently you can only Open the contents of the object
(duplicates current DoubleClick function). Settings is under construction.
ΓòÉΓòÉΓòÉ 2.12. May 8 Release ΓòÉΓòÉΓòÉ
SpHyDir now refreshes the title of a subdocument as it reads the parent file
in. Thus if you change the title of a subdocument, the pointer to it will be
changed the next time that the parent (or the entire tree) is processed.
ΓòÉΓòÉΓòÉ 2.12.1. Test (F5) ΓòÉΓòÉΓòÉ
The File-Test (F5) operation now dynamically communicates to running copies of
popular browsers. SpHyDir first generates a temporary copy of the document in
TEMPDOC.HTM in the HTMLLIB root directory. In previous releases it then started
Web Explorer to view the document. Now this is the last alternative. Before
launching a new WE, SpHyDir now tries two things:
1. SpHyDir first attempts to establish a DDE link with a running copy of
Netscape. For this to be useful, Netscape should be running in seamless
WINOS2 mode. SpHyDir passes Netscape a request to display the Tempdoc
file.
2. If Netscape is not found, then SpHyDir looks through the windows on the
screen for a running version of Web Explorer. If one is found and it is
already viewing an older version of TEMPDOC, then SpHyDir sends it a F5
to refresh the document. If it is running and viewing something else,
SpHyDir enters the name and path of TEMPDOC in its entry area and sends
an Enter key.
Netscape has known problems running seamless on OS/2. Fortunately, the most
serious issues involve user interaction with the menus. By controlling
Netscape from SpHyDir, interaction is minimized. However, the code to control
both Netscape and WE is new and may require some fine tuning.
ΓòÉΓòÉΓòÉ 2.12.2. The BR Object ΓòÉΓòÉΓòÉ
The design of a <BR> object finally became clear when a user reported by EMail
that he was having trouble with Netscape extensions. Normally, an Image either
appears by itself or is aligned with a single line of text. The Netscape
ALIGN=LEFT (which SpHyDir has "supported" from the start) causes multiple lines
of text to flow through the space left to the right of the image.
Unfortunately, this option doesn't just flow text. it also "sucks up" any
following images. The Netscapism for breaking this pattern and starting the
next line under the image is to add a <BR CLEAR="ALL"> tag.
I have previously noted the need for a <BR> object to separate buttons in a
form. The problem ws to distinguish when reading in the HTML a <BR> acting as
an object from a <BR> in the middle of a paragraph that acts instead like a
character or as a CR/LF pair. Although SpHyDir has tried to avoid non-standard
HTML, it seems very compact to declare that <BR CLEAR="ALL"> would be
recognized as the Object and plain <BR> as the character.
Browsers should ignore attributes they don't understand. The only problem
occurs if you try to validate HTML that contains Netscape extensions with a
validator looking for HTML 2.0. If you don't want <BR CLEAR="ALL">, then don't
create the object. Incidentally, to create such a tag, position at the next
object and press Alt-B. The BR Object will be positioned in front of the
currently selected object. For the most part, the BR and HR objects are very
similar. Neither has a specific icon at the moment.
ΓòÉΓòÉΓòÉ 2.12.3. Backup using RCS ΓòÉΓòÉΓòÉ
Backup of HTML files has been a serious issue. First, SpHyDir cannot be subject
to terribly aggressive testing between weekly releases. If a syntax error
occurs, SpHyDir can abort in the middle of writing a file. If SpHyDir
encounters HTML that it doesn't recognize, information can be lost. The
previous strategy of saving the old copy of the file in the BACKUP directory
addressed only part of the problem. SpHyDir now introduces the heavy artillery.
For other text files, the most powerful free software system is the RCS version
control package from Unix. When fully exploited, it allows several people to
check out and work on files in a shared library. It remembers changes to the
file and who made the changes. It is possible to reconstruct older versions of
the data if something goes wrong.
Use of RCS is optional. If it is not used, SpHyDir continues to make a copy of
the previous version of the file in the BACKUP subdirectory.
For each original data file, RCS builds a control file that keeps a copy of its
current contents and the information needed to recover any previous versions.
The first version is "1.1" and each time the file is changed a new version
number is generated.
By default, RCS will archive f:\pclt\sphydir\status.htm in a control file named
f:\pclt\sphydir\RCS\status.htmv. That is, it stores files in the RCS
subdirectory of the path where the data file is found, and it adds the suffix
character "v" to the file type.
Some future version of SpHyDir may maintain enough variables for a document to
allow individual decisions about what to manage under RCS. Currently, however,
if you use RCS at all you have to use it as the backup for the entire library.
SpHyDir is triggered to use it if there is an RCS subdirectory under the
HTMLLIB root. On my machine, "f:\pclt" is the the HTMLIB for PCLT articles, so
SpHyDir looks for "f:\pclt\RCS" to decide to use RCS, for all of the documents
in all of the directories under f:\pclt.
Unexpectedly, RCS will not create the needed subdirectory automatically, and it
will not quite work correctly if the subdirectory doesn't exist. A future
version of SpHyDir may fix this, once I develop more confidence about the best
arrangement. For now, manually create RCS subdirectories throughout your
HTMLLIB tree if you intend to use this facility. If you forget, RCS will create
the control file in the same directory as the data file and you can create the
RCS subdirectory and move the file to it later on. This is not a problem on
HPFS volumes, but it could present an issue on FAT directories where "HTMV"
might get truncated to "HTM". RCS backup is probably not a good idea for
SpHyDir users with only FAT directories.
Before generating new HTML, SpHyDir backs up the previous version of a file by
issuing the command:
ci -xv -l - m"backup" -t-"backup" xxxx.htm(l)
This runs CI.EXE (Check In) of the RCS version control system. The -m and -t
parameters provide dummy log messages so the program does not prompt for a
description of changes or of the file. The -l parameter checks the file back
out immediately (so that it remains in the library and can be rewritten). The
-xv adds a "v" letter on the end of the file type (htmv or htmlv) to provide
the file type of the RCS control file.
SpHyDir does not check the PATH for a copy of the RCS executables. It tries to
use RCS based on the existence of a directory in HTMLLIB. It is the user's
responsibility to install RCS on the OS/2 system before using this facility.
Unzip the RCS567PC.ZIP distribution file and copy the contents of the BIN32
subdirectory to a library in your PATH. RCS is now available on the same FTP
file servers as SpHyDir itself.
SpHyDir is using RCS to provide a super safe backup, not to do true version
control. There is no provision to check out locked files, or to check in and
unlock a final version. However, the user can build real version control
outside SpHyDir by issuing RCS commands before or after running SpHyDir to
provide real parameters and version numbers.
RCS is a serious system with some heavy duty manuals. SpHyDir's only direction
function is to call CI.EXE to generate the backup. Comparing different versions
of a file, or recovering old versions from the backup, requires direct use of
the other RCS commands. RTFM. If RCS appears to be too complicated, feel free
to continue to use the old SpHyDir BACKUP.
Use of RCS was requested in E-mail by a user several weeks ago. Initially it
seemed like a really bad idea. The problem is that RCS and all the other
version control systems have this view of tracking changes by line. Since
SpHyDir only puts a CR/LF line break at the end of paragraphs, it appears to
have really, really long lines. Change one word, and RCS regards the entire
paragraph as changed. Flowing the text into 80 character lines would not make
much difference, because any change in one section will flow changes onto all
the subsequent lines of the paragraph. However, there are no better version
control mechanisms, and after some consideration the long lines do not appear
to be unworkable.
Before reporting any bugs, please realize that the version being "checked in"
to RCS is not the version on the screen. What is being checked in is the old
version on disk from before the current edit session. So if you read a file in,
make a ton of changes, press F2 to save it, and look at the Console window, do
not be surprised it it reads:
F:\PCLT\sphydir\RCS/STATUS.HTMv <-- F:\PCLT\sphydir\STATUS.HTM
file is unchanged; reverting to previous revision 1.3
It is not saying that there are no changes in the current version, just that
there were no changes in the old version that you are just about to replace.
RCS is a Unix utility that has been ported to the OS/2 environment using the
EMX package. EMX is a version of the GNU development tools and the GCC
complier. These tools are normally found in the /unix subdirectory of the OS/2
files at ftp.cdrom.com and ftp-os2.nmsu.edu. The minimum files needed are the
EXM runtime DLL library (emxrt.zip) and the RCS distribution (rcs567pc.zip).
They will also be added to the SpHyDir FTP directory.
This facility is currently more "experimental" than the rest of SpHyDir, so its
output is not captured. The CI command writes to "standard output" and VX-Rexx
captures that file and displays it in the VX-Rexx Console window. If the
service seems to work well, this output will be supressed in a future release.
Meanwhile, the Console window will have to be manually closed when SpHyDir
ends.
ΓòÉΓòÉΓòÉ 3. SpHyDir Project Objectives ΓòÉΓòÉΓòÉ
Produce the highest quality HTML documents automatically. Upgrade
obsolete syntax to current "Recommended" practice. Add additional markup
to get the best possible results on all known viewers. Provide a
transition to new standard features.
Build larger documents from many small, structurally interrelated
hypertext files. Easily generate links to other files, to target lables
in certain files, and to remote documents (by extracting URL references
while a Browser displays the remote file).
Present an entirely different approach to Web document construction.
Microsoft and Word Perfect have interfaces from their Word Processors.
Oracle promises an interface from Oracle Book. HTML editors are being
written all the time. If SpHyDir isn't completely different (and IMHO
better) then it isn't worth the effort.
Automatically generate navigational links, copyright notices, and other
standard features at the beginning or end of every document.
Support all the features of HTML 3.0 and common Netscape extensions.
Provide for user extensions both to document structure and library
management.
Provide direct links to Netscape and Web Explorer to test document
changes. SpHyDir is not WYSIWYG, but you can immediately format what you
are structurally editing to see how it will look.
Simplify the construction of data entry forms (entry fields, check
boxes, radio buttons, push buttons) and tables.
SpHyDir II supports all the tags and attributes of the current HTML 3.0 draft
standard. SpHyDir should be able to process any Web document that uses these
features correctly and in context. However, HTML is a formatting language and
SpHyDir is a document structure tool. Incorrect syntax, or the use of a tag
out of context to achieve a particular effect, can confuse the analysis. In
particular, the use of <H6> to get "fine print" where no heading is actually
intended will certainly produce bad results.
SpHyDir II is Object Oriented. It examines an input HTML document and
produces a tree of Objects that corresponds to the apparent document
structure. The simplest Object is a paragraph that contains text. Other
objects include the Image (for inserted graphics), order and unordered Lists,
Tables, Forms, Horizontal Rules, etc.
Each object has properties. There is a fairly close tie between the
Properties of an Object (in SpHyDir) and the Attributes of a Tag (in HTML).
When an object is selected in the tree, its properties are displayed in the
Properties Table. This behavior is intentionally modelled on tools like Visual
Basic and Delphi. However, since most HTML attributes have default values that
can be ignored, the SpHyDir properties table only shows the items that have
been assigned an explict value.
The casual user can concentrate on text, graphics, and basic document
structure (sections, lists of points, hypertext links). If more advanced
features become needed, SpHyDir can display all the legal properties that any
object is permitted. For example, any Paragraph can have an ID (jump-to
label), ALIGN (LEFT|CENTER|RIGHT|JUSTIFY), CLEAR (LEFT|RIGHT|ALL), and NOWRAP.
SpHyDir will list the common standard values but allows the user to type in
other values (such as entering "100 pixels" as the value of the CLEAR
property).
With this approach, SpHyDir doesn't require the user to be familiar with
HTML, but it also doesn't prevent the HTML expert from using the more obscure
language options. The author can "ease into" advanced features.
ΓòÉΓòÉΓòÉ 4. The SpHyDir Idea ΓòÉΓòÉΓòÉ
To create a personal home page or an ad layout, one must concentrate on
graphic layout. To publish a large body of useful, interrelated information on
the Web, it is more important to focus on content and the organization of the
entire library. This is the purpose of PC Lube & Tune and so it is the design
objective of SpHyDir.
The SpHyDir program icon is configured with the path to a library of Web
files. SpHyDir will only edit files and build links to the subdirectories that
fall under that starting point (the "HTML Library"). To edit a file, drop its
icon into the SpHyDir workarea window.
SpHyDir reads in the HTML and converts it to a sequence of Document Objects.
These Objects correspond to paragraphs, images, sections (chapters, topics),
numbered lists, bullet lists, tables, and so on. The Objects are arranged in a
tree, because the document contains chapters, the chapters contain paragraphs,
images, and lists, the lists contain points, and so on.
Most of the objects that SpHyDir creates correspond directly with elements of
the HTML language. A few have to be invented and several more have to be
guessed. The future HTML standard (3.0) will include Divisions that break the
document up into chapters. Current HTML (2.0) doesn't support this, and few Web
documents include the HTML 3 features. So SpHyDir has to invent the "Section"
object by looking for Header tags that are part of the 2.0 standard. The
assumption is that a Header normally starts something. Therefore, everything
after a Header (up to the next Header of the same type) must be a Section of
the document.
At first the SpHyDir objects may seem a bit awkward. Dividing everything up
formally into paragraph objects and list objects is more precise than normal
word processing. However, once SpHyDir has decomposed the original HTML into
document objects, and those objects have been updated, SpHyDir is now in a
position to generate a document with flawless, precise HTML syntax. There are a
lot of erroneous documents in the Web. Some documents display correctly on one
browser but are wrong on another browser. Few Web authors are HTML experts, and
there are many misunderstandings. SpHyDir converts the HTML to something that
most people instinctively understand (chapters, paragraphs). In many cases it
will upgrade obsolete or "deprecated" HTML elements to current "recommended"
use.
ΓòÉΓòÉΓòÉ 5. SpHyDir is not for Everyone ΓòÉΓòÉΓòÉ
HTML marks up documents so that they look good. SpHyDir assumes that the
markup corresponds to valid document structure. Some things display nicely but
are impossible to structure. For example, because <H6> produces very tiny text,
it is sometimes used to get "fine print":
<H2>Lease a new car for $200 a month<H2>
<H6>engine not included<H6>
SpHyDir requires that all H1..H6 tags be used to start sections. Also SpHyDir
doesn't preserve the heading numbers, just their relative position compared to
each other. In the previous example, SpHyDir would change the H6 to an H3
because that is the next number down from H2.
A large number of Web documents have invalid HTML. They display as intended
because the browsers don't complain about errors that do not effect formatting.
For example, when someone wants to print in big letter, they frequenly use
heading tags:
<H1>Get Rich Quick<P>Act Now<P>Limited Time Offer</H1>
<P> tags are not permitted inside a header, but most browsers tolerate this
construction, using H1 to change font and /H1 to revert back to normal size.
SpHyDir expects Headings to be a simple character string as the standard
specifies. Paragraphs are other types of objects, and headings cannot contain
objects.
SpHyDir II attempts to include almost all the valid syntax in HTML 3.0 and
Netscape. The Math support will be omitted for a very long time. The FIG
structure will be supported when it is more widely used by browsers. Netscape
extensions will not be supported when they seem to directly overlap more
appropriate HTML 3.0 constructs.
HTML goes through revisions. Old constructions that have been replaced are
called "deprecated" in the standard. An even tighter reading of the standard is
called "recommended." SpHyDir reads the HTML in, understands it, and then
generates new HTML based on the structure. It can automatically upgrade old
"deprecated" files to "recommended". For example, it will automatically convert
<MENU> and <DIR> to <UL> and will convert <XMP> and <LISTING> to <PRE>. If you
want to keep the old stuff as is, then SpHyDir is not the right choice.
There are some constructions that the HTML standard permits, but maybe only
because the DTD language in which the standard is written cannot express
certain rules well. SpHyDir requires that a Definition List have sequences of
one term (DT) and one definition (DD). The Definition can have multiple
paragraphs. The sequence:
<DT>canned
<DD>packaged in a can
<DD>fired from a job
appears to be techically valid. It even has a certain obvious meaning (one term
with two definitions). The HTML DTD standard says that a <DL> tag can only have
<DT> or <DD> contents, but it doesn't specify how many or in what order. Some
very bad HTML uses <DL><DD> <DD> </DL> to get a certain level of indentation.
If you like this sort of thing, find another editor.
SpHyDir "understands" tag names and attributes. The name is the part of the
tag that follows "<" and the attributes follow the name as either a keyword or
keyword, equals sign, and a value. If SpHyDir doesn't explicitly support the
tag name, it copies the tag as ordinary text. If it understands the tag name
but not the attribute, it discards the attribute.
HTML 3.0 has introduced some attributes whose use is unclear. There is, for
example, a LANG attribute that may assign an ISO standard abbreviation for the
language and country. According to the standard, "it can be used by the parsers
to select language specific choices for quotation marks, ligatures, and
hyphenation rules". It is not really clear that this is useful. There is a much
stronger requirement, for changing from Latin 1 to other character sets, which
is not addressed by this description. SpHyDir may choose to skip features of
the HTML 3.0 draft that are unclear or appear poorly thought out. If any user
needs an attribute that has been omitted, please E-mail the author with a
description of its use.
SpHyDir builds its internal tables keyed to the tag, object, and attribute.
Unfortunately, several attribute names have meanings that depend on context.
The NAME attribute can be a variable name (in FORMS related objects) or it can
be the label of a jump (in the <A> tag). The ALIGN attribute has one set of
values for an IMG, a second set for CAPTION, and another set for Paragraphs,
Headings, and Divisions. The worst thing, however, is that ALIGN is also a
switch that appears with no value in tables. In some contexts SIZE means WIDTH
while elsewhere it is HEIGHT.
There is no way that SpHyDir can ever make sense out of this mess, but it will
try to "correct" some of these ambiguities for the normal end user who is not
an HTML expert. Near term, SpHyDir may offer to generate HTML attribute values
that are not valid for the attribute name used in its current context.
SpHyDir is not written tightly enough to trap its own syntax errors and
recover. Rexx simply stops the program when it encounters a problem. Since Rexx
is an interpreted language, syntax errors may only be detected during
execution. When the program aborts, it can leave the output file half-written.
This is the primary reason for making a backup of the previous copy of the file
before generating a new copy.
ΓòÉΓòÉΓòÉ 6. How to Get SpHyDir ΓòÉΓòÉΓòÉ
SpHyDir is a copyrighted program which is a personal project and property of
the author. It is made available on the network and may be used free of charge
under a license terms distributed with the package. Essentially, you agree to
leave in all HTML documents produced by SpHyDir the credit that appears at the
bottom of all of these Web pages: "This document generated by SpHyDir, another
fine product of PC Lube and Tune."
This arrangement is called "Personal SpHyDir." If a large organization wants
to generate more professional looking documents and omit the credit, other
licensing arrangements can be made with the author.
The following references are correct. They work with Web Explorer and Netscape
and conform to current HTTP and HTML standards. If they don't work on your
Browser, get a better Browser. Otherwise, you can fetch the files with FTP from
pclt.cis.yale.edu. They are in the SPHYDIR subdirectory of PUB. If you have
trouble with your browser, then read the trailing tutorial on Web handling of
binary files to figure what went wrong.
With a good browser, just select the name of any desired files and save them
to disk on your machine. All are compressed with the ZIP utility from the
INFOZIP project.
SPHYDIR.ZIP - The basic SpHyDir package. Includes the program, some
sample External Rexx "XSpO" scripts.
VROBJ21C.ZIP - The VX-Rexx 2.1 runtime library at Patch Level C
(VROBJ.DLL). This Dynamic Link Library must be in one of the directories
listed in your LIBPATH statement in CONFIG.SYS. This file is also
required for many other freeware and shareware packages, so you may
already have a copy of this file. After June 23, SpHyDir will be
generated with the "C" version of this module, and may complain if it is
started on a system with only the "B" level of the runtime. Current
information about VX-Rexx is available from the vendor Watcom.
SPHYDOC.ZIP - A copy of all these HTML pages and their associated GIF
files. Unlike other PCLT documents, the SpHyDir documents may be
downloaded and copied. This provides a good example of lots of SpHyDir
use.
GBM.ZIP - A freeware package written by an IBM employee and distributed
through a number of sources. This OS/2 program converts between a number
of popular image formats (GIF, TIFF, XBM, BMP, etc.) and can crop or
resize images. Use this package to convert BMP or Clipboard images into
GIF suitable for including in a Web document.
RCS is a programmer's Revision Control system ported to OS/2 from Unix.
It archives updates to a source file and keeps a change history. You can
display differences and recover any previous version of the file. SpHyDir
doesn't require its use, but a professional HTML editor quickly learns
the value of keeping a history of all document updates.
For a simple and configurable Web server that can run on the same OS/2
machine, PCLT recommends the GOSERVE package from Mike Cowlishaw. There is
also a supplimentary collection of routines named GOHTTP that adds better CGI
and forms support from D. L. Meyer.
ΓòÉΓòÉΓòÉ 6.1. Distributing Binaries through the Web ΓòÉΓòÉΓòÉ
Fetching a ZIP file through the Web should be a trivial matter. Unfortunately,
a number of popular Browsers (particularly NCSA Mosaic) don't do a reasonable
job of handling such files.
Web Servers support the HTTP (HyperText Transfer) Protocol. The first version
of HTTP (0.9) simply transmitted Web files back to the reader. The current
standard (1.0) preceeds each file with a statement of its data type in Internet
MIME style. This allows the Browser to distinguish between HTML, plain Text,
ZIP binaries, and MPEG movies.
Web Browsers can also read files using the FTP protocol. With FTP, the server
doesn't provide any indication of the data type, but the file name contains an
extension that usually indicates the type of data (*.ZIP, *.JPG, *.GIF, etc.).
In the early days of the Web, HTTP was generally used to distribute HTML files,
and FTP was generally used to distribute other binary formats.
No Operating Systems record the MIME file type in the disk directory. So most
HTTP servers look at the file type and create a MIME data type based on the
extension of the file requested. Thus if a browser fetches SPHYDIR.ZIP using
FTP, it will decide that it is a ZIP file because of the *.ZIP extension, but
it it fetches the same file using HTTP from the same server, the the Server
will look at the *.ZIP extension, decide that it is a ZIP file, send the MIME
header with that information, and the Browser will react accordingly.
The problem is that a lot of Web Browsers have developed the convention that
anything that comes over HTTP protocol should be either displayed on the screen
or played through the speakers, while files that come over FTP can be saved to
disk if they have a file extension that makes that seem right. Nothing in the
standards says any such thing. Architecturally, a URL can call up ftp:,
gopher:, or http: protocols to fetch a file. What you do with the file should
then be determined by the type of data and not by the protocol used to fetch
it. But it is hard to convince some Browsers to save a ZIP file to disk if it
came over HTTP protocol. In most cases, the ZIP file is actually on disk in the
Browser's CACHE directory, but it may be hard to find. When it doubt, fall back
on plain FTP.
ΓòÉΓòÉΓòÉ 7. Using SpHyDir ΓòÉΓòÉΓòÉ
This section will explain:
Object content , properties , and links
Steps to install SpHyDir
How to begin editing existing HTML files
How to create a new HTML file
How to edit text in paragraphs or headings
How to delete document objects
How to add new sections, paragraphs, lists, etc.
How to link to other files or remote Web documents
How to save the document and exit
How to move paragraphs and sections around
SpHyDir converts the document to a sequence of objects. Each object has three
attributes: content, properties, and links.
Content
The content of a file is the text or program contained in that
file. A word processing file may also contain formatting
information. The content of a SpHyDir Paragraph Object is the text
of the paragraph, along with all the HTML language features that
operate at the level of words or characters and therefore cannot be
turned into larger objects. To access the content of a file,
doubleclick its icon on the desktop. To access the content of a
SpHyDir paragraph, doubleclick its icon. This opens the SpHyDir Text
Edit Window.
For convenience, it is also possible to doubleclick other objects
that have Headings, Titles, or Captions. This opens a dialog box
that allows the heading to be changed. For example, doubleclicking
the point in a Definition List allows the defined term to be edited.
Although the Heading, Title, Caption, or Term can be opened as if it
was contents, these features are really Properties and can also be
viewed that way.
Properties
In Visual Basic or Delphi, each GUI object has Properties.
Properties include the size, location, font, color, enabled status,
and caption. SpHyDir objects have Properties that derive from HTML
attributes. They may include horizontal alignment (LEFT, MIDDLE,
RIGHT), size, source file (for images), label, shape, and so on.
HTML 3.0 creates all sorts of attributes for each type of object.
The SpHyDir workarea has a Properties Table that displays the
current meaningful values of Properties for the currently selected
document object. Click on a different object, and the table changes
to reflect the new object.
If the user doubleclicks a Property line in the Properties Table, a
dialog box appears that allows the value to be changed. Selecting a
property and clicking the Second Mouse Button pops up a list of the
common values that the Property can take (if it is associated with a
list of alternatives). The doubleclick dialog box allows a property
to be set to values that are not in the popup list if the user is
familiar with extended syntax.
Clicking with the Second Mouse Button in the unused part of the
Properties Table pops up a list of Properties that are valid for the
object. At this time, SpHyDir does not allow the user to add
Property names to an Object, so if SpHyDir doesn't support an HTML
attribute, send a note to the author .
Links
A link connects an Image or section of text in a paragraph to
another file in the library or to a remote network resource. Links
are formed by dropping the shadow of a file, a Web Explorer URL
object, or Link Manager database entry on a document object. If such
an object is dropped on a paragraph, the Hotword Selection window
opens. Highlight the word or phrase associated with the link and
click the OK button.
The SpHyDir user interface has been modelled on the native behavior of the
OS/2 Workplace. A document object can be deleted by dropping it on the OS/2
Shredder. Clicking the Second Mouse Button generates a Popup menu of
operations on the object. However, SpHyDir document objects are not files, so
they cannot be printed by dropping them on a printer, nor can they be moved to
a folder.
ΓòÉΓòÉΓòÉ 7.1. How is SpHyDir Installed? ΓòÉΓòÉΓòÉ
Although SpHyDir is a big Rexx program, through the magic of the Watcom
VX-Rexx Development Environment is it packaged as SPHYDIR.EXE. It can be placed
in any program library. The VX-Rexx runtime module VROBJ.DLL must be located
somewhere in the LIBPATH, but there are so many VX-Rexx programs in use that
this step may have already been performed.
SpHyDir is distributed with a number of useful freeware utilities. The GBM
package from IBM can be used to view and convert bitmaps files. The RCS package
can be used to maintain a log of changes to the HTML files. Neither is required
for SpHyDir to work, but they are helpful.
Make a copy of the production library of HTML files on the OS/2 machine that
will be doing the editing. The PCLT library, which appears to be
"http://pclt.cis.yale.edu/pclt/" to Web Browsers on the Internet, is
"D:\HTTP\PCLT" on the NT machine that acts as the server. The "D:\HTTP" is
configured to the server program as the starting point for all HTTP file
references. The OS/2 machine on which the files are prepared stores a copy of
the files in "F:\HTTP\PCLT" and establishes "F:\HTTP" as the "HTML Library" for
SpHyDir editing. This is also configured to GOSERVE as the starting point for
HTML service. Files are edited and tested on the OS/2 machine, then transferred
to the production server.
The SpHyDir library can be specified with the SET HTMLLIB environment
variable. Otherwise, it will be taken as the active directory when SpHyDir
starts. SpHyDir can be configured to operate on several different file
structures by creating several SpHyDir program objects, each with a different
initial current directory.
SpHyDir remembers parameters such as window size and location in the file
SPHYDIR.INI in the root directory of the library. Therefore, if there are
several SpHyDir program objects with several directories, each will have its
own version of the saved parameters.
ΓòÉΓòÉΓòÉ 7.2. How to load an HTML document ΓòÉΓòÉΓòÉ
There are three ways to start SpHyDir on an existing HTML file.
1.
If the SpHyDir Workarea is open and is either empty or the previous file
can now be discarded, then drag the WPS icon of the file over and drop it
in the whitespace of the workarea (not on any individual icon or
caption). SpHyDir will abandon any old file and will read in the new
HTML.
2.
If SpHyDir is not running, but a WPS Program Object has been constructed
for it, then drop any WPS Icon for an HTML file on the SpHyDir program
icon. SpHyDir will start up and read in the file.
3.
It is possible to associate a Program Object for SpHyDir with files of
the type *.HTM or *.HTML. Then SpHyDir will be automatically launched
when any such file is opened. However, this is not always the right
thing. Sometimes it is useful to launch Web Explorer to view the file
after it has been formatted. It is also sometimes useful to read HTML
files into the System Editor and few the raw tags. So SpHyDir is not the
only tool that can be used to view such files.
There is also one common practice that will not work. The File pulldown menu
does not have an Open option. This is a personal choice of the author and may
be regarded as part of the program design. Choosing a file by name from the
standard file dialog is a lot less attractive than drag and drop from the
Workplace folders.
ΓòÉΓòÉΓòÉ 7.3. How to create a new HTML file? ΓòÉΓòÉΓòÉ
Drag the first tool (the one that looks like a book) from the upper left
corner of the toolbar and drop it on the whitespace of the Workarea. SpHyDir
clears the workarea to start a new document. A window pops up asking for the
filename. Type the name as though it were a part of a URL. For example, type
"sphydir/sample.htm" to create a new file in the "sphydir" subdirectory of the
current library.
ΓòÉΓòÉΓòÉ 7.4. How to Edit Text (and Stuff) ΓòÉΓòÉΓòÉ
In the Workarea, the document is represented by icons. The text is displayed
to the right of the paragraph icons as a "caption". OS/2 allows captions to be
edited, but this doesn't provide a very nice environment. It is much simpler to
"Open" the paragraph by doubleclicking on the icon or on the caption.
Opening the paragraph displays the Text Edit window. The paragraph text is
loaded into a Multiline Edit control. Words wrap automatically to the next
line. A very large paragraph will activate the vertical scroll bars. Within the
Text Edit window, the text is just ordinary ASCII data. It can be Cut or Copied
to the OS/2 Clipboard, and characters from other programs can be pasted into
the paragraph. All the usual rules about selecting text and using special keys
like Del and End hold within the Text Edit window.
Although word processors and the EPM editor can display text in multiple fonts
and colors, a simple Multiline Edit control can use only one font. This is not
a terribly serious problem, because HTML overpowers the ability of any WYSIWYG
editor. Although it is fairly simple to do italics and bold, how can any simple
editor distinguish formats labelled EM, STRONG, CODE, SAMP, KBD, VAR, CITE,
DFN, PERSON, ACRONYM, ABBREV, BIG, and SMALL. These are the resonable named
forms of character emphasis. HTML 3.0 threatens to add another dozen even more
obscure types of character tagging.
Special functions are represented in the Text Edit window by special "dingbat"
characters that are not part of any standard Web character set. There are four
of these dingbat functions:
1.
When a file or URL is linked to a "hotword" through the Link Manager, a
pair of inward pointing triangle characters bracket the link. If SpHyDir
could make the triangles Read-Only it would. Editing or deleting these
characters can cause trouble, because the URL for the link is stored
separately. To remove a link, display the Link Manager window, select the
URL for the link, and delete it. However, if the two triangles are left
alone, the text in the middle can be changed to alter the hotword phrase.
2.
Character emphasis, and for that matter, any unrecognized tags in the
original HTML, are embedded in the text. Because the "<" and ">"
characters are presented as normal text, they must be replaced with
dingbat characters. Thus "<CITE>Debt of Honor</CITE> by <PERSON>Tom
Clancy</PERSON>" is going to appear pretty much as seen here, except that
each "<" will be replaced by an upward pointing triangle and every ">"
will be replaced by one that ponts down. The simplest way to apply
character emphasis is to select text with the mouse and then use the
"Emphasis" menu in the Text Edit window to generate the appropriate tag.
Unlike the previous case of hotwords, character emphasis has no special
structural significance. These tags can be edited to change the type of
emphasis or the tag can be deleted entirely.
3.
HTML defines named sets of foreign, math, and special use characters. In
HTML, such a character is referenced by an Entity. An Entity reference
starts with "&" and then continues with the name of the character. A
semicolon delimits the end of the Entity if it is immediately followed by
normal characters. For example, the copyright symbol Γòò is represented as
"©". SpHyDir wants to allow users to type the "&" character as
needed, so when the HTML is read in the "&" is replaced by the special PC
character that looks like a "smiley face" and has the numeric value 1. As
with all special PC characters, it can be generated from the keyboard by
holding down the Alt key and typing the number on the keypad. So to
generate a copyright character, hold down Alt, press 1, release Alt. A
smiley-face now appears on the screen. Type "copy;". When the paragraph
is written out as HTML, the smiley-face will be turned into an "&" and
the browsers will display the Entity correctly.
4.
Small Images can be embedded in the middle of text. Normally this is
used for icons. SpHyDir represents this with a dingbat that looks like a
box with a circle in the middle, something like "[o]". To generate such a
reference, hold down Alt, press "8" on the numeric pad, release Alt. Now
type the file name of the icon/image. Optionally type a space and then
alternate text. End by repeating the Alt "8" sequence.
All other information is a property of the object and appears in the
properties table at the top right corner of the workarea. Each property has a
name and current value. Properties are either character strings, numbers, or
choices. To edit a string or number property, hold the ALT-key and click on
the old value. When a property must have a value from a list, it may be faster
to click on the property with the second mouse button. The list of available
values will then popup as a menu and a new value can be selected.
ΓòÉΓòÉΓòÉ 7.5. How to delete objects ΓòÉΓòÉΓòÉ
The WPS approach is to drag the object to the desktop Shredder. The keyboard
approach is to select the object and press Ctrl-D. The Mouse approach selects
the object, presses the second mouse button to popup the menu, and selects
"Delete to Clipboard".
It would be reasonable to expect the Del key to delete things. Unfortunately,
Del is a character delete key and is used to correct mistyping in other
windows. It is difficult to know exactly which window has the focus. A few bad
experiences where an attempt to correct a mistyping accidentally deleted whole
sections of a document suggested that Del was simply too dangerous as the
Object Delete key.
The Delete operation applies to everything contained within the selected
object. Delete a list and all the points and paragraphs are also deleted.
A user complained that the Second Mouse Button menu made it too easy to delete
objects. To address this problem, a previous simple delete was converted to
"Delete to Clipboard". As explained elsewhere, SpHyDir maintains a special
Clipboard-like window that is able to hold the special document objects. Delete
to Clipboard makes a copy of the objects in the clipboard window and deletes
them from the Workarea. If a mistake was made, they can be restored by
selecting a location and pressing Shift-Ins.
ΓòÉΓòÉΓòÉ 7.6. How to Add to the Document ΓòÉΓòÉΓòÉ
At the top left of the workarea window there are a set of icons alternately
viewed as the "Toolbar" and as "Templates". They act to the document much like
the OS/2 Template folder acts to the rest of the workplace. Drag the icon for a
Section, Paragraph, Image, List, or other tool and drop it where you want the
new element to go. This creates an empty element that needs to be filled in.
Objects can also be inserted by pressing the Second Mouse Button and choosing
Insert from the popup menu. The menu includes the most commonly used objects
(paragraphs, points, and lists) and some less common objects for which there
was no room in the toolbar.
When the paragraph tool is dropped, the Text Edit Window opens and you
may type text. You can also paste text from the Clipboard. At the bottom
of the Text Edit Window there are buttons. One is labelled "New".
Pressing the New button saves the current text in the current paragraph
and creates a new empty paragraph into which the next information can be
typed.
When the image tool is dropped, it creates an empty Image object. Drag a
GIF file over from one of the OS/2 disk folders and drop it on the object
to assign a file. Add alternate text in the entry field at the top of the
workarea. Pushbuttons allow you to select the alignment of the image with
any trailing text.
ΓòÉΓòÉΓòÉ 7.7. How to link to other files? ΓòÉΓòÉΓòÉ
To create a hypertext link to another file in the HTML library, open the WPS
folder to display the file object. Hold down Ctrl-Shift and drag the file as if
you were going to make a shadow of it on the desktop. Drop the icon on a
Paragraph or Image object.
If you drop on an image, then there is no further work. Images can have only
one hypertext link and the entire image is the link. If you drop on a
paragraph, then the Hotword Selection window opens displaying the available
text. Drag the mouse to "select" the word or phrase that will represent the
link. Click the OK button. Hotwords are delimited in the text by an opening and
closing triangle character. You may change the contents of the hotword area,
but do not delete the funny triangle characters or SpHyDir will get confused.
SpHyDir has a Link Manager in the list of Windows to handle other types of
links. When the Link Manager window is visible, the top list box displays the
URLs of links from the currently selected Workarea document object. The lower
list box can be used to select other links. There are two buttons on the
bottom. Pressing the button with a Web Explorer icon displays all the items in
the current Web Explorer hotlist. Pressing the Target button displays all the
target labels in the current document tree. Select an element from the list,
drag and drop it to a paragraph or image in the workarea.
The Web Explorer will also produce URL objects in the Workplace. Dropping a WE
URL object onto a Workplace Object will also create a link to the referenced
object.
SpHyDir provides XSpO programs to form links. An eXternal Sphydir Object is a
Rexx program that can be dropped on SpHyDir windows. XSpO's provide a way for
Rexx-literate users to customize or extend SpHyDir without mucking in the
source. One XSpO can be dropped on the lower Link Manager list box and
duplicates the function of the Web Explorer button. It can be used as a model
for programs that extract hotlists from other sources. Another XSpO is dropped
on a Paragraph or Image. It searches through the windows on the destop to find
Web Explorer and extracts the URL of the current document that WE is
displaying. This provides a shortcut compared with creating a WE URL object and
then dropping it on the SpHyDir workarea. Another XSpO shows how to generate a
"Mailto" link.
ΓòÉΓòÉΓòÉ 7.8. How do I save the file and quit? ΓòÉΓòÉΓòÉ
When the workarea has the input focus, press F2 to generate HTML and continue
editing, F3 to quit without generating, and F4 to generate HTML and then quit.
The status message at the bottom of the window will indicate that HTML is being
generated and then has been written. If you press F2 and nothing happens, then
click once on the workarea to make sure it has the focus. If you try to quit
and have modified the file in the Workarea, a message will pop up asking
whether you want to Generate or Discard the file.
ΓòÉΓòÉΓòÉ 7.9. How do I move paragraphs around? ΓòÉΓòÉΓòÉ
You can drag an individual paragraph around, but only within the visible
window. To move more data, to move a greater distance, or to move between
files, there is special support to mark a range of objects, copy them to a
"clipboard" and paste them somewhere else.
SpHyDir wants to create the image of selecting a range of objects, moving them
around, copying them to the clipboard, and pasting them back into the file.
However, the native support for selection, movement, and the real OS/2
clipboard are not able to handle this problem correctly. Reluctantly, SpHyDir
has been forced to reinvent some of this infrastructure.
The user can select a range of objects to move within a document or to copy to
another document. First select one object and press Alt-L as if you were
establishing a "line mark" in the EPM or Kedit editors. Once you begin to mark
a section of the document, you may extend the marked area forward or backward,
but only within the current level of the document tree. The mark can be
extended over but not into lists or subsections. Nor can the start or end of
the mark be extended outside the section or list in which it is started.
Marking creates two new objects: Mark Start and Mark End. Initially these
objects are placed around the currently selected object when Alt-L is pressed.
The Mark objects can then be "slid" forward or backward along the line that
represents the current level of the tree. They cannot be slid into a subsection
or list (to a lower level) nor can they be slid outside the section or list in
which they started. The Mark can also be automatically adjusted by selecting
another object at the same level of the tree and pressing Alt-L again
(expanding the scope of the Mark just as additional lines are added in the EPM
editor when you move to another line and press Alt-L a second time).
Once a section has been marked, you can copy it to the Clipboard by pressing
Ctrl-Ins (the standard OS/2 keyboard sequence for Copy). However, the OS/2
Clipboard really doesn't know how to hold SpHyDir objects, so the same effect
is achieved by opening a new Window and copying all of the objects between the
two marks (including all the objects contained in subsections and lists) from
the workarea window to a second container that SpHyDir calls "The Clipboard".
In the current release, the Clipboard window becomes visible (for debugging)
though it can be minimized or can be dragged over to the side of the desktop.
The objects in the Clipboard can then be moved to another part of the original
document by selecting a destination object and pressing Shift-Ins (the OS/2
standard for Paste). They can be copied to another file by dragging another HTM
file to the workarea (replacing the original source document) and then pasting
from the Clipboard to a second document.
However, Clipboard objects cannot be copied to another part of the same
document. This fell out from the way the Clipboard got coded and, at the
moment, it seems to be a useful feature. When the user marks objects and
presses Ctrl-Ins, there were two programming choices. One choice copies all of
the objects to the Clipboard. The alternative creates what is essentially a
Shadow of the original record in the clipboard (what VX-Rexx calls a "shared
record"). Like the Workplace shadow, the two objects are actually different
views of the same data. If you were to edit the text of a paragraph after
copying it to the SpHyDir Clipboard, the text of the Clipboard copy would also
change. However, while a Workplace shadow cannot exist when the original is
deleted, a VX-REXX shared record continues to exist until all of its related
objects have been deleted. Thus the Clipboard copy of the data continues to
exist after the original object in the workarea has been deleted or the entire
document has been replaced.
A shared record can exist in two different containers, but there can be only
one copy of the record per container. By choosing to use Shadows in the
Clipboard instead of full copies, SpHyDir does not support duplicating large
blocks of text within the same Hypertext document.
When you select another location and press Ctrl-Ins, the SpHyDir Paste tries
to copy the shared record from the Clipboard back to the original document.
However, since there is already a copy of the record in the workarea and no
container can have two copies of the same record, the Paste operation actually
moves the old record from its previous location to the new position. If you
delete the document in the workarea and load a new document (even a new copy of
the original document) then a new set of records are created. Now the Clipboard
has the only copy of the old records and Paste copies the information into the
new document.
I am a bit suspicious of any feature that takes this much time to explain. On
the other hand, a hypertext document should be short and it doesn't make a lot
of sense to duplicate large blocks of text within such a file. There is a
strong sense that the way this Mark and Clipboard logic works is probably the
Right Way to handle this particular problem with this particular set of data.
Only by gaining experience with this technique will it become clear if this is
really the best approach.
Note that the SpHyDir specialized Clipboard, Cut, and Paste apply only to the
management of objects from the Workarea. Within the Text Edit window opened by
double-clicking a paragraph or point, the behavior of text selection, Cut,
Copy, and Paste is completely normal and operates through the normal OS/2
clipboard. Text data can be exchanged between another OS/2 program and the Text
Edit window through the ordinary cut and paste mechanisms.
ΓòÉΓòÉΓòÉ 8. The Toolbar and Document Objects ΓòÉΓòÉΓòÉ
At the top of the Workarea there are a collection of icons that represent the
Toolbar (or Template) area. These items can be dragged into the document to
create new elements.
ΓòÉΓòÉΓòÉ 8.1. The Document Tool and Object ΓòÉΓòÉΓòÉ
The Document Tool serves two functions. If it is dropped on the "whitespace"
of the workarea, away from any existing document objects (or anywhere in an
empty workarea), then this tool is treated as a request to start a new
document. It replaces the more traditional New option on the File pulldown
menu.
If there is an existing document in the Workarea, the user will be prompted
whether it is alright to abandon that document. Pressing ESC returns to the
previous document and cancels the request.
Otherwise, the Workarea is cleared and SpHyDir prompts for the name of the
file. The name is relative to the start of the document library and it should
be given with the "/" notation commonly used in Web documents. SpHyDir cannot
force the user to act rationally, but it is generally a good idea to organize
the local library with the same structure that documents have on the production
server. Thus the document http://pclt.cis.yale.edu/pclt/sphydir/status.htm
would have the name "pclt/sphydir/status.htm" on the editing machine,
corresponding to the part of the URL that follows the server machine name.
The library on the editing machine may have some upper path components that
are not visible. If HTMLLIB is set to "F:\HTTP" (or if there is no HTMLLIB
variable and F:\HTTP is the current directory when SpHyDir starts) then this
disk letter and upper directory will be inserted in front of the URL part to
form the actual OS/2 file name.
After the file name has been supplied, the user is prompted for a Title.
Whatever is entered becomes the initial caption of the Document object (used to
set the <TITLE> tag in HTML) and of the first Section object (used to set the
<H1> string). This seems to be redundant, but the two tags are used for
different purposes in HTML. The <TITLE> describes the document (say in an index
of the library). The <TITLE> text may not be displayed by some browsers, or it
appears as the title bar of the window. The <H1> text appears in big letters.
SpHyDir doesn't go one step further and add an initial paragraph object. In
some cases, the document might start with an IMG or other object instead. Drag
the appropriate tool from the toolbar and drop it on the Section tool to start
adding content.
Most of the Properties of the Document Object correspond to fields in the
<HEAD> area of the HTML. This includes LINKs to other documents, and a BASE URL
value. A few Properties are taken from HTML3/Netscape extensions to the <BODY>
tag to assign a Background pattern and to control the color of text on top of
that background.
If a GIF file is dropped on the Document Object of a file in the Workarea,
that file name is set as the value of the Background property for the document.
If the Document tool is dropped on a previous document object, then it becomes
a request to add a Subdocument link. Subdocument pointers are used to build a
larger document out of many smaller Web files. A subdocument pointer to another
file means that that file is a continuation of the current file. Any word or
phrase can be hotlinked to another Web page. Ordinary hotlinks do not imply a
relationship. Subdocument links indicate that the other file is a child
belonging to the current Web page.
There are some rules to good document construction. SpHyDir chooses not to
enforce them, but their proper use is highly recommended. Most importantly, a
file should only be a subdocument of one other file. If two different parents
try to claim the same file as a subdocument, things are going to get all messed
up.
A Subdocument object can go anywhere, but best results will be obtained if all
the Subdocument pointers are together at the end of a file.
Although it may be tempting to make everything a subdocument of the highest
level index page, this produces a structure that is awkward to handle. The
entire library should not be related pages. Restrict subdocument relationships
to files that are really part of the same paper, article, subject, tutorial, or
brochure. Use ordinary hyperlinks from the main page or library table of
contents.
Drag the Document Tool over and drop it in the file as you would create a
paragraph or point. The Subdocument definition is then completed by dragging
the WPS icon for an HTM or HTML file over and dropping it on the newly created
object. If the dropped file was previously processed by SpHyDir, the Title of
that document can be extracted from the Extended Attributes of the file and
will appear next to the Subdocument icon.
Within the current file, the Subdocument object behaves like a Paragraph whose
entire contents is the TITLE of another HTML file and where that TITLE text is
a hypertext link to the other file.
Every time SpHyDir loads a file with subdocument pointers, it checks each
identified file to see what its current Title is. Title changes appear
immediately in the caption of the Subdocument object and are written back when
the parent document is regenerated.
The Subdocument structure is also stored as Extended Attributes of the files
in the library. Each *.HTM or *.HTML file has pointers to the files that it
claims as subdocuments and to its "parent" (a file that claims it as a
subdocument). The order in which the Subdocument objects appear in the parent
establishes a Next and Previous ordering to the sudocument HTML files, which
SpHyDir maintains and can use to generate uniform Headers and Trailers.
ΓòÉΓòÉΓòÉ 8.2. The Section Tool and Object ΓòÉΓòÉΓòÉ
Every ordinary paper document is organized into Chapters, Sections, and
Subsections. That, after all, is what an Outline is all about. HTML has
provision for Headings, but it is not all that clear about what exactly a
Heading introduces.
HTML 3.0 addresses this by introducing a DIV tag. The concept is that large
documents would be broken up into segments delimited by a <DIV CLASS=CHAPTER>
or <DIV CLASS=APPENDIX> tag. However, these tags are not currently used.
SpHyDir forms sections based on the appearance of an H1..H6 tag. Eventually,
SpHyDir may automate the transition from HTML 2 to 3 and generate the DIV tags
automatically.
SpHyDir will recognize a DIV tag when it is encountered, but there is no
strong body of use to know what to do with it. The author would appreciate
E-mail if any important use of DIV is discovered on the Net.
For it to be successful, SpHyDir has to determine where the Section ends.
There is no explicit HTML marking, but it can reasonably assumed that the
Section extends until there is a new Heading tag at the same or a higher
logical level than the tag that started the Section. Lower level Headings are
assumed to start subections of the current Section object.
The Properties of a Section Object come from the attributes of the H1..H6
headings tag in the HTML 3 proposed standard. This produces a small confusion.
In all other document Objects that contain other objects, an attribute of the
container applies to all the objects that it contains. However, attributes on a
Section Object apply only to the Heading. For example, ALIGN=CENTER means that
the Heading is centered and does not apply to the paragraphs in the Section.
ΓòÉΓòÉΓòÉ 8.3. The Paragraph Tool and Object ΓòÉΓòÉΓòÉ
The Paragraph Object contains text. In HTML, "text" is more broadly defined to
contain ordinary characters, Entities, character emphasis tags (Bold, Italics,
CITE, etc), hyperlink hotwords, line breaks, and embedded images.
SpHyDir is not currently prepared to break the paragraph down into any finer
objects. So all this non-text "text" has to be encoded with special characters.
The user can doubleclick a paragraph to display all the text in the Text Edit
window. Special characters can be added to the document using the standard rule
for special keyboard entry (hold down Alt, type a number in decimal notation on
the numeric pad, and release the Alt key).
However, hotword links are only partially contained in the text. The rest of
the hotword link is a URL that can be displayed in the Link Manager Window.
Before deleting text containing a hotword, use Link Manager to delete the link
and remove the inward pointing triangles.
Properties of the Paragraph Object are mostly derived from the attributes of
the <P> tag in HTML 3.
ΓòÉΓòÉΓòÉ 8.4. The Image Tool ΓòÉΓòÉΓòÉ
The Image Tool represents a graphic insert. First, drag and drop the image
tool to the location in the document where the image is logically positioned.
Then finish the definition by dropping the WPS icon of a GIF file on top of the
Image object. Currently SpHyDir requires the GIF data type (XBM and JPG may be
added later). Since the IBM IPF system doesn't support GIF, generated IPF
source uses the same file name and an extension of BMP. GIF files can be
converted to BMP files using the GBM utilities or any of a number of other
graphics programs.
HTML authors are reminded that there are still a number of disadvantaged users
who try to surf the net using charcter mode Unix. To accomodate such people, an
Image should have alternate text that describes the content or meaning of the
image. This alternate text is entered as the value of the ALT Property. If an
image has alternate text, it is displayed as the caption to the right of the
icon for the object. Otherwise, the file name is the caption.
The Image Object was created by SpHyDir to provide an icon on which GIF files
can be dropped and from which hyperlinks can be made. HTML doesn't really seem
to regard Images as objects. Rather, HTML syntax seems to treat each image as
an unusually large text character.
The SpHyDir Object will work if the image appears all by itself or at the
beginning of a paragraph. If the image has to appear in the middle of a heading
or paragraph text, then it cannot be promoted to full object status. SpHyDir
calls such things embedded images. An embedded image is represented by a
dingbat character corresponding to the value 0x08, followed by the name of the
GIF file. Then optionally there is a blank and alternate text. The sequence
ends with a second 0x08 dingbat. Embedded images have no properties, and the
user cannot drop anything onto them. They can be part or all of a hotlink
phrase.
When an image appears at the start of a paragraph, it could be rendered using
the "inline image" syntax. However, SpHyDir extracts it from the paragraph an
builds a separate object. The value of the ALIGN property for the Image Object
determines how it will interact with the paragraph that follows it. ALIGN=NONE
(a value made up by SpHyDir) displays the Image by itself. Other values of
ALIGN position the following text to the right of the image, and in some cases
the text flows around the image border.
HTML 3.0 introduces a FIG tag to extend the functions of the current IMG tag.
Unfortunately, it is not widely supported and its use is currently not clear. A
later version of SpHyDir may provide automatic migration of current IMG syntax
to the preferred FIG syntax after it becomes a viable alternative.
ΓòÉΓòÉΓòÉ 8.5. The Ordered List Tool ΓòÉΓòÉΓòÉ
An Ordered List Tool contains a sequence of numbered points. Although points
are normally simple paragraphs, HTML allows a point to contain almost anything:
paragraphs, images, even check boxes or radio buttons.
SpHyDir 1 tried to combine the functions of Points and paragraphs. This worked
well 99% of the time. It caused trouble when the list item started with
anything that wasn't ordinary text.
SpHyDir II views lists the way that the HTML standard views them. A list
contains points. The points contain paragraphs. The paragraphs contain text.
HTML 3.0 allows a list to begin with a header. This is represented by a
<LH>text</LH> sequence before the first point. Following the recommendation of
the standard, SpHyDir takes any free text that is found in old HTML documents
outside the list points and turns it into a list heading so it is legal.
The properties of an ordered list are taken from HTML 3.0. The properties
allow a second list to resume numbers where a previous list left off. Netscape
has some nifty ideas to control format, whether items are listed as 1 2 3, A B
C, a b c, or i ii iii.
ΓòÉΓòÉΓòÉ 8.6. The Unordered List Tool ΓòÉΓòÉΓòÉ
An unordered lists contains unnumbered points, generally delimited by a
"bullet" character. Unordered lists follow the same rules as the previous
discussion of Ordered Lists.
The properties of an Unordered list allow the bullet charcter to be replaced
with another choice. The PLAIN attribute allows the bullet to be omitted
entirely. SRC allows the bullet to be generated as a GIF image.
HTML has two obsolete list formats based on the <DIR> and <MENU> tag. SpHyDir
will read such lists in, but will convert them to the preferred <UL COMPACT>
tag.
ΓòÉΓòÉΓòÉ 8.7. The Definition (Glossary) List Tool ΓòÉΓòÉΓòÉ
A Definition List defines a set of terms. Each term is followed by an indented
definition. In SpHyDir, the term is a Property of each point in the List. As
with other lists, the Point then contains paragraphs that are the definition of
the term.
With this structure, SpHyDir requires a propertly formed Definition list with
alternating pairs of <DT> term <DD> definition.
ΓòÉΓòÉΓòÉ 8.8. The Point Tool ΓòÉΓòÉΓòÉ
The icon of a hand making a "point" represents the general list item. An
Ordered, Unordered, or Definition List can contain only Points. Each point then
contains paragraphs, images, and other document content objects. In an ordered
or unordered list, the Properties of the Point object correspond to the
attributes of the <LI> tag. In a definition list, the Point object also has a
"Term" Property which is the text contained in the <DT> tag. The <DD> tag ends
the term and begins the paragraphs which are contained within the Point.
ΓòÉΓòÉΓòÉ 8.9. The Forms Tools ΓòÉΓòÉΓòÉ
The Form Tool creates an interactive area in which the Web user can enter data
to submit a request or query. A Form can include all of the previous document
objects, and data fields from the bottom row of the toolbar. To process a form,
the server must execute a program written by the form designer. This makes the
use of Forms an advanced topic that will be described in a separate section.
ΓòÉΓòÉΓòÉ 8.10. Missing Tools ΓòÉΓòÉΓòÉ
Less frequently needed Objects (and tools that have no particularly good icon
and would look ugly on the Toolbar) can be inserted by selecting an object in
the Workarea, pressing the Second Mouse Button, and choosing Insert from the
popup menu. The most frequently used Tools (Paragraph and Image for example)
can also be inserted this way. However, there are a few objects that can only
be inserted from the menu.
The Horizontal Rule Object draws a horizontal line across the screen. Its
thickness can be controlled with the SIZE property (a Netscape extension
supported by Web Explorer).
SpHyDir has a BR object. Normally a simple line break is honorary text and
appears embedded inside paragraphs. The idea of a special break was introduced
by Netscape, which needed it to clear dangling images. The <BR CLEAR=ALL> stops
flowing text to the right or left of an image and starts at the first line
clear of all images. The BR object is also useful in Forms to put a line break
between fields, boxes, buttons, and other non-text objects.
ΓòÉΓòÉΓòÉ 9. The Problem of Position ΓòÉΓòÉΓòÉ
SpHyDir would be simple if the VX-Rexx and OS/2 programming interface allowed
the user to drop tools and components in between two existing document
elements. This would clearly indicate where the new element is to go. However,
the environment requires the tools, files, and other objects to be dropped on
top of existing components. This forces SpHyDir to invent some rules about
positioning.
Sections and Lists contain things. In the Workplace, if you drop a file on the
icon of a folder, the file goes into the folder. So the normal behavior is that
if you drop anything (other than a Target) on a Section or a List, then the new
element is added inside that Section or List in front of anything already
there.
If you drop something on a Paragraph, Image, or Point then the new item goes
after the thing you dropped it on. Thus to add a new Point to the end of an
existing list, drop the Point tool on the last Point in the list. To add a new
Point to the beginning of a list, drop the Point tool on the parent List
object.
These rules seem to cover all but two cases. Lists can be nested inside other
lists. When this occurs, there is no way to add a new outer point after the end
of a nested inner list because every time you try to drop a point on the inner
list icon the new point is positioned inside the inner list instead of after it
in the outer list. Similarly, there is no way to add one section after another
because whenever you drop something on a section it goes inside it and not
behind it. So there is an extra rule that if you hold down Ctrl when dropping a
Point on a List the Point goes after the list, and if you hold down Ctrl when
dropping the Section tool on an existing Section, the new Section goes behind
the current section. This is not entirely satisfactory. A section can go on for
many screens, and it it somewhat unexpected to have to go many screens back to
the start of a section in order to drop something on the section object and add
it many screens down after the section end. I am waiting for a better idea to
come to mind.
Originally, the idea would be that Ctrl-dropping a tool placed the tool after
the thing on which you dropped it. That seemed like a good rule, but it doesn't
work with Sections because you can't put a paragraph, image, or list after a
Section. Sections don't end, you see, until a new Section begins (in HTML
terms, a section ends when a new H1...H6 header is encountered). The only thing
that you can put behind a section is another Section. Everything else that you
try to put behind the section ends up inside it anyway.
ΓòÉΓòÉΓòÉ 10. Managing Links ΓòÉΓòÉΓòÉ
Most HTML editors expect the user to type in the document referenced (the URL)
in order to create a hypertext link. Since it is easy to make a mistake,
authors are urged to test their documents thoroughly. SpHyDir provides a
simpler and more reliable method of constructing links.
ΓòÉΓòÉΓòÉ 10.1. Forming Links from the Desktop ΓòÉΓòÉΓòÉ
Links to other files in the same library can be constructed using standard
Workplace Shell behavior. Hold down Cntrl and Shift and drag the icon of
another file in the library to a paragraph or image object in the SpHyDir
Workarea. If the link is made to a paragraph object, the Hotword Selection
window opens. Use the mouse to select a word or phrase and press the OK button.
Links to remote documents should be managed with the aid of Web Explorer. The
SpHyDir philosophy holds that before you generate a link to a document, you
should be able to display it in the Browser. There are two fairly direct ways
that WE references can be use to generate SpHyDir links.
The simplest option is to use the ability of the current Web Explorer program
to generate URL objects. Such objects can be dropped on the desktop, but it is
better if they are stored as disk files. They can then be dropped on SpHyDir to
generate a link to the corresponding resource.
SpHyDir also provides an XSpO Rexx program named WE_URL.CMD. First use Web
Explorer to view the desired network resource. Then drop the WE_URL icon on a
paragraph or image in the SpHyDir Workarea. The WE_URL program locates the Web
Explorer window on the desktop, extracts the current URL from it, and passes it
on to form a link.
The XSpO library supplied with SpHyDir also has MAILTO.CMD, an example of how
to form a link that uses the Mailto URL.
ΓòÉΓòÉΓòÉ 10.2. Using the Link Manager Window ΓòÉΓòÉΓòÉ
To display the Link Manager, select it from the Window pulldown menu on the
Workarea window. The Link Manager presents two list box areas and a pair of
buttons.
The top list box shows the URLs of any links from the current document object
selected in the workarea. The number of URLs listed should correspond to the
number of pairs of inward pointing triangle dingbat characters in the text of
the paragraph. The order of the URLs in the box also corresponds to the order
of the hotword phrases in the paragraph. An Image object would have only one
link.
To delete a link, select the URL in the top list box and press the Ctrl-D key.
The URL is removed from the list, and the triangle dingbat characters will also
disappear in the text from around the previous hotword phrase. If a hotword
phrase is to be deleted, it is important to remove the link first. SpHyDir has
no way to connect hotwords to URLs except to pair them off in order when
generating HTML. If a hotword is deleted with the editor, then the following
hotword gets paired to the URL that of the deleted link, and the meanings of
subsequent hotwords are similarly shifted.
The larger Link Manager list box proposes new links from a database. Two
buttons are presented at the bottom to populate this list. The button with the
Web Explorer icon fills the box with entries from the current Web Explorer
hotlist. The Target button fills it with target labels from the current
document.
A target is a lable assigned to a section or paragraph in the middle of the
document. HTML 2.0 generates such a label with the <A NAME=xxx> tag. HTML 3.0
also supports the ID attribute on most tags, as in <P ID=xxx>. SpHyDir supports
both types of HTML, but its interface is modelled on HTML 3.0.
SpHyDir document objects have an ID Property. When it is set, ID appears in
the Properties table for the current object. Any object can be labelled by
adding the ID Property. Select the object, click with the Second Mouse button
on the whitespace of the Properties table, and select ID from the list of
properties. A dialog box appears in which a label value can be typed.
Programmers frequently assume that the labels must be short, or that they
cannot contain spaces or special characters, or that they are all uppercase.
All these things are wrong. The label can be any reasonable length, it can
contain blanks, and is case-sensitive. The label "Case" will not be matched by
a search for "case". At this time, SpHyDir cannot guarantee that this or any
other property can safely have special characters such as '<', '>', '&', or
doublequote.
If links are to be formed manually, then it is probably a good idea to keep
the name short. Type one character wrong, even in the wrong case, and the
search misses its target. However, if links are formed automatically by
selecting a target from a list, then there is no chance of a mistyping. In this
case, it makes sense to make the labels longer and more descriptive, so that
they can be identified more easily in a larger database.
When the target button at the bottom of the Link Manager is pressed, SpHyDir
first seaches the current file for all objects with an ID property. If this
document is part of a larger document tree, it then goes up through the Parent
Extended Attribute pointers to find the root document, and proceeds down
through the tree locating all the other target labels.
At this point, it is not part of the SpHyDir plan to expand the scope of
buttons in the Link Manager to other targets in the Library. Rather, XSpOs will
be developed to populate the Link Manager list with specialized targets, such
as glossary references. If anyone wishes to develop specialized XSpO routines,
the names of all targets in an HTML file are stored by SpHyDir in the
PCLT-SPHYDIR.TARGETS Extended Attribute.
ΓòÉΓòÉΓòÉ 11. The Subdocument Tree ΓòÉΓòÉΓòÉ
Most HTML editor tools operate on a single text file. However, good practice
holds that hypertext documents should be divided into a large number of small
files. Managing all these files and maintaining a consistent overall structure
then becomes a serious problem.
ΓòÉΓòÉΓòÉ 11.1. The Library ΓòÉΓòÉΓòÉ
PC Lube and Tune has developed into a library structure that seems generally
applicable. Because no one application can assume to own the entire server, the
files fall under a common starting directory. During development, this is
x:\PCLT on the author's machine. In distribution, the same structure becomes
http://pclt.cis.yale.edu/pclt/ on the server.
SpHyDir gets the local library name from the HTMLLIB environment variable. In
this case, "SET HTMLLIB=F:\PCLT" is put in CONFIG.SYS. All of the HTML and GIF
files that SpHyDir processes have to fall on this disk under this directory.
SpHyDir is then programmed to moderate between the native OS/2 file naming
conventions (with "\") and the more general file naming conventions used in
most hypertext links (with "/"). In concept, it should be possible to move the
entire structure from OS/2 to a Unix server.
Although it is possible to dump all the files in one directory, the library
becomes more managable if each major subject has its own directory. Any large
collection of related files can be collected in the same subdirectory.
ΓòÉΓòÉΓòÉ 11.2. Chapter and Verse ΓòÉΓòÉΓòÉ
It is possible for a collection of random short documents to be collected
together in some free-form association. No structure would be needed for such a
grouping. However, most collections of hypertext files actually started as a
larger paper document. The material was broken into smaller files because it is
best if each file on the Web is only a few screens long. However, the original
logical structure of chapters, sections, and subsections is still logically
present.
To accomodate this, SpHyDir supports the concept of a Subdocument. A
Subdocument is a special kind of "paragraph" object in a file. Any word in an
ordinary paragraph or point can be a hypertext link to another file. However,
such links do not establish a relationship between the file containing the link
and the file to which the link points.
A Subdocument link, however, claims that the other file is logically a part of
the file that references it. When one file claims another as a Subdocument,
then the first file is said to be the "parent" of the claimed file. A thousand
different files can have ordinary hypertext links to the same Web page, but
only one file can claim to be its parent. (This is a restriction that the user
should obey. SpHyDir is not currently in a position to enforce it).
Just as each library generally has a "front door" or "home page", so any
collection of subdocument has a starting point. The "root" document is the one
member of the group that has no parent. It points to subdocuments, and they in
turn can point to other subdocuments.
ΓòÉΓòÉΓòÉ 11.3. Objects and Attributes ΓòÉΓòÉΓòÉ
Physically, a Subdocument Object produces a paragraph whose only content is
the TITLE of the Subdocument. This TITLE is a hypertext link to the
Subdocument. In addtion, however, the Subdocument object has a structural
effect upon the parent, the named document, and other subdocuments that are
also claimed by the same parent.
Subdocuments are normally a series of chapters or sections. If the text were
printed out, they would be printed and read in order. The order in which the
Subdocument objects appear in the parent produces a Next/Previous relationship
between the subdocuments themselves. HTML 2.0 doesn't have a formal method of
expressing this relationship. HTML 3.0 will have syntax for Next and Previous
links. Until this becomes widely available, SpHyDir manages the relationships
itself.
In OS/2 a file can have Extended Attributes. The normal attributes are things
like Date and Size. Extended Attributes are maintained by the application that
creates the file. SpHyDir creates Extended Attributes for the HTML files to
manage the larger logical document structure within the subdocument tree.
One EA provides quick external access to the document TITLE without having to
read through the HTML. Another lists all the Subdocuments that the current
document claims. Another lists the parent, if any, of the current document.
Another lists all the Header text and levels of all the Sections contained
within the document.
To create a Subdocument link, first drag the "Book" tool (the first one in the
Toolbar) and drop it anywhere a paragraph or list point can go. The definition
is completed by dragging the Workplace icon of another HTML file from the
library and dropping it on the newly created object. If the dragged file was
previously generated by SpHyDir, then when it is dropped on the Subdocument
object, SpHyDir will extract is TITLE (from the EA) and display it as the
caption of the object. This title will appear in the final page on a line by
itself hypertext linked to the referenced file.
When HTML is generated for the current file, the list of Subdocument objects
in the order that they appear will be stored as an Extended Attribute of the
current file, and an Extended Attribute will be created on each of the
referenced files pointing back to the current file as the parent.
Subdocument objects are not a formal construct of HTML 2.0, but there is some
fully documented syntax that comes very close. When the Subdocument object is
converted to HTML, it is generated in one of two forms (a paragraph or a list
item):
<P><A HREF="xxx.htm" REL="Subdocument"> ...title...</A></P>
or
<LI><A HREF="xxx.htm" REL="Subdocument"> ...title...</A>
If SpHyDir processes an existing HTML document with the REL="Subdocument"
attribute it will try to convert it back to a subdocument object.
ΓòÉΓòÉΓòÉ 11.4. Next and Previous ΓòÉΓòÉΓòÉ
HEADER and TRAILER can contain variables which are replaced with current
information. Variable names are enclosed in "[" and "]" characters.
[Date] is replaced by the current date.
[Doctitle] is replaced by the TITLE of the document.
[Up] is replaced by the file that claims this as a subdocument.
[Previous] and [Next] are replaced by the files that appear before and after
this file in the Subdocument list of the Parent.
The [Up], [Next], and [Prevous] relationships don't always exist. For example,
the document at the top of the tree has no Up. The first document listed as a
Subdocument has no Previous, and the last document has no Next. To accomodate
this, any line in HEADER or TRAILER that references a non-existant variable is
entirely deleted. The idea is that you put on one line all the stuff that would
relate to a relationship, and when it doesn't exist then the entire package is
deleted.
An example HEADER might include the lines:
<P>
[<A HREF="[Up]">Up</A>]
[<A HREF="[Previous]">Previous</A>]
[<A HREF="[Next]">Next</A>]
</P>
<P><I> [Date] </I></P>
Every document gets a line containing the current date in italics. Above that
line there may be 0-3 hyperlinks depending on the number of available
relationships. If all three links are generated, then the line looks like:
[Up] [Previous] [Next]
with each word acting as a link.
ΓòÉΓòÉΓòÉ 11.5. The Document Tree Window ΓòÉΓòÉΓòÉ
The Window pulldown menu of the SpHyDir Workarea includes an option to display
the Document Tree for whatever HTML file is currently in the Workarea.
To build this window, SpHyDir checks for the Parent of the current file, and
then for the parent of the parent, until it finally reaches the Root document.
It then proceeds down through the Extended Attributes of the Root and all the
subdocuments and sub-subdocuments. For each file, the TOC Extended Attribute
lists all of the Headers in that file.
The Document Tree window displays a complete cumulative Table of Contents for
all of the files in the document tree structure. It is intended to eventually
create a TOC file and simplify the creation of references from one part of the
tree to a section in another file.
Currently, the major feature of this window is the ability, from the File
pulldown menu, to trigger SpHyDir to regenerate HTML for all of the files in
the tree. This is a convenient way to clean things up if the HEADER or TRAILER
files have been changed or when the logical order of files has been rearranged.
ΓòÉΓòÉΓòÉ 12. XSpO - External SpHyDir Rexx Code ΓòÉΓòÉΓòÉ
It is nice to have code that understands Web Explorer, but other people use
Netscape, Mosaic, or other browsers. SpHyDir can't handle every type of hotlist
file. The solution to this and other problems is an External SpHyDir Object.
These are called XSpO's (pronounced "expo") but given that they are written in
Rexx, it is acceptable to roll an "R" in front of the name and call it a
"Rexx-spo".
An XSpO is an external Rexx program that resides as a CMD file on disk. If you
click on the file with the second mouse button, open its Settings, and change
the icon, you can give it some meaninful icon. Then you put a shadow of the
file in a desktop folder, probably along with your program object for SpHyDir.
An XSpO acts something like a Tool. You drag it from its workplace folder and
drop it somewhere in the SpHyDir program. The nature of the XSpO decides where
it can be dropped. Unlike the Tools, an XSpO could be dropped on an entry area
or list.
The XSpO interface will be extended whenever an idea comes to mind. Currently,
the two supported uses of an XSpO are to fill the New Links list box in the
Link Manager window and to add a URL Link to an object in the work area. Sample
XSpO files are supplied with SpHyDir for both purposes and will be discussed
here.
SpHyDir assumes that it has an XSpO whenever the user drops a CMD file on an
object. When the object accepts XSpO, it generates a Rexx Call to the file as
an external procedure. Since the caller is running in the VX-Rexx environment,
all of the VX-Rexx functions are available to the XSpO. However, it will be
difficult to make use of them without 1) a copy of the VX-Rexx manual and 2)
some hints from me about the environment. An XSpO that uses VX-Rexx function
calls to manipulate objects is said to be "dirty." The internal implimentation
of SpHyDir may change in the future, and such files may need to be changed. An
object that supports XSpO will generally provide a convention using only
arguments, the return value, and the stack. Details may differ from object to
object. An XSpO that does not directly call VX-Rexx functions is said to be
"clean." The terms are relative, and it may be convenient to use "quick and
dirty" techniques from time to time.
When an XSpO is called, it is always passed as an argument the name of the
object on which it was dropped. There is no good way (currently) for XSpO's to
declare a type, so the XSpO itself has to make sure it has been dropped in the
right place and return without doing anything if it is called by the wrong
object. Objects that call XSpO's should ignore any null return.
When an XSpO is dropped on the New Links listbox of the Link Manager window,
it is passed no arguments other than the "New_Links" object name. The XSpO puts
new list entries on the Rexx stack. Each entry begins with a URL (no blanks are
allowed by SpHyDir in a URL) and then a Title (blanks are OK in the tile). Each
line in the Rexx queue is one list entry. Only the title will show up in the
list box, the URL is kept as user data and is presented later on when the title
is dragged to create a link. If the XSpO returns the word "CLEAR" from the
function call, then the list box is cleared and the new list becomes its only
contents. Otherwise, the new links are added in front of the existing links.
The following is the complete text of an XSpO that duplicates the existing Web
Explorer Link Manager function. This file is distributed with SpHyDir and may
be adapted to support other quicklist formats.
/* XSpO version of Web Explorer Links */
arg object
if object<>"NEW_LINKS" then return
exploreini=Value("ETC",,"OS2ENVIRONMENT")"\EXPLORE.INI"
strm_status = Stream( exploreini, "Command", "Open Read" )
if strm_status="READY:" then
do while lines(exploreini)>0
line=linein(exploreini)
if line="[quicklist]" then leave
end
do while lines(exploreini)>0
line=linein(exploreini)
parse var line "quicklist=" title
if title="" then return
url=linein(exploreini)
queue url title
end
return "CLEAR"
The Workarea also supports XSpO's, but the only function currently supported
is to add a Link to an object. Dropping this type of XSpO on a Workarea object
is simpler than adding the link to the Link Manager list and then dragging the
list item over and dropping it on the object. A supplied XSpO uses some rather
"dirty" logic to find the current URL in Web Explorer (you have to enable the
WE option that displays the URL in a box at the top of the window). Dropping
this XSpO on a paragraph, point, or image puts a link to whatever page is
currently being displayed in WE (without requiring that the page be added to
the Quicklist).
arg object
if wordpos(object, "NEW_LINKS WORKAREA")=0 then return
desktop = "?HWND1"
app = VRGet( desktop, "FirstChild" )
do while app<>""
title=VRGet(app,"Caption")
if substr(title,1,16 )="IBM WebExplorer " then
do
title=strip(substr(title,19),"B")
kid= VRGet(app,"FirstChild")
url=Searcher(kid)
if url<>"" then queue url title
if object="WORKAREA" then return "LINK"
return "ADD"
end
app = VRGet( app, "Sibling" )
end
return ""
Searcher: procedure
parse arg w
do while w <> ""
if VRGet( w, "Visible" ) = 1 then do
class = VRGet( w, "ClassName" )
caption = VRGet( w, "Caption" )
if class="WC_ENTRYFIELD" then return caption
subkid = VRGet( w, "FirstChild" )
url= searcher(subkid)
if url<>"" then return url
end
w = VRGet( w, "Sibling" )
end
return ""
ΓòÉΓòÉΓòÉ 13. Forms Support ΓòÉΓòÉΓòÉ
Modern HTML and Web Browser programs allow the user to enter data and make
selections with standard GUI Boxes, Buttons, and Lists. Collectively, these
features are know as "forms" support. There are two steps. First, the author
must design the data entry form using HTML language elements. Secondly, a
program must be written in some supported language to process the data that the
user enters.
HTML forms provide a subset of the standard GUI dialog features that will be
familiar to users of Visual Basic or other visual programming languages. The
user is presented with a set of single line and multiline text entry fields,
checkboxes, radio buttons, selection lists, and push buttons. The user makes
selections and enters data. Then a push button (or the Enter key) transmits
data to the server.
ΓòÉΓòÉΓòÉ 13.1. Forms Handling Programs ΓòÉΓòÉΓòÉ
The data entered in a form has to be passed to locally written code that runs
on the Web server machine. For a Unix machine, this program receives data
through the "CGI" protocol. CGI specifies a particular way to pass information
about the request, the remote machine, and the local server environment. Most
CGI programs are written in either C or Perl.
However, SpHyDir runs in OS/2 and is written in Rexx. IBM has a very nice Web
server package for this environment called GOSERVE. Each arriving request is
passed to a locally customized Rexx filter program running as a subthread of
the server. Whatever efficiency is lost using an interpreted language like Rexx
is gained back by using threads instead of creating a new process for each
request. Although GOSERVE provides all the necessary forms support, it doesn't
use precisely the same conventions as the CGI interface. SpHyDir will talk more
generically about a "forms processing program" while other sources would
probably call the same thing a "CGI program" without assuming that there could
be any other kind of server.
Each GUI object in the HTML form is associated with a variable name. The data
and selections are transmitted as a sequence of "name = value" pairs, where
name is the variable name associated with a field or button and value is the
data typed or the alternative selected. This sequence of name and value pair
must be processed by program that processes the request.
After the request is processed, the results are sent back to the remote user.
Normally, the format of this result is another HTML file. Frequently, the
response will also have Forms objects. The contents of the response file will
include some insertions based on the results of the previous request.
Thus a comprehensive tool to simplify Form processing has to solve three
problems:
1.
It must provide the user with an easy way to specify the GUI objects
(entry fields, buttons, check boxes, and selection lists). SpHyDir does
this by providing Toolbar of GUI objects just as Visual Basic and VX-Rexx
solve the same problem with similar toolbars.
2.
It must provide a simple way to decode the incoming "variable=value"
pairs. The Rexx language (along with some helper functions provided by
GOSERVE) makes this a trivial task, but it is not a very difficult
problem in any language. SpHyDir provides a Rexx "helper" routine named
SpHyDir_Decode in the SPHYHLPR.VRS file that provides this service.
3.
It must provide a way to insert data into the reply sent back to the
user.
Some existing programs generate the entire response with program
statements:
SAMPprintf("<TITLE>Response to Your Request</TITLE>\n");/SAMP
This is tedious, difficult to read, and impossible to validate.
A second approach scans an HTML file and inserts data:
SAMP<TITLE>
%insert TITLETEXT
</TITLE>/SAMP
This is slow because it requires a syntax scan during every reply.
SpHyDir provides (IMHO) a better solution. The programmer uses SpHyDir
to create a ordinary HTML file with text and forms objects. As SpHyDir
generates the HTML, it separately tabulates the location of strings or
insertion points that correspond to the various forms variable names. If
the file is fetched as a *.HTM file, then everything goes out as it was
designed. However, if a forms processing program wants to send the file
back as a reply to a previous query, then it can call a helper routine
(SpHyDir_Reply in the supplied Rexx-GOSERVE example) that extracts from
the program the current value of all variables whose names correspond to
the variable names assigned to the forms objects in the HTML file. These
current values from the program are inserted into the file as it is sent
back to the user and populate the fields, boxes, buttons, and lists that
are available for the next reply.
Rexx is a particularly attractive language in which to do this kind of
programming because access to its variable names and symbol table is simple
and flexible. The combination of SpHyDir, Web Explorer, GOSERVE, and
Rexx-based Forms processing programs provides a simple but powerful Web
development environment. However, local requirements will soon make it
necessary to extend this development environment to real CGI programs running
on Windows NT or Unix servers.
ΓòÉΓòÉΓòÉ 13.2. Forms are poorly Form-matted ΓòÉΓòÉΓòÉ
The ambiguities of HTML that cause problems for SpHyDir in normal text are
made worse when Forms are processed. Consider a simple example:
The top line is a simple entry area for typed characters. The second line
presents three alternatives using the "radio button" metaphor (only one can be
selected, and choosing one deselects the others). The last line is a check box
that can be set or cleared by clicking it.
In visual programming languages, such as Visual Basic, each radio button or
check box has a "caption" defining the text that follows the box or button and
describes the option. In this example, the captions are "HTTP", "Gopher",
"FTP", and "BINARY". Occasionally, but less frequently, a Text Entry object
would also have a caption (in this case "Identify a Server Machine:"). In any
case, the Caption is an attribute of the object and is part of the object
definition.
However, in HTML a box or button object is just the box or button itself. Any
caption text is just ordinary "paragraph" text. There is no limit on its size,
contents, or structure. Just as SpHyDir had to invent a chapter and section
structure by looking at Heading tags, it must also construct GUI programming
objects by assuming that the captions are reasonable and obvious.
All GUI objects (entry areas, buttons, boxes, and selection lists) must be
inside a FORM area. However, the form can also contain ordinary text, images,
ordered and unordered lists, sections, and everything else that is valid in a
document. Unlike a paper form, where the instructions are usually separate so
that the input can be easily processed, an HTML form can have the input widely
scattered through the text. When the form is submitted, only the values of the
entry fields and the selections made by the user are transmitted, not the
captions and text.
A user will become confused, however, if each Radio Button option is
accompanied with three screens full of explanation. The relationship between
the buttons would be lost. Therefore, it is probably best if each field or
button has a short clean caption. Furthermore, based on a universal GUI
practice, the caption of a data entry area would ususally come in front of the
entry field (as the example "Identify a Server Machine:"), while the label of a
check box or radio button comes right after it.
In normal text, most of the SpHyDir objects start a new line. This is not true
of Form Objects. If a browser can fit the next button on the same line, it will
do so. The only way to be sure that there is a line break is to create a
paragraph (<P>) tag.
In normal text, every SpHyDir object is "paragraph sized" or larger. SpHyDir
knows to create a line break when paragraphs, ordered lists, and headers are
encountered. But several forms objects may have to go on the same line. One
idea would be to create a higher "grouping" object to which they might all
belong, but SpHyDir is based on the principle that format should follow from
document structure, and it seems wrong to create artificial structure to
duplicate a format feature.
It has always been possible to create an empty paragraph. Simply drag the
Paragraph tool to the document to create a new paragraph, then type nothing in
it. When the HTML is generated, this creates a line of the form:
<P> </P>
in the output. The problem is that SpHyDir ignores empty paragraphs when
reading in normal text, so this structural element is lost when the document is
re-edited. SpHyDir relaxes this rule, and will preserve empty paragraphs when
they are encountered inside a Form structure. A form designer should drop an
empty paragraph object between any two buttons, fields, or boxes that are
supposed to appear on different lines.
ΓòÉΓòÉΓòÉ 13.3. Form Tools ΓòÉΓòÉΓòÉ
The Toolbar contains template objects for all the GUI elements that HTML
supports. If this document is viewed using a Web Browser, examples of the Forms
objects will appear in the document. They are not connected to any processing
program at this time. Attempting to submit anything from these form objects
will return an error message. Just go back to the document and continue. Forms
examples will not appear in the INF version of this material, because forms are
not supported in IPF.
ΓòÉΓòÉΓòÉ 13.3.1. The Forms Tool ΓòÉΓòÉΓòÉ
Interactive form elements are valid only within a section of a document maked
as a Form. The Form Tool creates such a section. Drag the Form Tool over and
drop it anywhere in a document except within another Form. This creates a new
level in the document tree. All other form objects, and all ordinary document
objects, are valid within a Form section.
Each form must be associated with the name of a program that the server will
run to process the data from the form. When the form object is created or
selected, an entry area becomes visible into which a program identifier can be
typed. The exact format for program identifiers depends on the type of server
being used. On a Unix server, this is usually the name of a program in the
"cgi-bin" subdirectory, as in "/cgi-bin/program". On other systems, this may be
any program name.
ΓòÉΓòÉΓòÉ 13.3.2. The Single Line Text Entry Field ΓòÉΓòÉΓòÉ
The Entry Field Tool creates a "single line" text entry area. This is the type
of field that would be used to read simple data like a name, phone number,
E-Mail address, or book title.
The caption for the Entry Field Object is treated like paragraph text. When
the field is created, or when the object is double-clicked, the standard Text
Edit window opens. Although the user can type an arbitrary amount of text into
the window, the caption should generally be short. When the object is closed,
it is the caption and not the default field contents that appears next to the
Entry Field Object in the SpHyDir Workarea.
An Entry Field object has attributes. When the object is created or is
selected by clicking with the first mouse button, a set of fields becomes
visible in the upper right section of the Workarea. Yes, these are also "entry
fields", but they are part of the VX-Rexx application and not the HTML Forms.
Many of these attributes are common or similar across all the Forms objects.
For each type of object, the appropriate set of fields becomes visible.
The first (top) attribute is a variable name. When the form is submitted, the
text entered into the field will be transmitted as the value of a "name=value"
sequence. For example, entering "Yale University" into a field with this
definition would transmit the sequence:
SAMPsampentry=Yale University/SAMP
to the Web Server. This value, along with any other values from other fields in
the form, will be passed to the program designated by the FORM object to handle
the data.
In many cases, the Text Entry field will be initially empty and the user will
be expected to type a value in. HTML allows an initial value to be transmitted
from the server. This string will appear in the Text Entry field and will be
transmitted back as its value if the user doesn't change it. A static default
value can be entered in the second (long middle) field.
A default value can also be generated dynamically from a previous Web Server
program that requested transmission of the current page. To allow this, SpHyDir
creates a "symbol table" external but connected to the HTML source for the
page. This table is attached as an Extended Attribute of the file in the OS/2
or NT file system, and is stored less elegantly as a separate file in Unix. For
this field, the table would contain a line of the form:
ENTRY SAMPENTRY nnnn 18
Where "ENTRY" is the type of forms object, "SAMPENTRY" is the name of the
variable associated with the field, "nnnn" will be replaced with the byte
offset in the field of the default value (in this example, the offset of the
"S" in "Sample Entry Field"), and 18 is the length of the static default value.
An Entry field generates HTML text of the form:
<INPUT TYPE="TEXT" NAME="SAMPENTRY" VALUE="Sample Entry Field" SIZE="30"
MAXLENGTH="30">
If no static default text is provided, a VALUE="" is generated to simplify the
insertion of a dynamic default text from the symbol table. SpHyDir helper
routines simplify the insertion of dynamic default text from forms processing
programs.
The last attributes of an Entry Field include a checkbox to declare that this
is a Password field (so the data typed in should be masked out) and two length
fields. The first length specifies the size of the box, the second field is the
maximum amount of data that can be typed into the box. If the maximum amount is
larger than the size of the box, or is omitted all together, then when the user
gets to the end of the box the previous characters shift left to make room.
ΓòÉΓòÉΓòÉ 13.3.3. The Multiline Entry Field ΓòÉΓòÉΓòÉ
A Multiline Entry (MLE) Object generates an area with scroll bars into which
the user can type an arbitrary amount of text. This is ususally used for
freeform feedback (to send comments, suggestions, or complaints to the author).
It can also be used to annotate information.
An MLE is a large object, so it has no formal caption. If you want to describe
it, do so in the paragraph that preceeds or follows it. The contents of the MLE
object, which can be edited by double clicking the object and opening the Text
Edit window, is the static data that will appear as a default within the MLE
window when it is displayed on the remote screen.
An MLE field in a Web Browser will not support font changes or hypertext
links. SpHyDir may eventually get around to disabling these options in the Text
Edit window. Meanwhile, when editing default text for an MLE, don't use
italics, bold, or any of the other format tags.
An MLE is associated with a variable name. When the form is submitted, the new
content of the MLE will be assigned as a value to that variable name. SpHyDir
creates a entry in the Variables Extended Attribute with the type of "MLE", the
name of the variable, the location of the start of the default text, and the
length of the static default text. This can be used by the helper routine to
insert an alternate string dynamically into the form as it is being
transmitted. The content of such a string would be whatever HTML declares to be
valid between the <TEXTAREA> and </TEXTAREA> tags.
An MLE also has a size specified as rows and columns. They appear in the two
lower numeric boxes and can be changed to fit the application needs.
ΓòÉΓòÉΓòÉ 13.3.4. The Checkbox Tool ΓòÉΓòÉΓòÉ
The Checkbox Tool creates a standard GUI Checkbox object. A caption follows
the Checkbox to describe the option. The caption is regarded as the "contents"
of the object and may be edited by double-clicking the checkbox object to open
the Text Edit window. Unlike the MLE, the checkbox caption is ordinary text and
may contain emphasis (bold, italics) or hypertext links.
The Checkbox is associated with a variable name. When the checkbox is seleted,
a "name=ON" pair is returned. A static default value can be set by clicking the
"Checked" option when the checkbox object is currently selected.
A Checkbox has a variable name. It can also be statically assigned an initial
value by checking the "Checked" checkbox for the Checkbox object. [This is
about the fourth pass through this document, and it just gets worse as it gets
more precise.]
There are different ways to express the value of a Checkbox variable. As a
number it would be 0 or 1. In other contexts it might be "YES" and "NO" or
"TRUE" and "FALSE". In HTML, the checkbox is turned on by adding the keyword
"CHECKED" to the tag that defines it:
SAMP<INPUT TYPE="CHECKBOX" NAME="NOMAYO" CHECKED>/SAMP
However, when the user submits the form and the box is checked, the variable
name is returned with the value "ON" as in:
SAMPNOMAYO=ON/SAMP
Clearly this is a muddy area and may be subject to further refinement.
When SpHyDir generates the Variables EA for this field, the entry will have
the form:
CHECKBOX NOMAYO nnnn 7
Where the type is CHECKBOX, the variable name is NOMAYO, nnnn is the byte
offset in the file of the blank following the variable name, and the length is
either 0 or 7 since the word "CHECKED" has seven letters and is either present
or omitted.
ΓòÉΓòÉΓòÉ 13.3.5. The Radio Button Tool ΓòÉΓòÉΓòÉ
The RadioButton Tool is used to specifiy one of a set of mutually exclusive
alternatives. Only one can be selected, and selecting that option automatically
turns off the other alternatives.
The Web server is:
The caption of the RadioButton, which can be edited by doubleclicking the
object to open the Text Edit Window, is ordinary text and may have emphasis and
hyperlinks. However, if the captions are large enough so that the alternatives
cannot all fit on the same line, the user must provide additional HTML markup
(such as the <HR> tag) to group related buttons together.
When a RadioButton Object is created or selected, three fields become visible
at the top of the Workarea. The first field provides the variable name for this
button (and implicitly all other buttons that are part of the same grouping).
The second field contains a string that will be assigned to the variable when
this particular button is selected. Under these fields, a Checkbox allows this
particular button to be selected as the default for the group. To be
meaningful, only one button in each group can be checked as the default.
In Visual Basic and VX-Rexx, radio buttons have to be collected in a Group Box
to be related to each other. In HTML forms, radio buttons are related by having
the same variable name. The value assigned to that variable name distinguishes
one button from another.
Radio Buttons pose a problem for the symbol table in the Extended Attribute.
Up to this point, every HTML object produced one entry with its own variable
name, and there was one insertion point for the value of that variable.
However, each Radio Button has a tag location, and to override a static default
with dynamic information from a program, the "CHECKED" attribute in all of the
tags has to be manipulated. So for every radio button, the Variables EA gets a
separate entry:
SAMPRADIOBUT SERVER=UNIX nnnn 0
RADIOBUT SERVER=OS2 nnnn 0
RADIOBUT SERVER=NT nnnn 0/SAMP
The "nnnn" in each line is the offset in the file of the blank that follows the
name and either preceeds ">" (if the length is 0) or "CHECKED>" (if the length
is 7). An acceptable strategy is to process these entries in order, checking
the current value of the program's "SERVER" variable against the possible
matching strings "UNIX", "OS2", and "NT". If a match is made, then "CHECKED" is
inserted into the HTML file, if not and the length is 7 then the old "CHECKED"
string is removed.
ΓòÉΓòÉΓòÉ 13.3.6. The Spin and Listbox Objects ΓòÉΓòÉΓòÉ
A Spin field displays a sequence of alternatives within a single window. CUA
rules suggest that the Spin choice is appropriate when the alternatives are
ordered, but the Spin object also allows a small number of alternatives to be
meaningfully displayed in a small space. In HTML terms, a Spin object
corresponds to a SELECT tag with no SIZE parameter.
Get a dozen eggs:
An interesting feature here is that Web Explorer seems to mess up the order
and selection rules. It defaults to the last alternative chosen, when the
standard clearly says that the first is the default, and it seems to get
"bigger" and "smaller" reversed.
A Listbox provides another way to display alternatives. It is probably more
suitable if the number of options is large. This Object is also a SELECT list,
but with the SIZE parameter specified.
For both selection objects, a static list of alternatives can be entered
through the Text Edit window by doubleclicking the object. Each alternative is
typed on a separate line. Press Enter between alternatives. Do not use
character emphasis or try to assign links to the alternatives.
List alternatives can be assigned dynamically by creating an array of
character strings. For example, in Rexx a set of alternatives might be
specified by the sequence:
account.0=3
account.1="Checking"
account.2="Savings"
account.3="Money Market"
If the user chose the second option, this would then feed back as the string
"account=Savings" which the Rexx helper routines would use to assign the string
"Savings" to the variable ACCOUNT in the next program. [A note to those who are
not Rexx wizards, the scalar variable ACCOUNT is completely independent of the
"stem" ACCOUNT. (with the trailing period). This strategy uses the stem to hold
the list of alternatives, and uses the scalar to designate which alternative
was selected.]
ΓòÉΓòÉΓòÉ 13.3.7. Pushbuttons ΓòÉΓòÉΓòÉ
After filling in the required fields, the user triggers an action on the
server by pressing a Pushbutton. If no Pushbutton object appears in the form,
pressing the Enter key may also transmit data.
A default Pushbutton with no options is labelled "SUBMIT". It will trigger
transmission of the data, but will add nothing to the datastream itself.
Multiple "SUBMIT" buttons would be indistinguishable from each other.
Each Pushbutton has attributes:
The left entry box is the name of a variable. The right box is both the value
assigned to the variable when the button is pushed and also the label placed on
the face of the button.
When an explicit variable name is assigned to a Pushbutton object, an entry is
also made in the Variable Extended Attribute. It identifies a type of
"PUSHBUT", the variable name, the offset of the static value string, and its
length. If the helper functions are used, they will check for a variable of the
same name in the calling program and will substitute its current value in the
Pushbutton definition. This means that the caption of the Pushbutton can be
dynamically changed by the calling program.
A special version of the Pushbutton control is established if the Hidden
attribute is checked when the button object is selected. A Hidden field doesn't
appear on the user's screen, but it is passed back as part of the data stream
to the next program. This can be used to pass a handle, transaction ID, or
other state information from one screen to the next.
ΓòÉΓòÉΓòÉ 14. Bugs and Restrictions ΓòÉΓòÉΓòÉ
VX-Rexx 2.1B has a bug when moving a tree of records in a container. Suppose,
for example, you decide to move one section in front of another. You can click
on the sections to collapse the tree so that just the two icons are showing.
You can then drag the second icon in front of the first. However, when you
re-expand the tree, you will see that elements two or three levels down in the
tree have been incorrectly reorganized. For now, the safe way to move large
sections of the document is to mark them with Alt-L and move them through the
SpHyDir special "Clipboard" window.
Web Explorer creates unusual objects that cannot be directly dropped on the
SpHyDir windows. To process a document, drag the document from the WE window to
a folder in the HTML library on the current machine. SpHyDir can only process
files that are in the library. Drop a URL object on the desktop or in a folder
first, then use it to build a link.
ΓòÉΓòÉΓòÉ 15. Supported and Unsupported HTML ΓòÉΓòÉΓòÉ
SpHyDir II was restructured to simplify extensions. Most of the HTML 3.0 and
Netscape tags and attributes are now supported, or will be shortly.
SpHyDir generates LINK tags for Subdocument relationships (Next, Previous,
Up). It preserves, as properties of the Document Object, LINKs mentioned in the
current HTML 3.0 draft (Home, TOC, Index, Glossary, Copyright, Help, and
Bookmark). It plans to support Header and Trailer links for document specific
boilerplate files. Other LINK tags are not preserved.
The SpHyDir objects have a place for every valid construction, but they may
not support constructions that are invalid, even when frequenlty used. If there
is a reasonable strategy, current incorrect markup may be "upgraded" to valid
status. For example, lists may not contain any data outside the list items:
<OL>Text here is illegal, but there is a lot of it in practice.
<LI>This is an implied paragraph
<LI><P>This is an explicit paragraph</P></LI>
Text here is nominally illegal.
</OL>
SpHyDir will take one text string outside the points and "upgrade" it to the
HTML 3.0 List Header <LH> contents. In other places, loose text may be upgraded
to a <CAPTION> or <CREDIT>. However, when there is no place to put it, the text
may get lost.
SpHyDir needs where possible to convert HTML constructs to the properties of
an Object. A particular problem is created by hypertext labels generated by <A
NAME=xxx>word</A>. Since SpHyDir cannot manage properties for individual words,
it assigns the NAME to the ID property of the Paragraph, Section, or other
object in which the labelled word appears. It is the intention of SpHyDir to
migrate this to the preferred <P ID=xxx> syntax of HTML 3.0 as soon as that
syntax is universally supported. For now, SpHyDir rewrites the HTML by applying
the <A NAME=xxx> tag to the entire text content of the Paragraph or Header in
which it previously appeared. SpHyDir does not support two <A NAME=xxx> labels
within the same paragraph or header.
In many visual programming languages, buttons and boxes have a caption. This
is not an HTML concept. SpHyDir follows the more common practice to simplify
use. In HTML, a check box is just the box:
<INPUT TYPE="CHECKBOX" NAME="BIN"> BINARY
Syntatically, the last word "BINARY" is outsize the tag. It is just text. If
SpHyDir didn't make any stuctural assumptions, it would just appear as ordinary
paragraph text. However, SpHyDir depends on creating "objects" that are bigger
than just a "[]" or "O". So the Entry field, Checkbox, and Radiobutton forms
object include text that functions as the "caption" of the object.
ΓòÉΓòÉΓòÉ 16. Character Sets ΓòÉΓòÉΓòÉ
Character set issues are ususally overlooked in the US. However, a World Wide
Web has to confront the problem of displaying information in languages other
than English. This is a fairly difficult problem that must be approached
carefully.
The most complete solution would be Unicode, a two-byte character set that
includes every modern language in the world. This may prove important in the
future, but its use today is premature. A more modest solution is to use the
ISO "8859" family of one-byte character sets. In particular, the ISO 8859-1
"Latin 1" character set supports all the Western European languages from
Iceland, to the Nordic countries, to Italy.
There is little perspective in Connecticut about how people overseas actually
configure their personal computers. The screen is a more powerful device and
can support many different character sets. The keyboard is more constrained.
Through the years there have been many different approaches to the keyboard
entry of foreign language character sets. If SpHyDir is going to provide an
easy to use editing environment, the data entry is an important part of the
problem.
Without any user input, SpHyDir now caves in to the OS/2 System design. It
embraces the IBM architecture of Code Pages. The assumption is that IBM sells
hardware and software overseas and if it insists on pushing an architecture
like Code Pages, then that must be how people are actually using the system. A
few terms need to be defined:
character set
A character set is a collection of characters that completely
address a particular need. For example, the upper and lower case
alphabet is a character set that can be used to express all the
common names of people in the US (since names like "Sally2" and
"Bi$$" don't occur). The minimal useful computer character set are
the 94 characters in the ASCII set (although for many purposes you
can get along without ~ ` { } or ^. Extensions to this character set
exist to support particular foreign languages or special purposes
(math, APL).
font
A font is a set of instructions for drawing each character in a
character set on a screen or printer. The system normally uses a
small set of bitmap fonts to display characters of normal size.
Algorithmic fonts such as Microsoft's TrueType or Adobe's ATM fonts
can be displayed in any size.
code
A standard that assigns number values to every character in a
character set, allowing those characters to be stored in a computer
memory, on disk, or to be transmitted on a communications line.
ASCII and EBCDIC are examples of codes. A code always has some
control characters to represent the end of a line, a backspace, a
tab, and other functions. In ASCII, the control character values are
from 0 to 31 and in EBCDIC they are from 0 to 63.
code page
A Code Page is (essentially) a character code in which all the
control values have been removed and replaced with addtional
printable characters. Code Page is mostly an IBM term, though it has
rubbed off on Microsoft. It allows a display or printer to have some
additional special use characters that can be displayed in contexts
where the normal functions of control characters are not needed.
When IBM designed the PC in 1980 there were no general international
standards for character sets and code pages beyond the standard ASCII set. The
PC created a Code Page by filling in the remaining 256-94=162 code locations
with a haphazard collection of box drawing, international, and dingbat (club,
face, "small house") characters. Years later this was designated in IBM terms
as Code Page 437.
Later on during the 1980's, the Internation Standards Organization (ISO)
finally developed a set of one-byte character sets that extended the ASCII
standard to other character sets.
8859-1 covers Western Europe
8859-2 covers "Latin" Eastern Europe
8859-5 Cyrillic
8859-6 Arabic
8859-7 Greek
8859-8 Hebrew
8859-9 like 8859-1 but drop Iceland and add Turkey
The HTML 2.0 standard makes 8859-1 the default encoding for HTML documents.
However, the HTTP and MIME standards allow a document to be encoded in any of
the ISO 8859 family of code sets. It would be a mistake for SpHyDir to drop
its USA-centered perspective only to adopt a slightly broader 8859-1 Western
European perspective.
The "Latin 1" character set on which the 8859-1 code is based includes some
characters which were not part of the IBM PC Code Page 437. Most of the
vendors (Microsoft with Windows and NT, Adobe with PostScript and ATM, DEC)
simply adopted 8859-1 as their standard code. IBM decided that it was too
important to leave the basic box-drawing characters in their current location.
Instead, they created Code Page 850, which includes all the Latin 1 characters
but does not assign them to their 8859-1 code values.
The OS/2 Presentation Manager has a dummy Code Page 1004 that reflects the
ISO 8859-1 character values. However, this is not recognized as a "real" Code
Page number by most of the commands and OS/2 services that deal with such
things.
Before beating up on IBM, it should be noted that the ISO 8859-1 standard may
not be quite as useful as it first appears. While it is fairly simple to
display 256 different characters (or more) on a computer screen or printer, it
is very difficult to squeeze all those characters on the keyboard. Any
one-byte code page will have too many characters for easy keyboard input, but
not enough characters to handle the total information system requirement.
Long before modern computers and laser printers made a complete 8-bit code
set possible, foreign countries had adopted variations on the old 7-bit
"ASCII" character set. The idea was to give up a character you don't need for
one that is more important in your country. The characters ` ~ ! @ # $ % ^ { }
[ ] \ | < > could be replaced with Т or ш. These substitutions created other
Code Pages in which the foreign use characters have been placed in the
familiar ASCII location, and the ASCII charcters that they displaced have been
put somewhere else.
There is also the problem that in any large publishing project, the character
sets quickly expand beyond any 256 character subset. Beyond French and
Spanish, there are Hebrew, Arabic, Cyrillic, Greek, and then the problem of
mathematical symbols, special punctuation, and the stupid box drawing
characters that caused all the trouble in the first place. Some of this you
can handle with GIF files, but the rest pose a problem.
HTML and current World Wide Web practice address this issue with Entities.
The characters that are not part of the standard ASCII set are referenced by
name. An Entity reference to a character begins with "&", then contains the
character name, and ends with ";". The special character used in HTML syntax
are converted to Entities, with < > and & referenced as < > and &
respectively. The ╨ó symbol is denoted Æ (short for "A-E ligature").
Going back to the earlier analysis, the Entity name allows HTML to refer to a
character in a character set without becoming dependent on any particular code
mapping. While a code mapping would limit you to 256 characters, the range of
possible names is unlimited. Entities also allow you to accomodate Code Pages
that either reflect historical accident (the original PC Code Page 437) or
National Use subsets.
The CODPAGE statement in the OS/2 CONFIG.SYS dataset specifies first the
default Code Page number, and then an alternate value. IBM normally makes 437
the default to support obsolete DOS utilities. In modern use, particularly
when someone edits HTML files, it makes more sense to at least make 850 the
default:
CODEPAGE=850,437
For more information, look up CODEPAGE in the Command Reference file in the
Information folder.
SpHyDir does not change the current Code Page. The whole idea behind the
current SpHyDir strategy is that whatever Code Page the user has currently
selected must be familiar. The user must already know how to deal with it and
how to comfortably enter data in the local language. So SpHyDir converts HTML
use to the Code Page environment rather than trying to change OS/2 to some
other character set.
The number of the current code page is used as a file extension. SpHyDir
searches the root directory of the HTML library (determined from the HTMLLIB
environment variable or the current directory when SpHyDir starts up). It
looks for three files: ENTITES.xxx, CHARIN.xxx, and CHAROUT.xxx where xxx is
the Code Page number. SpHyDir is distributed with *.850 versions of these
three files for the recommended Code Page 850.
The ENTITIES file is an ordindary text file with entries to map the Entity
names to values in the current code page. For example, the ENTITIES.850 begins
with the lines:
CODE
b5 Aacute Á Capital A, acute accent
b7 Agrave À Capital A, grave accent
b6 Acirc  Capital A, circumflex accent
/CODE
Only the first two items are significant. On the first line, "b5" is the hex
representation of the value assigned to the character in the 850 code page and
"Aacute" is the name of the entity (with the leading "&" and trailing ";"
stripped off). The rest of the line is commentary.
SpHyDir has a builtin knowlege of the < > and & Entity names. These
are also the only Entities that can be mapped to a code value below 80 hex.
All the other Entity names that SpHyDir will process specially come from the
ENTITIES.xxx file. However, if SpHyDir encounters an Entity name that is not
defined in the file, it simply converts the "&" to a Smiley Face dingbat
character and retains it in its Entity form in the Workarea and Text Edit
windows. Later on the Similey Face is turned back to "&" when the HTML is
generated.
The CHARIN.xxx and CHAROUT.xxx files provide translate tables to handle files
encoded in the ISO 8859-1 character set. The CHARIN table translates
characters from the HTML file with code values from A0 to FF hex to the
corresponding codes in the current Code Page. The CHAROUT file provides a
table to translate Code Page characters with a hex value of 80 to FF to the
external ISO 8859-1 set.
SpHyDir provides CHARIN.850 and CHAROUT.850. Since the 850 Code Page contains
the Entire Latin 1 character set, this appears to be a fairly reasonable
arrangement. The user is free to create a CHARIN.437 to support the older PC
character set, but since it does not contain all the characters in the Latin 1
alphabet some characters may be lost on input. Also, the CHAROUT table cannot
meaningfully translate PC dingbat characters ( like the box drawing character)
that are not part of the 8859-1 set.
Assuming that the user adopts this suggestion to make 850 the default Code
Page:
If a CHARIN.850 file has been copied to the root directory of the HTML
library, then immediately after reading in an HTML file somewhere in that
library, SpHyDir uses the table in that file to translate any ISO 8859-1
extended code values to their corresponding Code Page values. Note that
this simply shuffles one set of code above hex 80 to another set of codes
also above hex 80. Since all the HTML markup and entity names use
standard ASCII characters below hex 80, the initial translation will not
effect any of the subsequent syntax analysis.
If there is no CHARIN table, then any code value in the HTML file will
be read into SpHyDir untranslated. It will display with whatever
character the current Code Page assigns to that code value. However,
without a CHAROUT table it will also be written back to HTML with its
original code value. SpHyDir will not provide any help displaying or
editing such characters, but it will not damage them if the user leaves
them undisturbed.
When processing text, SpHyDir identifies an Entity from the leading "&".
It will handle the <, >, and & Entities automatically. Without an
ENTITIES.850 file in the root directory of the HTML library, those are
the only Entities that it knows about. With an ENTITIES file, it will
look up any other Entity names that the file defines and replace the
Entity reference with the code value in the current Code Page for the
corresponding character. The character will then display normally in the
Workplace document tree and in the Text Edit Window.
Any Entity name that is not matched against the file remains an Entity.
Since SpHyDir wants the "&" character to edit normally, and since a lot
of dingbat characters are available, the "&" introducer is replaced by
the Smiley Face character whose code value (in all Code Pages) is 01.
When it goes to generate HTML, SpHyDir converts the Smiley Face dingbat
back to an "&". Thus anything that displays as a SmileyFace Entity in the
Text Edit window will become a regular HTML Entity in the final file. It
will not be translated to anything else.
Any text in a Paragraph or Header that contains extended code values
(above hex 80) will be checked against the table build from the
ENTITIES.xxx table. If a match is found, the character is replaced with
an Entity reference to the character name.
Any extended code values that do not match Entity names will be
translated by the CHAROUT.xxx table should it exist. These characters
will remain as single byte codes. However, if they are proper Latin 1
characters then they should be assigned their 8859-1 values and should
display properly with a browser.
If there is no CHAROUT table, any character in SpHyDir memory will be
written to the HTML file without translation. If this happens to be a
valid 8859-1 character, then it will display on most browsers.
Although 850 is the recommended International Code Page, many users may
prefer other OS/2 Code Pages tailored to a particular country. It is trivial
to generate another ENTITIES.xxx table file. Generating the CHARIN and CHAROUT
tables are a bit more difficult, but the existing tables were generated with a
C program and, given a bit more time, it may be possible for SpHyDir to
provide these tables for other defined numbers:
852 Latin 2 (Czechoslovakia, Hungary, Poland)
857 Turkish
860 Portuguese
861 Iceland
863 Canada (French-speaking)
865 Nordic
This will remain an exercise unless some real user out on the Web reports that
they use one of these Code Pages and would like them to be supported.
It is not clear if more is needed to support the right-to-left characters.
For that matter, it is not clear if there is any Web Support for:
862 Hebrew-speaking
864 Arabic-speaking
Again, input from users would be helpful.
I have not studied the scope of the National Use Code Pages. They may not
include all of the Latin 1 characters. Lacking official Entity names, and any
usable Web standards, and any support from Browsers such as Netscape, it seems
premature for PCLT to try to solve this problem all by itself. This is,
however, an area where Entity notation has a substantial advantage over
CHARIN/CHAROUT single character translation. A user with an Icelandic keyboard
can still generate the occasional Turkish character as a named Entity even if
that character cannot be natively displayed in the Text Edit Window. This is
the reasoning behind the SpHyDir bias to generate output as Entity notation
instead of as single byte 8859-1 encoding.
If this isn't exactly what you want, please E-mail Howard.Gilbert@yale.edu
with additional suggestions.