19 Jun 1995 - Preliminary Information

SpHyDir is not for Everyone

HTML marks up documents so that they look good. SpHyDir assumes that the markup corresponds to valid document structure. Some things display nicely but are impossible to structure. For example, because <H6> produces very tiny text, it is sometimes used to get "fine print":
<H2>Lease a new car for $200 a month<H2>
<H6>engine not included<H6>
SpHyDir requires that all H1..H6 tags be used to start sections. Also SpHyDir doesn't preserve the heading numbers, just their relative position compared to each other. In the previous example, SpHyDir would change the H6 to an H3 because that is the next number down from H2.

A large number of Web documents have invalid HTML. They display as intended because the browsers don't complain about errors that do not effect formatting. For example, when someone wants to print in big letter, they frequenly use heading tags:
<H1>Get Rich Quick<P>Act Now<P>Limited Time Offer</H1>
<P> tags are not permitted inside a header, but most browsers tolerate this construction, using H1 to change font and /H1 to revert back to normal size. SpHyDir expects Headings to be a simple character string as the standard specifies. Paragraphs are other types of objects, and headings cannot contain objects.

SpHyDir II attempts to include almost all the valid syntax in HTML 3.0 and Netscape. The Math support will be omitted for a very long time. The FIG structure will be supported when it is more widely used by browsers. Netscape extensions will not be supported when they seem to directly overlap more appropriate HTML 3.0 constructs.

HTML goes through revisions. Old constructions that have been replaced are called "deprecated" in the standard. An even tighter reading of the standard is called "recommended." SpHyDir reads the HTML in, understands it, and then generates new HTML based on the structure. It can automatically upgrade old "deprecated" files to "recommended". For example, it will automatically convert <MENU> and <DIR> to <UL> and will convert <XMP> and <LISTING> to <PRE>. If you want to keep the old stuff as is, then SpHyDir is not the right choice.

There are some constructions that the HTML standard permits, but maybe only because the DTD language in which the standard is written cannot express certain rules well. SpHyDir requires that a Definition List have sequences of one term (DT) and one definition (DD). The Definition can have multiple paragraphs. The sequence:
<DT>canned
<DD>packaged in a can
<DD>fired from a job
appears to be techically valid. It even has a certain obvious meaning (one term with two definitions). The HTML DTD standard says that a <DL> tag can only have <DT> or <DD> contents, but it doesn't specify how many or in what order. Some very bad HTML uses <DL><DD> <DD> </DL> to get a certain level of indentation. If you like this sort of thing, find another editor.

SpHyDir "understands" tag names and attributes. The name is the part of the tag that follows "<" and the attributes follow the name as either a keyword or keyword, equals sign, and a value. If SpHyDir doesn't explicitly support the tag name, it copies the tag as ordinary text. If it understands the tag name but not the attribute, it discards the attribute.

HTML 3.0 has introduced some attributes whose use is unclear. There is, for example, a LANG attribute that may assign an ISO standard abbreviation for the language and country. According to the standard, "it can be used by the parsers to select language specific choices for quotation marks, ligatures, and hyphenation rules". It is not really clear that this is useful. There is a much stronger requirement, for changing from Latin 1 to other character sets, which is not addressed by this description. SpHyDir may choose to skip features of the HTML 3.0 draft that are unclear or appear poorly thought out. If any user needs an attribute that has been omitted, please E-mail the author with a description of its use.

SpHyDir builds its internal tables keyed to the tag, object, and attribute. Unfortunately, several attribute names have meanings that depend on context. The NAME attribute can be a variable name (in FORMS related objects) or it can be the label of a jump (in the <A> tag). The ALIGN attribute has one set of values for an IMG, a second set for CAPTION, and another set for Paragraphs, Headings, and Divisions. The worst thing, however, is that ALIGN is also a switch that appears with no value in tables. In some contexts SIZE means WIDTH while elsewhere it is HEIGHT.

There is no way that SpHyDir can ever make sense out of this mess, but it will try to "correct" some of these ambiguities for the normal end user who is not an HTML expert. Near term, SpHyDir may offer to generate HTML attribute values that are not valid for the attribute name used in its current context.

SpHyDir is not written tightly enough to trap its own syntax errors and recover. Rexx simply stops the program when it encounters a problem. Since Rexx is an interpreted language, syntax errors may only be detected during execution. When the program aborts, it can leave the output file half-written. This is the primary reason for making a backup of the previous copy of the file before generating a new copy.

Continue Back PCLT

This document generated by SpHyDir, another fine product of PC Lube and Tune.