[Next] [Previous] [Up] [Top]

2. HTML Specification

2.6 Working with Structured Text

2.6.1 - HTML Elements
2.6.2 - Names
2.6.3 - Attributes
2.6.4 - Special Characters
2.6.5 - Comments

An HTML document is like a text file, except that some of the characters are markup. Markup (tags) define the structure of the document.

To identify information as HTML, each HTML document should start with the prologue:

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN//2.0">
NOTE: If the body of a text/html body part does not begin with a document type declaration, an HTML user agent should infer the above document type declaration.

HTML documents should also contain an <HTML> tag at the beginning of the file, after the prologue, and an </HTML> tag at the end. Within those tags, an HTML document is organized as a head and a body, much like memo or a mail message. Within the head, you can specify the title and other information about the document. Within the body, you can structure text into paragraphs and lists as well as highlighting phrases and creating links. You do this using HTML elements.

NOTE: Technically, the start and end tags for HTML, Head, and Body elements are omissible; however, this is not recommended since the head/ body structure allows an implementation to determine certain properties of a document, such as the title, without parsing the entire document.


2.6.1 HTML Elements

In HTML documents, tags define the start and end of headings, paragraphs, lists, character highlighting and links. Most HTML elements are identified in a document as a start tag, which gives the element name and attributes, followed by the content, followed by the end tag. Start tags are delimited by < and >, and end tags are delimited by </ and >.

Example:

<H1>This is a Heading</H1>
Some elements only have a start tag without an end tag. For example, to create a line break, you use the <BR> tag. Additionally, the end tags of some other elements, such as Paragraph (<P>), List Item (<LI>), Definition Term (<DT>), and Definition Description (<DD>) elements, may be omitted.

The content of an element is a sequence of characters and nested elements. Some elements, such as anchors, cannot be nested. Anchors and character highlighting may be put inside other constructs.

NOTE: The SGML declaration for HTML specifies SHORTTAG YES, which means that there are other valid syntaxes for tags, such as NET tags, <EM/.../; empty start tags, <>; and empty end tags, </>. Until support for these idioms is widely deployed, their use is strongly discouraged.


2.6.2 Names

A name consists of a letter followed by up to 71 letters, digits, periods, or hyphens. Element names are not case sensitive, but entity names are. For example, <BLOCKQUOTE>, <BlockQuote>, and <blockquote> are equivalent, whereas &amp; is different from &AMP;.

In a start tag, the element name must immediately follow the tag open delimiter <.


2.6.3 Attributes

In a start tag, white space and attributes are allowed between the element name and the closing delimiter. An attribute typically consists of an attribute name, an equal sign, and a value (although some attributes may be just a value). White space is allowed around the equal sign.

The value of the attribute may be either:

In this example, A is the element name, HREF is the attribute name, and http://host/dir/file.html is the attribute value:

<A HREF="http://host/dir/file.html">
NOTE: Some non-SGML implementations consider any occurrence of the > character to signal the end of a tag. For compatibility with such implementations, when > appears in an attribute value, you may want to represent it with an entity or numeric character reference (see Section 2.17.1), such as: <IMG SRC="eq1.ps" alt="a &#62; b">

To put quotes inside of quotes, you may use the character representation &quot; as in:

<IMG SRC="image.ps" alt="First &quot;real&quot; example">
The length of an attribute value is limited to 1024 characters after replacing entity and numeric character references.

NOTE: Some non-SGML implementations allow any character except space or > in a name token. Attributes values must be quoted only if they don't satisfy the syntax for a name token.

Attributes with a declared value of NAME, such as ISMAP and COMPACT, may be written using a minimized syntax. The markup:

<UL COMPACT="compact">
can be written using a minimized syntax:

<UL COMPACT>
NOTE: Some non-SGML implementations only understand the minimized syntax.


2.6.4 Special Characters

The characters between the tags represent text in the ISO-Latin-1 character set, which is a superset of ASCII. Because certain characters will be interpreted as markup, they should be represented by markup -- entity or numeric character references. For more information, see Section 2.16.


2.6.5 Comments

To include comments in an HTML document that will be ignored by the HTML user agent, surround them with <!-- and -->. After the comment delimiter, all text up to the next occurrence of --> is ignored. Hence comments cannot be nested. White space is allowed between the closing -- and >, but not between the opening <! and --.

For example:

<HEAD>
<TITLE>HTML Guide: Recommended Usage</TITLE>
<!-- Id: Text.html,v 1.6 1994/04/25 17:33:48 connolly Exp -->
</HEAD>
NOTE: Some historical HTML user agents incorrectly consider a > sign to terminate a comment.


HTML 2.0 Specification (Internet Draft) - 29 NOV 94
[Next] [Previous] [Up] [Top]

Generated with CERN WebMaker