A Quick Guide to HTML

by John English

Contents:


About this document.

This is a quick-and-dirty guide to writing HTML documents written specifically for BURKS. The "Try It" button at the bottom brings up a test window that you can type HTML into and see the formatted result, so you can experiment with different features of HTML as I explain them. Only basic HTML features are covered here, but there are more complete tutorials available:

A Beginner's Guide to HTML
A good introduction to HTML from NCSA which is included on this CD.
A Guide to HTML and CGI scripts
An interactive HTML tutorial by Mike Smith at Brighton. It includes forms that you can type HTML into and see the results.

A good thing to do when reading this document is to use your browser's "View Document Source" or "View Frame Source" command (on the View menu) so that you can see the raw HTML used in this document and compare it to the way your browser displays it. You can also copy examples from the text into the test window; press the "Try It" button at the bottom to bring up the test window, select some text from this document using your mouse, then press Control-C to copy it to the Windows clipboard. Now click on the text entry box in the test window and press Control-V to paste in the text from the clipboard.


HTML basics.

HTML is the "markup language" used by web browsers to display documents. A web browser treats text as a continuous sequence of words separated by "white space" (one or more spaces, tabs or line breaks) and displays it according to the width of the display window, using "word wrapping" to fit as many words as will fit on a line before starting the next line. Changing the width of the window will reformat the text so it still fits inside the window (try it!).

Since line breaks are ignored, your document will end up as one long continuoous paragraph if you don't do anything about it, regardless of how you laid it out when you wrote it. To tell the browser to start a new paragraph, you have to use markup tags which will be interpreted specially. HTML markup tags are written inside angle brackets "<...>"; the tag to tell a browser to start a new paragraph is <P>. It doesn't matter if you use capitals or not for tags, so <p> means the same thing as <P>.

You can also use markup tags to tell the browser about special formatting requirements (bold or italic text, and so on):

   <B> ... </B>      Text between <B> and </B> will be
                     displayed as bold text
   <I> ... </I>      Text between <I> and </I> will be
                     displayed as italic text

HTML tags are almost always used in pairs, like brackets; the closing tag is the same as the opening tag but preceded by "/", so <B> is the opening "boldface" tag and </B> is the closing "boldface" tag, and so on.

Because the characters "<" and ">" and a few others are treated specially by browsers, you have to encode them like this:

   To display this:    write this:
         <                &lt;
         >                &gt;
         &                &amp;
         "                &quot;

Any tags that a browser doesn't recognise will just be ignored, so that if you forget to encode "<" as "&lt;" the browser will treat what follows as a tag. If it doesn't recognise the text after "<" as a valid tag, everything up to the next occurrence of ">" will be ignored, which means that a chunk of your text will just disappear completely. The easiest way to write HTML is to use an HTML editor, which will take care of all these details automatically.


Structure of an HTML document.

An HTML document is actually divided into two parts: a header (which is not displayed) and a body (the text that is actually displayed in the browser window). The overall structure looks like this:

   <HTML>                   -- start of HTML document
      <HEAD>                -- start of document header
         ...                -- header contents
      </HEAD>               -- end of header
      <BODY>                -- start of document body
         ...                -- body contents
      </BODY>               -- end of body
   </HTML>                  -- end of document

The only thing the document header needs to contain is a document title which will be displayed in the browser's title bar. A title is enclosed in <TITLE> ... </TITLE> like this:

   <TITLE>This is a document title</TITLE>

In fact, the document structure tags given above (<HTML>, <HEAD> and <BODY>) are normally ignored by browsers; usually, as soon as a browser sees anything which can't be part of the document header, it assumes that it's got to the document body and starts displaying text in the browser window. All the same, it's good practice to put these tags in since some browsers might require them.


Document headings.

To provide headings like the one immediately above, you can use the tag <H1> ... </H1>. The text in between is displayed as a separate paragraph in a large font. For example, if you write this:

   <H1>A Level 1 Heading</H1>

it will be displayed like this:

A Level 1 Heading

Level 1 headings like this are normally only used at the start of a document. There are five other levels for subheadings:

   <H2>A Level 2 Heading</H2>
   <H3>A Level 3 Heading</H3>
   <H4>A Level 4 Heading</H4>
   <H5>A Level 5 Heading</H5>
   <H6>A Level 6 Heading</H6>

which will be displayed like this:

A Level 2 Heading

A Level 3 Heading

A Level 4 Heading

A Level 5 Heading
A Level 6 Heading


Preformatted text.

Sometimes you want text to be displayed exactly as you've written it (e.g. program code). To do this, enclose the text in <PRE> ... </PRE> like this:

   <PRE>
      This text will be displayed exactly as it was typed
                        including any indentation
      or alignment      into columns
      like              this
      Blank lines       are also possible

      You can still use <B>bold text</B> or <I>italic text</I> in
      preformatted text.
   </PRE>

This will be displayed as:

      This text will be displayed exactly as it was typed
                        including any indentation
      or alignment      into columns
      like              this
      Blank lines       are also possible

      You can still use bold text or italic text in
      preformatted text.


Lists.

If you want to write a bulleted list, you enclose the entire list in <UL> ... </UL> and then start individual list items with <LI>. For example:

   <UL>
      <LI>List item 1
      <LI>List item 2
   </UL>

will be displayed like this:

To produce a numbered list istead of a bulleted list, use <OL> ... </OL> instead of <UL> ... </UL>:

   <OL>
      <LI>List item 1
      <LI>List item 2
   </OL>

will be displayed like this:

  1. List item 1
  2. List item 2

You can also produce definition lists using <DL> ... </DL>. Each entry in a definition list is in two parts: a definition term which begins with <DT> and a definition part which begins with <DD>. For example, here is an extract from a glossary of terms elsewhere on this CD:

    <DL>
      <DT>BTW
      <DD>&quot;By the way&quot;
      <DT>RTFM
      <DD>&quot;Read the f***ing manual&quot; (yes, really...)
   </DL>

which will be displayed like this:

BTW
"By the way"
RTFM
"Read the f***ing manual" (yes, really...)


Miscellaneous tags.

Here are a couple more useful tags to round things off:

<HR>      
A horizontal rule (like the one above the heading for this section)
<BR>      
A line break
The line break starts a new line, but doesn't put a gap between lines the way that starting a new paragraph would.


Including images in your text.

To include an image, you need to have the image available in a .GIF or .JPG (JPEG) file. To reference the file you use an IMG tag, like this:

   <IMG SRC="filename.gif">

This will display the image in the file filename.gif as part of the current paragraph. If you want the image to be displayed as a separate paragraph, start a new paragraph before and after the IMG tag, or put line breaks (<BR>) before and after.

There's an example of this at the very beginning of the document. Slightly simplified (use "Frame source" from the "View" menu to see the whole truth), it looks like this:

A Beginner's Guide to HTML
A good introduction to HTML from NCSA. It's a single HTML document, so it's easy to save a copy for offline viewing.

which is produced by the following markup:

   <DL>
      <DT><IMG SRC="../../../link.gif"> A Beginner's Guide to HTML
      <DD>A good introduction to HTML from NCSA. It's a single HTML
          document, so it's easy to save a copy for offline viewing.
   </DL>

The image is in the file link.gif in the directory three levels above the current one (standard Unix filename conventions are used, so directory names are separated by "/" and ".." means "the directory above this one").

In fact, the filename can be any URL (Uniform Resource Locator) so that it can be on any accessible machine anywhere in the world. URLs are described more fully below.


Hypertext links.

Hypertext links are what make web documents so powerful. A link like this can be used to reference another document, which can be another local file or (like an image) it can be another document anywhere in the world.

Links are generated by using anchor tags. The link above is written like this in HTML:

   <A HREF="../.././welldone.htm">like this</A>

The text between <A> and </A> is highlighted by the browser, and when you click on it the browser goes to the file specified by the HREF part of the tag (in this case, the file welldone.htm in the directory two levels above this one). Simple, isn't it?

You can also use images as hypertext links:

Press me!

Pressing the "button" will take you to another document. This was done with the following markup:

   <A HREF="../../../welldone.htm"><IMG SRC="../../../link.gif"></A>
   Press me!

If you want to link to a specific section in a document, you need to put #section after the filename, which will go to the section called section in the specified document:

   <A HREF="somefile.htm#index">The index in some file</A>

If the reference is to a section of the current document, you just use #section on its own:

   <A HREF="#contents">Go to the table of contents</A>

which will be displayed like this:

Go to the table of contents

To attach a section name to part of a document, you need to use another variation of the <A> tag:

   <A NAME="section-name">Some text</A>

For example, the bookmark "contents" was attached to the heading for the table of contents at the beginning of this document like this:

   <P><B><A NAME="contents">Contents:</A></B>

This has no visible effect on the text. All the section headings in this document have bookmarks attached, which are referenced from the table of contents at the start of the document.


URLs.

As I mentioned earlier, images and hypertext links can both use Uniform Resource Locators (URLs) which can reference documents all over the world. A typical URL looks like this:

   http://www.comp.it.bton.ac.uk/je/burks.html

which references the front page for the online copy of BURKS at the University of Brighton. The URL consists of:

In general, a URL looks like this:

   protocol://server/document

HTML supports many different Internet protocols: FTP, mail and Usenet news are among the commonest. The formats for these are as follows:

   ftp://server/filename     -- transfer filename from server
                                using anonymous FTP
   mailto:user@site          -- send email to the email address
                                user@site
   news:groupname            -- connect to the newsgroup groupname

For example:

   ftp://ftp.brighton.ac.uk/pub/je/adacraft/adacraft.zip
                             -- get the file adacraft.zip from
                                the directory pub/je/adacraft
                                by anonymous FTP from
                                ftp.brighton.ac.uk
   mailto:je@brighton.ac.uk  -- send email to John English (je)
                                at Brighton University
                                (brighton.ac.uk)
   news:comp.lang.ada        -- read the newsgroup comp.lang.ada

If you leave out the protocol and server name, the protocol and server name from the current URL will be assumed. So by leaving out the protocol and server name and just providing a file name, you end up referring to a file whose location is relative to the document containing the link. The full gory details are described in RFC 1738 elsewhere on this CD. (Note that the link to RFC 1738 is specified like this:

   <A HREF="../rfc1738.htm">
or in other words, the file rfc1738.htm in the directory above the one where this document is located.)


Summary.

Here's a quick roundup of the HTML tags covered in this document:

Paragraph types:

   <P>                      Paragraph break
   <H1> ... </H1>           Heading level 1
   <H2> ... </H2>           Heading level 2
   <H3> ... </H3>           Heading level 3
   <H4> ... </H4>           Heading level 4
   <H5> ... </H5>           Heading level 5
   <H6> ... </H6>           Heading level 6
   <UL> ... </UL>           Bulleted (unordered) list
   <OL> ... </OL>           Numbered (ordered) list
   <LI>                     List item in a bulleted or
                            numbered list
   <DL> ... </DL>           Definition list
   <DT>                     Definition term
   <DD>                     Definition

Text formatting

   <B> ... </B>             Bold text
   <I> ... </I>             Italic text

Miscellaneous

   <TITLE> ... </TITLE>     Document title
   <BR>                     Line break
   <HR>                     Horizontal rule

Hyperlinks

   <IMG SRC="url">          Inline image
   <A HREF="url"> ... </A>  Hyperlink to another document
   <A NAME="tag"> ... </A>  Bookmark within a document