A Quick Guide to HTML
by John English
This is a quick-and-dirty guide to writing HTML documents written specifically for BURKS. The "Try It" button at the bottom brings up a test window that you can type HTML into and see the formatted result, so you can experiment with different features of HTML as I explain them. Only basic HTML features are covered here, but there are more complete tutorials available:
A good thing to do when reading this document is to use your browser's "View Document Source" or "View Frame Source" command (on the View menu) so that you can see the raw HTML used in this document and compare it to the way your browser displays it. You can also copy examples from the text into the test window; press the "Try It" button at the bottom to bring up the test window, select some text from this document using your mouse, then press Control-C to copy it to the Windows clipboard. Now click on the text entry box in the test window and press Control-V to paste in the text from the clipboard.
HTML is the "markup language" used by web browsers to display documents. A web browser treats text as a continuous sequence of words separated by "white space" (one or more spaces, tabs or line breaks) and displays it according to the width of the display window, using "word wrapping" to fit as many words as will fit on a line before starting the next line. Changing the width of the window will reformat the text so it still fits inside the window (try it!).
Since line breaks are ignored, your document will end up as one long continuoous paragraph if you don't do anything about it, regardless of how you laid it out when you wrote it. To tell the browser to start a new paragraph, you have to use markup tags which will be interpreted specially. HTML markup tags are written inside angle brackets "<...>"; the tag to tell a browser to start a new paragraph is <P>. It doesn't matter if you use capitals or not for tags, so <p> means the same thing as <P>.
You can also use markup tags to tell the browser about special formatting requirements (bold or italic text, and so on):
<B> ... </B> Text between <B> and </B> will be displayed as bold text <I> ... </I> Text between <I> and </I> will be displayed as italic text
HTML tags are almost always used in pairs, like brackets; the closing tag is the same as the opening tag but preceded by "/", so <B> is the opening "boldface" tag and </B> is the closing "boldface" tag, and so on.
Because the characters "<" and ">" and a few others are treated specially by browsers, you have to encode them like this:
To display this: write this: < < > > & & " "
Any tags that a browser doesn't recognise will just be ignored, so that if you forget to encode "<" as "<" the browser will treat what follows as a tag. If it doesn't recognise the text after "<" as a valid tag, everything up to the next occurrence of ">" will be ignored, which means that a chunk of your text will just disappear completely. The easiest way to write HTML is to use an HTML editor, which will take care of all these details automatically.
An HTML document is actually divided into two parts: a header (which is not displayed) and a body (the text that is actually displayed in the browser window). The overall structure looks like this:
<HTML> -- start of HTML document <HEAD> -- start of document header ... -- header contents </HEAD> -- end of header <BODY> -- start of document body ... -- body contents </BODY> -- end of body </HTML> -- end of document
The only thing the document header needs to contain is a document title which will be displayed in the browser's title bar. A title is enclosed in <TITLE> ... </TITLE> like this:
<TITLE>This is a document title</TITLE>
In fact, the document structure tags given above (<HTML>, <HEAD> and <BODY>) are normally ignored by browsers; usually, as soon as a browser sees anything which can't be part of the document header, it assumes that it's got to the document body and starts displaying text in the browser window. All the same, it's good practice to put these tags in since some browsers might require them.
To provide headings like the one immediately above, you can use the tag <H1> ... </H1>. The text in between is displayed as a separate paragraph in a large font. For example, if you write this:
<H1>A Level 1 Heading</H1>
it will be displayed like this:
Level 1 headings like this are normally only used at the start of a document. There are five other levels for subheadings:
<H2>A Level 2 Heading</H2> <H3>A Level 3 Heading</H3> <H4>A Level 4 Heading</H4> <H5>A Level 5 Heading</H5> <H6>A Level 6 Heading</H6>
which will be displayed like this:
Sometimes you want text to be displayed exactly as you've written it (e.g. program code). To do this, enclose the text in <PRE> ... </PRE> like this:
<PRE> This text will be displayed exactly as it was typed including any indentation or alignment into columns like this Blank lines are also possible You can still use <B>bold text</B> or <I>italic text</I> in preformatted text. </PRE>
This will be displayed as:
This text will be displayed exactly as it was typed including any indentation or alignment into columns like this Blank lines are also possible You can still use bold text or italic text in preformatted text.
If you want to write a bulleted list, you enclose the entire list in <UL> ... </UL> and then start individual list items with <LI>. For example:
<UL> <LI>List item 1 <LI>List item 2 </UL>
will be displayed like this:
To produce a numbered list istead of a bulleted list, use <OL> ... </OL> instead of <UL> ... </UL>:
<OL> <LI>List item 1 <LI>List item 2 </OL>
will be displayed like this:
You can also produce definition lists using <DL> ... </DL>. Each entry in a definition list is in two parts: a definition term which begins with <DT> and a definition part which begins with <DD>. For example, here is an extract from a glossary of terms elsewhere on this CD:
<DL> <DT>BTW <DD>"By the way" <DT>RTFM <DD>"Read the f***ing manual" (yes, really...) </DL>
which will be displayed like this:
Here are a couple more useful tags to round things off:
<HR> | A horizontal rule (like the one above the heading for this section) |
<BR> | A line break |
To include an image, you need to have the image available in a .GIF or .JPG (JPEG) file. To reference the file you use an IMG tag, like this:
<IMG SRC="filename.gif">
This will display the image in the file filename.gif as part of the current paragraph. If you want the image to be displayed as a separate paragraph, start a new paragraph before and after the IMG tag, or put line breaks (<BR>) before and after.
There's an example of this at the very beginning of the document. Slightly simplified (use "Frame source" from the "View" menu to see the whole truth), it looks like this:
which is produced by the following markup:
<DL> <DT><IMG SRC="../../../link.gif"> A Beginner's Guide to HTML <DD>A good introduction to HTML from NCSA. It's a single HTML document, so it's easy to save a copy for offline viewing. </DL>
The image is in the file link.gif in the directory three levels above the current one (standard Unix filename conventions are used, so directory names are separated by "/" and ".." means "the directory above this one").
In fact, the filename can be any URL (Uniform Resource Locator) so that it can be on any accessible machine anywhere in the world. URLs are described more fully below.
Hypertext links are what make web documents so powerful. A link like this can be used to reference another document, which can be another local file or (like an image) it can be another document anywhere in the world.
Links are generated by using anchor tags. The link above is written like this in HTML:
<A HREF="../.././welldone.htm">like this</A>
The text between <A> and </A> is highlighted by the browser, and when you click on it the browser goes to the file specified by the HREF part of the tag (in this case, the file welldone.htm in the directory two levels above this one). Simple, isn't it?
You can also use images as hypertext links:
Pressing the "button" will take you to another document. This was done with the following markup:
<A HREF="../../../welldone.htm"><IMG SRC="../../../link.gif"></A> Press me!
If you want to link to a specific section in a document, you need to put #section after the filename, which will go to the section called section in the specified document:
<A HREF="somefile.htm#index">The index in some file</A>
If the reference is to a section of the current document, you just use #section on its own:
<A HREF="#contents">Go to the table of contents</A>
which will be displayed like this:
To attach a section name to part of a document, you need to use another variation of the <A> tag:
<A NAME="section-name">Some text</A>
For example, the bookmark "contents" was attached to the heading for the table of contents at the beginning of this document like this:
<P><B><A NAME="contents">Contents:</A></B>
This has no visible effect on the text. All the section headings in this document have bookmarks attached, which are referenced from the table of contents at the start of the document.
As I mentioned earlier, images and hypertext links can both use Uniform Resource Locators (URLs) which can reference documents all over the world. A typical URL looks like this:
http://www.comp.it.bton.ac.uk/je/burks.html
which references the front page for the online copy of BURKS at the University of Brighton. The URL consists of:
In general, a URL looks like this:
protocol://server/document
HTML supports many different Internet protocols: FTP, mail and Usenet news are among the commonest. The formats for these are as follows:
ftp://server/filename -- transfer filename from server using anonymous FTP mailto:user@site -- send email to the email address user@site news:groupname -- connect to the newsgroup groupname
For example:
ftp://ftp.brighton.ac.uk/pub/je/adacraft/adacraft.zip -- get the file adacraft.zip from the directory pub/je/adacraft by anonymous FTP from ftp.brighton.ac.uk mailto:je@brighton.ac.uk -- send email to John English (je) at Brighton University (brighton.ac.uk) news:comp.lang.ada -- read the newsgroup comp.lang.ada
If you leave out the protocol and server name, the protocol and server name from the current URL will be assumed. So by leaving out the protocol and server name and just providing a file name, you end up referring to a file whose location is relative to the document containing the link. The full gory details are described in RFC 1738 elsewhere on this CD. (Note that the link to RFC 1738 is specified like this:
<A HREF="../rfc1738.htm">or in other words, the file rfc1738.htm in the directory above the one where this document is located.)
Here's a quick roundup of the HTML tags covered in this document:
Paragraph types:
<P> Paragraph break <H1> ... </H1> Heading level 1 <H2> ... </H2> Heading level 2 <H3> ... </H3> Heading level 3 <H4> ... </H4> Heading level 4 <H5> ... </H5> Heading level 5 <H6> ... </H6> Heading level 6 <UL> ... </UL> Bulleted (unordered) list <OL> ... </OL> Numbered (ordered) list <LI> List item in a bulleted or numbered list <DL> ... </DL> Definition list <DT> Definition term <DD> Definition
Text formatting
<B> ... </B> Bold text <I> ... </I> Italic text
Miscellaneous
<TITLE> ... </TITLE> Document title <BR> Line break <HR> Horizontal rule
Hyperlinks
<IMG SRC="url"> Inline image <A HREF="url"> ... </A> Hyperlink to another document <A NAME="tag"> ... </A> Bookmark within a document