


|
 The same simplicity that
gave HTML mass accessibility and appeal has also created all sorts of headaches for Web
developers and surfers alike. What does the future hold for HTML? Rose Vines travels along
its not-so-smooth path to find out.
The advent of the World Wide Web transformed the Internet. Before the
Web, the Internet was merely a research and academic network. It was highly successful in
what it did and well-used by an elite international community, but it didn't impinge on
the general consciousness.
With the coming of the Web, the Internet became
mass market. In an extraordinarily short period, the Web remade the Internet into an
almost unavoidable feature of everyday life.
This astounding change was very largely due to
the simplicity of hypertext markup language (HTML), the language used to create documents
on the Web. HTML consisted of a handful of tags for 'marking up' text so it could
be displayed with some basic formatting. A program called a Web browser was used
to read these tags and display the formatted page.
The 'hypertext' part of HTML's name referred to
the inclusion of document links, called hyperlinks. Clicking a link on a page
would transport you to the destination page referred to by the link, and display that page
in your browser.
Thus, the Web is a series of interlinked,
formatted documents. While from day one HTML has done a pretty good job of linking
documents, right from the start it's been a pretty rotten markup language.
|
Markup languages
Markup languages have been with us for a long time. The term 'markup' comes from the
printing and publishing industries. In order to turn plain text into a final printed
product complete with bold, italics, different typefaces, headings, structured tables and
so on, editors marked up copy with standard notations telling the typesetter how to format
each element on the page.That practice
continues today with computer-based desktop publishing. For instance, if I want the words
'markup language' to be printed in italics, as I type I'll enclose those words in italics
tags: <I>markup language<I>. Our desktop publishing program will automatically
strip out the tags and convert the enclosed text to italics, thus: markup language.
If you've tried your hand at Web page authoring, you'll recognise the similarity between
PC User's typesetting tags and HTML's own tags.
Markup language purists will point out that this is
technically incorrect: the markup language is supposed to describe the structure
of a document, not its presentation. Thus, a valid markup tag might take the form:
<page heading>Heading Text</page heading>
indicating that 'Heading Text' should be formatted in the
standard page heading style. Note that the tag doesn't indicate what this style is: it
doesn't specify 24 point, bold Garamond, for instance. Instead, it merely indicates which
page element to use. The actual way to present page heading elements is separate from the
markup language itself.
That's why you'll find <em> instead of <I> used
in early HTML (and still supported by browsers today). The <em> tag indicates
emphasis; it doesn't specify the style of emphasis to use (italics, bold, highlight),
merely that the marked text is an emphasised element of the page.
In the computing world, the mother of all markup languages,
Standard Generalised Markup Language (SGML), made its appearance in the 1960s. SGML is a
meta-language: that is, it's a language which can be used to define other markup
languages. SGML is complex, unwieldy, powerful and infinitely extensible, and it has been
used by the military, newspapers, large organisations and academics to define document
standards for a variety of purposes. With SGML, you can define markup for everything from
a memo to a complete book.
Had SGML been used as the basis of the World Wide Web,
chances are you'd not be surfing the Web today. That's because SGML is about as accessible
as Annapurna. It also has no in-built linking support and has to resort to using another
system, HyTime, to provide document links.
HTML, on the other hand, is about as simple as a markup
language can be. It's at the other extreme from SGML; in fact, HTML is a single type of
SGML document.
|
The problem with HTML
The original version of HTML provided for little more than headings, paragraph breaks and
indented lists.It didn't take long for people
to start clamouring for a little more spice in their Web documents. In particular, they
wanted to be able to display images as well as text. Marc Andreesson (who later went on to
found Netscape) came to the party by adding an <img> tag to his Mosaic browser.
That was the start of the tag war. Browser developers started
including support for new tags such as <background> and <font> and the
shamefully abused <blink>. It wasn't long before we saw <table> and
<frame>. Microsoft's Internet Explorer weighed in with <marquee> and
<bgsound>, neither supported by other browsers. Netscape replied with its very own
<layer> tag. It wasn't very pretty.
In the meantime, a group of people at the World Wide Web
Consortium (known as the W3C) was attempting to provide some sort of sanity in the form of
HTML standards. A series of revised standards appeared, adding the most popular and
workable new features already incorporated in the rival browsers.
The end result? A Web where site designers spend an
aggravating amount of time designing multiple versions of their pages which can be viewed
by numerous incompatible browsers. A Web where surfers stumble over sites that display
poorly, if at all, in their own particular browser.
It's a mess and we're all to blame. There's no doubt that
Microsoft and Netscape have been driven into a tag proliferation battle by the desire of
Web users for something faster, neater, snazzier, louder, and more entertaining. The two
companies have tried to lure us with new tags offering better content. At the same time
the W3C, working at a comparatively snail-like pace, has tried to bring some order to the
scene by revising the HTML standard. It's not surprising that a committee focussed on
getting it right has been left behind by two highly competitive companies trying to get it
delivered.
|
Structure and presentation
Apart from the strife engendered by non-standard HTML implementations, HTML has been
suffering from another problem. As a markup language, HTML's original job was to define
the structure of Web documents. But in the rush to produce a more riveting
experience on the Web, this purpose has been lost. HTML as it now stands is being used to
control presentation as well as structure.Web
designers, faced with the total lack of tools for controlling the layout of their pages,
have forced HTML way beyond its bounds. The italics tag has replaced the emphasis tag;
tables are used to position text and graphics; spacer graphics (1-pixel invisible GIF
files) are used to create space between page elements. Look at the source code of most
sites and you'll find HTML has been forced into Gumby-esque contortions.
The situation is exacerbated by the addition of incompatible
implementations of scripting languages and controls (JavaScript, VBScript, ActiveX) used
to add a degree of interactivity to Web sites.
Fixing HTML: CSS and XML
The one good thing about HTML's current parlous state is that everyone -- from surfer to
designer to standards-setter -- realises how hopeless things have become.
Microsoft and Netscape are promising to mend their ways and
adhere to standard HTML. That hasn't happened yet, of course, but each company recognises
that compatibility with HTML standards is a strong selling point.
The latest HTML standard from the W3C -- HTML 4.0 -- supports
cascading style sheets. Style sheets allow Web page designers to separate presentation
from structure, and give designers a much greater degree of control over the layout of
their pages.
Unfortunately, browser support for style sheets is poor.
Navigator 4 and Communicator 4 provide limited support; Internet Explorer 4's support is
much better, but still incomplete and buggy; and most alternative browsers lack style
sheet support altogether. This means that designers, who relish the control provided by
style sheets, are having to adopt a 'slowly, slowly' approach.
Netscape and Microsoft are currently working on their
fifth-generation browsers which, hopefully, will provide full support for the W3C's second
Cascading Style Sheet specification, CSS2. CSS2 provides for extensive control over
positioning of page elements, as well as control over fonts, colours, text spacing,
interaction, and other stylistic features.
XML is another acronym you'll need to add to your Web
lexicon. It stands for eXtensible Markup Language, and it's the next big step in evolving
a Web language that can accommodate all the new uses -- multimedia, database publishing,
interactive presentations, and so on -- that are appearing on the Web.
The key word in the acronym is, of course, extensible.
Extensible is just what HTML isn't, even though we've tried to make it so by pummelling
and pulling it out of shape.
XML is the bridge between the power and huge flexibility of
SGML and the simplicity and rigidity of HTML. Unlike HTML, which is merely a single SGML
document type, XML is a genuine subset of SGML. Like SGML, XML is a meta-language: a
language which can be used to define other languages.
|
New dialects: CDF, CML, SMIL
Because it can be used to define other languages, XML provides almost limitless scope.
Already, it has been used by Microsoft as the basic for its push content format, Channel
Definition Format (CDF). XML has also been used to create Chemical Markup Language (CML),
a markup language which lets scientists and researchers publish documents containing
chemical symbols and formulae. In April of this year the W3C released MathML (Mathematical
Markup Language) as the first application of XML that it has recommended.Yet another XML by-product, Synchronised Multimedia Integration
Language (SMIL), is waiting in the wings. SMIL will allow developers to produce multimedia
presentations on the Web. It lets developers separate text, audio, static images and video
into separate streams, and then combine them. SMIL provides control over the timing of the
display of the various streams, much like current presentations software provides in
desktop applications.
CDF, CML, MathML and SMIL signal the beginning of what may
turn out to be a flood of XML applications, which will bring new capabilities to the Web.
XML languages will offer enormous benefits to vertical markets and specialised industries.
At the same time, there's a danger that these specialised
applications will undo one of the key features of the Web as we know it: its broad
accessibility. If the main browser vendors have trouble producing fully HTML
standard-compliant browsers, what will happen when there are dozens of specialised
XML-based languages? What will browser vendors themselves do in response to the huge
flexibility of XML: if we're bothered by tag proliferation now, what happens when vendors
have a tool like XML in their hands?
Coping with HTML
While questions about XML are being raised by the technical Web community, HTML is showing
no signs of disappearing immediately. According to the W3C, HTML is here "for several
years to come". HTML 4 won't be the last version, and we'll see HTML and XML coexist
for some time. Hopefully during that time Microsoft and Netscape will get in sync with one
another and with the developing HTML and CSS standards.
In the meantime, we still have to cope with incompatible
browsers, so what's the best way to approach a Web without true HTML standards?
If you design your own Web pages, you need to assess your
audience and design for the lowest common denominator. That doesn't mean avoiding all the
new features supported by the latest browsers. It does mean offering versions of your site
that are readable by browsers which may be incapable of understanding Dynamic HTML,
Cascading Style Sheets, JavaScript and all the latest tricks.
At the same time, you should be learning how to separate
style from content. Use Cascading Style Sheets -- in the long run they'll make your life
infinitely easier. In the short term, you can write (or borrow -- there are plenty free on
the Net) a JavaScript routine that determines which browser each visitor is using. You can
automatically shunt people with incompetent browsers off to your alternative site. Or you
can ensure you comment your code between <script> and <style> tags so that it
remains invisible to such browsers.
If you're a surfer rather than a designer, there's one big
thing you can do for yourself: install the very best browser your hardware will support.
Get the latest version of Navigator, Communicator or Internet Explorer if possible. If
you're pressed for hard disk space, try Opera (www.operasoftware.com),
a compact and very fast browser. Opera 4 promises to support both Cascading Style Sheets
and Java, and is due out in July or August.
|

|


The Web as it used to look. The original
version of HTML appeared in shades of grey, with standard sized headings and little to
alleviate the text, apart from underlined and (sometimes) coloured links.

The Web as it looks today. Even largely text-based sites look totally
different when they employ the latest features of HTML and style sheets. Web designers can
control the appearance and placement of text and graphics, as well as adding elements such
as animation, sound and interaction.
|
|