Welcome to Australian PC User Magazine Offline CD-ROM
Software - What's on our latest cover CD. What's New - Like . . "What's new?" Net Guide - Our comprehensive guide to using the Net Web Workshop - Tutorials for Web developers Net Sites - Check out the best on the Web About Us - Find out about PC User Magazine and PC User Net nineMSN
clear.gif (118 bytes)
Net Guide Contents

Home
Search
Help!

Net Guide - Whither HTML?

The same simplicity that gave HTML mass accessibility and appeal has also created all sorts of headaches for Web developers and surfers alike. What does the future hold for HTML? Rose Vines travels along its not-so-smooth path to find out.

The advent of the World Wide Web transformed the Internet. Before the Web, the Internet was merely a research and academic network. It was highly successful in what it did and well-used by an elite international community, but it didn't impinge on the general consciousness.

With the coming of the Web, the Internet became mass market. In an extraordinarily short period, the Web remade the Internet into an almost unavoidable feature of everyday life.

This astounding change was very largely due to the simplicity of hypertext markup language (HTML), the language used to create documents on the Web. HTML consisted of a handful of tags for 'marking up' text so it could be displayed with some basic formatting. A program called a Web browser was used to read these tags and display the formatted page.

The 'hypertext' part of HTML's name referred to the inclusion of document links, called hyperlinks. Clicking a link on a page would transport you to the destination page referred to by the link, and display that page in your browser.

Thus, the Web is a series of interlinked, formatted documents. While from day one HTML has done a pretty good job of linking documents, right from the start it's been a pretty rotten markup language.

 

Markup languages
Markup languages have been with us for a long time. The term 'markup' comes from the printing and publishing industries. In order to turn plain text into a final printed product complete with bold, italics, different typefaces, headings, structured tables and so on, editors marked up copy with standard notations telling the typesetter how to format each element on the page.

That practice continues today with computer-based desktop publishing. For instance, if I want the words 'markup language' to be printed in italics, as I type I'll enclose those words in italics tags: <I>markup language<I>. Our desktop publishing program will automatically strip out the tags and convert the enclosed text to italics, thus: markup language. If you've tried your hand at Web page authoring, you'll recognise the similarity between PC User's typesetting tags and HTML's own tags.

Markup language purists will point out that this is technically incorrect: the markup language is supposed to describe the structure of a document, not its presentation. Thus, a valid markup tag might take the form:

<page heading>Heading Text</page heading>

indicating that 'Heading Text' should be formatted in the standard page heading style. Note that the tag doesn't indicate what this style is: it doesn't specify 24 point, bold Garamond, for instance. Instead, it merely indicates which page element to use. The actual way to present page heading elements is separate from the markup language itself.

That's why you'll find <em> instead of <I> used in early HTML (and still supported by browsers today). The <em> tag indicates emphasis; it doesn't specify the style of emphasis to use (italics, bold, highlight), merely that the marked text is an emphasised element of the page.

In the computing world, the mother of all markup languages, Standard Generalised Markup Language (SGML), made its appearance in the 1960s. SGML is a meta-language: that is, it's a language which can be used to define other markup languages. SGML is complex, unwieldy, powerful and infinitely extensible, and it has been used by the military, newspapers, large organisations and academics to define document standards for a variety of purposes. With SGML, you can define markup for everything from a memo to a complete book.

Had SGML been used as the basis of the World Wide Web, chances are you'd not be surfing the Web today. That's because SGML is about as accessible as Annapurna. It also has no in-built linking support and has to resort to using another system, HyTime, to provide document links.

HTML, on the other hand, is about as simple as a markup language can be. It's at the other extreme from SGML; in fact, HTML is a single type of SGML document.

 

The problem with HTML
The original version of HTML provided for little more than headings, paragraph breaks and indented lists.

It didn't take long for people to start clamouring for a little more spice in their Web documents. In particular, they wanted to be able to display images as well as text. Marc Andreesson (who later went on to found Netscape) came to the party by adding an <img> tag to his Mosaic browser.

That was the start of the tag war. Browser developers started including support for new tags such as <background> and <font> and the shamefully abused <blink>. It wasn't long before we saw <table> and <frame>. Microsoft's Internet Explorer weighed in with <marquee> and <bgsound>, neither supported by other browsers. Netscape replied with its very own <layer> tag. It wasn't very pretty.

In the meantime, a group of people at the World Wide Web Consortium (known as the W3C) was attempting to provide some sort of sanity in the form of HTML standards. A series of revised standards appeared, adding the most popular and workable new features already incorporated in the rival browsers.

The end result? A Web where site designers spend an aggravating amount of time designing multiple versions of their pages which can be viewed by numerous incompatible browsers. A Web where surfers stumble over sites that display poorly, if at all, in their own particular browser.

It's a mess and we're all to blame. There's no doubt that Microsoft and Netscape have been driven into a tag proliferation battle by the desire of Web users for something faster, neater, snazzier, louder, and more entertaining. The two companies have tried to lure us with new tags offering better content. At the same time the W3C, working at a comparatively snail-like pace, has tried to bring some order to the scene by revising the HTML standard. It's not surprising that a committee focussed on getting it right has been left behind by two highly competitive companies trying to get it delivered.

 

Structure and presentation
Apart from the strife engendered by non-standard HTML implementations, HTML has been suffering from another problem. As a markup language, HTML's original job was to define the structure of Web documents. But in the rush to produce a more riveting experience on the Web, this purpose has been lost. HTML as it now stands is being used to control presentation as well as structure.

Web designers, faced with the total lack of tools for controlling the layout of their pages, have forced HTML way beyond its bounds. The italics tag has replaced the emphasis tag; tables are used to position text and graphics; spacer graphics (1-pixel invisible GIF files) are used to create space between page elements. Look at the source code of most sites and you'll find HTML has been forced into Gumby-esque contortions.

The situation is exacerbated by the addition of incompatible implementations of scripting languages and controls (JavaScript, VBScript, ActiveX) used to add a degree of interactivity to Web sites.

 

Fixing HTML: CSS and XML
The one good thing about HTML's current parlous state is that everyone -- from surfer to designer to standards-setter -- realises how hopeless things have become.

Microsoft and Netscape are promising to mend their ways and adhere to standard HTML. That hasn't happened yet, of course, but each company recognises that compatibility with HTML standards is a strong selling point.

The latest HTML standard from the W3C -- HTML 4.0 -- supports cascading style sheets. Style sheets allow Web page designers to separate presentation from structure, and give designers a much greater degree of control over the layout of their pages.

Unfortunately, browser support for style sheets is poor. Navigator 4 and Communicator 4 provide limited support; Internet Explorer 4's support is much better, but still incomplete and buggy; and most alternative browsers lack style sheet support altogether. This means that designers, who relish the control provided by style sheets, are having to adopt a 'slowly, slowly' approach.

Netscape and Microsoft are currently working on their fifth-generation browsers which, hopefully, will provide full support for the W3C's second Cascading Style Sheet specification, CSS2. CSS2 provides for extensive control over positioning of page elements, as well as control over fonts, colours, text spacing, interaction, and other stylistic features.

XML is another acronym you'll need to add to your Web lexicon. It stands for eXtensible Markup Language, and it's the next big step in evolving a Web language that can accommodate all the new uses -- multimedia, database publishing, interactive presentations, and so on -- that are appearing on the Web.

The key word in the acronym is, of course, extensible. Extensible is just what HTML isn't, even though we've tried to make it so by pummelling and pulling it out of shape.

XML is the bridge between the power and huge flexibility of SGML and the simplicity and rigidity of HTML. Unlike HTML, which is merely a single SGML document type, XML is a genuine subset of SGML. Like SGML, XML is a meta-language: a language which can be used to define other languages.

 

 

New dialects: CDF, CML, SMIL
Because it can be used to define other languages, XML provides almost limitless scope. Already, it has been used by Microsoft as the basic for its push content format, Channel Definition Format (CDF). XML has also been used to create Chemical Markup Language (CML), a markup language which lets scientists and researchers publish documents containing chemical symbols and formulae. In April of this year the W3C released MathML (Mathematical Markup Language) as the first application of XML that it has recommended.

Yet another XML by-product, Synchronised Multimedia Integration Language (SMIL), is waiting in the wings. SMIL will allow developers to produce multimedia presentations on the Web. It lets developers separate text, audio, static images and video into separate streams, and then combine them. SMIL provides control over the timing of the display of the various streams, much like current presentations software provides in desktop applications.

CDF, CML, MathML and SMIL signal the beginning of what may turn out to be a flood of XML applications, which will bring new capabilities to the Web. XML languages will offer enormous benefits to vertical markets and specialised industries.

At the same time, there's a danger that these specialised applications will undo one of the key features of the Web as we know it: its broad accessibility. If the main browser vendors have trouble producing fully HTML standard-compliant browsers, what will happen when there are dozens of specialised XML-based languages? What will browser vendors themselves do in response to the huge flexibility of XML: if we're bothered by tag proliferation now, what happens when vendors have a tool like XML in their hands?

 

Coping with HTML
While questions about XML are being raised by the technical Web community, HTML is showing no signs of disappearing immediately. According to the W3C, HTML is here "for several years to come". HTML 4 won't be the last version, and we'll see HTML and XML coexist for some time. Hopefully during that time Microsoft and Netscape will get in sync with one another and with the developing HTML and CSS standards.

In the meantime, we still have to cope with incompatible browsers, so what's the best way to approach a Web without true HTML standards?

If you design your own Web pages, you need to assess your audience and design for the lowest common denominator. That doesn't mean avoiding all the new features supported by the latest browsers. It does mean offering versions of your site that are readable by browsers which may be incapable of understanding Dynamic HTML, Cascading Style Sheets, JavaScript and all the latest tricks.

At the same time, you should be learning how to separate style from content. Use Cascading Style Sheets -- in the long run they'll make your life infinitely easier. In the short term, you can write (or borrow -- there are plenty free on the Net) a JavaScript routine that determines which browser each visitor is using. You can automatically shunt people with incompetent browsers off to your alternative site. Or you can ensure you comment your code between <script> and <style> tags so that it remains invisible to such browsers.

If you're a surfer rather than a designer, there's one big thing you can do for yourself: install the very best browser your hardware will support. Get the latest version of Navigator, Communicator or Internet Explorer if possible. If you're pressed for hard disk space, try Opera (www.operasoftware.com), a compact and very fast browser. Opera 4 promises to support both Cascading Style Sheets and Java, and is due out in July or August.

 

corner.gif (190 bytes)

clear.gif (118 bytes)

 

 

 

 

 

 

 

 

HTML01s.gif (6785 bytes)
The Web as it used to look. The original version of HTML appeared in shades of grey, with standard sized headings and little to alleviate the text, apart from underlined and (sometimes) coloured links.


HTML02s.gif (6785 bytes)
The Web as it looks today. Even largely text-based sites look totally different when they employ the latest features of HTML and style sheets. Web designers can control the appearance and placement of text and graphics, as well as adding elements such as animation, sound and interaction.

toppage.gif (1757 bytes)copyrite.gif (1355 bytes)