IBM alphaWorks
E-Mail Site Map
Search

alphaWorks
Internet Access Required Internet Access Required alphaWorks Press and Awards Information about alphaWorks Internet Access Required XML Tutorials and Papers
XML Education: Online Courses                                                                   Go to Tutorial-2

XML Education for Programmers

Ralf I Pfeiffer
IBM XML Technology Group




Tutorial 1

Overview of XML      view in PDF format

What is XML? (Yet another markup language?)

We are already familiar with markup tags from HTML. For example, Web authors often use tables to present their product information.

This is an HTML source code example, illustrating a product catalog entry as presented on a corporate Web site:

<TABLE BORDER=1>

        <TR>

                <TH>Product ID</TH>

                <TH>Description</TH>

                <TH>Price</TH>

        </TR>

        <TR>

                <TD>12345678-Q</TD>

                <TD>Thinkpad 2000D</TD>

                <TD>$999.99</TD>

        </TR>

</TABLE>

Your HTML browser renders the example above like so:

Product ID Description Price
12345678-Q Thinkpad 2000D $999.99

XML uses markup tags as well, but, unlike HTML, XML tags describe the content, rather than the presentation of that content. So, in the example above instead of using <TABLE>, we would define our own tag called <product>. Now we can find a specific product in all documents that follow this markup convention. We can now distinguish between products and the various data that can be presented as HTML tables.

In XML, you can also define your own attributes for tags. So the same example above could be coded in XML as:

<product>

 	<id>12345678-Q</id>

	<description>Thinkpad 2000D</description>

	<price>$999.99</price>

</product>

Ironically, by avoiding formatting tags in the data, but marking the meaning of the data itself with <product>, we actually make it easier for a client to search various individual servers for a product and receive a product list tailored to the preferences of the user.

Figure 1 - XML makes data accessible from any device.

Figure 1

In Figure 1 above, note how meaningful searches can be applied to XML data, and the result can be rendered differently, depending on the destination device. Note also that the XML processor can exist on the server, the client, or both.

Using XML tags to define what your data means using the natural vocabulary of your data's domain is the key motivation for XML's invention and the basis of its usefulness.

Where does XML come from? What is its status?

XML is a simplified subset of the Standard Generalized Markup Language (SGML). SGML was standardized in 1986, based on the Generalized Markup Language invented by IBM in 1969. XML was simplified for more general use on the Web and as a data interchange format. The simplifications don't detract from XML's extensibility, but make it easier for anyone to write valid XML. It has also been simplified so that a parser can easily and quickly verify that documents are well-formed and valid.

The XML specification is an accepted recommendation awaiting formal acceptance as a standard by the W3C. The sister DOM specification, for programmatic access to XML documents is a proposed recommendation of the W3C. The DOM specification has been more recently developed, but all or most XML parsers like the IBM XML for Java (XML4J) Parser have been keeping up with and tracking the latest DOM specification.

As a technology, XML is in the unique position of being embraced by all of the leaders in the computer industry. Also, many vertical industries are embracing XML for its ability to expedite the availability of their domain-specific information for internal and external use.

There are a number of W3C-proposed extensions to XML. Most of the proposed extensions use the XML language, which minimizes the differences in syntax that must be learned. These extensions are beyond the scope of this tutorial, but are listed in our Appendix A - Proposed Extensions to XML.

Why does my manager want XML?

From the business perspective, almost any type of data can be represented as XML, with a grammar to describe its structure.

These are a few business areas that will benefit immediately from XML use:

e-commerce

In the business domain there will soon be specific XML languages to describe orders, transactions, inventory, and billing. These open XML languages will allow manufacturers, retailers, and consumers, even banking and accounting systems to share the same data.

Figure 2 - Claire starts a whole chain of XML transactions.

Figure 2

Meaningful searches, platform independence

If Web data is encoded in XML, describing the meaning of the actual content, suddenly customers can find your product or service, and purchase it, without regard for the platform that serves the data, or the platform that the client is using.

Data accessibility

Indeed, some proposed XML extensions like DCD allow XML documents to encode the datatype information and relationships in existing databases. Other XML extensions such as XQL have been proposed for specifying queries within XML. Database information can be immediately accessible as XML on corporate intranets or the Web.

Figure 3 - Databases can be presented as XML

Figure 3

Future Web servers will incorporate XML processors to access data from databases and the file system, and return it as HTML or XML.

Application simplification

Today's memory- and disk-consuming applications suffer from file-format bloat. That is, typical image or text processing applications are often forced to read and write many different formats. XML's open and extensible nature allows us to represent all of the functionality of different formats in one XML domain-specific grammar. An application could simply read and write XML for its domain.

XML will greatly enable the sharing of data over the Internet. Since the actual data is encoded, and not its presentation, it may be presented on any output device from the PC with a browser to the cell-phone and small PDA. Indeed, the prevalence of XML data will drive browsers and applications to be XML aware.

Also, note that XML opens up end-user access to data, and the ability to make intelligent choices. XML will further accelerate the connection between business and consumer.

Why should a programmer learn XML?

XML enables the separation of the meaning or semantics of the data from the way it is used by an application or rendered on the screen or output device. This is analogous to the practice of model/view separation in good object-oriented design. An example of model/view separation is Java's JFC Swing components. Two different instances of JFC presentation components can share the same model.

Model/view separation allows for multiple different presentations of the same type of XML data. The different presentations can be tailored to the output platform capability, or even to user preference. The simultaneous presentation of data in tabular and graphical ways is often helpful to understanding the data.

An XML document can be thought of as an instance of its grammar or DTD, similar to the way a Java object is an instance of a Class or Interface. Like a Java object or JavaBean, XML is self-declaring! Just as the Reflection capability in Java allows you do ask a class/bean about its methods and behavior at run-time, XML provides DOM, a programmatic API to access the logical structure and content of any XML document.

XML Parsers that expose the DOM API, like the XML for Java Parser will soon be ubiquitous, in Java and natively on various platforms. The benefits to the programmer are many:

  • The programming task of writing multiple parsers for the data an application must read is reduced to coding against an API that presents the logical structure of the data itself!
  • Domain-specific frameworks will be written on top of DOM, to present the DOM tree structure in ways natural to that domain.
  • Since the DOM spec is a set of language-independent Interfaces, expressing a logical tree structure, translators from existing formats can easily be created by parsing existing formats, and building a DOM structure. Then you can use DOM Parsers, which not only parse XML, but can generate XML from DOM (like the XML for Java Parser) to write XML corresponding to an existing format.

In essence it is much easier for the programmer to tag his data in XML, and code against DOM Interfaces to access application data.

Summary

XML is a simple, cross-platform, and extensible way to mark up data. Like HTML, XML lives on the Web as a first-class Web format. Clients will access XML data as easily as they access HTML today.

Commerce and the software industry are gearing up to use XML because of its benefits to users. Currently there are many proprietary ways of accessing databases, catalogs, weather, and stock data. When this same data is presented as XML, search engines and agents can finally present the user with exactly what he or she is interested in. Also, since the data itself is tagged, rather than the presentation, XML can be custom-rendered for whatever device the user might have at the moment, whether it is a browser or cell-phone.

Unlike HTML, the built-in validity checking of XML allows users to trust the data. Validity checking makes XML appropriate for transactions, electronic commerce, and inventory management.

And finally, XML solves both technical and strategic problems of application programming. On the technical side, the programmer simply codes against the DOM, a set of interfaces to access the structure and content of a parsed XML document. Freed from writing parsers, the programmer can thus concentrate on the expertise and value-add of the application domain. Strategically, an open data format like XML allows applications to compete based on their feature sets, rather than compete by locking out other applications due to secret proprietary formats.

What's next?

You are probably ready to start writing XML documents. If you know a little about HTML, you will probably find that XML is not much of a leap. The next tutorial will teach you to start writing XML by comparing it to HTML and highlighting the subtle differences. You will learn coding techniques through interactive XML examples--which you can edit and parse, and correct and re-parse--from within the tutorial.

Continue on to
Tutorial Abstracts
Tutorial-2