Package com.ms.xml.om

Package com.ms.xml.om XML
Packages Next

About com.ms.xml.om

The Java based XML parser from Microsoft® loads XML documents and builds a tree structure of Element objects, starting with the root object of type Document. Each XML tag can either represent a node or a leaf of this tree. You can then browse and edit the tree using the methods of the Element class, and you can save the tree back out in XML format. The parser will also do DTD validation to ensure the XML is valid according to the specified XML DTD.

Loading a document

A document can be loaded by speciying a URL or an InputStream. For example:

     Document d = new Document();
     try {
         URL url = new URL("http://www.w3.org/foo.xml"); 
         d.load(url);
     } catch (ParseException e) {
         d.reportError(e, System.out);
         return;
     }
     ...

You can then examine several properties of the document, like the name (getName) and id (getId) as specified in the <!DOCTYPE> tag.

Enumerating Children

You can traverse the tree by first getting the root node (getRoot) and then enumerating through the children of the root using the Enumeration returned from the Element.getElements method, and then iterating through those children's children, etc.

In the following XML snippet:

     <CHANNEL>
         <TITLE>Breaking News</TITLE>
         <!-- This is a comment -->
     </CHANNEL>

The following Element hierarchy is constructed:

    DOCUMENT
    +---ELEMENT CHANNEL
        +---WHITESPACE 0xd 0xa 0x9
        +---ELEMENT TITLE
        |   +---PCDATA "Breaking News"
        +---WHITESPACE 0xd 0xa 0x9
        +---COMMENT --
        |   +---CDATA " This is a comment "
        +---WHITESPACE 0xd 0xa

Note the new ignorable WHITESPACE nodes that capture the exact whitespace between tags. These nodes can be ignored and are only used by the Document.save() method to write the XML back out in the same format as it came in. You can override this formatting at save time by using the COMPACT or PRETTY save options.

If you are trying to write an application that finds the TITLE and returns a string you could write the following:

    public String findTag(Name tag, Element node)
    {
        Element channel = doc.getRoot();
        ElementEnumeration enum = channel.getElements();
        while (enum.hasMoreElements()) 
        {
            Element e = enum.nextElement();
            if (e.getTagName() == tag) 
            {
                return e.getText();
            } 
            else if (e.numElements() > 0))
            {
                String s = findTag(tag,e);
                if (s != null)
                    return s;
            }
        }
        return null;
    }

    ...
    String tag = findTag(Name.create("TITLE"),document);

Find Specific Children

You can also construct an ElementEnumeration directly providing a tag name so that you can find all immediate children that have a matching tag name. For example, suppose you want to iterate through all CHANNEL elements that are immediate children of the document root:

    ElementEnumeration iter = new ElementEnumeration(
        document.getRoot(),
        Name.create("CHANNEL"));

    while (iter.hasMoreElements()) 
    {
        Element e = (Element)iter.nextElement();
        // Now you are guarenteed that 'e' 
        // is a CHANNEL element.
    }

Element Collections

An alternative way to enumerate children if you prefer array semantics is to use the ElementCollection class. For example, to iterate through all the CHANNEL tags using the element collection returned from getChildren do the following:

    ElementCollection iter = new ElementCollection(
        document.getRoot(),
        Name.create("CHANNEL"), 
        Element.ELEMENT);

    for (int i = 0; i < iter.getLength(); i++)
    {
        Element e = iter.getChild(i);
        // Now you are guarenteed that 'e' 
        // is a CHANNEL element.
    }

You can also use this it iterate other things, like COMMENTS, by passing a null tag name and the flag Element.COMMENT.

Enumerating Attributes

You can traverse the attributes of a given element also, using the getAttributes method. The following code finds the attribute whose value is a Name object:

    public Attribute findNameAttr(Element e)
    {
        AttributeEnumeration ae = e.getAttributes();
        while (ae.hasMoreElements()) 
        {
            Attribute a = ae.nextElement();
            if (value instanceof Name)
                return a;
        }
        return null;
    }

Modifying the Document

You can modify the tree by using createElement and addChild. For example to append a copyright notice at the end of a document, do the following:

    public void AddCopyright(Document d)
    {
        Element e = d.createElement( 
            Element.ELEMENT,
            Name.create("COPYRIGHT"));
        Element t = d.createElement(Element.PCDATA,null);
        t.setText("Copyright (c) 1997 " +
            "Microsoft Corporation. " +
            "All Rights Reserved.");
        e.addChild(t,null);
        d.getRoot().addChild( e, null); 
    }

Modifying Attributes

Modifying attributes is easily done using the setAttribute and removeAttribute methods. For example, if the COPYRIGHT was an attribute of the root element instead of a child element you could do the following:

    public void AddCopyright(Document d)
    {
        d.getRoot().setAttribute(Name.create("COPYRIGHT"),
            "Copyright (c) 1997 " +
            "Microsoft Corporation. " +
            "All Rights Reserved.");
    }

Saving the document

You can now save your modified document back out to a given OutputStream using the save method. For example, to save the document to a file on disk do the following:

    import java.io.FileOutputStream;
    ...
    OutputStream o = new FileOutputStream("c:\mydocs\new.xml");
    XMLOutputStream xout = d.createOutputStream(o);
    document.save(xout);

You can also control how the file is written using the setOutputStyle and the setEncoding methods and other methods on the XMLOutputStream. The createOutputStream makes sure the document is written out in the same encoding as the original input file. If you do not want this, you can create a new XMLOutputStream using the empty constructor.

Namespaces

The Name objects returned from Element.getTagName and Attribute.getName contain two fields, the name itself, and an optional namespace. For example, if you have the following XML:

    <?xml:namespace ns="urn:Zooology" src="http://www.foo.com/zoo.dtd" prefix="ZOO"?>
    <ZOO::ANIMAL>Tiger</ZOO:ANIMAL>

Then the following code can be used to find all the child elements of the document that belong to the ZOO namespace:

    static Atom zooNameSpace = 
        Atom.create("urn:Zooology");
    
    for (ElementEnumeration en = document.getChildren();
        en.hasMoreElements(); )
    {
        Element e = (Element)en.nextElement();
        Name name = e.getTagName();
        if (name.getNameSpace() == zooNameSpace)
        {
            ...process the element...
        }
    }

See Namespaces Proposal for more information.