Microsoft HomeproductssearchsupportshopWrite Us   Microsoft Home
Magazine
 |  Community
 |  Workshop
 |  Tools & Samples
 |  Training
 |  Site Info

Workshop  |  XML (Extensible Markup Language)

Microsoft XML Parser in Java
Release Notes for Version 1.8


December 4, 1997

* XML Language Issues
* Object Model Issues
* Recent Bug Fixes
* Earlier Release Notes (version 1.6)


XML Language Issues

Feature Details
Case Sensitivity The XML language recently changed to become case-sensitive. MSXML uses the exact case for all keywords as defined in the XML Language Specification, so the following were changed to lowercase.
    <?xml version="..." encoding="..." ?>
    xml-space=(default|preserve)
    <?xml:namespace as="..." href="..."?>
This is clearly a breaking change from the 1.0 version of the parser and is enabled by default, but for backwards compatibility the object model provides a switch to set the parser back to case-insensitive, as follows:
    Document d = new Document();
    d.setCaseInsensitive(true);
    d.load("http://www.foo.com/example.xml");
Namespaces The parser supports namespaces as outlined in the XML Namespaces document. Namespace support was used to implement the new xml:lang attribute.
XML encoding On the Windows platform, MSXML uses C++ optimization for input. This currently supports only UCS2, UTF8, and the Windows code page. On other platforms, the encodings supported depends on which encodings the Java Virtual Machine provides in the InputStreamReader. MSXML also support little endian and big endian Unicode storage formats and maintains the same format when the document is saved.

White space Handling Section 2.10 says that xml-space can be specified on any element, controlling whether white space is preserved or normalized. The default is to normalize white space (which means unify all white space characters down to a single space). To preserve white space, set xml-space to preserve -- this is inherited down the hierarchy. To switch back to the default, set xml-space to default.
Standalone Document Declaration Section 2.9 says that the xml declaration can contain a standalone attribute with values yes or no This replaces the old RMD attribute. The standalone attribute is currently not used by the parser. If you want to stop the parser from loading external DTDs and entities, use the Document method:
    document.setLoadExternal(false);

   
End-of-Line Handling Section 2.11 of the spec specifies that all new lines are now returned to the application as the single character 0xa. So to make sure that Document.save() still generates a valid text file on each platform, the parser now writes out the System.getProperty("line.separator") every time it sees the 0xa character.
Language Identification Section 2.12 adds a new xml:lang attribute. This means that any element can now have this attribute regardless of ATTLIST declaration. For example, the following is valid, even though the DTD says that the test element doesn't have any attributes.
    <!DOCTYPE test [
        <!ELEMENT test ANY>
     ]>
    <test xml:lang="en">
        The quick brown fox.
    </test>

   

TopBack to top

Object Model Issues

Change Details
Ignorable White Space The parser generates ignorable white-space nodes, as defined in the W3C DOM Specification. This results in making Element.getChild(index) unreliable, since there may or may not be white-space nodes that affect this index. A more reliable way to get the FOO element is as follows:
    Element root = document.getRoot();
    Element foo = root.getChildren().getChild(0);

This works because the default ElementCollection returned from getChildren() automatically filters out the white-space nodes.

C++ XML object model The JavaBeans are provided that expose the Java Object Model in a way that is as close as possible to the C++ XML Object Model that shipped with Internet Explorer 4.0. This makes JavaScript pages work the same regardless of whether the back-end parser is C++ or Java.
ElementFactory The element factory is now designed so that the factory is responsible for building the XML tree. It now has the following methods:
    Element createElement(Element parent, int type,
        Name tag, String text);
    void parsedAttribute(Element e, Name name,
        Object value);
This makes it possible to provide a factory that builds nothing and returns null from createElement -- which obviously results in faster parsing.
Name Tokenization The Name class is a useful class that automatically tokenizes commonly used names. This can save a lot of memory, and as a result it can also speed up parsing. For example, the msnbc.cdf Non-SBN link file creates only 58 unique Names, and shares a whopping 3522 Name objects. All Name objects are created using a static method as follows: Name foo = Name.create("FOO"); These names are stored in a hash table so that multiple instances of the same name will share the actual Name object. Obviously this is useful for XML tags and XML entities, and so the APIs in the object model now take and return Name objects instead of strings whereever applicable so that clients can also receive these benefits.

TopBack to top

Recent Bug Fixes

Fixed bugs in Details
Parameter Entities Fixed inclusion of parameter entities inside entity declarations. The new table in section 4.4.0 of the XML Language Specification is a useful guide for how entities are handled.
Portability The XMLInputStream.java file will now compile on all platforms. It uses the following trick to detect whether the Windows-specific optimization is available:
    Class clazz = Class.forName(
        "com.ms.xml.xmlstream.XMLStream");
    xmlis = (XMLStreamReader)clazz.newInstance();
Entity Validation There was a bug in validation when entities were involved. For example, the following now works correctly:
    <!DOCTYPE doc [
    <!ELEMENT doc (foo)>
    <!ELEMENT foo EMPTY>
    <!ENTITY foo "<foo/>">
    ]>
    <doc>&foo;</doc>
This was complicated by the presence of EntityRef nodes. The validator no longer assumes an entity is PCDATA.
Duplicate Entities Fixed a bug in its handling of a declaration for an entity that has already been declared. It used to use the second definition but the spec says to ignore the second definition.

TopBack to top

HomeBack to the XML Parser in Java home page


Did you find this article useful? Gripes? Compliments? Suggestions for other articles? Write us!

Back to topBack to top

© 1998 Microsoft Corporation. All rights reserved. Terms of use.

 

Magazine Home
Ask Jane
DHTML Dude
Extreme XML
For Starters
More or Hess
Servin' It Up
Site Lights
Web Men Talking
Member Community Home
Benefits: Freebies & Discounts
Benefits: Promote Your Site
Benefits: Connect with Your Peers
Benefits at a Glance
Online Special-Interest Groups
Your Membership
SBN Stores
Join Now
Workshop Home
Essentials
Content & Component Delivery
Component Development
Data Access & Databases
Design
DHTML, HTML & CSS
Extensible Markup Language (XML)
Languages & Development Tools
Messaging & Collaboration
Networking, Protocols & Data Formats
Reusing Browser Technology
Security & Cryptography
Server Technologies
Streaming & Interactive Media
Web Content Management
Workshop Index
Tools & Samples Home
Tools
Samples, Headers, Libs
Images
Sounds
Style Sheets
Web Fonts
Training Home
SBN Live Seminars
SBN Live Chats
Courses
Peer Support
CD-ROM Training
Books & Training Kits
Certification
SBN Home
New to SBN?
What's New on SBN
Site Map
Site Search
Glossary
Write Us
About This Site