Magazine |
| | Community |
| | Workshop |
| | Tools & Samples |
| | Training |
| | Site Info |
|
|
||||||||
|
November 6, 1997
Breaking Changes
XML Language Features
Object Model Changes
Bug Fixes
Please check the Release 1.8 Release Notes for changes in the latest version of the XML Parser.
The XML language recently changed to become case-sensitive. This is clearly a breaking change from the 1.0 version of the parser and is enabled by default, but for backwards compatibility the object model provides a switch to set the parser back to case-insensitive, as follows:
Document d = new Document(); d.setCaseInsensitive(true); d.load("http://www.foo.com/example.xml");
The other potentially breaking change is the introduction of ignorable white-space nodes, as defined in the W3C DOM Specification. This results in making Element.getChild(index) unreliable, since there may or may not be white-space nodes that affect this index. The following two examples, although semantically identical, result in two different object models:
Example 1 | Example 2 |
---|---|
<ROOT> <FOO/> </ROOT> |
<ROOT><FOO/></ROOT> |
DOCUMENT +---ELEMENT ROOT |---WHITESPACE 0xd 0xa 0x9 |---ELEMENT FOO +---WHITESPACE 0xd 0xa |
DOCUMENT +---ELEMENT ROOT +---ELEMENT FOO |
This means that if you have a pointer to the ROOT element, getChild(0) will not always return the FOO element. A more reliable way to get the FOO element is as follows:
Element root = document.getRoot(); Element foo = root.getChildren().getChild(0);
This works because the default ElementCollection returned from getChildren() automatically filters out the white-space nodes.
Feature | Details |
---|---|
Conditional sections | In the DTD (INCLUDE and IGNORE keywords). |
Namespaces | See separate XML Namespaces document. |
XML encoding | Support for the encoding attribute on the <?XML ...?> tag was added. The actual encodings that are supported depends on the Java Virtual Machine that you have installed on your computer. Under Internet Explorer 3.02, you have support for ISO-10646-UCS-2 and ASCII only. Under the final release of Internet Explorer 4.0, you have support for UTF-8, ISO-10646-UCS-2, Shift_JIS, Big5, and ISO-8859-1. It also supports little endian and big endian storage formats and maintains the same when the document is saved. |
XML-SPACE | Implemented according to the XML specification. The default for the parser is to normalize white space. |
RMD | The RMD attribute on the <?XML?> tag is now supported with the possible values NONE, INTERNAL, and ALL. Either way, well-formedness is still checked in the internal subset. |
Floating ampersand | The parser can now parse the text "this & that" since the ampersand is not followed by a valid name character. This makes it possible to parse existing CDF files. |
Change | Details |
---|---|
Introduced ignorable white space | A new white-space node has been added that remembers all the white space between elements. This makes it possible to save the XML in exactly the same format as it was read. |
Synchronized with C++ XML object model | Several changes were made to the object model in order to sync up with the XML object model provided by the C++ XML Parser. This makes JavaScript pages work the same regardless of whether the back-end parser is C++ or Java. |
Improved document save options | New feature for selecting document save format: DEFAULT, COMPACT, or PRETTY. DEFAULT saves in original format, COMPACT has no white space, and PRETTY has new lines and tabbed indenting. |
Pushed Name class up to API level |
The Name class is a
useful class that automatically tokenizes commonly used names. This can
save a lot of memory, and as a result it can also speed up parsing.
For example, the
msnbc.cdf |
Added method on ElementFactory | A new method was added to ElementFactory to notify the factory when an Element was completely parsed. This is useful for clients who provide their own factory to know when an element that they have created is complete. |
Created new DTD abstraction | DTD handling code was extracted from the Document class and placed in a new class called DTD. This way the parser no longer has direct knowledge about the Document class. |
Added ElementEnumeration | This can be used to iterate over the immediate children of a given node in the tree that have a matching tag name. |
Added ElementCollection | This provides a collection interface similar to that already used in the Internet Explorer 4.0 C++-based XML object model. |
Fixed bugs in | Details |
---|---|
Root-level tags | The handling of root-level comments and processing instructions (Misc elements) was broken. The XML specification allows for any number of Misc elements before and after the <!DOCTYPE> and root-level Element tags. This information is now also preserved in the object model for saving out in the same order. |
DTD validation | These fixes include bugs reported by people who used the Alpha 1.0 release of MSXML and new bugs found internally. We also made improvements in error reporting and in making sure that the saved DTD looks the same as the original DTD. |
Document save | Some things were missing in Document save that caused the resulting output to be invalid in some circumstances. |
Entity handling | External parameter entities now fetch the external file and parse it (used by namespaces). Entities are now also stored as nodes in the tree, which means that they can now be saved properly. |
Did you find this article useful? Gripes? Compliments? Suggestions for other articles? Write us!
© 1998 Microsoft Corporation. All rights reserved. Terms of use.