Magazine |
| | Community |
| | Workshop |
| | Tools & Samples |
| | Training |
| | Site Info |
|
|
||||||||
|
December 4, 1997
XML Language Issues
Object Model Issues
Recent Bug Fixes
Earlier Release Notes (version 1.6)
Feature | Details |
---|---|
Case Sensitivity |
The XML language recently changed to become case-sensitive.
MSXML uses the exact case for all keywords as defined in the
XML Language Specification, so the following were changed to
lowercase.
<?xml version="..." encoding="..." ?> xml-space=(default|preserve) <?xml:namespace as="..." href="..."?>This is clearly a breaking change from the 1.0 version of the parser and is enabled by default, but for backwards compatibility the object model provides a switch to set the parser back to case-insensitive, as follows: Document d = new Document(); d.setCaseInsensitive(true); d.load("http://www.foo.com/example.xml"); |
Namespaces | The parser supports namespaces as outlined in the XML Namespaces document. Namespace support was used to implement the new xml:lang attribute. |
XML encoding |
On the Windows platform, MSXML uses C++ optimization for input.
This currently supports only UCS2, UTF8, and the Windows code page.
On other platforms, the encodings supported depends on which
encodings the Java Virtual Machine provides in the InputStreamReader.
MSXML also support little endian and big endian Unicode
storage formats and maintains the same format when the document
is saved.
|
White space Handling | Section 2.10 says that xml-space can be specified on any element, controlling whether white space is preserved or normalized. The default is to normalize white space (which means unify all white space characters down to a single space). To preserve white space, set xml-space to preserve -- this is inherited down the hierarchy. To switch back to the default, set xml-space to default. |
Standalone Document Declaration |
Section 2.9 says that the xml declaration can contain
a standalone attribute with values yes or no
This replaces the old RMD attribute.
The standalone attribute is currently not used by the parser.
If you want to stop the parser from loading external DTDs and
entities, use the Document method:document.setLoadExternal(false); |
End-of-Line Handling | Section 2.11 of the spec specifies that all new lines are now returned to the application as the single character 0xa. So to make sure that Document.save() still generates a valid text file on each platform, the parser now writes out the System.getProperty("line.separator") every time it sees the 0xa character. |
Language Identification |
Section 2.12 adds a new xml:lang attribute.
This means that any element can now have this attribute regardless of
ATTLIST declaration. For example, the following is valid, even
though the DTD says that the test element doesn't have any attributes.
<!DOCTYPE test [ <!ELEMENT test ANY> ]> <test xml:lang="en"> The quick brown fox. </test> |
Change | Details |
---|---|
Ignorable White Space |
The parser generates ignorable white-space nodes, as defined in the
W3C DOM Specification. This results in making
Element.getChild(index) unreliable, since there may or may not be white-space
nodes that affect this index. A more reliable way to get the FOO element is
as follows:
Element root = document.getRoot(); Element foo = root.getChildren().getChild(0); This works because the default ElementCollection returned from getChildren() automatically filters out the white-space nodes. |
C++ XML object model | The JavaBeans are provided that expose the Java Object Model in a way that is as close as possible to the C++ XML Object Model that shipped with Internet Explorer 4.0. This makes JavaScript pages work the same regardless of whether the back-end parser is C++ or Java. |
ElementFactory |
The element factory is now designed so that the factory is responsible
for building the XML tree. It now has the following methods:
Element createElement(Element parent, int type, Name tag, String text); void parsedAttribute(Element e, Name name, Object value);This makes it possible to provide a factory that builds nothing and returns null from createElement -- which obviously results in faster parsing. |
Name Tokenization |
The Name class is a useful class that automatically tokenizes commonly used names. This can
save a lot of memory, and as a result it can also speed up parsing.
For example, the
msnbc.cdf |
Fixed bugs in | Details |
---|---|
Parameter Entities | Fixed inclusion of parameter entities inside entity declarations. The new table in section 4.4.0 of the XML Language Specification is a useful guide for how entities are handled. |
Portability |
The XMLInputStream.java file will now compile on all platforms.
It uses the following trick to detect whether the Windows-specific
optimization is available:
Class clazz = Class.forName( "com.ms.xml.xmlstream.XMLStream"); xmlis = (XMLStreamReader)clazz.newInstance(); |
Entity Validation |
There was a bug in validation when entities were involved. For
example, the following now works correctly:
<!DOCTYPE doc [ <!ELEMENT doc (foo)> <!ELEMENT foo EMPTY> <!ENTITY foo "<foo/>"> ]> <doc>&foo;</doc>This was complicated by the presence of EntityRef nodes. The validator no longer assumes an entity is PCDATA. |
Duplicate Entities | Fixed a bug in its handling of a declaration for an entity that has already been declared. It used to use the second definition but the spec says to ignore the second definition. |
Did you find this article useful? Gripes? Compliments? Suggestions for other articles? Write us!
© 1998 Microsoft Corporation. All rights reserved. Terms of use.