Data Types in XML Instance

There are two categories of data types that may occur in XML schemas you import into XML Instance:

Data Types in XML 1.0

The XML 1.0 data types are used primarily to identify attribute types for use in document contexts. All validating XML 1.0 parsers will check these constraints to make sure that attribute values fit these rules. None of these data types is really oriented toward data in the sense used by programmers and database developers, but are useful for describing relationships within documents and for constraining data to a list of acceptable values.

Identifier Meaning and Constraints
text(cdata) The attribute value may be any series of legal XML characters and general entities. (The <, >, and & characters still need to be represented with the predefined entities &lt;, &gt;, and &amp;.)
ID The value must be an XML name, beginning with a letter and otherwise composed of letters, digits, hyphens, underscores, and full stop characters. (Colons are prohibited for documents conforming to the Namespaces in XML 1.0 W3C Recommendation.) The value of the attribute must also be unique within the document among all attributes of type ID. ID attributes may never have fixed default values. Only one attribute per element may be of type ID. Typically, attributes containing ID values are named 'id', though this is not required.
IDREF The attribute value must match the value of an ID attribute of an element contained within the same XML document.
IDREFS Multiple values of ID attributes may appear, separated by white space, but all must match ID values in the document. (A single ID value is also acceptable.)
ENTITY The attribute value must match the name of an external unparsed entity declared elsewhere in the document type definition. (Colons are prohibited within the value of this type of attribute for documents conforming to the Namespaces recommendation.)
ENTITIES Like ENTITY, except that multiple names of unparsed entities may appear with white space separating the values. (Colons are prohibited within the value of this type of attribute for documents conforming to the Namespaces recommendation.)
NMTOKEN The attribute value must contain letters, digits, periods, dashes, underscores, combining characters or extenders. No other characters (including white space) may appear. (Colons are prohibited within the value of this type of attribute for documents conforming to the Namespaces recommendation.)
NMTOKENS Like NMTOKEN, except that multiple name values may appear with white space separating the values. (Colons are prohibited within the value of this type of attribute for documents conforming to the Namespaces recommendation.)
enumerated Provides a list of acceptable values. The word enumerated isn't stated in the declaration. Instead, a list of possible values for the attribute appear in parentheses, separated by vertical ('or') bars. (value | value ...).
NOTATION The NOTATION keyword must be followed by a list of acceptable notation identifiers in the same format - (value | value ...) - used for enumerated values. All values provided must have been declared as NOTATIONS elsewhere in the document type definition.

Data-oriented Types

While the final usage of data-oriented types in XML schemas is still in development at the World Wide Web Consortium (W3C), the data types available in certain XML schemas used with XML Instance offer an opportunity to get started planning document data type usage. Applications that want to perform an extra validation step may do so by using XML Instance to apply a schema created in XML Authority, which currently stores these data types as additional fixed attributes. These data types are based on the XML-Data proposal submitted to the W3C.

Data Type Constraints Imposed
string Content is a text string.
number Content is a number of some kind.
integer Content is an integer number.
currency Content represents a currency value.
float Content is a floating-point number.
boolean Content is a boolean (true or false).
dateTime Content is a date and time.
date Content is a date.
time Content is a time.
datetime.tz Content is a date and time plus time zone information.
time.tz Content is a time plus time zone information.
interval  
1-byte integer Content is an integer represented by a single byte.
2-byte integer Content is an integer represented by two bytes.
4-byte integer Content is an integer represented by four bytes.
8-byte integer Content is an integer represented by eight bytes.
1-byte unsigned integer Content is an unsigned integer represented by a single byte.
2-byte unsigned integer Content is an unsigned integer represented by two bytes.
4-byte unsigned integer Content is an unsigned integer represented by four bytes.
8-byte unsigned integer Content is an unsigned integer represented by eight bytes.
4-byte float Content is a floating-point number represented by four bytes.
8-byte float Content is a floating-point number represented by four bytes.
UUID Content is hexidecimal digits representing octets..
bin hex Content is BinHex-encoded, representing binary information with a text transformation. (Commonly used in Apple Macintosh projects.)
base64 Content is base64-encoded, representing binary information with a text transformation. (Commonly used in MIME-based projects.)

Copyright 2000 Extensibility, Inc.

Suite 250, 200 Franklin Street, Chapel Hill, North Carolina 27516