Document type definitions (DTDs) are connected to documents by document type declarations. Document type declarations must appear before the root element of the document, and tell the XML parser what the structure of the document should look like. Validating parsers are required to load the entire set of declarations referenced by the document type declaration, while non-validating parsers are only required to process declarations included directly within the document type declaration, known as the internal DTD subset.
The document type declaration (often abbreviated doctype declaration) identifies the root element of the document, may point to an external file containing declarations, and may contain declarations. The general syntax for the document type declaration looks like:
<!DOCTYPE rootElement externalReference [declarations]>
Usually, the externalReference is provided as a SYSTEM identifier - the keyword SYSTEM, followed by a URI (in quotes) pointing to the actual file. For example, if a document used the personnel DTD included with XML Authority, the document type declaration might look like:
<!DOCTYPE personnel SYSTEM "personal.dtd">
This document type declaration tells the parser that the root element of this document will be the personnel element, and that it should load the personal.dtd file for a complete list of declarations describing the structure of that document. Storing DTDs in external files makes it much simpler to manage document structures and make certain that all of the documents in a set actually have the same structure.
Note: Some applications may use PUBLIC identifiers or PUBLIC and SYSTEM identifiers in combination. If you are working with software that can process PUBLIC identifiers, see your software's documentation for instructions identifying which PUBLIC identifiers you should use.
In some cases, the document type declaration will contain declarations, rather than refer to an external DTD file. For example:
<!DOCTYPE myDoc [
<!ELEMENT myDoc(#PCDATA)>
]>
This type of declaration is fairly rare; it requires every document to carry a complete description of its structure if it will ever be used with validating parsers. More typically, the internal declarations are used to supplement an external reference, as in:
<!DOCTYPE personnel SYSTEM "personal.dtd" [
<!ENTITY bigBoss "Alexander the Great">
]>
In this case, the parser would read both the information in the personal.dtd file and the entity declaration, getting the information about the file's structure from the personal.dtd file but also becoming aware that when it encounters the entity 'bigBoss' it should replace it with the text 'Alexander the Great'.
Vocabulary note: The declarations in the file referenced by the SYSTEM declarations (and any files referenced by that file) are referred to as the external subset, while declarations made directly within the document type declarations are referred to as the internal subset. Declarations in the internal subset may override declarations in the external subset when such overrides are permitted. (Only one element type declaration may appear for each element name.)
Warning: The internal subset effectively allows authors to modify the structure of documents. Using the internal subset for anything that describes document structure (like element type declarations, attribute type declarations, or parameter entities that reference these declarations) may prove dangerous when an application encounters a structure it wasn't built to accommodate. When building standards, use the external subset.
Copyright 2000 Extensibility, Inc.
Suite 250, 200 Franklin Street, Chapel Hill, North Carolina 27516