XML provides an extraordinarily flexible set of structures that can hold many different types of information, from highly structured database tables and lists to much more free-form documents. The tightly defined requirements for XML documents ensure that all applications play by the same rules when they read (technically, parse) and write documents. The XML 1.0 specification leaves very little room for conflicting syntactical interpretations of the same XML document, allowing the exchange of documents across a wide variety of platforms, applications, and development environments. More recent schema developments provide even more powerful tools for describing information and simplifying information exchange.
Reading documents the same way every time, even in different environments, opens incredible new possibilities for networked information sharing, but it only goes part of the way. Using XML's tight syntax effectively requires common vocabularies that allow programs and people to understand the meaning of documents, not simply parse them into elements and attributes. A formal description of the vocabulary and structures used in a document allows all parties to the transaction to share a common understanding of the document contents. XML includes tools that allow developers and authors to create common structures that can help provide meaning as well as syntax - schemas - and schemas can help developers build networks of understanding, acting as contracts between all parties. Taking data modeling seriously and developing clean schemas will help you understand your documents better, producing cleaner and more reliable structures that can help you communicate more freely.
A schema describes the vocabulary and structures that may appear within a document conforming to that schema. Schemas use their own formal grammars to express document structures and vocabulary. If a set of documents uses the same schema, the documents may have markedly different contents, but can share common processing. A schema for invoices, for example, would describe a class of documents that have very different contents (sender, recipient, rates and prices, services and goods, and, of course, total) but which have the same basic structure and can be processed by generic tools for handling invoices. Applications check documents against the schema, and process them only if the document passes inspection (more commonly called validation). This way, applications don't need to provide extensive error-handling or implement complex logic for determining the structure of a variety of different invoice formats. The schema allows applications to coordinate their activities safely and relatively easily.
Schemas provide constraints that documents must meet to be considered 'valid' and therefore safely processable. Those constraints can be used in a number of different applications, because schemas provide a formal vocabulary that can be processed and repurposed. Editing software can read a schema and use it to provide support to document authors, presenting them with acceptable choices along the way or perhaps even building their entire interface (like an entry form) around the contents of the schema. Applications that exchange documents can use the schema to double-check each other's work and make certain that all applications participating in the exchange are playing by the same set of rules. When errors appear, the receiving application can report them back to the sender, and hopefully have them corrected. Schemas provide an extra level of safety net above the core XML document structure, making it much simpler to exchange information reliably.
Another key task schemas help manage is the integration of documents and data. XML provides a framework in which both document and data structures can co-exist. Data may dominate a document (as when a document represents a relational database table) or appear as fragments scattered among document structures. When data and documents from multiple sources must merge into a single document, schemas can smooth the process by making sure the inputs are what they claim to be and that the output is delivered as it should be. Schemas also help programs separate the parts of a document they need from the parts they can ignore, helping search engines, data mining tools, and agents. XML documents can use schemas to identify themselves, advertising that they contain information of a particular type, and programs can use that identity and the structural information it contains to inspect documents and extract (or create) the right information. Whether an organization is using XML to facilitate electronic commerce, manage its documents, present a Web site for public consumption, or exchange information among internal processes, schemas provide a framework that ensures the safe transmission of information using common vocabularies and protocols.
In many cases, describing the document structure - the vocabulary that identifies different parts of a document, and how those parts all fit together - is all that's needed. XML provides a fairly complete set of tools for describing the parts of a document (elements), annotating those parts (attributes), and constraining the parts that can appear within the elements and attributes (content models and attribute types). XML inherits SGML's strong document management heritage, providing a solid set of tools for describing documents from memos to annual reports to survey responses to poetry to recipes to Web pages to scripts and books and magazines. The enormous amount of information that lives outside databases, that has so far resisted systematic storage, has finally found a home. XML promises to store this kind of information, providing structures that are firm enough to manage and exchange the information but flexible enough to allow authors to create documents that reflect their needs.
Schemas play a key role in this process, describing which features are allowed to appear in particular document types and which are not. This framework can then provide a solid foundation for presentation, storage, and interchange, assisting with document creation and editing along the way. Schemas give document authors and programmers a common set of expectations, making it possible to create smarter applications that support document authors more completely. Once a document has been created, the schema can provide a roadmap to the document that other applications - like search engines, document management systems, or presentation tools - can use to help them find and manage the information inside the document.
While the structures described above are very useful, many applications need to know much more about the information in a document than what goes where and what it's called. Data-oriented applications also want to know whether the content inside of an element is an integer, a string, a database key value, a currency value, a date, a boolean (true or false) value, or any of a number of other possibilities. Data-oriented schemas provide an extra layer of information that allows an application to pass more of the job of identifying information types and verifying that the information presented there conforms to the rules of that information type to a validation component. By adding data typing to the document structures described above, schemas become useful for a much broader set of applications. Document structures remain important, but now it becomes possible to define documents like invoices, with their dates, quantities, and currencies, both abstractly and completely. Purely data-focused applications, like database interchange, gain an extra level of processing security that is far more meaningful in this context.
XML is an opportunity to start treating documents and data as partners rather than as separate worlds. XML schemas are a key tool for realizing this promise, creating structures that clearly identify document structure and data types within that structure. XML schemas can provide a map to your documents, letting authors mix documents and data freely without ever locking the information into a document format that can't be later used for data extraction and processing. Data can be data, documents can be documents, and the two can be integrated and intertwined without losing information. While XML has great promise in both document and data processing, its unique ability to bring the two together makes it far more powerful.
Providing constraints and supporting structured editing is useful, but the implications of schemas go far beyond their use with specific document sets. Schemas provide a common formal vocabulary for describing the terms on which information will be exchanged, an easily enforced contract between senders and receivers (and creators and consumers) of information. While not every application is going to check to make sure that the contract is in force - doing so requires potentially precious processing time - having a common set of rules makes it much easier to create communities that share common sets of information. By providing a formal set of structures (often customized to meet particular needs), schemas allow both large and small communities to share information reliably. Additionally, schemas provide a necessary set of tools for maintaining order in a world where many different applications from different organizations may be creating, managing, and processing the same files.
Instead of relying on a single vendor's interpretation of a document structure, often a proprietary creation and described only by an implementation, organizations working with XML can use an abstract description of that document structure to build their own custom applications around the structure. Changes to the schema, to be effective, must be public, creating new opportunities for discussion of improvements and leveling the playing field for all participants. Building infrastructure based on a formal and readily-available schema is much less risky than building on (and sometimes reverse-engineering) the proprietary structures that presently dominate much of computing. Because schemas provide a clear formal means of describing constraints and structures, many open standards may also migrate to XML, taking advantage of its easy processing and formal validation process. Vocabulary creation and management can be greatly simplified when formal tools can take over much of the work involved.
Copyright 2000 Extensibility, Inc.
Suite 250, 200 Franklin Street, Chapel Hill, North Carolina 27516