extensibility

Content-oriented, Structure-oriented, and Presentation-Oriented Schemas


XML provides you with immense freedom to name and structure your elements any way you like. Although there are some fiercely divided and almost religious schools of thought on how to create a 'proper' schema, in practice there are a number of commonly traveled paths. On one end of the spectrum are schemas that are entirely based on their content, and on the other are schemas that are entirely based on their presentation. In the middle are schemas based on the structures that unite content and presentation, often taking material from both ends of the spectrum. While your particular approach may fit one or all of these paths, knowing what they are may help you choose an appropriate style for your schema.

Content-Oriented Schemas

Content-oriented schemas come in a variety of types, but they tend to receive more use in data-centric applications. Content-oriented schemas typically reflect a model of how the information works in the 'real world', using hierarchies to reflect containment relationships. A schema describing parts for a toy might use an element describing the entire toy as its root element, and then have various subassemblies as child elements. Those subassemblies might then contain smaller assemblies and parts, all the way down to the smallest wires and springs. Another schema describing financial transactions might include all parties to the transaction and the terms of that transaction as child elements grouped together under a parent transaction element.

Content-oriented schemas can be very useful for working with 'pure' data in situations where the structure of the information itself is more important than any particular presentation of that information. Content-oriented schemas often make useful modules for inclusion in structure-oriented schemas or transformation to purely presentation-oriented information, but raw presentation of the information contained in them is often difficult or meaningless to human readers. Content-oriented schemas do, however, serve very well for transporting data between applications, representing information with complex internal structures, and storing as much information as possible without accepting information loss generated by concessions to readability.

Note: Content-oriented markup is often called 'semantic' markup, though this term is sometime used with structure-oriented markup as well. 'Semantic' markup means that the element names provide clues to the meaning of the content, not just their presentation.

Presentation-Oriented Schemas

A small minority of XML schemas are truly presentation-oriented, focusing squarely on formatting to the total exclusion of content. One of the most important is the XSL formatting objects vocabulary (XFOs), which describe content purely in terms of presentation needs. Structured Vector Graphics (SVG), another standard under development at the W3C, focuses on an XML vocabulary for graphics, though its reusability features tip it somewhat toward the structure-oriented approach. Other possible presentation-oriented schemas might be used to convert information from content-oriented documents to particular formats needed to control devices, read information aloud, or present it in particular contexts.

Structure-Oriented Schemas

Structure-oriented schemas typically reflect some level of presentation information, though not necessarily the details of that presentation. Structure-oriented schemas often begin with a document that holds a particular kind of information, and then models that document (an invoice, for instance) rather than modeling the information below the document (the transaction described by that invoice.) Structure-oriented schemas often blur content-oriented information with presentation-oriented information, typically storing and presenting content in a way that is convenient to presentation.

HTML (HyperText Markup Language) is the best-known of the structure-oriented schemas in use today. Although HTML has occasionally dipped into purely presentation markup (elements like FONT and SIZE), the W3C seems to be steering HTML back toward structural markup, identifying paragraphs with P, headlines with H1, H2, and so forth, and trying to convince users to apply style sheet positioning rather than tables to create their page layouts.

Copyright 2000 Extensibility, Inc.

Suite 250, 200 Franklin Street, Chapel Hill, North Carolina 27516