home *** CD-ROM | disk | FTP | other *** search
Text File | 1997-10-05 | 53.0 KB | 1,243 lines |
- Frequently Asked Questions about the Extensible Markup Language
-
- The XML FAQ
- [logo]
-
- Version 1.1 (1 October 1997)
-
- Maintained on behalf of the World Wide Web Consortium's XML
- Special Interest Group by Peter Flynn, (Silmaril Consultants),
- with the collaboration of Terry Allen, (), Tom Borgman,
- (Harlequin Ltd), Tim Bray, (Textuality, Inc), Robin Cover,
- (Summer Institute of Linguistics), Christopher Maden, (O'Reilly &
- Associates), Eve Maler, (Arbortext, Inc), Peter Murray-Rust,
- (Nottingham University), Liam Quin, (), Michael Sperberg-McQueen,
- (University of Illinois at Chicago), Joel Weber, (MIT), Murata,
- Makoto (Fuji Xerox Japan), and many other members of the XML
- Special Interest Group of the W3C as well as FAQ readers around
- the world. Please use the form at the end for any corrections or
- additions.
-
- Recent changes
-
- 1 October 1997
-
- * No more minimization parameters in element declarations
- * Parsers must now pass all white-space to the application
- * Everything is now case-sensitive, including all markup
- * A new proposal for stylesheets: XSL, which combines DSSSL
- and CSS in an XML format
- * Java[Script] and and metadata and their use in XML
- * Updated list of software
- * First XML book is published
- * New public mailing list XML-L
-
- Paragraphs which have been added since the last version are shown
- prefixed with a pilcrow (╢). Paragraphs which have been changed
- since the last version are shown prefixed with a section sign
- (º). Paragraphs marked for future deletion but retained at the
- moment for information are prefixed with a plus/minus sign (▒).
-
- Summary
-
- º This document contains the most frequently-asked questions
- (with answers) about XML, the Extensible Markup Language. It is
- intended as a first resource for users, developers, and the
- interested reader, and should not be regarded as a part of the
- XML Draft Specification.
-
- Organization
-
- The FAQ is divided into four parts: a) General, b) User, c)
- Author, and d) Developer. The questions are numbered
- independently within each section. As the numbering may therefore
- change with each version, comments and suggestions should refer
- to the version number (see Revision History above) as well as the
- Part and Question Number.
-
- There is a form at the end of this document which you can use to
- submit bug reports, suggestions for improvement, and other
- comments relating to this FAQ only. Comments about the XML Draft
- Specification itself should be sent to the W3C.
-
- Availability
-
- The SGML file for use with any conforming SGML system is
- available at http://www.ucc.ie/xml/faq.sgml (this can also be
- used online with SGML browsers like Panorama or Multidoc Pro; you
- can also download the DTD and stylesheet installation
- self-extractor for faster local access with these browsers, or
- the DTD set as ASCII files).
-
- The same text is available in an HTML version for use with an
- HTML browser (eg Netscape Navigator, Microsoft Internet Explorer,
- Spry Mosaic, NCSA Mosaic, Lynx, Opera, GNUscape Navigator etc) at
- http://www.ucc.ie/xml/.
-
- ╢ An XML version will be produced once the specification has been
- agreed and when DTDs and browsers are available to handle it.
-
- A plaintext (ASCII) version is available from the Web and
- (eventually) by anonymous FTP to one of several FAQ repositories.
- The versions above are also available by electronic mail to the
- WebMail server (for users with email-only access).
-
- º For printed copies there are PostScriptTM versions for A4 and
- Letter sizes of paper.
-
- The document is also available in oil-based toner on flattened
- dead trees by sending $10 (or equivalent) to the editor (email
- first to check currency and postal address).
-
- º Thanks to Murata Makoto for making this document available in
- Japanese: see
- http://www.iijnet.or.jp/FXIS/XSoft/sgml/xml/XMLFAQ.htm
-
- You can download the XML logo and an icon for your files in ICO
- (Microsoft Windows) or XBM (X Window system) format (volunteer
- wanted to do a Mac icon).
-
- ---------------------------------------------------------------------------
-
- The Questions
-
- A. General questions
-
- A.1 What is XML?
-
- A.2 What is XML for?
-
- A.3 What is SGML?
-
- A.4 What is HTML?
-
- A.5 Aren't XML, SGML, and HTML all the same thing?
-
- A.6 Who is responsible for XML?
-
- A.7 Why is XML such an important development?
-
- A.8 How does XML make SGML simpler and still let you define
- your own document types?
-
- A.9 Why not just carry on extending HTML?
-
- A.10 Why do we need all this SGML stuff? Why not just use Word
- or Notes?
-
- A.11 Where do I find more information about XML?
-
- A.12 Where can I discuss implementation and development of XML?
-
- B. Users of SGML (including browsers of HTML)
-
- B.1 Do I have to do anything to use XML?
-
- B.2 Why should I use XML instead of HTML?
-
- B.3 Where can I get an XML browser?
-
- B.4 Do I have to switch from SGML or HTML to XML?
-
- C. Authors of SGML (including writers of HTML)
-
- C.1 Does XML replace HTML?
-
- C.2 What does an XML document look like inside?
-
- C.3 How does XML handle white-space in my documents?
-
- C.4 Which parts of an XML document are case-sensitive?
-
- C.5 How can I make my existing HTML files work in XML?
-
- C.6 If XML is just a subset of SGML, can I use XML files
- directly with SGML tools?
-
- C.7 I'm used to authoring and serving HTML. Can I learn XML
- easily?
-
- C.8 Will XML be able to use non-Latin characters?
-
- C.9 What's a Document Type Definition (DTD) and where do I get
- one?
-
- C.10 How will XML affect my document links?
-
- C.11 Can I do mathematics using XML?
-
- C.12 How does XML handle metadata?
-
- C.13 Can I use Java, ActiveX, etc in XML?
-
- C.14 How do I control appearance?
-
- D. Developers and Implementors (including WebMasters and server operators)
-
- D.1 Where's the spec?
-
- D.2 What are these terms `DTDless', `valid', and `well-formed'?
-
- D.2.1 `Well-formed' documents
-
- D.2.2 Valid XML
-
- D.3 What else has changed between SGML and XML?
-
- D.4 What XML software can I use today?
-
- D.5 Do I have to change any of my server software to work with
- XML?
-
- D.6 Can I still use server-side INCLUDEs?
-
- D.7 Can I (and my authors) still use client-side INCLUDEs?
-
- D.8 I'm trying to understand the XML Spec: why does SGML (and
- XML) have such difficult terminology?
-
- D.9 Is there a Developer's API kit for XML?
-
- ---------------------------------------------------------------------------
-
- The Answers
-
- ---------------------------------------------------------------------------
-
- A. General questions
-
- A.1 What is XML?
-
- XML is the `Extensible Markup Language' (extensible because it is not a
- fixed format like HTML). It is designed to enable the use of SGML on the
- World-Wide Web.
-
- º It's actually slightly misnamed: XML itself is not a single markup
- language: it's a metalanguage to let you design your own markup language. A
- regular markup language defines a way to describe information in a certain
- class of documents (eg HTML). XML lets you define your own customized
- markup languages for many classes of document. It can do this because it's
- done in SGML, the international standard metalanguage for markup languages.
-
- A.2 What is XML for?
-
- º XML is designed `to make it easy and straightforward to use SGML on the
- Web: easy to define document types, easy to author and manage SGML-defined
- documents, and easy to transmit and share them across the Web.'
-
- It defines `an extremely simple dialect of SGML which is completely
- described in the Draft XML Specification. The goal is to enable generic
- SGML to be served, received, and processed on the Web in the way that is
- now possible with HTML.'
-
- `For this reason, XML has been designed for ease of implementation, and for
- interoperability with both SGML and HTML' [quotes from the XML spec].
-
- A.3 What is SGML?
-
- º SGML is the Standard Generalized Markup Language (ISO 8879), the
- international standard for defining descriptions of the structure and
- content of different types of electronic document. There is an SGML FAQ at
- http://www.infosys.utas.edu.au/info/sgmlfaq.txt.
-
- A.4 What is HTML?
-
- º HTML is the HyperText Markup Language (RFC 1866), a specific application
- of SGML used in the World-Wide Web.
-
- A.5 Aren't XML, SGML, and HTML all the same thing?
-
- º Not quite. SGML is the `mother tongue', used for describing thousands of
- different document types in many fields of human activity, from
- transcriptions of ancient Sumerian scrolls to the technical documentation
- for stealth bombers, and from patients' clinical records to musical
- notation.
-
- º HTML is just one of these document types, the one most frequently used in
- the Web. It defines a single, fixed type of document with markup that lets
- you describe a common class of simple office-style report, with headings,
- paragraphs, lists, illustrations, etc, and some provision for hypertext and
- multimedia.
-
- XML is an abbreviated version of SGML, to make it easier for you to define
- your own document types, and to make it easier for programmers to write
- programs to handle them. It omits the more complex and less-used parts of
- SGML in return for the benefits of being easier to write applications,
- easier to understand, and more suited to delivery and interoperability over
- the Web. But it is still SGML, and XML files may still be parsed and
- validated the same as any other SGML file (see the question on XML
- software).
-
- Programmers may find it useful to think of XML as being SGML-- rather than
- HTML++.
-
- A.6 Who is responsible for XML?
-
- º XML is a project of the World-Wide Web Consortium (W3C), and the
- development of the specification is being supervised by their XML Working
- Group. A Special Interest Group of co-opted contributors and experts from
- various fields contributes comments and reviews by email.
-
- XML is a public format: it is not a proprietary development of any company.
-
- A.7 Why is XML such an important development?
-
- º It removes two constraints which are holding back Web developments:
-
- 1. dependence on a single, inflexible document type (HTML);
-
- 2. the complexity of full SGML, whose syntax allows many powerful but
- hard-to-program options.
-
- XML simplifies the levels of optionality in SGML, and allows the
- development of user-defined document types on the Web.
-
- A.8 How does XML make SGML simpler and still let you define your own
- document types?
-
- º To make SGML simpler, XML redefines some of SGML's internal values and
- parameters, and removes a large number of the more complex and sometimes
- less-used features which made it harder to write processing programs (see
- Appendix A of the XML specification).
-
- º But it retains all of SGML's structural abilities which let you define
- your own document type. It also introduces a new class of document which
- does not require you to use a predefined document type. See the questions
- about `valid' and `well-formed' documents, and how to define your own
- document types in the Developers' Section.
-
- A.9 Why not just carry on extending HTML?
-
- HTML is already overburdened with dozens of interesting but often
- incompatible inventions from different manufacturers, because it provides
- only one way of describing your information.
-
- XML will allow groups of people or organizations to create their own
- customized markup languages for exchanging information in their domain
- (music, chemistry, electronics, hill-walking, finance, surfing,
- linguistics, knitting, history, engineering, rabbit-keeping etc).
-
- HTML is at the limit of its usefulness as a way of describing information,
- and while it will continue to play an important role for the content it
- currently represents, many new applications require a more robust and
- flexible infrastructure.
-
- A.10 Why do we need all this SGML stuff? Why not just use Word or Notes?
-
- Information on a network which connects many different types of computer
- has to be usable on all of them. Public information cannot afford to be
- restricted to one make or model or manufacturer, or to cede control of its
- data format to private hands. It is also helpful for such information to be
- in a form that can be reused in many different ways, as this can minimize
- wasted time and effort.
-
- º SGML is the international standard which is used for defining this kind
- of application, but those who need an alternative based on different
- software are entirely free to implement similar services using such a
- system, especially if they are for private use.
-
- A.11 Where do I find more information about XML?
-
- Online, there's the XML Draft Specification and ancillary documentation
- available from the W3C; an XML section with an extensive list of online
- reference material in Robin Cover's SGML pages; and a summary and condensed
- FAQ from Tim Bray.
-
- ╢ The items listed below are the ones the maintainer has been able to
- discover: please mail me if you come across others. Old items are retained
- here for reference at the moment- they will eventually expire.
-
- * ╢ Eve Maler is giving a one-day tutorial (working title XML for the
- SGML-Knowledgeable) at the GCA offices on 14 November 1997. This will
- also be given at SGML/XML '97 (see below).
-
- * º Technology Appraisals Ltd are holding a seminar in London, England,
- on XML ready for prime time? on 24-25 November 1997. Details from
- Susan Dennington at TAL.
-
- * Peter Murray-Rust is preparing an XML/Java Virtual Course entitled
- Scientific Information Components using Java and XML Details are at
- http://www.vsms.nottingham.ac.uk/vsms/java/advert/advert.txt. The XML
- will be very low-level (ie well-formed only, balanced tags, and quoted
- attributes; no DTDs, entities, marked sections, catalogs, links, etc.)
- It concentrates on building element trees (including those from legacy
- files).
-
- * º The annual SGML Conference run by the Graphic Communications
- Association has been renamed the SGML/XML Conference. SGML/XML '97
- will be held in Washington DC, 8-11 December 1997 (further details on
- the GCA's Web site).
-
- º There is a list of articles on XML which have appeared in the computing
- press: details are being kept in Robin Cover's SGML pages.
-
- º The first XML books are starting to appear:
-
- Light, Richard: Presenting XML. Sams.Net, 1-27221-334-6, 414pp,
- http://www.mcp.com/info/1-57521/1-57521-334-6/
- With contributions from Simon North and Charles Allen, and a
- foreword from Tim Bray
-
- A.12 Where can I discuss implementation and development of XML?
-
- There is a mailing list called xml-dev for those committed to developing
- components for XML. You can subscribe by sending a 1-line mail message to
- majordomo@ic.ac.uk saying:
-
- subscribe xml-dev yourname@yoursite
-
- The list is hypermailed for online reference at
- http://www.lists.ic.ac.uk/hypermail/xml-dev/.
-
- Note that this list is for those people actively involved in developing
- resources for XML. It is not for general information about XML (see this
- FAQ and other sources) or for general discussion about SGML implementation
- and resources (see comp.text.sgml).
-
- There is a general-purpose mailing list XML-L for public discussions: to
- subscribe, send a 1-line mail message to LISTSERV@listserv.hea.ie saying
-
- subscribe XML-L forename surname
-
- (substituting your own forename and surname). To unsubscribe, send a 1-line
- message to the same address saying
-
- unsubscribe XML-L
-
- Please Read The Fine Documentation which you will be sent when you join
- either mailing list, as it contains important information, particularly
- about what to do when your email address changes.
-
- ---------------------------------------------------------------------------
-
- B. Users of SGML (including browsers of HTML)
-
- B.1 Do I have to do anything to use XML?
-
- º Not yet. XML is still being developed, but there are already some pilot
- browsers, so you can experiment with them. When the specification is
- complete, more software should start to appear, and you may be able to
- download browsers and use them to browse the Web much as you do with
- current applications.
-
- º You can use the pilot browsers to look at some of the emerging XML
- material, such as Jon Bosak's Shakespeare plays and the molecular
- experiments of the Chemical Markup Language (CML). There are some more
- example sources listed at http://www.sil.org/sgml/xml.html#examples.
-
- If you want to start preparations for writing your own XML, see the
- questions in the Authors' Section.
-
- B.2 Why should I use XML instead of HTML?
-
- * Authors and providers can design their own document types using XML,
- instead of being stuck with HTML. Document types can be explicitly
- tailored to an audience, so the cumbersome fudging that has to take
- place with HTML to achieve special effects should become a thing of
- the past: authors and designers will be free to invent their own
- markup elements;
-
- * Information content can be richer and easier to use, because the
- hypertext linking abilities of XML are much greater than those of
- HTML.
-
- * XML can provide more and better facilities for browser presentation
- and performance;
-
- * It removes many of the underlying complexities of SGML in favor of a
- more flexible model, so writing programs to handle XML will be much
- easier than doing the same for full SGML.
-
- * Information will be more accessible and reusable, because the more
- flexible markup of XML can be used by any XML software instead of
- being restricted to specific manufacturers as has become the case with
- HTML.
-
- * Valid XML files are kosher SGML, so they can be used outside the Web
- as well, in an SGML environment (once the spec is stable and SGML
- software adopts it).
-
- B.3 Where can I get an XML browser?
-
- There are already some browsers emerging (see below), but the XML
- specification is still under development. As with HTML, there won't be just
- one browser, but many. However, because the potential number of different
- XML applications is not limited, no single browser should be expected to
- handle 100% of everything.
-
- The generic parts of XML (eg parsing, tree management, searching,
- formatting, etc) are being combined into general-purpose browser libraries
- or toolkits to make it easier for developers to take a consistent line when
- writing XML applications. Such applications could then be customized by
- adding semantics for specific markets, or using languages like Java to
- develop plugins for generic browsers and have the specialist modules
- delivered transparently over the Web.
-
- Netscape and Microsoft are both now developing XML facilities: some
- development work at Microsoft can be seen at
- http://www.microsoft.com/msdn/sdk/inetsdk/help/inet5017.htm.
-
- * JUMBO is a prototype GUI browser/editor/search/rendering tool for the
- output of XML parsers, developed as part of the project to produce
- CML. It displays the abstract document tree which can be queried and
- edited in limited fashion. Java classes can be dynamically loaded for
- the current DTD and allow complex transformation and rendering. The
- emphasis is on the import of legacy files into structured documents,
- and the management of non-textual data, including common data
- structures (trees, tables, lists, etc). Currently JUMBO parses a
- subset of XML files (ie only elements and their attributes) and will
- be grafted onto other parsers as soon as possible. The software and a
- wide range of XML demo files, including Jon Bosak's PLAY, can be
- downloaded for any Java-enabled browser from
- http://www.venus.co.uk/omf/cml/
-
- * The DynaWeb server from Inso Corporation (formerly EBT) can serve
- other forms of SGML translated on-the-fly to XML (demonstrated at the
- GCA's XML Conference in San Diego, March 1997). Sun Microsystems are
- currently serving XML using this software on an experimental basis
- (message to the xml-dev mailing list dated Mon, 17 Mar 1997 16:49:42
- -0800 from Jon Bosak).
-
- * Microsoft are defining their proposed Channel Definition Format (CDF)
- as an application of XML, but no DTD is available yet. Their Open
- Software Description (OSD; endorsed by CyberMedia, InstallShield,
- LANovation, Lotus, and Netscape) is also proposed as an XML
- application. For further information on these formats, contact
- Microsoft
-
- See also the notes on software for authors and developers, and the more
- detailed list at http://www.sil.org/sgml/xml.html.
-
- B.4 Do I have to switch from SGML or HTML to XML?
-
- No, existing SGML and HTML applications software will continue to work with
- existing files. But as with any enhanced facility, if you want to view or
- download and use XML files, you will need to add XML-aware software when it
- becomes available.
-
- ---------------------------------------------------------------------------
-
- C. Authors of SGML (including writers of HTML)
-
- Authors should also read the Developers' Section, which contains
- further information about the internals of XML files.
-
- C.1 Does XML replace HTML?
-
- º No, XML itself does not replace HTML: instead, it provides an alternative
- by allowing you to define your own set of markup elements. HTML is expected
- to remain in common use for some time to come, and DTDs will be available
- in XML versions as well as the original SGML versions. XML is designed to
- make the writing of DTDs much simpler than with full SGML.
-
- º Work is going on to produce XML versions of HTML and other popular DTDs,
- but this may not take off until the specification for XML 1.0 is complete
- (targeted November 1997). Watch comp.text.sgml and XML-L for announcements.
-
- C.2 What does an XML document look like inside?
-
- The basic structure is very similar to most other applications of SGML,
- including HTML. XML documents can be very simple, with no document type
- declaration, and straightforward nested markup of your own design:
-
- <?XML version="1.0" RMD="NONE"?>
- <conversation>
- <greeting>Hello, world!</greeting>
- <response>Stop the planet, I want to get off!</response>
- </conversation>
-
- Or they can be more complicated, with a DTD specified, and maybe an
- internal subset, and a more complex structure:
-
- <?XML version="1.0" RMD="ALL" encoding="UTF-8"?>
- <!DOCTYPE titlepage SYSTEM "typo.dtd"
- [<!ENTITY % active.links "INCLUDE">]>
- <titlepage>
- <white-space type="vertical" amount="36"/>
- <title font="Baskerville" size="24/30"
- alignment="centered">Hello, world!</title>
- <white-space type="vertical" amount="12"/>
- <!-- In some copies the following decoration is
- hand-colored, presumably by the author -->
- <image location="http://www.foo.bar/fleuron.eps" type="URL" alignment="centered"/>
- <white-space type="vertical" amount="24"/>
- <author font="Baskerville" size="18/22" style="italic">Munde Salutem</author>
- </titlepage>
-
- Or they can be anywhere between: a lot will depend on how you want to
- define your document type (or whose you use) and what it will be used for.
- See the question on valid and well-formed files.
-
- C.3 How does XML handle white-space in my documents?
-
- Note
-
- This section contains a major change from the previous version.
-
- º The SGML rules regarding white-space have been changed for XML, so all
- white-space, including linebreaks, TAB characters, and regular spaces, is
- passed by the parser unchanged to the application (browser, formatter,
- viewer, etc). This means:
-
- * `insignificant' white-space between structural elements (those which
- can contain only other elements, not text data,) will now get passed
- to the application (under `full' SGML this white-space is suppressed);
-
- * `significant' white-space within elements which can contain text and
- markup mixed together (ie paragraphs) will also get passed to the
- application as before.
-
- <chapter>
- <section>
- <title>
- My title for Section
- 1.
- </title>
- <para>
- ...
- </para>
- </section>
- </chapter>
-
- º The parser must, however, still inform the application what white-space
- occurred in element content, if known. (Users of `full' SGML may recognize
- that this information was not in the ESIS, but it is in the grove.) In the
- above example, the application will receive all the pretty-printing
- linebreaks, TABs, and spaces between the elements as well as those embedded
- in the section title. It is the function of the application (browser,
- formatter, viewer, etc) to decide which type of white-space to discard and
- which to retain.
-
- C.4 Which parts of an XML document are case-sensitive?
-
- Note
-
- This section contains a major change from the previous version.
-
- º All of an XML file is case-sensitive, both markup and text. This is
- significantly different from HTML and many other SGML document types.
-
- * Element names (used in start-tags and end-tags) are case-sensitive:
- you must stick with whatever combination of upper- or lower-case was
- used to define them in the DTD (for valid files);
-
- * For well-formed files with no DTD, the first occurrence of an element
- name defines the casing. So you can't say <BODY> . . . </body>: upper-
- and lower-case must match; thus <IMG/> and <img/> are two different
- elements;
-
- * Attribute names are also case-sensitive, on a per-element basis: for
- example <PIC width="7in"/> and <PIC WIDTH="6in"/> in the same file
- exhibit two separate attributes, because the different casings of
- width and WIDTH distinguish them;
-
- * Attribute values are also case-sensitive. Character data values (eg
- HRef="MyFile.SGML") are exactly as before, but ID and IDREF attributes
- are case-sensitive and no longer get folded to uppercase for
- comparisons;
-
- * All entity names (Á), and your data content (your text), are
- case-sensitive, exactly as before.
-
- C.5 How can I make my existing HTML files work in XML?
-
- º All XML documents must be well-formed (see below), but a DTD is optional.
- HTML files currently have to be DTDless in XML, because there is no XML
- version of the HTML DTD yet. Many HTML authoring tools already produce
- almost (but not quite) well-formed XML.
-
- º If you have created your HTML files conforming to one of the several HTML
- Document Type Definitions (DTDs), and they validate OK, then they can be
- converted as follows:
-
- * replace the DOCTYPE declaration and any internal subset (basically
- everything within the first set of angled brackets <!DOCTYPE HTML...>)
- with the XML Declaration <?XML version="1.0" RMD="NONE"?>
-
- * change any EMPTY elements (eg <ISINDEX>, <BASE>, <META>, <LINK>,
- <NEXTID> and <RANGE> in the header, and <IMG>, <BR>, <HR>, <FRAME>,
- <WBR>, <BASEFONT>, <SPACER>, <AUDIOSCOPE>, <AREA>, <PARAM>, <KEYGEN>,
- <COL>, <LIMITTEXT>, <SPOT>, <TAB>, <OVER>, <RIGHT>, <LEFT>, <CHOOSE>,
- <ATOP>, and <OF> in the body) so that they end with `/>', for example
- <IMG SRC="mypic.gif" alt="Picture"/>
-
- * ensure there are correctly-matched explicit end-tags for all non-empty
- elements; eg every <P> must have a </P>, etc: this can be automated by
- a normalizer progam like sgmlnorm (part of SP) or a function in an
- editor like Emacs/psgml's sgml-normalize;
-
- * escape all markup characters (< and &) as < and &
-
- * ensure all attribute values are in quotes;
-
- * ensure all occurrences of all element names in start-tags and end-tags
- match with respect to upper- and lower-case and that they are
- consistent throughout the file;
-
- * ensure all attribute names are similarly in a consistent case
- throughout the file.
-
- ╢ Be aware that many HTML browsers may not accept XML-style EMPTY elements
- with the trailing slash, so the above changes are not backwards-compatible.
- An alternative might be to add a dummy end-tag to all EMPTY elements, so
- <IMG> becomes <IMG></IMG>.
-
- º If you have a lot of valid HTML files, you could write a script in an
- SGML conversion system to do this (such as Omnimark, Balise, SGMLC, or a
- system using one of the SGML Perl libraries), or you could even use edit
- macros if you know what you're doing.
-
- º If your HTML files are invalid then they will almost certainly have to be
- converted manually, although if the deformities are regular and carefully
- constructed, the files may actually be almost well-formed, and you could
- write a program or script to do as described above. To test for invalidity
- and non-conformance, check the following:
-
- * º do the files contain markup syntax errors? For example, are there
- any backslashes instead of forward slashes on end-tags; or elements
- which nest incorrectly (<SAMP>an element which starts <EM>inside one
- element</SAMP> but ends outside it</EM>);
-
- * º do the files contain markup which conflicts with the HTML DTDs, such
- as headings inside list items, or list items outside list
- environments;
-
- * º do the files use elements which are not in any DTD? Although this is
- easy to transform to a DTDless well-formed file (because you don't
- have to define elements in advance) most proprietary
- [browser-specific] extensions have never been formally defined, so it
- is often difficult to work out where they can meaningfully be used.
-
- ╢ Markup which is valid but which is meaningless or void may need to be
- edited out before conversion (such as repeated empty paragraphs or
- linebreaks, empty tables, invisible `spacing' GIFs etc: XML uses
- stylesheets, so you won't need any of these contrivances)
-
- ╢ See the rules for `well-formed' XML files for details of what you need to
- check in XML when converting.
-
- ╢ Note there are XML versions of the HTML DTD in preparation:
-
- * Ben Trafford has developed an XML version of HTML 4.2
-
- * [details of others sought: please contact the editor]
-
- C.6 If XML is just a subset of SGML, can I use XML files directly with
- SGML tools?
-
- º Yes, provided: a) the document has a valid Document Type Definition
- (DTD), ie the files are valid, not just well-formed; and b) you use
- software which knows about the features needed to support XML, such as the
- special form for EMPTY elements; some aspects of the SGML Declaration such
- as NAMECASE GENERAL NO; multiple attribute declarations.
-
- º At the moment there are few tools which handle XML files unchanged
- because of the format of these EMPTY elements, but this is changing. The
- nsgmls parser has an experimental XML conformance switch, and the first
- XML-specific editors and parsers are appearing (see the question on
- software).
-
- ╢ The rules of ISO 8879 are up for minor amendments, some of which are to
- facilitate changes needed for Web-enablement.
-
- C.7 I'm used to authoring and serving HTML. Can I learn XML easily?
-
- Yes, very easily, but at the moment there is still a need for tutorials,
- simple tools, and more examples of XML documents. Well-formed XML documents
- may look similar to HTML except for some small but very important points of
- syntax.
-
- As every user community can have their own document type defined, it should
- be much easier to learn, because element names can be picked for relevance.
-
- C.8 Will XML be able to use non-Latin characters?
-
- º Yes, the XML Draft Specification explicitly says XML uses ISO 10646, the
- international standard 31-bit character repertoire which covers most human
- (and some non-human) written languages. This is currently congruent with
- Unicode.
-
- º ` . . . all XML processors must accept the UTF-8 and UCS-2 encodings of
- ISO 10646 . . . '. UCS-2 is the 16-bit version of Unicode. UTF-8 is an
- encoding of Unicode into 8-bit characters: the first 128 are the same as
- ASCII, the rest are used to encode the rest of Unicode into sequences of
- between 2 and 6 bytes. UTF-8 in its single-octet form is therefore the same
- as ISO 646 IRV (ASCII), so you can continue to use ASCII for English or
- other unaccented languages using the Latin alphabet. Note that UTF-8 is
- incompatible with ISO 8859-1 (ISO Latin-1) after code point 126 decimal
- (the end of ASCII).
-
- º ` . . . the mechanisms for signalling which of the two are in use, and
- for bringing other encodings into play, are [ . . . ] in the discussion of
- character encodings.' The XML Draft Specification explains how to specify
- in your XML file which coded character set you are using.
-
- º Use of UCS-4 can only legally be specified in SGML or XML when the
- pending `WebSGML' Technical Corrigendum to ISO 8879 comes into force to
- enable numbers longer than eight digits to be used in the SGML Declaration.
-
- º `Regardless of the specific encoding used, any character in the ISO 10646
- character set may be referred to by the decimal or hexadecimal equivalent
- of its bit string': so no matter which character set you personally use,
- you can still refer to specific individual characters from elsewhere in the
- encoded repertoire by using dddd; (decimal character code) or hhhh;
- (hexadecimal character code).
-
- ╢ The terminology can get confusing, as can the numbers: see the ISO 10646
- Concept Dictionary.
-
- C.9 What's a Document Type Definition (DTD) and where do I get one?
-
- º A DTD is usually a file (or several files to be used together) which
- contains a formal definition of a particular type of document. This sets
- out what names can be used for elements, where they may occur, and how they
- all fit together. For example, if you want a document type to describe
- <LIST>s which contain <ITEM>s, part of your DTD would contain something
- like
-
- <!ELEMENT item (#pcdata)>
- <!ELEMENT list (item)+>
-
- This defines items containing text, and lists containing items. It's a
- formal language which lets processors automatically parse a document and
- identify where every element comes and how they relate to each other, so
- that stylesheets, navigators, browsers, search engines, databases, printing
- routines, and other applications can be used.
-
- [Note that in XML, there are no minimization parameters (`-' and `O'
- characters in element definitions between element name and content model),
- because all elements except empty ones must have both start-tag and end-tag
- present at all times.]
-
- There are thousands of SGML DTDs already in existence in all kinds of areas
- (see the SGML Web pages for examples). Many of them can be downloaded and
- used freely; or you can write your own. As with any language, you need to
- learn it to do this: but XML is much simpler than full SGML: see the list
- of restrictions which shows what has been cut out. Existing SGML DTDs need
- to be converted to XML for use with XML systems: expect to see
- announcements soon of popular DTDs becoming available in XML format.
-
- C.10 How will XML affect my document links?
-
- º The linking abilities of XML systems are much more powerful than those of
- HTML, so you'll be able to do much more with them. Existing HREF-style
- links will remain usable, but new linking technology is based on the
- lessons learned in the development of other standards involving hypertext,
- such as TEI and HyTime, which let you manage bidirectional and multi-way
- links, as well as links to a span of text (within your own or other
- documents) rather than to a single point. This is already implemented for
- SGML in browsers like Panorama and Multidoc Pro.
-
- º The XML Linking Specification (XLL) document contains a detailed
- specification. An XML link can be either a URL or a TEI-style Extended
- Pointer (`Xptr'), or both. A URL on its own is assumed to be a resource (as
- with HTML); if an Xptr follows it, it is assumed to be a sub-resource of
- that URL; an Xptr on its own is assumed to apply to the current document.
-
- º An Xptr is always preceded by one of #, ?, or |. The # and ? mean the
- same as in HTML applications; the | means the sub-resource can be found by
- applying the Xptr to the resource, but the method of doing this is left to
- the application.
-
- º The TEI Extended Pointer Notation (EPN) is much more powerful than the
- `fragment address' on the end of some URLs. For example, the word `Xptr'
- two paragraphs back could be referred to as
- http://www.ucc.ie/xml/faq.sgml#ID(faq-hypertext)CHILD(3,*)(4,*), meaning
- the fourth child object within the third child object after the element
- whose ID is tei-link. Count the objects from the start of this question in
- the SGML version (which has the ID `faq-hypertext'):
-
- 1. the title of the question;
-
- <SECT2 ID="faq-hypertext">
- <TITLE>How will XML affect my document links?</TITLE>
-
- 2. the first paragraph;
-
- <PARA><LINK LINKEND="tei-link">The linking abilities of XML
- systems</LINK> are much more powerful than those...
-
- 3. the second paragraph:
-
- 1. the character data from the start of the paragraph to the first
- item of markup:
-
- <PARA>The
-
- 2. the markup item:
-
- <ULINK URL="http://www.w3.org/TR/WD-xml-link">XML Linking
- Specification (XLL)</ULINK>
-
- 3. the next stretch of character data:
-
- document contains a detailed specification. An XML link can
- be either a URL or a TEI-style Extended Pointer (
-
- 4. the next markup item:
-
- <LINK LINKEND="tei-link">Xptr</LINK>
-
- º If you view this file with Panorama or MultiDoc Pro you can click on the
- highlighted cross-reference button at the start of the example sentence,
- and it will display the locations in Extended Pointer Notation of all the
- links to it, including the word `Xptr' mentioned. (Doing this in an HTML
- browser is not meaningful, as they do not support bidirectional linking or
- EPN.)
-
- C.11 Can I do mathematics using XML?
-
- Yes, if the document type you use provides for math. The long-expired HTML3
- could be used, or HTML Pro, or ISO 12083 Math, or the proposals of the
- OpenMath or HTML-Math projects, or one of your own making. Browsers which
- display simple math embedded in SGML already exist (eg Panorama, Multidoc
- Pro), and the mathematics-using communities may develop their own XML
- software.
-
- º The sophistication could vary from math expressions like xi through
- simple inline equations such as E = mc2 to display equations like
-
- Sni=1 (xi - p)2/n
-
- (If you are using an HTML browser to read this, the above equations may not
- be rendered correctly unless you have a math plugin for Netscape like IBM's
- TechExplorer which reads the embedded TeX equivalent.
-
- C.12 How does XML handle metadata?
-
- Because XML lets you define your own markup language, you can make full use
- of the extended hypertext features (see the question on Links) of XML to
- store or link to metadata in any format (eg Dublin Core, Warwick Framework,
- Resource Description Framework (RDF), and Platform for Internet Content
- Selection (PICS)).
-
- There are no predefined elements in XML, because it is an architecture, not
- an application, so it is not part of XML's job to specify how or if authors
- should or should not implement metadata. You are therefore free to use any
- suitable method from simple attributes to the embedding of entire Dublin
- Core/Warwick Framework metadata records. Browser makers may also have their
- own architectural recommendations or methods to propose.
-
- C.13 Can I use Java, ActiveX, etc in XML?
-
- This depends on what facilities the browser makers implement. XML is about
- describing information; scripting languages and languages for embedded
- functionality are the software which enables the information to be
- manipulated at the user's end.
-
- XML itself provides a way to define the markup needed to implement
- scripting languages: as a neutral standard it neither encourages not
- discourages their use, and does not favour one language over another, so
- the field is wide open. Developments are ongoing: see John Tigue's
- suggestions for standardising the API for Java in respect of XML.
-
- Scripting languages are provided for in a proposal for an Extensible Style
- Language, XSL (see question on Stylesheets).
-
- C.14 How do I control appearance?
-
- The use of a stylesheet is implicit in XML. Some browsers may possibly
- provide simple default styles for popular elements like <PARA>, or <LIST>
- containing <ITEM>, but in general a stylesheet gives the author much better
- control of the layout. But as with any system where files can be viewed at
- random by arbitrary users, the author cannot know what resources (such as
- fonts) are on the user's system, so care is needed.
-
- * The international standard for stylesheets for SGML documents is
- DSSSL, the Document Style and Semantics Specification Language (ISO
- 10179). This provides Scheme-like languages for stylesheets and
- document conversion, and is extensively implemented in the Jade
- formatter.
-
- * The Cascading Stylesheet Specification (CSS) provides a simple syntax
- for assigning styles to elements, and has been implemented in HTML
- browsers.
-
- * The Synex stylesheet DTD as already used in Panorama and MultiDoc Pro;
-
- * A new Extensible Style Language (XSL) is being proposed for use
- specifically with XML. This uses XML syntax (a stylesheet is actually
- an XML file) and combines formatting features from both DSSSL and CSS
- (HTML) and has already attracted support from several major vendors.
-
- It remains to be seen which ones browsers will implement.
-
- ---------------------------------------------------------------------------
-
- D. Developers and Implementors (including WebMasters and server operators)
-
- D.1 Where's the spec?
-
- Right here (http://www.w3.org/TR/WD-xml). Includes the EBNF. There's also a
- version in Japanese.
-
- D.2 What are these terms `DTDless', `valid', and `well-formed'?
-
- Full SGML uses a Document Type Definition (DTD) to describe the markup
- (elements) available in any specific type of document. However, the design
- and construction of a DTD can be a complex and non-trivial task, so XML has
- been designed so it can be used either with or without a DTD. DTDless
- operation means you can invent markup without having to define it formally.
-
- To make this work, a DTDless file in effect `defines' its own markup,
- informally, by the existence and location of elements where you create
- them. But when an XML application such as a browser encounters a DTDless
- file, it needs to be able to understand the document structure as it reads
- it, because it has no DTD to tell it what to expect, so some changes have
- been made to the rules.
-
- For example, HTML's <IMG> element is defined as `EMPTY': it doesn't have an
- end-tag. Without a DTD, an XML application would have no way to know
- whether or not to expect an end-tag for an element, so the concept of
- `well-formed' has been introduced. This makes the start and end of every
- element, and the occurrence of EMPTY elements completely unambiguous.
-
- D.2.1 `Well-formed' documents
-
- All XML documents must be well-formed:
-
- * if there is no DTD in use, the document must start with a Required
- Markup Declaration (RMD) saying so:
-
- <?XML version="1.0" RMD="NONE"?>
- <foo>
- <bar>...<blort/>...</bar>
- </foo>
-
- * all tags must be balanced: that is, all elements which may contain
- character data must have both start- and end-tags present (omission is
- not allowed except for empty elements, see below);
-
- * all attribute values must be in quotes (the single-quote character
- [the apostrophe] may be used if the value contains a double-quote
- character, and vice versa): if you need both, use ' and "
-
- * any EMPTY element tags (eg those with no end-tag like HTML's <IMG>,
- <HR>, and <BR> and others) must either end with `/>' or you have to
- make them non-EMPTY by adding a real end-tag;
-
- Example: <BR> would become either <BR/> or <BR></BR>.
-
- * º there must not be any isolated markup characters (< or &) in your
- text data (ie they must be escaped as < and &), and the
- sequence ]]> must be escaped as ]]> if it does not occur as the end
- of a CDATA marked section;
-
- * º elements must nest inside each other properly (no overlapping
- markup, same rule as for all SGML);
-
- * ╢ Well-formed files with no DTD may use attributes on any element, but
- the attributes must all be of type CDATA by default.
-
- º Well-formed XML files with no DTD are considered to have <, >,
- ', ", and & predefined and thus available for use even
- without a DTD. Valid XML files must declare them explicitly if they use
- them.
-
- º Note that the value of the RMD `NONE' indicates that an XML processor can
- parse the document correctly without first reading any part of the DTD, so
- it can also be used if you do supply a DTD but don't want it used on this
- occasion. See the next section for other values of the RMD.
-
- D.2.2 Valid XML
-
- Valid XML files are those which have a Document Type Definition (DTD) like
- all other SGML applications, and which adhere to it. They must also be
- well-formed.
-
- A valid file begins like any other SGML file with a Document Type
- Declaration, but may have an optional XML Declaration prepended:
-
- <?XML version="1.0"?>
- <!DOCTYPE advert SYSTEM "http://www.foo.org/ad.dtd">
- <advert>
- <headline>...<pic/>...</headline>
- <text>...</text>
- </advert>
-
- The XML Specification defines an SGML Declaration for XML which is fixed
- for all instances. An XML version of the specified DTD must be accessible
- to the XML processor, either by being available locally (ie the user
- already has a copy on disk), or by being retrievable via the network. You
- can enable this either by supplying the URL for the DTD in a System
- Identifier (as in the example above) or by supplying the Formal Public
- Identifier, eg
-
- <!DOCTYPE advert PUBLIC "-//Foo, Inc//DTD Advertisements//EN">
-
- and providing a catalog file which equates FPIs with their URL equivalents.
-
- The Required Markup Declaration (RMD) can take two other values (apart from
- `NONE' which was discussed in the previous subsection): `INTERNAL' and
- `ALL'.
-
- `INTERNAL' indicates that the XML processor is required to read and process
- the internal subset of the DTD, if provided, to parse the document
- correctly:
-
- <?XML VERSION="1.0" RMD="INTERNAL"?>
- <!DOCTYPE foo [
- <!ENTITY alephhb cdata "à">
- ]>
- <foo>The first letter is &alephhb;</foo>
-
- `ALL' is the default, when no XML Declaration is present, and indicates
- that the DTD and the internal subset must both be read in order to parse
- the document correctly. See the the XML Specification for a more detailed
- description.
-
- The defaults for the other attributes of the XML Declaration are
- VERSION="1.0" and ENCODING="UTF-8".
-
- D.3 What else has changed between SGML and XML?
-
- The principal changes are in what you can do in writing a Document Type
- Definition (DTD). To simplify the syntax and make it easier to write
- processing software, a large number of markup declaration options have been
- suppressed (see Appendix A of the XML Specification).
-
- ╢ A new delimiter is permitted in Names (the colon) for use in experiments
- with namespaces (enabling DTDs to distinguish element source, ownership, or
- application). A colon may only appear in mid-name, though, not at the start
- or the end, and the syntax may change in a future version.
-
- D.4 What XML software can I use today?
-
- There is a modification under development for Emacs/psgml-mode to handle
- XML files.
-
- Most of the well-known SGML vendors are working on XML versions of editors
- and other tools; the editors with a product released (or very close) so far
- are:
-
- * GriF's Symposia Doc+ (http://www.grif.fr/)
-
- * STiLO's WebWriter (http://www.stilo.com/)
-
- * ArborText's ADEPT*Editor (http://www.arbortext.com/)
-
- * [anyone with details of others please let me know]
-
- There is a growing number of XML parsers which can be used to check that
- your files conform to the Draft XML Specification:
-
- * Norbert Mikula's NXP at http://www.edu.uni-klu.ac.at/~nmikula/NXP/
-
- * Tim Bray's Lark at http://www.textuality.com/Lark/
-
- * Sean Russell's Java test kernel at
- http://jersey.uoregon.edu/ser/software/XML.tar.gz
-
- * Microsoft's MSXML parser at
- http://www.microsoft.com/standards/xml/xmlparse.htm
-
- * Steve Ball's parser in Tcl at http://tcltk.anu.edu.au/XML/
-
- * [anyone with details of others please let me know]
-
- º For browsers see the question on XML Browsers and the details of the
- xml-dev mailing list for software developers. Bert Bos keeps a list of some
- XML developments in bison, flex, perl and Python.
-
- D.5 Do I have to change any of my server software to work with XML?
-
- º Only to serve up .xml files as the correct MIME type. MIME types of
- text/xml and application/xml have been submitted for approval, so for
- serving XML documents all that is needed is to edit the mime-types file (or
- its equivalent) and add the lines
-
- text/xml xml XML
- application/xml xsl XSL
-
- However, more sophisticated applications may require HTTP content
- negotiation to determine what tools the client has for display. Also, since
- XML is designed to support stylesheets and sophisticated hyperlinking, XML
- documents will be accompanied by ancillary files such as DTDs, entity
- files, catalogs, stylesheets, etc, which may need their own MIME entry, and
- which require placing in the appropriate directories.
-
- If you run scripts generating HTML, which you wish to work with XML, they
- will need to be modified to produce the relevant document type.
-
- D.6 Can I still use server-side INCLUDEs?
-
- Yes, so long as what they generate ends up as part of an XML-conformant
- file (ie either valid or just well-formed).
-
- D.7 Can I (and my authors) still use client-side INCLUDEs?
-
- The same rule applies as for server-side INCLUDEs, so you need to ensure
- that any embedded code which gets passed to a third-party engine (eg SDQL
- enquiries, Java writes, LiveWire requests, streamed content, etc) does not
- contain any characters which might be misinterpreted as XML markup (ie no
- angle brackets or ampersands): either use a CDATA marked section to avoid
- your XML application parsing the embedded code, or use the standard <,
- >, and & character entity references instead.
-
- D.8 I'm trying to understand the XML Spec: why does SGML (and XML) have
- such difficult terminology?
-
- For implementation to succeed, the terminology needs to be precise.
-
- ╢ Example: `element' and `tag' are not synonymous: an element is a whole
- unit of information with its markup, and may consist of a start-tag alone
- (as in HTML's <BR>) or a start-tag and an end-tag and the content which
- goes between them; tags alone are simply the markers at the start and end
- of elements.
-
- Sloppy terminology in specifications causes misunderstandings, so formal
- standards have to be phrased in formal terminology. This is not a formal
- document, and the astute reader may already have noticed it refers to
- `element names' where `element type names' is more correct; but the former
- is more widely understood.
-
- Those new to SGML may want to read something like the Gentle Introduction
- to SGML chapter of the TEI.
-
- D.9 Is there a Developer's API kit for XML?
-
- Several are reported to be under development. The ones I have found so far
- are:
-
- * The Language Technology Group has produced the LT XML toolkit
- (http://www.ltg.ed.ac.uk/software/xml/) and the DSSSL Syntax Checker
- (DSC: http://www.ltg.ed.ac.uk/~ht/dsc-blurb.html).
-
- * [anyone with details of others please let me know]
-
- The big SGML conversion and application development engines like Balise,
- Omnimark, and SGMLC are all working on XML versions. Details of SGML
- software of all kinds are on the SGML Web pages.
-
- ---------------------------------------------------------------------------
-
- Response and query form
-
- Illustration from Dale Dougherty's article in Web Review (courtesy of the
- publishers).
- [XMLfiles image]
-
- Section and question:
-
- New material
-
- New question, answer not known Question and Answer
-
- New question, with sample answer
-
- Corrections to existing wording
-
- Correction to an existing question onlyDetails
-
- Correction to an existing answer only Your name:
-
- Correction to both question and answer Affiliation:
-
- Additional material Email address:
-
- Addition to an existing question only
-
- Addition to an existing answer only
-
- Addition to both question and answer
-