home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.text.sgml
- Path: sparky!uunet!mcsun!sunic!aun.uninett.no!nuug!ifi.uio.no!naggum.no
- From: Erik Naggum <erik@naggum.no>
- Sender: Erik Naggum <enag@ifi.uio.no>
- Reply-To: Erik Naggum <enag@ifi.uio.no>
- Message-ID: <23266B@erik.naggum.no>
- Date: 31 Jul 1992 03:56:49 +0200
- Subject: Overlapping element structures
- Lines: 72
-
- SGML cannot handle overlapping element structures under the intuitive
- semantics of the element structure, but if we introduce an ability to
- concatenate elements, as additional semantics in the application, we
- could easily accomplish this.
-
- Steve DeRose has proposed using a new attribute type, and I draw from
- his suggestion, but I consider it difficult to use ID and IDREF, or any
- scheme derived from them, because overlapping regions can easily be
- split by intervening markup, and then we have the same element instance
- conceptually continued, as opposed to one of them referring in any sense
- to the other. That is, we should ideally have two elements with the
- same ID attribute, but we can't do that, because IDs have to be unique
- across the entire document. Eliot Kimber has argued for the difficulty
- in maintaining the ID space in large documents, and I think it's
- therefore wise not to use IDs for this purpose.
-
- My proposal makes use of a special name attribute, which I call
- "instance" (but which, for more generality, could be an attribute
- architectural form), which identifies, within an application-defined
- part of the element structure (such as inside a particular element), the
- individual instance of an element. Such an element can be split into
- pieces which are dispersed in the document instance, but which are
- conceptually concatenated for those applications which need it.
-
- Take the example of paragraphs and quotations. When used in quoting
- direct speech, paragraph breaks may occur in the quotation, and the
- easiest way to handle them is to allow paragraphs in quotations. This,
- however, does violence to the sequence of paragraphs, such that a given
- paragraph element contains the entire quotation, instead of only the
- first paragraph of the quotation. This produces very counter-intuitive
- addressing if, say, HyTime's treeloc's are used, or when intelligent
- search and retrieval functions are used, because a paragraph number will
- not correspond to the formatted output, if paragraphs containing and
- contained in quotations appear identically, and will needlessly clutter
- the addressing scheme.
-
- A named instance of the element should solve the problem, and we leave
- to the application to specify the semantics in the concatenation. As a
- contrived example, consider the following fragments:
-
- <!ELEMENT p (#PCDATA|...|q|...)*>
- <!ELEMENT q (#PCDATA|...)*>
- <!ATTLIST q instance NAME #CURRENT>
-
- <p><q instance=ceotalk>Ladies and gentlemen...</q>, he said, and paused
- to allow the guests to finish the smalltalk at the tables.</p>
- <p><q>We are gathered here today...</q></p>
- <p><q>...</q></p>
-
- Note the use of CURRENT default values. This means that all Q elements
- need an INSTANCE attribute value, which could be a disadvantage. The
- other option is to use IMPLIED as the default value, and require all Q
- elements which are part of a concatenated element to specify the
- attribute value. This has disadvantages in typing, and makes use of
- SHORTREF next to impossible, but is significantly easier to handle and
- deal with, and is not as error prone as the CURRENT approach, in
- particular with respect to an erroneously omitted attribute value
- specification.
-
- It would now be possible to extract the entire speech by requesting all
- elements of type Q with INSTANCE equal to CEOTALK.
-
- This does not necessarily handle the problem of overlapping regions,
- where the contents of one element is also the contents of another, but
- this may be best solved with judicious use of the CONCUR feature
- (provided one has a parser which does not consider it required to give
- the application program access to only one document type at a time).
-
- I think this solves most of the problem, anyway.
-
- Best regards,
- </Erik>
- --
- Erik Naggum | ISO 8879 SGML | +47 295 0313
- | ISO 10744 HyTime |
- <erik@naggum.no> | ISO 10646 UCS | Memento, terrigena.
- <enag@ifi.uio.no> | ISO 9899 C | Memento, vita brevis.
-