home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.text.sgml
- Path: sparky!uunet!mcsun!sunic!aun.uninett.no!nuug!ifi.uio.no!enag
- From: Erik Naggum <SGML@ifi.uio.no>
- Message-ID: <19930107.006@erik.naggum.no>
- Date: 07 Jan 1993 22:20:40 +0100
- References: <1993Jan6.221136.27780@twg.com> <19930106.155234.608@almaden.ibm.com>
- Subject: Re: SGML for data querying
- Summary: The element structure is the _result_ of sequential processing.
- Lines: 51
-
- [Eliot Kimber]
- :
- | <digression>
- |
- | The examples above, and most SGML applications in existence today,
- | process SGML documents sequentially from start to finish. However,
- | SGML does not require sequential processing, and it can make just as
- | much sense to define applications that work with SGML documents
- | wholistically as a tree, rather than sequentially, at least in the
- | abstract. In this processing model, access is via queries, rather than
- | sequential access by waiting for the element you want to flow by. ...
-
- This is a good point, but an important distinction needs to be made (and
- "at least in the abstract" needs to be stressed).
-
- The SGML document consists of one or more _entities_, each a character
- sequence, which represents a fragmented linearization of the element
- structure. This fragmented linearization needs to be parsed (and
- reconstructed) before we can access the element structure. The element
- structure can fruitfully be regarded as the _product_ of the parser (cf.
- ESIS), and the parser has to parse the document "from start to finish" to
- be able to build this structure.
-
- According to the standard, the only way you can possibly parse an SGML
- document is sequentially (cf. 6.2 SGML Entities), and any access other than
- sequential in a data stream is to entities in it (cf. SDIF).
-
- This problem has been discussed before, in relation to what #CURRENT meant
- as an attribute default value declaration.
-
- In order to access elements in the element structure non-sequentially, you
- have to build some sort of index into the entity structure, or store the
- data returned from the sequential parsing process. Since element contents
- can span entities (one of SGML's significant strengths), the demand on
- software that accesses the unparsed entity structure is high, compared to
- storing a fully parsed SGML document in some form optimized for retrieval
- by the application. This task becomes daunting in the presence of frequent
- short references, character entity references, and references to other
- internal entities.
-
- Several interesting problems follow from the constraints that this
- represents. (No, that's not the "understatement of the year" -- it's only
- January 7th. ;-)
-
- Best regards,
- </Erik>
- --
- Erik Naggum ISO 8879 SGML +47 295 0313
- ISO 10744 HyTime
- <erik@naggum.no> ISO 9899 C Memento, terrigena
- <SGML@ifi.uio.no> ISO 10646 UCS Memento, vita brevis
-