home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!ralvm13.VNET.IBM.COM
- From: drmacro@ralvm13.VNET.IBM.COM
- Message-ID: <19930107.090654.710@almaden.ibm.com>
- Date: Thu, 7 Jan 93 12:05:39 EST
- Newsgroups: comp.text.sgml
- Subject: Re: SGML for data querying
- Disclaimer: This posting represents the poster's views, not those of IBM
- News-Software: UReply 3.1
- References: <19930106.155234.608@almaden.ibm.com> <1993Jan7.021616.11906@nuscc.nus.sg>
- <1993Jan7.142852.23671@infodev.cam.ac.uk>
- Lines: 85
-
- In <1993Jan7.142852.23671@infodev.cam.ac.uk> Alasdair Grant writes:
-
- >But support for queries of the form "move to previous <item> element"
- >would be difficult to support without storing the entire database in
- >fully parsed form. Then you're not really talking about SGML at
- >all, you're talking about a hierarchical database, object-oriented even,
- >with facilities for SGML import and export.
-
- But it's not that simple. If you define "SGML import and export" as
- being able to import a document and then immediately export, byte for
- byte, the original document, then what you are storing in your
- database is SGML, even if the internal representation is proprietary.
- Thus you must support SGML features like entity references, SDATA,
- NDATA, etc., etc. This support can be added to hierarchical,
- object-oriented databases, but just because the storage is implemented
- that way makes it no less "SGML". The key is the logical constructs
- and functions that are defined and required by SGML, not the way
- the data is stored and accessed. A typical example of such a system
- is OpenText's PAT database coupled with SGML/Search. PAT is a structured
- text database and SGML/Search adds true SGML parsing and querying on
- top of it. The fact that PAT does some magical thing to store the
- data makes the data it stores no less SGML because SGML/Search
- provides an SGML-compliant and knowledgeable interface to the data.
- A system that used SGML/Search to access the SGML data could still
- be a conforming SGML system.
-
- You're right that providing a function that lets you retrieve data
- requires implementing random access into the SGML document and that
- the most efficient way to do that is probably to read in the entire
- thing and index it for subsequent fast retrieval, but that's not a
- hard requirement. I could just as easily use the source as my
- data structure and simply re-parse it for each new query. It may not
- be a good implementation choice, but it works just fine.
- Also, there will always be cases where part of the document (or
- referenced subdocuments) cannot be parsed until actually queried
- for or linked to, so there has to be a facility to dynamically query
- and access unpreprocessed SGML data. This will especially be the case
- in dynamic, online presentation and retrieval systems working
- against heterogenous databases of SGML and non-SGML information.
-
- >In fact thinking of this as SGML could raise many problems, for example
- >forcing users to make distinctions between element-attributes and element
- >data, and forcing them to know about ordering even when (to the data) it
- >might not be significant. To me, choosing whether to store data in
- >content or attributes is one of the real hassles of DTD design.
-
- I'm not sure that this is really a problem. The decision of what to
- make content and what to make attributes is made by the application
- designer who designs a given element set. If they've done a good job
- they will presumably have logical reasons for what is attributes and
- what is data, logic that is consistent, predictable, and documented.
- Also, one can reasonably assume that applications of significant
- complexity will have tools built for them (e.g., editors, etc.) that
- hide the details of the SGML markup in any case, only exposing the
- logical constructs (e.g., present input panels to capture data
- rather than directly typing tags). Online query systems can either
- remove the need to know the underlying element structure or
- provide interfaces for learning the structure as you need it. EBT's
- Dynatext does both of these things, providing accessible tools for
- readers to make use of the SGML structure of the document without the
- need to know before hand the structure of the markup. With Dynatext,
- it is not even necessary for readers to know it's SGML underneath
- because authors can create "canned" queries that readers simply
- use. This lets authors use their, presumably, detailed knowledge
- of the documents they create to define optimized, SGML-based searches
- that their readers can use directly. In other words, authors can
- use their specialized knowledge to optimize retrieval for readers,
- in much the same way they use their specialized knowledge to create
- indexes, cross references, and other print retrieval aids today.
-
- > Surely
- >the way to go is define a common view of object-oriented databases, and
- >then settle on a way in which these can be imported and exported as SGML.
-
- But doesn't SGML define this common view, at least for text data? The
- problem is not to define a common view of object-oriented text
- information, but to define the most effective way to express and
- support SGML with generic database technology *coupled with* SGML
- technology (e.g., parsers and SGML-knowledgeable retrieval systems).
-
- Eliot Kimber Internet: drmacro@ralvm13.vnet.ibm.com
- Dept E14/B500 IBMMAIL: USIB2DK9@IBMMAIL
- Network Programs Information Development Phone: 1-919-543-7091
- IBM Corporation
- Research Triangle Park, NC 27709
-