NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / text / sgml / 1267 < prev next >

Wrap

Internet Message Format | 1993-01-07 | 5.4 KB

Path: sparky!uunet!ralvm13.VNET.IBM.COM From: drmacro@ralvm13.VNET.IBM.COM Message-ID: <19930107.090654.710@almaden.ibm.com> Date: Thu, 7 Jan 93 12:05:39 EST Newsgroups: comp.text.sgml Subject: Re: SGML for data querying Disclaimer: This posting represents the poster's views, not those of IBM News-Software: UReply 3.1 References: <19930106.155234.608@almaden.ibm.com> <1993Jan7.021616.11906@nuscc.nus.sg> <1993Jan7.142852.23671@infodev.cam.ac.uk> Lines: 85 In <1993Jan7.142852.23671@infodev.cam.ac.uk> Alasdair Grant writes: >But support for queries of the form "move to previous <item> element" >would be difficult to support without storing the entire database in >fully parsed form. Then you're not really talking about SGML at >all, you're talking about a hierarchical database, object-oriented even, >with facilities for SGML import and export. But it's not that simple. If you define "SGML import and export" as being able to import a document and then immediately export, byte for byte, the original document, then what you are storing in your database is SGML, even if the internal representation is proprietary. Thus you must support SGML features like entity references, SDATA, NDATA, etc., etc. This support can be added to hierarchical, object-oriented databases, but just because the storage is implemented that way makes it no less "SGML". The key is the logical constructs and functions that are defined and required by SGML, not the way the data is stored and accessed. A typical example of such a system is OpenText's PAT database coupled with SGML/Search. PAT is a structured text database and SGML/Search adds true SGML parsing and querying on top of it. The fact that PAT does some magical thing to store the data makes the data it stores no less SGML because SGML/Search provides an SGML-compliant and knowledgeable interface to the data. A system that used SGML/Search to access the SGML data could still be a conforming SGML system. You're right that providing a function that lets you retrieve data requires implementing random access into the SGML document and that the most efficient way to do that is probably to read in the entire thing and index it for subsequent fast retrieval, but that's not a hard requirement. I could just as easily use the source as my data structure and simply re-parse it for each new query. It may not be a good implementation choice, but it works just fine. Also, there will always be cases where part of the document (or referenced subdocuments) cannot be parsed until actually queried for or linked to, so there has to be a facility to dynamically query and access unpreprocessed SGML data. This will especially be the case in dynamic, online presentation and retrieval systems working against heterogenous databases of SGML and non-SGML information. >In fact thinking of this as SGML could raise many problems, for example >forcing users to make distinctions between element-attributes and element >data, and forcing them to know about ordering even when (to the data) it >might not be significant. To me, choosing whether to store data in >content or attributes is one of the real hassles of DTD design. I'm not sure that this is really a problem. The decision of what to make content and what to make attributes is made by the application designer who designs a given element set. If they've done a good job they will presumably have logical reasons for what is attributes and what is data, logic that is consistent, predictable, and documented. Also, one can reasonably assume that applications of significant complexity will have tools built for them (e.g., editors, etc.) that hide the details of the SGML markup in any case, only exposing the logical constructs (e.g., present input panels to capture data rather than directly typing tags). Online query systems can either remove the need to know the underlying element structure or provide interfaces for learning the structure as you need it. EBT's Dynatext does both of these things, providing accessible tools for readers to make use of the SGML structure of the document without the need to know before hand the structure of the markup. With Dynatext, it is not even necessary for readers to know it's SGML underneath because authors can create "canned" queries that readers simply use. This lets authors use their, presumably, detailed knowledge of the documents they create to define optimized, SGML-based searches that their readers can use directly. In other words, authors can use their specialized knowledge to optimize retrieval for readers, in much the same way they use their specialized knowledge to create indexes, cross references, and other print retrieval aids today. > Surely >the way to go is define a common view of object-oriented databases, and >then settle on a way in which these can be imported and exported as SGML. But doesn't SGML define this common view, at least for text data? The problem is not to define a common view of object-oriented text information, but to define the most effective way to express and support SGML with generic database technology *coupled with* SGML technology (e.g., parsers and SGML-knowledgeable retrieval systems). Eliot Kimber Internet: drmacro@ralvm13.vnet.ibm.com Dept E14/B500 IBMMAIL: USIB2DK9@IBMMAIL Network Programs Information Development Phone: 1-919-543-7091 IBM Corporation Research Triangle Park, NC 27709