home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!ralvm13.VNET.IBM.COM
- From: drmacro@ralvm13.VNET.IBM.COM
- Message-ID: <19930106.155234.608@almaden.ibm.com>
- Date: Wed, 6 Jan 93 18:51:16 EST
- Newsgroups: comp.text.sgml
- Subject: Re: SGML for data querying
- Disclaimer: This posting represents the poster's views, not those of IBM
- News-Software: UReply 3.0
- References: <1993Jan6.221136.27780@twg.com>
- Lines: 96
-
- In <1993Jan6.221136.27780@twg.com> "David Herron" <david@twg.com> writes:
-
- >Taking the `forms' example. A tag might be `<NAME>' and the value at
- >that tag is the persons name. Or a purchase requisition form might
- >have <AMOUNT-REQUIRED> <ITEM-DESCRIPTION> <CATALOG-NUMBER> <UNIT-PRICE>
- >and <PRICE> tags, with associated values. Is there a concept wherein
- >some `tags' have `values'? Or does a tag only create a context, and the
- >data within that context is interpreted in a particular way?
-
- The SGML markup only defines a structural context, e.g., Ammount-Required
- must precede Item-Description, or Name contains <lastname> followed
- by <firstname>. It is up to a particular processing application to
- define what those elements mean, which they do by associating some
- processing with the elements and their data. This processing can be
- anything you can program. SGML neither defines nor constrains the
- purpose to which you put SGML-encoded data.
-
- Elements (tags are the syntactic constructs that delimit element
- boundaries) can have values in a sense.
- All elements (except one special case we can ignore for simplicity)
- can be plain text (PCDATA or "parsed character data"), other
- elements only, or elements and PCDATA ("mixed content"). When elements
- only take PCDATA as their content, the PCDATA is generally intended
- as that element's "value" in the way you mean. Note that in
- SGML, elements always contain things, they do not simply represent
- start and end flags, and that containment is tracked and reported
- by the SGML parser, and knowledge of that containment structure can
- be used by applications to make decisions based on the containment
- hierarchy for any element or data.
-
- >In any case it appears that for `querying' type applications the
- >SGML parser must build up some data structure, and that the application
- >program must then be able to probe the data structure at its leisure.
- >For a normal text processing application (SGML -> TeX -> PS conversion)
- >you can get away with creating and destroying the data structures on
- >the fly. But to be consulted at the applications leisure the data
- >structures must stay around for a long time.
- >
- >So... Are there any SGML programming toolkits available which allow
- >for this sort of thing? Is the ARCSGML/JClark suitable in this way?
- >(I'm on a small budget)
-
- Someone has provided a Rexx interface to ARCSGML for use with
- Personal Rexx under MS-DOS. With this, you could fairly easily
- write "query" applications. The basic idea is
- to first define the data structures you need to hold the data
- you want, just as you would for any other data processing
- application. You then process the document, grabbing data
- off of the relevant elements as they go by and putting it into your data
- structures. The rest you just let flow through and ignore.
-
- When you're done processing the document, you do whatever needs
- to be done with the data you've collected, such as create
- reports, load databases, etc. This should be fairly straight-forward
- processing.
-
- <digression>
-
- The examples above, and most SGML applications in existence today,
- process SGML documents sequentially from start to finish. However,
- SGML does not require sequential processing, and it can make just
- as much sense to define applications that work with SGML documents
- wholistically as a tree, rather than sequentially, at
- least in the abstract. In this processing model, access is
- via queries, rather than sequential access by waiting for the
- element you want to flow by. Some SGML-based online presentation
- systems, such as EBT's Dynatext, present this sort of view of
- documents, providing query-based direct access to elements and
- and data, which makes sense in a dynamic, direct access application
- like online presentation.
-
- You can also define processing models that use both sequential
- and query-based processing. For example, you might use a query
- to access the root of some subtree within a document, and then
- process that subtree sequentially to build some output, or in
- the course of sequential processing, you might use a query to
- do lookahead to resolve a cross reference (as opposed to
- doing some sort of two-pass process). This sort of system might
- make the most sense when your documents are already stored in
- an SGML-knowledgeable database system that provides the retrieval
- function you need.
-
- The point is that even though bytehead hackers like myself who
- knock together quick programs to do pragmatic stuff tend to
- think about processing SGML sequentially, especially because
- that's what our tools give us, SGML processing is not limited
- to that one processing model, and that by thinking of SGML documents
- as databases against which you can do queries, you might start
- to think of very interesting and useful ways to use your
- SGML documents you might not have thought of before.
-
- Eliot Kimber Internet: drmacro@ralvm13.vnet.ibm.com
- Dept E14/B500 IBMMAIL: USIB2DK9@IBMMAIL
- Network Programs Information Development Phone: 1-919-543-7091
- IBM Corporation
- Research Triangle Park, NC 27709
-