home *** CD-ROM | disk | FTP | other *** search
- Path: sparky!uunet!ralvm13.VNET.IBM.COM
- From: drmacro@ralvm13.VNET.IBM.COM
- Message-ID: <19930107.141259.567@almaden.ibm.com>
- Date: Thu, 7 Jan 93 17:11:39 EST
- Newsgroups: comp.text.sgml
- Subject: Re: SGML for data querying
- Disclaimer: This posting represents the poster's views, not those of IBM
- News-Software: UReply 3.0
- References: <1993Jan6.221136.27780@twg.com> <19930106.155234.608@almaden.ibm.com>
- <19930107.006@erik.naggum.no>
- Lines: 96
-
- In <19930107.006@erik.naggum.no> Erik Naggum <SGML@ifi.uio.no> writes:
- >[Eliot Kimber]
- >:
- >| <digression>
- >|
- >| The examples above, and most SGML applications in existence today,
- >| process SGML documents sequentially from start to finish. However,
- >| SGML does not require sequential processing, and it can make just as
- >| much sense to define applications that work with SGML documents
- >| wholistically as a tree, rather than sequentially, at least in the
- >| abstract. In this processing model, access is via queries, rather than
- >| sequential access by waiting for the element you want to flow by. ...
- >
- >This is a good point, but an important distinction needs to be made (and
- >"at least in the abstract" needs to be stressed).
- >
- >The SGML document consists of one or more _entities_, each a character
- >sequence, which represents a fragmented linearization of the element
- >structure. This fragmented linearization needs to be parsed (and
- >reconstructed) before we can access the element structure. The element
- >structure can fruitfully be regarded as the _product_ of the parser (cf.
- >ESIS), and the parser has to parse the document "from start to finish" to
- >be able to build this structure.
-
- Erik's completely correct. In my mind, I've already abstracted away
- the entity structure, such that the only SGML-defined constructs I'm
- thinking about are enlement structures. You can do this by defining
- semantics that provide "entity-like" functions for elements. For
- example, if you use the HyTime content extraction link idea to access
- the content of SGML elements, you don't have to depend on entities
- to do data organization. All data organization can be done using
- only SGML elements. This allows you to view the SGML data without
- regard to how it might be organized into system entities, or rather,
- you view the data as a single, giant entity that is an entire "document"
- or set of documents. In addition, if you constrain all elements and
- semantics to the element hierarchy, you remove all requirements to
- process the document sequentially (ignoring the fact that the SGML
- must be *parsed* sequentially, which is not the same thing). This
- means that valid subtrees can be processed in isolation because
- they are, by this definition, completely self contained. Or, thought
- of another way, there are no longer any processing states that
- change asynchronously from the element hierarchy itself. This is very
- powerful for things like re-use and modularization.
-
- New documents are built using content
- references to re-use objects rather than using entity references
- (except for data entities, of course). You can thus think of
- the database as a "containment-only document" that serves only as
- a repository of elements for use elsewhere, at its simplest,
- a one-element document type with a content-model of ANY. If you
- define the first level of containment as defining the object
- boundaries, hey presto, a database of objects. If you make this
- structure recursive (which is to SGML by dint of using ANY),
- you can have arbitrarily complex sets of containers within containers.
- This enables the data organization to be completely different from
- the presentation or retrieval organization, letting you
- modularlize your data to any granularity desired (down to the
- character level, unless you use HyTime addressing to address
- bits within characters--just joking). At all levels of granularity,
- the same basic SGML features or application semantics are used
- to manage structure and addressing (e.g., containment hierarchy
- and ID or query-based hyperlinks).
-
- In other words, you can define all data organization of SGML
- data purely in terms of element structure without ever using
- external entities (not counting data entities, of course). HyTime
- defines the processing semantics you need in terms of hyperlinks,
- which provides a nice formalism, but you don't need HyTime to
- define those semantics for a given application.
-
- One of the big revelations for me as I've worked at designing
- SGML applications is that SGML puts very few limits on what
- you *can* do. I found that I tended to limit my own thinking by
- what I was used to doing, and when I stopped doing that, I
- realized I had a lot of freedom to define whatever processing
- and application semantics I wanted to. This is because SGML does
- not define the processing semantics in any way. Charles Goldfarb
- has always been very adamant on this point and I finally see
- why. Of course, there's no reason you can't define another
- standard set of semantics in SGML terms, but the possible number of
- such definitions is infinite. I also found that I could define,
- for my application, which happens to be a text processing
- application, all my semantics in terms of hyperlinks, which means
- they can all be expressed in terms of HyTime, for the
- most part. I now have an application whose data structures
- and semantics are expressed almost entirely in terms of existing
- international standards, which provides, at the very minimum,
- a coherent, "universal", language for expressing my application
- design, and that is completely divorced from implementation specifics.
- I find this to be pretty darn cool and very liberating.
-
- Eliot Kimber Internet: drmacro@ralvm13.vnet.ibm.com
- Dept E14/B500 IBMMAIL: USIB2DK9@IBMMAIL
- Network Programs Information Development Phone: 1-919-543-7091
- IBM Corporation
- Research Triangle Park, NC 27709
-