NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / text / sgml / 1272 < prev next >

Wrap

Internet Message Format | 1993-01-07 | 5.8 KB

Path: sparky!uunet!ralvm13.VNET.IBM.COM From: drmacro@ralvm13.VNET.IBM.COM Message-ID: <19930107.141259.567@almaden.ibm.com> Date: Thu, 7 Jan 93 17:11:39 EST Newsgroups: comp.text.sgml Subject: Re: SGML for data querying Disclaimer: This posting represents the poster's views, not those of IBM News-Software: UReply 3.0 References: <1993Jan6.221136.27780@twg.com> <19930106.155234.608@almaden.ibm.com> <19930107.006@erik.naggum.no> Lines: 96 In <19930107.006@erik.naggum.no> Erik Naggum <SGML@ifi.uio.no> writes: >[Eliot Kimber] >: >| <digression> >| >| The examples above, and most SGML applications in existence today, >| process SGML documents sequentially from start to finish. However, >| SGML does not require sequential processing, and it can make just as >| much sense to define applications that work with SGML documents >| wholistically as a tree, rather than sequentially, at least in the >| abstract. In this processing model, access is via queries, rather than >| sequential access by waiting for the element you want to flow by. ... > >This is a good point, but an important distinction needs to be made (and >"at least in the abstract" needs to be stressed). > >The SGML document consists of one or more _entities_, each a character >sequence, which represents a fragmented linearization of the element >structure. This fragmented linearization needs to be parsed (and >reconstructed) before we can access the element structure. The element >structure can fruitfully be regarded as the _product_ of the parser (cf. >ESIS), and the parser has to parse the document "from start to finish" to >be able to build this structure. Erik's completely correct. In my mind, I've already abstracted away the entity structure, such that the only SGML-defined constructs I'm thinking about are enlement structures. You can do this by defining semantics that provide "entity-like" functions for elements. For example, if you use the HyTime content extraction link idea to access the content of SGML elements, you don't have to depend on entities to do data organization. All data organization can be done using only SGML elements. This allows you to view the SGML data without regard to how it might be organized into system entities, or rather, you view the data as a single, giant entity that is an entire "document" or set of documents. In addition, if you constrain all elements and semantics to the element hierarchy, you remove all requirements to process the document sequentially (ignoring the fact that the SGML must be *parsed* sequentially, which is not the same thing). This means that valid subtrees can be processed in isolation because they are, by this definition, completely self contained. Or, thought of another way, there are no longer any processing states that change asynchronously from the element hierarchy itself. This is very powerful for things like re-use and modularization. New documents are built using content references to re-use objects rather than using entity references (except for data entities, of course). You can thus think of the database as a "containment-only document" that serves only as a repository of elements for use elsewhere, at its simplest, a one-element document type with a content-model of ANY. If you define the first level of containment as defining the object boundaries, hey presto, a database of objects. If you make this structure recursive (which is to SGML by dint of using ANY), you can have arbitrarily complex sets of containers within containers. This enables the data organization to be completely different from the presentation or retrieval organization, letting you modularlize your data to any granularity desired (down to the character level, unless you use HyTime addressing to address bits within characters--just joking). At all levels of granularity, the same basic SGML features or application semantics are used to manage structure and addressing (e.g., containment hierarchy and ID or query-based hyperlinks). In other words, you can define all data organization of SGML data purely in terms of element structure without ever using external entities (not counting data entities, of course). HyTime defines the processing semantics you need in terms of hyperlinks, which provides a nice formalism, but you don't need HyTime to define those semantics for a given application. One of the big revelations for me as I've worked at designing SGML applications is that SGML puts very few limits on what you *can* do. I found that I tended to limit my own thinking by what I was used to doing, and when I stopped doing that, I realized I had a lot of freedom to define whatever processing and application semantics I wanted to. This is because SGML does not define the processing semantics in any way. Charles Goldfarb has always been very adamant on this point and I finally see why. Of course, there's no reason you can't define another standard set of semantics in SGML terms, but the possible number of such definitions is infinite. I also found that I could define, for my application, which happens to be a text processing application, all my semantics in terms of hyperlinks, which means they can all be expressed in terms of HyTime, for the most part. I now have an application whose data structures and semantics are expressed almost entirely in terms of existing international standards, which provides, at the very minimum, a coherent, "universal", language for expressing my application design, and that is completely divorced from implementation specifics. I find this to be pretty darn cool and very liberating. Eliot Kimber Internet: drmacro@ralvm13.vnet.ibm.com Dept E14/B500 IBMMAIL: USIB2DK9@IBMMAIL Network Programs Information Development Phone: 1-919-543-7091 IBM Corporation Research Triangle Park, NC 27709