NetNews Usenet Archive 1993 #1

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1993 #1 / NN_1993_1.iso / spool / comp / database / 8905 < prev next >

Wrap

Internet Message Format | 1993-01-07 | 5.4 KB

Path: sparky!uunet!nntp1.radiomail.net!fernwood!autodesk!tropic!tchild From: tchild@autodesk.com (Timothy Child) Newsgroups: comp.databases Subject: Re: Hierarchical Structures in RDBMS Message-ID: <18335@autodesk.COM> Date: 7 Jan 93 23:10:08 GMT References: <1993Jan6.094340.5870@devon.co.uk> Sender: news@Autodesk.COM Reply-To: tchild@autodesk.com Organization: Autodesk, Inc. Lines: 140 In article 5870@devon.co.uk, don@duncan (Don Radvan) writes: > > I have an interesting problem. I need to represent hierarchical data > structures within a relational db, but I need to do it in a way that > permits the user to create his own hierarchy. That is, I do not know the > structure of the hierarchy before hand. > > For example: > A user wants to organize various objects. These objects will > appear as leaf nodes in the user's own hierarchy. The user may choose many > different ways of organizing these objects, indeed, s/he may have many > simultaneous hierarchies of organization for each set of objects. What > results is a sort of directory tree. Each directory is a category of > objects and each "file" or non-directory entry in a directory is an > object. Given the objects of apples and oranges, we may get the following: > > Stuff > | > -------------------------------- > Food Non-Food > | > ---------------------- . . . > Fruit Veggies > | > ------------- > Apples Oranges > > And so when s/he does a query such as the following: > select name where Food = "Fruit" > or > select name where Stuff = "Food" AND Food = "Fruit" > > we would get Apples and Oranges. > It's possible to model this problem with an RDBMS, but I don't know of a STANDARD way to express the appropriate SQL to return the information and I don't think there is one! What you have in the hierarchy above is a specialized form of directed acyclic graph (DAG). The solution proposed below applies cyclic and acyclic graphs so the system also must contain extra logic to assure the appropriate specialized form of graph. A graph can be modeled in an RDBMS with two tables Nodes and Links. Nodes are the entities of the hierarchy like: Stuff, Fruit, Food, etc, . . . A node relation would be something like: Node id (prime key), node attributes, . . . Links are the connections between Nodes and contain the keys of the parent and sibling nodes. A link relation would be something like . . . Parent Node id, Sibling Node id, (together making the primary key), link attributes, etc . . . In the above example the tables would look like, . . . Nodes Links Stuff Stuff, Food Food Stuff, Non_Food Non_Food Food, Fruit Fruit Food, Veggies Veggies Fruit, Apples Apples Fruit, Oranges Oranges The query strategy is to start with a given Node id and search the Link table for all entires that contain that Node id as the parent. The parents sibling node ids can be used to search the Links table for their siblings. The search continues recursively until the desired number of levels has been reached. Standard SQL queries have no way of dealing with this recursive search. Some vendors provide work arounds with non standard features. Another solution is to have specific queries for a specific number of levels. There is one piece of semantics this relational approach doesn't capture and that is sub-entity synonyms. Because all entries in the hierarchy are nodes they appear in the node table as unique names, no duplicates are allowed. It is possible to have a hierarchy with two sub-entities with the same name that are different, my above solution doesn't support this. For example consider the hierarchy .... Stuff | ------------------- | | Food Non-Food | | ------- --------- | | | | Fruit Paint | | -------- ----------- | | | | | Orange Apple Blue Red Orange Orange is both a kind of fruit and a paint. To support this sort of arrangement you need to introduce an extra key field to identify node types. Let's call it the type id. The Node and Link relations now look like: Node Id, Type Id, (a compound primary key) ... Parent Node Id, Parent Type Id, Sibling Node Id, Sibling Type Id, (all 4 fields are the primary key) . . . I would also recommend adding an additional table of Type Ids. The type table would allow for referential integrity checks on the type id. This solution only works if the nodes are of different types. It wouldn't work if the hierarchy was a UNIX directory structure for example. Another problem with this approach and the relation model is the issue that a Node is a Node and only a Node. Therefore all nodes have an identical number and kinds of attributes. This problem can be dealt with in a couple of ways, including extra tables which contain specific attributes for each type. It is considered bad practice to need to add new tables because of new subtype instances. In general graph like structures are one of the toughest things to work with in the relational model. A network or an object oriented data base has a more natural affinity for solving this kind of problem. -Tim Child Autodesk Inc. (Exercising an individual first amendment right.)