home *** CD-ROM | disk | FTP | other *** search
- Tech Chapter 2 - Binary tree pseudocode MetalBase 5.1
- -------------------------------------------------------------------------------
-
- MetalBase 3.2 used straight, unbalanced binary-trees to keep track of data.
- 4.0 to 5.0 migrated to weight-limited AVL-balancing; the routines for
- keeping a tree in balance, and rebalancing it when needed, are below. 5.1
- uses a less restrictive version of 5.0's algorithm; the compile-time
- definition MAX_FRAG allows for a choice of a tradeoff between speed of
- updates and queries.
-
- I developed these algorithms by hand, just from common sense; if there's a
- flaw (other than rebalancing speed), it hasn't shown up after hundreds of
- thousands of uses, so I'm happy. I am, however, interested in a faster
- rebalancing algorithm (as long as it can be used with existing relations)...
- please lemme know if something strikes you.
-
- The balancing algorithm used for 5.0 (and for 5.1 if MAX_FRAG==1) requires
- that, at any node in a tree, the number of records to the left of a node
- is almost exactly equal to the number on the right; to make it work, they
- can be off by 1 (MAX_FRAG). So the following are balanced:
-
- ___5___ ___4___
- _2_ _8_ _2_ _6_
- 1 4 7 1 3 5 7
-
- And the following aren't (the first is too heavy to the left at 5, the second
- is too heavy to the right at 4):
-
- ___5___ ___4___
- _2_ 8 6_
- 1 4 7
-
- This kind of balancing ensures, in effect, that only the last two levels of
- the tree can ever have any empty nodes... so a tree will always stay as
- shallow as possible, providing the following speeds for algorithms:
-
- Worst-Case Average-Case Best-Case
- ------------ ------------ ---------
- SEARCH...........O(ln(n))........O(ln (ln(n)))...O(1)
- ADD..............O(n * ln (n))...O(n * ln (n))...O(ln (n))
- UPDATE...........O(n * ln (n))...O(n * ln (n))...O(ln (n))
- DELETE...........O(n * ln (n))...O(n * ln (n))...O(ln (n))
-
- MetalBase 5.1 is shipped with MAX_FRAG set to five; this provides the following
- timings:
-
- Worst-Case Average-Case Best-Case
- ------------ ------------ ---------
- SEARCH...........O(ln(n))........O(ln (n)).......O(1)
- ADD..............O(n * ln (n))...O(n)............O(ln (n))
- UPDATE...........O(n * ln (n))...O(n)............O(ln (n))
- DELETE...........O(n * ln (n))...O(n)............O(ln (n))
-
- I've never really liked O() notation; in the real world, the constants that
- drop out can make a big difference. In essence, searching is almost
- instantaneous for thousands of records; with MAX_FRAG at one, adds and
- updates are a painfully slow exponentially-increasing curve... with
- MAX_FRAG at five, average-case adds and updates are a fast-constant
- linear slope.
-
-
- PSEUDOCODE --------------------------------------------------------------------
-
- (*) indicates a procedure must be called once for each index
-
- -------------------------------------------------------------------------------
-
- mb_upd (rcd) :
- * _delete (rcd, <index>)
- change records instance on disk
- * _link (rcd, <index>)
-
- mb_del (rcd) :
- * _delete (rcd, <index>)
- _remove (rcd)
-
- mb_add (data) :
- rcd = _append (data)
- * _link (rcd, <index>)
-
- _link (rcd) :
- *| _drop (rcd, <index>) (these are called, one after the other,
- *| _check (rcd, top, <index>) for each index in the relation)
-
- _drop (pos, index) :
- for (loc = top; ; )
- { dir = _compare (rec (loc), rec(pos), index) -- (-1,0,1)
- loc->balance += dir;
- if ( loc->child[dir] == 0 )
- { loc->child[dir] = rec
- rec->parent = loc
- break
- }
- }
-
- _check (st, ed, index) :
- for (loc = st; ; loc=loc->parent)
- { if ( rec (loc) ->unbalanced(index) )
- _balance (loc, index)
- if (loc == ed) break;
- };
-
- _dislink (loc, index) :
- ch=loc->left || loc->right
- if (loc->pardir) loc->parent->right = ch;
- else loc->parent->left = ch;
- if (ch) ch->parent = loc->parent, ch->pardir = loc->pardir;
- /* DO NOT re-read loc */
- for (dir=loc->pardir,tmp=loc->parent;tmp!=0;dir=tmp->pardir,tmp=tmp->parent)
- tmp->balance -= (dir == 1) ? 1 : -1;
-
- _balance (loc, index) :
- if (! rep = find_seq (loc, rec(loc)->balance) )->ERROR!!! -- bal>0?Next:Prev
- rp=rep->parent
- _dislink (rep, index)
- _replace (loc, rep, index) -- Replace LOC with REP
- _drop (loc, index)
- if (rp != loc) _check (rp, rep)
- /* re-read loc */
- _check (loc->parent, rep)
-
- _delete (bad) :
- bp = bad->parent
- if (bad->balance != 0)
- rep = find_seq (bad, bad->balance) --bal>0?Next:Prev
- else
- if (! rep = find_seq (bad, toggle (lastmove)))
- rep = find_seq (bad, toggle (lastmove))
- if (! rep)
- _dislink (bad)
- else
- { rp= rep->parent;
- _dislink (rep)
- _replace (bad, rep)
- if (rp != bad) _check (rp, bp)
- }
- _check (bp, top)
-
- _replace (old, new) :
- new->left_c = old->left_c;
- new->right_ = old->right_;
- new->parent = old->parent; par=new->parent;
- new->pardir = old->pardir;
- par->[pardir]_c = new;
- new->left_c->parent = new;
- new->right_c->parent = new;
-
-