home *** CD-ROM | disk | FTP | other *** search
- Newsgroups: comp.theory
- Path: sparky!uunet!ukma!netsys!decwrl!ads.com!marcel
- From: marcel@ADS.COM (Marcel Schoppers)
- Subject: Re: #nodes in avg.tree of given depth?
- Message-ID: <1992Nov5.224938.25597@ads.com>
- Keywords: binary tree size
- Sender: Marcel Schoppers
- Organization: Advanced Decision Systems, Mtn. View, CA (415)960-7300
- References: <96222@netnews.upenn.edu>
- Date: Thu, 5 Nov 1992 22:49:38 GMT
- Lines: 55
-
- >In article <96222@netnews.upenn.edu> rymon@linc.cis.upenn.edu (Ron Rymon)
- > writes:
- >I second the naturality of random (uniformly distributed) binary trees of
- >depth at most d. However, the analysis of the NUMBER of such trees suggest
- >a way to draw such trees.
-
- Hmmm, OK... Here's what I'm up to. I'm doing an analysis of the expected
- size of a certain kind of program that happens to be analyzable as a
- decision tree. Hence binary trees are a suitable abstraction of those
- programs. Read program size as number of nodes in the corresponding tree.
- We are told the number of world-states (or objects) that the decision tree
- must classify; call that number W.
-
- It can be shown that on average (uniform distribution over all possible
- partitions of W objects) the W states to be classified will fall into
- O(W/log W) classes, i.e. that's how many leaves the binary tree should have.
- Hence we immediately know how many internal nodes it has. Further, we also
- know (from Knuth) that the expected height/depth of the tree is
- D = O( sqrt(W / log W) )
-
- However, the real classification programs I am dealing with turn out to be
- very much smaller, i.e. they are not average, and I've managed to prove
- mathematically that the cause of their small size is that the programs contain
- dynamically bindable variables. That results (at least for some domains I
- have analyzed) in trees that have
- O( (log W / loglog W)^P )
- leaves, where P is something small like 2 or 3.
-
- So, now I have an average-case estimate (way too large) and some actual-case
- figures (much smaller) and the restriction causing the difference (the
- existence of variables in the programs being modelled), and now want to find
- a way to do an average-case analysis for the *restricted* set of programs.
-
- It turns out that the presence of variables in my programs has the following
- effect on tree structure: the depth of the trees stays about the same, but
- the trees become much thinner "horizontally" because the variables effectively
- allow to replace a list of identical subtrees by only one such subtree (OK, so
- the depth of the list decreases by the number of subtrees in the list minus 1).
-
- Hence, my first idea was to find out the number of nodes in the average tree
- of a fixed depth D. The answer, O(2^D), for D = O( sqrt(W / log W) ) as above,
- would give me N = O(2 ^ sqrt(W / log W)) nodes, which is way too big, and so
- reveals that indeed, the restricted set of thinned trees I'm concerned about
- is very far away from being a uniformly distributed set of trees with depth D.
-
- Obviously, I need to come up with a suitably restricted set of binary trees,
- and then figure out the expected number of nodes in that set as a function of:
- the known depth, the number of subtrees in each list, and the density (and
- placements?) of such lists. (Such lists can be nested, i.e. a subtree in a
- list can contain another list.)
-
- Marcel
-
-
-
-