NetNews Usenet Archive 1992 #26

home *** CD-ROM | disk | FTP | other *** search

/ NetNews Usenet Archive 1992 #26 / NN_1992_26.iso / spool / comp / theory / 2344 < prev next >

Wrap

Text File | 1992-11-06 | 3.2 KB | 68 lines

Newsgroups: comp.theory Path: sparky!uunet!ukma!netsys!decwrl!ads.com!marcel From: marcel@ADS.COM (Marcel Schoppers) Subject: Re: #nodes in avg.tree of given depth? Message-ID: <1992Nov5.224938.25597@ads.com> Keywords: binary tree size Sender: Marcel Schoppers Organization: Advanced Decision Systems, Mtn. View, CA (415)960-7300 References: <96222@netnews.upenn.edu> Date: Thu, 5 Nov 1992 22:49:38 GMT Lines: 55 >In article <96222@netnews.upenn.edu> rymon@linc.cis.upenn.edu (Ron Rymon) > writes: >I second the naturality of random (uniformly distributed) binary trees of >depth at most d. However, the analysis of the NUMBER of such trees suggest >a way to draw such trees. Hmmm, OK... Here's what I'm up to. I'm doing an analysis of the expected size of a certain kind of program that happens to be analyzable as a decision tree. Hence binary trees are a suitable abstraction of those programs. Read program size as number of nodes in the corresponding tree. We are told the number of world-states (or objects) that the decision tree must classify; call that number W. It can be shown that on average (uniform distribution over all possible partitions of W objects) the W states to be classified will fall into O(W/log W) classes, i.e. that's how many leaves the binary tree should have. Hence we immediately know how many internal nodes it has. Further, we also know (from Knuth) that the expected height/depth of the tree is D = O( sqrt(W / log W) ) However, the real classification programs I am dealing with turn out to be very much smaller, i.e. they are not average, and I've managed to prove mathematically that the cause of their small size is that the programs contain dynamically bindable variables. That results (at least for some domains I have analyzed) in trees that have O( (log W / loglog W)^P ) leaves, where P is something small like 2 or 3. So, now I have an average-case estimate (way too large) and some actual-case figures (much smaller) and the restriction causing the difference (the existence of variables in the programs being modelled), and now want to find a way to do an average-case analysis for the *restricted* set of programs. It turns out that the presence of variables in my programs has the following effect on tree structure: the depth of the trees stays about the same, but the trees become much thinner "horizontally" because the variables effectively allow to replace a list of identical subtrees by only one such subtree (OK, so the depth of the list decreases by the number of subtrees in the list minus 1). Hence, my first idea was to find out the number of nodes in the average tree of a fixed depth D. The answer, O(2^D), for D = O( sqrt(W / log W) ) as above, would give me N = O(2 ^ sqrt(W / log W)) nodes, which is way too big, and so reveals that indeed, the restricted set of thinned trees I'm concerned about is very far away from being a uniformly distributed set of trees with depth D. Obviously, I need to come up with a suitably restricted set of binary trees, and then figure out the expected number of nodes in that set as a function of: the known depth, the number of subtrees in each list, and the density (and placements?) of such lists. (Such lists can be nested, i.e. a subtree in a list can contain another list.) Marcel