Generating Functions, Counting Trees

Ordered Rooted Trees

A rooted tree is a tree with one vertex designated as the root. If you've been to agricultural school, you'll place the root at the bottom of the page and draw the tree going up. But if you've studied math or computer science, the root is at the top, and the tree grows down from there.

A rooted tree is ordered if the edges below each node are ordered in some way, perhaps from left to right on the page. Assume the root has two edges, numbered 1 and 2. Let the subtree below the first edge have six nodes, while the subtree below the second edge has 8 nodes. Swap the two subtrees, and the result is a new ordered tree, because the first edge now has 8 nodes below it, and the second edge has six. However, when trees are unordered, swapping the two subtrees produces the same tree. It is still a root with the same two subtrees hanging off of it.

For now, we're going to look at ordered trees, because that's easier.

Here is a procedure for turning an ordered tree into a string of nested parentheses. Traverse the tree recursively, traveling through each subtree in turn, and visiting the branches in order from left to right. Print a left parenthesis as you move down an edge, and print a right parenthesis when you come back up an edge. As a result, the string associated with a subtree will be enclosed in the set of parentheses associated with the edge just above that subtree. This is true at every level, including the top. The edges incident to the root correspond to the outermost parentheses in the string. These are not enclosed in any other parentheses; they are at the top level. Conversely, the outermost parentheses establish the edges, in order, below the root. Look at the substring between the first set of parentheses. It defines the subtree below the first edge of the root. Do this for the entire string and you have reversed the process. We can encode each tree using parentheses, and then decode the parentheses to recover the original tree. The map is a bijection.

We know how to count strings of nested parentheses, hence the number of ordered rooted trees with n edges is the n^th catalan number, written cat(n), and having the formula (2n:n)/(n+1).

Binary Trees

Binary trees can also be placed in 1-1 correspondence with strings of parentheses. Traverse the tree in order, visiting the left edge and then the right. At each node, including the root, print a left parenthesis, follow the left edge and traverse that subtree, print a right parenthesis, and follow the right edge. Note that even a single edge must be designated left or right. Verify that this map is a bijection as well, hence the number of binary trees is cat(n).

Wait a minute! Surely there are more ordered trees than binary trees. How can they both be cat(n)?

When counting ordered trees, n was the number of edges; with binary trees n is the number of nodes. Also, the binary trees overcount the ordered trees of degree 2. There is one ordered tree with two vertices, the root and its child; but there are two such binary trees, since the edge must be designated left or right. In any case, the number of ordered rooted trees on n nodes is cat(n-1), and the number of binary trees, where each edge is labeled left or right, is cat(n).

Rooted Trees

How many (unordered) rooted trees are possible on n nodes?

Let r(x) be the generating function that answers this question. Set r(0) = 0; there are no trees with zero nodes.

Consider a rooted tree on n nodes. Below the root, there are some edges, we don't know how many, and each has a rooted tree below it. Thus the rooted tree on n nodes is synonymous with the rooted forest, on n-1 nodes, that lies below.

Apply polya's enumeration theorem. Let d(x) = r(x) + r(x²)/2 + r(x³)/3 + r(x⁴)/4 etc, and take the exponential of this function. The result is the number of rooted forests. Connect the trees in this forest back to the root to give the corresponding rooted tree. Here is the equation that relates r to a shifted version of itself.

r(x) = xE^d(x)

This doesn't tell us what r(x) is, nor its taylor coefficients, but it's the best we can do for now.

Edge Rooted Trees

Let an edge rooted tree be a tree with a designated edge. To count edge rooted trees on n nodes, place a rooted tree at one end and a rooted tree at the other, which is done in r² ways. But swapping the two rooted trees, one for the other, is the same edge rooted tree, so we're double counting. Divide by 2. Finally, the trees that are perfectly symmetric on this edge were not double counted, and we cut those in half. Add them back in again, giving this formula.

e(x) = ½(r(x)² + r(x²))

Counting Trees

Given r(x), and e(x), can we find a generating function t(x) that counts the distinct trees on n nodes?

Establish an order on rooted trees, starting with the number of nodes, and then using any criteria you like after that. Now, given an edge rooted tree, consider the two rooted subtrees incident to our designated edge. If one is lighter, let its root be the root for the entire tree. If both subtrees are the same, select either root to act as root for the entire tree. It doesn't matter which point you select, since the resulting rooted tree comes out the same. Thus we have a map from edge rooted trees into rooted trees.

Suppose two edges of a given tree select the same vertex v. Draw two arrows into v, along the two incident edges; the arrows pointing to the lighter half of the tree. Back up through one of the arrows and find at least half the nodes. There is no way the other arrow can point to v, since that part of the tree includes v, and at least half the nodes, and is the heavier portion.

Next assume different edges point to different vertices, housed in the same tree, and when we mark these vertices as roots, the two rooted trees are isomorphic. The heaviest branches of the rooted trees are isomorphic, so the two subtrees behind our designated edges are isomorphic. The other branches of the two rooted trees correspond, and that means the subtrees ahead of our designated edges are isomorphic. Put this all together and the two edge rooted trees are isomorphic as well. They are really the same edge rooted tree after all. For a given tree, the map from edge rooted into rooted is 1-1.

Suppose two trees, not isomorphic, lead to the same rooted tree. Take the label off the root to resurrect the original unrooted tree. Thus distinct trees always lead to distinct rooted trees.

Given a rooted tree, look for an incident edge that leads to a heavier branch. In other words, look for an arrow pointing up to the root, towards the lighter side. If there is one, that is the edge rooted tree that leads to this rooted tree. But if all edges point down to lighter branches, then this root is the "center" of the tree, and this rooted tree does not come from an edge rooted tree.

If there are two centers c and d, draw a unique path between them. The heavier portion of the tree lies behind c, as we move towards d, and it also lies behind d as we move towards c. We can't have more than half the tree behind two distinct vertices, so if there is a center, it is unique.

Start at a leaf and move in the opposite direction of the arrow, into the tree. At the next node, look for an arrow that points towards the node and move up the arrow, going further into the tree. Continue this process, always moving against the arrows. If you run into a node you saw before, you have made a cycle, which is impossible. The process must stop sometime, hence there are no arrows pointing into your node. This is the center.

Edge rooted trees map 1-1 into rooted trees. They almost map onto rooted trees, except every tree has a rooted tree that is omitted - when that tree is rooted at the center. The number of trees is almost r(x)-e(x). We still have a problem with the trees that are symmetric about an edge. The rooted tree that is missing is actually covered by the root at the other end of the designated edge. So the edge symmetric trees have to be put back in.

t(x) = r(x) - e(x) + r(x²)

t(x) = r(x) - ½(r(x)²+r(x²)) + r(x²)

t(x) = r(x) - ½r(x)² + ½r(x²)

This will serve us well, as soon as someone comes up with a nice formula for r(x).