- Independent Events
- Permutations
- Circular Permutations
- Combinations
- Pascal's Triangle
- Binomial / Multinomial Theorem
- Binomial Theorem and Differentiation
- Continuous Binomial Theorem
- Factors of p in n!
- Poker Hands
- The Birthday Paradox
- The Monty Hall Dilemma
- Inclusion Exclusion
- No-Match Permutations
- The Marriage Problem
- Catalan Numbers

There is a catch. The individual events must be independent. If someone decides to drop the coin flat on the table, heads up, whenever the die comes up 6, there are only 11 possible outcomes, not 12. This disclaimer is obvious, but in the real world, seemingly unrelated events are often connected in subtle ways.

The branch of mathematics known as combinatorics combines independent events and counts, or at least estimates, the number of outcomes. Probability brings in the concept of randomness, so that one of these outcomes can be chosen at random. Then we can compute the odds of a royal flush, 4 chances in 2598960. That will come later. For now, let's start counting events.

A permutation is a specific linear arrangement of items, such as books on a shelf.
The number of n element permutations is the product of the integers from 1 to n,
which is called n factorial, and is written n!.
Select the first (leftmost) book in n different ways,
then the next book in n-1 ways, and so on until there is but 1 way to choose the last book.
The number of outcomes is n × (n-1) × … × 2 × 1 = n!.
Note that the n events are numerically independent, not physically independent. After selecting David Copperfield, we cannot select it again. Yet after selecting any of the n books, there are n-1 books to choose from, then n-2 books, and so on. We only need numerical independence to multiply outcomes together. For convenience, define 0! = 1. |

When k out of n elements are indistinguishable, e.g. k copies of the same book, the number of different permutations is n!/k!. Temporarily distinguish the k elements, e.g. number the copies of David Copperfield, so that there are again n! permutations. The order of the k copies doesn't really matter, so k! permutations map onto 1. Thus we obtain n!/k!.

A similar factor must be included for each group of repeated elements. For example, The number of permutations of the letters "JJJKLMMN" is 8!/3!/2! = 3360.

In a truncated permutation, only the first k elements of an n-element permutation are significant. As before, there are n elements to choose from, then n-1, then n-2, down to n-k+1. This gives n!/(n-k)!.

There is no simple formula for a truncated permutation with repeated elements, though one can count them out systematically. Consider arranging 4 letters from ABCDEEE. If you select at most one E, that's a truncated permutation, 4 out of 5, giving 120. With three E's on the shelf, there is one other letter, A B C or D, and it is placed in one of 4 positions, hence 16. Finally 2 E's on the shelf calls for 2 other letters, which can be done in 6 ways. Then we are permuting 4 elements with 2 repeated E's, so 6 times 4!/2! = 72. The sum is 208, and there just isn't a nice formula for that.

A designated person, person number 1, sits somewhere; call this the head of the table. Let the permutation run clockwise from there. There are n-1 people to choose from, then n-2, then n-3, and so on down to the last. This is (n-1)!.

[ people around a table, numbered 1 at the top then 5 3 6 4 2 around, or some such ]

A box may have 13 different books in it, and you're selecting 4 to take to school. How many 4 book sets are possible?

Start by calling this a truncated permutation. You're going to put the four books on the shelf, left to right, and there are 13!/9! ways to do that. As you gather up the four books and put them in your backpack, the order doesn't matter, so the 4! arrangements of the 4 books all map onto the same set. The number of combinations is 13!/(9!×4!), or 715.

In general, a set of n elements produces n!/(k!×(n-k)!) k-element subsets. This is a common expression, called "n choose k", and written (n:k) for convenience. (Note, (n:k) is my own ascii notation; the standard notation is two dimensional, and shown on the right.) |

Although most applications partition a set into two subsets, e.g. the backpack and the box, the formula can be generalized to handle many subsets. Place four books in the backpack, three in a desk drawer, and two on the mantle, leaving 4 in the box, and the number of book distributions is 13!/(4!×3!×2!×4!) = 900900.

Leaving a combination's size unspecified seems, at first, to complicate the problem. Select an arbitrary number of books from a box containing 13 books, and stuff them into a backpack, and the number of possibilities is (13:0) + (13:1) + … + (13:13) = 8192. The size of the subset is considered first, then the possible combinations for each size. But there is an easier way. Each book either goes into the backpack or remains in the box. Two choices for each of 13 independent events yields 213 = 8192 possibilities. Of course you can have more than two containers. The 13 books can be placed into k different containers in k13 ways.

When the elements are indistinguishable (say 13 marbles), they can be placed in k containers in (13+k-1:13) ways. To see this, picture each marble distribution as a row of marbles with k-1 container dividers inserted into the row. The marbles between divider i-1 and i are located in the ith container. Set k = 2, for instance, and there are (14:13) = 14 possible outcomes. Any of 0 to 13 marbles can be placed in the first box, leaving the rest for the second box.

Start with (x+y)n-1, and expand by the binomial theorem, giving:

a0xn-1 + a1xn-2y + a2xn-3y2 + … + an-2xyn-2 + an-1yn-1

Where the coefficients ai come from row n-1 of Pascal's triangle. Then multiply by x+y. Each coefficient in the product, other than xn and yn, is the sum of two coefficients from the previous product. For example, x3y6 in (x+y)9 comes from x2y6×x and x3y5×y. Hence we add the coefficients on x2y6 and x3y5 from (x+y)8 to get the coefficient on x3y6 in (x+y)9. The coefficients on the terms xn and yn are always 1. This mirrors the construction of Pascal's triangle perfectly.

At the same time, there is another formula for the jth coefficient in the product.
By the distributive law, there are 2n terms,
selecting x or y from each factor x+y.
The number of terms with j instances of x and n-j instances of y
is the number of ways we can choose j factors out of n (these j factors presenting x rather than y),
which is n choose j, or (n:j),
or n!/(j!×(n-j)!).
This means the jth number
in the nth row of Pascal's triangle is also (n:j).
Since these numbers are the coefficients on the nth power of a binomial,
they are often called binomial coefficients.
Set x=y=1 and obtain the immediate corollary: 2n is the sum of (n:i) as i runs from 0 to n. The sum of all the binomial coefficients from 0 to n, or the sum of the entries in the nth row of Pascal's triangle, is 2n. When x and y are 1 and -1, the alternating sum of binomial coefficients is 0, except for the top row. |

Remember that the binomial theorem fails if multiplication does not commute. In the quaternions, (i+j)2 is not i2+2ij+j2. It is in fact i2+ij+ji+j2, which equals -2.

The binomial theorem generalizes to the multinomial theorem when the original expression has more than two variables, although there isn't a triangle of numbers to help us picture it. When x+y+z is raised to the n, there are n!/(i!×j!×k!) ways to select i instances of x, j instances of y, and k instances of z. Hence this combination becomes the coefficient on xiyjzk.

Let's look at (x+y+z)7. The coefficient on xy2z4 is the number of ways you can arrange xyyzzzz. That's 7! divided by (1!×2!×4!), or 105.

Remember that (1+x)n is the sum of (n:k) × xk, by the binomial theorem. Differentiate this with respect to x, then substitute x = 1. The right side is the sum of k×(n:k), which is exactly what we want. The left side becomes n×2n-1.

If you are interested in the sum of k2×(n:k), take the second derivative and replace x with 1. This isn't quite right, because you get the sum of k×(k-1)×(n:k), but you can always ad n×2n-1 back in. This cancels the -k in k×(k-1). The answer is (n2+n)×2n-2. Note that similar formulas can be derived for the sum of k3×(n:k), and so on.

Just to give you a hint of where we are headed, we want a binomial expansion, similar to the binomial theorem, for the square root of x+y, that is, (x+y)½.

Let c be a constant, and let f be the differentiable function (c+x)t, where t is a positive real number. What is the Taylor series for f?

Evaluate f at 0 and get ct.

The first derivative is t×(c+x)t-1, hence f′(0) = tct-1.

The second derivative evaluated at 0 is t(t-1)ct-2, and so on.

Divide these derivatives by k! to get the taylor coefficients. This gives a factor of the form t×(t-1)×(t-2)×…(t-k+1) over k!. Use the binomial coefficient (t:k) to represent this expression. When t is a positive integer n, the result is the traditional binomial coefficient (n:k). We are simply generalizing the notation to any real number t.

Now we are ready to write the power series for f.

f = ∑{k=0,∞} (t:k)ct-kxk

When t is a positive integer, such as n, the nth derivative of (c+x)n is a constant for all x, and all higher derivatives are 0. The power series truncates to a finite polynomial, as one would expect when f itself is a polynomial. In fact, the power series reproduces the traditional binomial theorem.

To evaluate convergence, write f as exp(t×log(c+x)). This is analytic at 0, hence f agrees with its power series on a circle of convergence. Since log(c+x) converges out to |x| = c, and exp() converges everywhere, f converges at least as far as c. If x is even slightly larger than c, say c+ε, move far out in the power series and consider the ratio of adjacent terms, (t-k)/(k+1) times (c+ε)/c. The first fraction approaches 1 for large k, and the second fraction is a fixed number greater than 1. By the ratio test, this series does not converge, hence the radius of convergence is c.

Note, When t = n, f is a polynomial that converges across the entire plane.

Since f is analytic it extends uniquely to complex numbers, i.e. (c+z)t. Even the exponent t could be complex as well. This is the continuous binomial theorem in its full generality.

Apply this to sqrt(c+x). Rewrite this as (c+x)½, then build the following power series.

f = ∑(k=0,∞) (½:k)c½-kxk

Is there a convenient formula for ½ choose k?

Set the denominator of k! aside for the moment, and consider the product of 1/2 times -1/2 times -3/2 times -5/2 etc. Multiply by 2k and find 1×-1×-3×-5… out to -2k+3.

Setting the signs aside for the moment, the product becomes (2m)! over (m!×2m), where m = k-1.

Divide by 2k (since we multiplied by 2k earlier), and obtain the following.

(2m)! over m!×2m×2k

2×(2m)! over m!×4k

2k×(2m)! over k!×4k

2k×(2k-1)×2m)!/(2k-1) over k!×4k

(2k)! over (2k-1)×k!×4k

Bring the denominator of k! back in, and get this.

(½:k) = (2k)! over (2k-1)×k!×k!×4k

(½:k) = (2k:k) over (2k-1)×4k

This isn't quite right when k = 0, as it produces -1 instead of 1. But remember, we set the signs aside. Let's bring them back in now. When k is 1 the sign is positive, and it alternates from there, so multiply by (-1)k+1, and it all works out. Verify the above formula, with alternating sign, for k = 0 1 2 3 and 4.

With this in hand, the power series for sqrt(c+x) is as follows.

sqrt(c+x) = ∑(k=0,∞) (-1)k+1 × (2k:k) / ((2k-1)×4k) × c½-kxk

If you like, pull sqrt(c) out of the sum, and replace c-kxk with (x/c)k.

sqrt(c+x) = -sqrt(c) × ∑(k=0,∞) (2k:k) / ((2k-1)×4k) × (-x/c)k

There are n/p integers, up to n, that are divisible by p. Here n/p is an integer operation, n divided by p with the remainder thrown away. To this we must add n/p2, since this many integers are divisible by p2. Continue all the way up to pt. However, there is another way to write this, more compact than a sum of quotients.

Write n in base p. In other words:

n = a0 + a1p + a2p2 … atpt.

The kth quotient from the previous sum is n/pk, with remainder discarded, which can be written in base p as follows.

ak + ak+1p + ak+2p2 … atpt-k.

Add these rewritten quotients together, and regroup terms according to the coefficients, giving:

a1 + a2(1+p) + a3(1+p+p2) … at (1+p+…+pt-1).

Multiply this by p-1, giving:

a1(p-1) + a2(p2-1) + a3(p3-1) + … + at(pt-1).

Throw in a0-a0. Group a0 with the sum of akpk to rebuild n. Group -a0 with the sum of -ak to build a number that we will call digits(n), the sum of the digits of n in base p. Therefore the number of factors of p in n! is (n-digits(n)) ÷ (p-1).

Note that p-1 goes in evenly; there is no need to round or truncate.

For instance, there are (17-(2+2+1))/(3-1) = 6 powers of 3 in 17!. These come in via 3, 6, 9, 12, and 15.

There are 15 factors of 2 in 17!

To find the powers of p in the binomial coefficient (n:k), the integers n, k, and n-k cancel, leaving (digits(k) + digits(n-k) - digits(n)) / (p-1).

If n is a power of p, digits(n) is 1. Digits(k) + digits(n-k) will contribute more than 1, unless k is 0 or n. Other than the boundaries, (n:k) contains factors of p.

When n = p, each binomial coefficient has precisely one factor of p, (digits(k) + digits(p-k) - 1) / (p-1), except for k = 0 or p. Of course the special case of n = p has a simpler proof, insofar as p! has exactly one factor of p, and k! and (p-k)! have none, unless k = 0 or p.

Binary coefficients for 7 choose k: 1 7 21 35 35 21 7 1

By the binomial theorem , (x+y)p has all its terms divisible by p, except for xp and yp. If x is not divisible by p, and y has k factors of p, then (x+y)p begins with xp, which is not divisible by p, then pxp-1y, which has k+1 factors of p, then a lot of terms with at least k+2 factors of p. Thus the minimum power of p, ignoring xp, has increased by one. This is called ratcheting the exponent.

Note that this breaks down when y has one factor of 2, as in (x+2w)2 = x2+4xw+4w2. The binomial coefficients stop before the higher powers of 2 kick in, and the second and third terms combine to give a multiple of 8, rather than a multiple of 4. Ratcheting works properly when p is odd, or when the second term has more than one factor of 2.

There are 52 choose 5 or 2598960 possible hands. With this as a common denominator, we only need find the number of hands in each category.

A straight flush consists of 5 cards in sequence, all of the same suit. There are 4 suits, and the sequence can start with anything from ace to 10 inclusive. That's 4×10 = 40 possibilities.

A straight is merely an ascending sequence, with the suit unconstrained. There are 10×45 = 10240 straights, but we should really subtract the straight flushes, leaving 10200 proper straights.

A flush consists of 5 cards of the same suit. There are 4 suits, and 13 choose 5 combinations in each. Multiply this out and subtract 40, giving 5108 flushes.

If you want four of a kind, pick a rank with all four suits represented, then any other card. That's 13×48 = 624.

A full house requires 3 of one rank and 2 of another. That's 13×4 times 12×6 = 3744.

Three of a kind selects a rank, and 3 from that rank, which is 13×4. Then select any two other cards, 48 choose 2, then take away the full houses. The result is 54912.

For two pair, select two ranks, 13 choose 2, and 2 cards for each, 6×6. Then bring in one of the remaining 44 cards, giving 123552.

Finally we have a pair. Choose a rank and two cards from that rank, 13×6. Then choose 3 ranks from the remaining 12, and any of 4 cards for each. This gives 1098240.

Here is a summary of the hands, starting with the best. Remember the common denominator of 2.6 million.

Straight flush = 40 = 0.0015%

Four of a kind = 624 = 0.024%

Full house = 3744 = 0.14%

Flush = 5108 = 0.19%

Straight = 10200 = 0.39%

Three of a kind = 54912 = 2.1%

Two pair = 123552 = 4.7%

Pair = 1098240 = 42%

The math is easy if you turn the problem around. What are the odds that every birthday is unique? The first one is free. The second person has a new birthday with probability 364/365. (I'm ignoring leap year.) The third person has a new birthday with probability 363/365, then 362/365, and so on. By the time you get to 343/365, the ratio is 49.27%. So you have a 50 50 shot at finding a duplicate birthday.

Assume there are n choices, and you are looking for duplicates. The odds of having no duplicates after k selections is the product of 1 - i/n, as i runs from 0 to k-1. If r and s are quite small, then 1-r times 1-s is approximately 1-(r+s). Thus 1-0.001 times 1-0.002 is about 1-0.003. Apply this shortcut here. Since i is small relative to n, the product is approximately 1 minus ∑ i/n. This in turn is 1 minus (the sum of the first k integers) divided by n. The sum of 1 through k is (k2+k)/2. This is well known, but will be proved and generalized in chapter 6. Set k to the square root of n, and the sum of the first k integers is half of n. Divide this by n to get ½, whence 1 - ½ = ½. Therefore you have an even chance of finding a duplicate amongst k selections, where k is the square root of n.

This is approximate of course. The true answer is slightly larger, but not much larger. If the pool is 10,000, you must select 119 to expect a duplicate; that's just a bit more than 100. Here is a program that does the math.

This principle has practical applications, such as Pollard rho factorization, described in the next chapter.

The answer is clear if there are 500 doors. You pick one, Monty shows you 498 fabulous prizes, and there is one last closed door to consider. You certainly didn't select the one "bad" door at the outset. If you switch now, you're sure to bring home a goat. Stick with your original choice!

The math is not as extreme with three doors, but the answer is the same. Let's say you switch. You selected a good door at the start with probability 2/3, and now you blew it! Only 1/3 of the time, when you selected the bad door, do you become a winner. Now - let's say you stand pat. You're a winner 2/3 of the time, and a loser 1/3 of the time.

The situation is reversed if there is only one fabulous prize. After Monty reveals a worthless pile of empty boxes, you should switch. Again, this is clear with 500 doors, but still true with 3.

First consider each set independently, and compute the total number of elements therefrom. This is an upper bound, since elements appearing in multiple sets will be counted more than once. Next subtract the elements appearing in pairwise intersections. Then add the elements appearing in three way set intersections. Continue alternating until the set intersection criterion reaches n.

To prove that every element is counted exactly once, consider an arbitrary element x that belongs to exactly r sets. To start the algorithm, add all the elements in all the sets, disregarding overlap. Thus x is counted r times. Next, x appears in (r:2) (r choose 2) pairs of sets, so subtract (r:2) from r. Next, x appears in (r:3) (r choose 3) set triples, So add (r:3) to the total. Continue alternating until (r:r) is reached. Of course r choose r, the last term, is 1. Our element x is only counted once at step r, since there is precisely one collection of r sets that contains x. The element cannot be a member of more than r sets by definition.

Add -1 to the beginning of this series, -1+r-(r:2)+(r:3)-(r:4)+…, and apply the binomial theorem in reverse to get -(1-1)r. This is zero, so the original series added up to 1, and x was counted exactly once.

Inclusion exclusion also works for non-discrete applications, such as determining the area consumed by many intersecting regions. Partition the regions into squares of size ε, and invoke the theorem. Then take the limit as ε approaches 0. In this illustration, the numbers indicate the overlap count. |

T is covered by the set of permutations that fix 1, the set of permutations that fix 2, the set of permutations that fix 3, and so on up to the permutations that fix n. Add up the sizes of these n sets, then subtract the pairwise intersections, add the three way intersections, and so on. An example of a 3 way intersection fixes 123, and leaves the rest unconstrained. This is a set of size (n-3)!. And there are (n:3) such sets. Multiply these together to get n!/3!, or in general, n!/i!. T is the alternating sum of n!/i!, starting with i = 1. Subtract this from U and find the alternating sum of n!/i!, starting with 0. Factoring out n! leaves the beginning of the taylor expansion for E-1. Thus the number of no-match permutations approaches n!/E for large n. When n is 10 or greater, the error is less than 1 part per million.

Sometimes the alternating restriction is dropped in order to simplify the problem. In this case we can compute the number of permutations using inclusion exclusion. U is all the circular permutations of 2n people, or (2n-1)!. T is the set with at least one couple together.

To seat at least i couples together, select i couples out of n, order them male-female or female-male, and place them at the table along with the other 2n-2i people. This yields a formula:

(n:i) × 2i × (2n-i-1)!

Take the alternating sum of this expression, as i runs from 1 to n, to find T. Subtract this from U to get the no-couple permutations.

When n = 3, U = 120, T = 144 - 72 + 16 = 88, and U-T = 32. Do this manually and find 18 with BBBGGG (B is boy and G is girl), 6 with BBGBGG, 6 with BBGGBG (by symmetry), and 2 with BGBGBG.

1, 1, 2, 5, 14, 42, 132, 429, 1430, 4862, 16796, 58786, 208012, …

This is the number of strings of balanced pairs of parentheses. When n = 3 there are 5 possibilities: ((())), (()()), (())(), ()(()), and ()()(). I offer this without proof for now, but the proof will come later, using generating functions and the continuous binomial theorem.

Catalan numbers appear again and again. Here are a few examples.

A binary tree has at most two edges descending from each node, designated left and right. Even a single edge must be designated left or right. For every node, including the root, print a left parenthesis, travel down the left edge and traverse that tree, print a right parenthesis, and travel down the right edge and traverse that tree. The tree can be reconstructed from the string of parentheses, so the correspondence is 1 for 1. Thus cat(n) is the number of binary trees with n nodes.

[ binary tree, root at the top, two branches left and right, a few more branches going down from these. ]

An ordered tree can have many edges descending from each node, but they are numbered left to right. Catalan numbers count the ordered trees with n edges. Traverse the tree recursively, traveling through each subtree in turn, and visiting the branches in order from left to right. Print a left parenthesis as you move down an edge, and print a right parenthesis when you come back up an edge. As a result, the string associated with a subtree will be enclosed in the set of parentheses on the edge just above that subtree. The edges incident to the root correspond to the outermost parentheses in the string. Conversely, the outermost parentheses establish the edges, in order, below the root. Look at the substring between the first set of parentheses. It defines the subtree below the first edge of the root. Do this for the entire string and get the original tree back again.

Consider a polygon that is fixed in the plane. How many ways can you triangulate this polygon?

Let n be 2 less than the number of sides, so that a triangle has n = 1. Of course a triangle is already triangulated, so this can be done in one way.

A square has n = 2, and there are 2 ways to triangulate; draw the diagonal or the other diagonal.

A pentagon has n = 3, and there are 5 ways to triangulate; draw two lines from any of 5 vertices to the two points on the other side. So far the number of triangulations equals cat(n).

Build a correspondence between triangulations and strings of parentheses, starting with a triangle, which corresponds to (). For a larger polygon, number the points clockwise, with 1 at the top. The point just to the left of 1 is n+2. If no edges are incident to 1, then a segment joins n+2 and 2. Print a left parenthesis, convert the smaller polygon, with 2 as its apex, into parentheses, then print a right parenthesis.

[ Octagon with vertex at the top numbered 1, then 2 through 8 around clockwise, edge from 8 to 2, edges from 2 to 6 and 7, edges from 6 to 3 and 4 ]

Assume j is the first vertex that is connected to 1. If j = 3 then we have our triangle. Print a set of parentheses, then encode the rest of the polygon with 1 as the apex.

If j > 3, j and 2 must be connected. Print a left parenthesis, encode the polygon with 2 as apex, print a right parenthesis, and encode the polygon with 1 as apex.

This process can be reversed, giving the original triangulation. Thus cat(n) counts the triangulations of a labeled polygon with n+2 edges.