As you recall, the geometric distribution measures the number of trials before success. How many times must you roll a die before 3 comes up, etc. If each trial succeeds with probability p, and fails with probability q, the geometric distribution says your first success occurs on the kth trial with probability pqk-1. If a trial is run every minute, the experiment stops at k minutes with probability pqk-1.
We want the experiment to run continuously, like an unstable atom ready to decay. We don't have to wait for a minute boundary; it could decay at any time. Interpolate the geometric distribution, with finer and finer resolution, to obtain the poisson distribution.
Instead of running a trial on the minute, run a trial every 30 seconds. However, the mean should be the same. On average you have to wait just as long for the experiment to end. Cut p in half, thus doubling 1/p, the average number of 30 second trials. Since trials are run twice as often, you'll be waiting just as long as you did before.
But why stop at two trials per minute? Pack n trials into each minute, and divide p by n. Once again the mean is preserved, and you'll be waiting just as long, on average. What does this high resolution distribution look like?
The density function is rather awkward in this case, so derive the distribution function. The experiment continues past the kth trial with probability qk. In other words, the first k trials all yield failure. So the experiment stops at k trials or less with probability 1-qk. This is the distribution function.
Now we're ready to pack n little trials in to each minute. Divide p by n, adjust q accordingly, and multiply k by n. Here is the probability of termination at or before k minutes.
1 - (1-p/n)kn
What happens to this expression as n approaches infinity? The 1 doesn't bother us, but the next term is a puzzle. Let's see what happens to its log.
kn × log(1-p/n)
For large n, log(1-p/n) approaches -p/n. Multiply this by kn and get kp. If the experiment runs continuously, the distribution function, for k minutes, is 1-E-kp. This rises monotonically from 0 to 1, as k runs from 0 to infinity. And that's what a distribution function should do.
We don't usually think of k as a real variable, but now we want to interpolate to all points in time. So replace k with t. The distribution function is now 1-E-pt, where t is the number of minutes. This is well defined for all real values of t ≥ 0.
At the outset, p was constrained to be less than 1, but this is no longer necessary. For example, change the units of time from minutes to hours. This replaces t with 60t in the distribution function. In other words, the coefficient p is multiplied by 60. There's no need to keep p below 1 any more; it all depends on how you measure time. Every positive p creates a valid poisson distribution, and in some sense, there is but one poisson distribution, once p is normalized to 1.
Take the derivative to get the density function: pE-pt. If p is large, the density function could be high at the start, but it drops off abruptly. This corresponds to an atom that decays quickly. Conversely, a long-lived isatope has a tiny value of p. Its curve begins lower on the y axis, but does not drop off as quickly, so that its integral remains 1.
What is the mean? How long will you wait, on average, for the atom to decay? When the geometric distribution was cut in to n pieces, the mean was always 1/p. And the poisson distribution is the limit of these geometric distributions. So it ought to have the same mean. Right? Let's find out.
-tE-pt - E-pt/p
Evaluate at t = 0 and t = ∞ and get 1/p. All's right with the world.
Now for the variance. Find the weighted integral of t2, then subtract the square of the mean.
∫pt2E-pt = (by parts)
-t2E-pt + ∫2tE-pt
We know that t2 times a negative exponent becomes 0 at 0 and infinity, and the second integral is 2/p times the mean, so the result is 2/p2. Subtract the square of the mean, and the variance is 1/p2. The standard deviation is 1/p.
In the geometric distribution, the standard deviation was often close to the mean. In the poisson distribution the standard deviation always equals the mean.
Consider the decay of an unstable atom. The atom has no memory. As long as it is still with us, it has the same instantaneous probability of decay. Therefore, the probability of a decay event at or near time t only depends on whether the atom is still present at time t. Take the probability that it has not decayed prior to time t, multiply by a fixed constant, and find the probability of decay at time t. Hence the density function and the distribution function are interrelated.
Let f be the distribution function, whence f′ is the density function. Let p be the constant of proportionality, which measures the "instability" of the atom. The following equation holds.
f′(t) = p × (1-f(t))
This differential equation has the solution 1-E-pt, which is the same formula we saw earlier.
The first atom has not decayed by time t with probability E-pt. Both atoms are still present at time t with probability E-ptE-rt. Therefore the "first" atom decays at or before time t with probability 1-E-(p+r)t. This is another poisson distribution function.
Waiting for the first of two poisson events is poisson. In general, waiting for the first of n poisson events is poisson.
There is another way to merge poisson events, that relies on the law of large numbers. Consider this example.
A sizable chunk of radioactive material contains an unthinkably large number of atoms. After t seconds each atom is still around (in its original form) with probability E-pt. Assign it an outcome of 1 if it remains, 0 if it has decayed. The average number of atoms remaining is now the total number of atoms times E-pt. Furthermore, the variance, relative to the total aggregate, is practically 0. We can be quite comfident that the amount of "stuf" remaining after time t is the original mass times E-pt. The same formula was derived, in "integral calculus", using a differential equation.