Probability, Uniform Distribution

Uniform Distribution

If x is a discrete random variable with a uniform distribution, it attains an integer from a to b inclusive, such that all integers are equally likely.  If there are n integers in the interval, then each is selected with probability 1/n.  For instance, the throw of a single die is uniform, with integers ranging from 1 to 6.  Even the toss of a coin is uniform, with integers ranging from 0 (tails) to 1 (heads).  (We're ignoring the Twilight Zone episode where the coin landed on its edge.)

The mean is (a+b)/2.  This is intuitive, but you can prove it by balancing the integers on either side of the mean.  The outside numbers, a and b, add up to a+b.  The same is true for a+1 and b-1.  Continue all the way to the middle, then divide by n, giving (a+b)/2.

To find the variance, assume the mean is 0.  The formula depends on whether n is even or odd.  If n is odd then the "middle" outcome is 0.  Start at the origin and head right, down the positive axis, and apply the formula for variance.  Compute the sum of squares, from 1 to c2, where c is (n-1)/2, then divide by n.  Since x could be positive or negative, you need to double this result.  For example, set n = 7, whence c = 3, and the variance is 2(1+4+9)/7 = 4.  The standard deviation is 2.

Can we find the variance in general?  There is a branch of mathematics known as "difference equations", similar to differential equations.  It tells us the sum of the first c squares is (2c3+3c2+c)/6.  (If you don't want to explore difference equations at this time, you can prove this result by induction on c.)  anyways, replace c with (n-1)/2, multiply by 2, and divide by n, giving (n2-1)/12.  For large n, the standard deviation approaches n over the square root of 12.  This approximation is surprisingly accurate, even when n is as small as 7.

What happens when n is even?  The mean lies halfway between the two "middle" integers, and when the mean is shifted over to 0, x attains half integer values, from -(n-1)/2 to +(n-1)/2.  To find the variance, take the sum of the squares of the positive half integers, double the result, and divide by n.  The algebra is similar to that described above.  Use the theory of difference equations, or induction, to show the sum of the odd squares between 1 and (n-1)2 equals (n3-n)/6.  Multiply by 2 and divide by 4n, and the variance is (n2-1)/12, just as it was when n was odd.  Once again the standard deviation approaches n over the square root of 12 for large n.

If a fair die is thrown, there are six possible outcomes from 1 to 6, and each is equally likely.  The mean is 3.5, the variance is 35/12, and the standard deviation is 1.71.

Continuous Model

If x is a continuous variable with a uniform distribution, it attains all real numbers on the closed interval [a,b] with a uniform probability of 1/(b-a).  The outcome is just as likely to appear here as there - anywhere in the interval is fair game.

Integrate x/(b-a) from a to b, and get the mean, (a+b)/2.  The average is smack in the middle, as we would expect.

Assume the mean is 0, and let the interval run from -c to +c.  To find the variance, integrate x2/2c from -c to +c, giving c2/3.  The standard deviation is c over the square root of 3.  In the discrete case, we used the variable n, the distance from a to b.  Here n would be 2c, and the standard deviation becomes n/sqrt(12).  We saw the same formula when we let the discrete model run to infinity.  We expect the discrete formula to approach the continuous formula for large n, and it does.

Nomenclature

The title of this page is "uniform distribution", yet we've never talked about a distribution function at all.  We described the density function in great detail, and if you want the distribution function you have to take the sum or integral yourself.  For instance, the distribution function for a continuous uniform variable on [a,b] starts at a,0 and rises linearly to b,1.

The same nomenclature is used in subsequent pages.  The "foo" distribution is described in terms of its density function, while its actual distribution function, i.e. the integral of the density function, is left as an exercise for the reader.  I realize this may cause some confusion, but I'm just following the text books.