Probability, Density and Distribution Functions

Density and Distribution Functions

Trials are not always binary (success or failure). When a trial has many outcomes, we associate each with an integer, and let fx be the probability that the trial will produce outcome x. Note that the sum over fx must equal 1. Here are the probabilities for the toss of two fair dice.

f(1) = 0
f(2) = 1/36
f(3) = 2/36
f(4) = 3/36
f(5) = 4/36
f(6) = 5/36
f(7) = 6/36
f(8) = 5/36
f(9) = 4/36
f(10) = 3/36
f(11) = 2/36
f(12) = 1/36

The distribution function dx is the probability that a trial will produce an outcome less than or equal to x. Thus dx is the sum of fj, as j runs from its minimum value up to x. Here is the distribution function for the roll of the dice.

d(1) = 0
d(2) = 1/36
d(3) = 3/36
d(4) = 6/36
d(5) = 10/36
d(6) = 15/36
d(7) = 21/36
d(8) = 26/36
d(9) = 30/36
d(10) = 33/36
d(11) = 35/36
d(12) = 36/36

As you can see, the distribution function begins at 0 and rises monotonically to 1.

Density and distribution functions have analogs when the spectrum of possible outcomes is continuous. Let f be a continuous function of the real variable x. The probability of finding x between a and b is now the integral of f(x) from a to b. The integral from -∞ to +∞ is necessarily 1. These density functions are assumed to be piecewise continuous, since point fluctuations wouldn't change the integrals, and would only lead to confusion.

The distribution function d(x) is the integral of f(t) as t runs from -∞ to x. In other words, d(x) measures the odds of getting an outcome less than or equal to x.

The simplest example is the floating point random number generator in your computer, which produces a random number between 0 and 1. If there is no bias, the density function equals one, from 0 to 1, and is zero elsewhere. The distribution function starts at the origin and rises linearly to the point 1,1. Of course d(x) remains 1 thereafter, since you are guaranteed to receive a number less than 1.

Let f be a density function, continuous at x. In fact let y = f(x). The odds of finding a value between x and x+h is the integral of f(t) from x to x+h. This is bounded between hy1 and hy2, where y1 and y2 are the minimum and maximum of f over this interval. By continuity, these bounds approach y, hence the probability approaches hy for small h. The density function accurately measures probability at every point, as long as you use small intervals.

Higher dimensional density functions allow trials to produce multiple values. Thus the probability that x lies between 1 and 3 while y lies between 5 and 9 is the double integral of f(x,y), as x runs from 1 to 3, and y runs from 5 to 9. For small regions, f is an accurate indicator of probability. Combine this with the definition of independent to show that x and y are independent variables iff f(x,y) = g(x)×h(y), where g and h are the individual density functions.