Unit 13.3 Some common probability densities
There are a zillion different functions commonly used for probability densities. Three of the most common are named in this section: the exponential, the uniform, and the normal. These are common in probability for reasons analogous to why exponential behavior is common in evolving systems. Each of these is constructed to have a particular useful property.
The uniform distribution, as the name applies, arises when a random quantity is uniformly likely to be anywhere in an interval. It is often used as an "uninformed" model when all you know is that a quantity has to be somewhere in a fixed interval. The normal arises when many small independent contributions are summed. It is often used to model observational error. The exponential is the so-called memoryless distribution. It arises when the probability of finding \(X\) in the next small interval, given that you haven’t already found it, is always constant.
All three of these are parametrized families of distributions. Once values are picked for the parameters you get a particular distribution. This section concludes by giving definitions of eac h and discuss typical applications.
The exponential distribution.
The exponential distribution has a parameter \(\mu\) which can be any positive real number. Its density is \((1/\mu) e^{-x/\mu}\) on the positive half-line \([0,\infty)\text{.}\) This is obviously the same as the density \(C e^{-Cx}\) (just take \(C = 1/\mu\)) but we use the parameter \(\mu\) rather than \(C\) because a quick computation shows that \(\mu\) is a more natural property of the distribution.
Checkpoint 189.
The exponential distribution has a very important "memoryless" property. If \(X\) has an exponential density with any parameter and is interpreted as a waiting time, then once you know it didn’t happen by a certain time \(t\text{,}\) the amount of further time it will take to happen has the same distribution as \(X\) had originally. It doesn’t get any more or any less likely to happen in the the interval \([t,t+1]\) than it was originally to happen in the interval \([0,1]\text{.}\)
The median of the exponential distribution with mean \(\mu\) is also easy to compute. Solving \(\int_0^M \mu^{-1} e^{-x/\mu} \, dx = 1/2\) gives \(M = \mu \cdot \ln 2\text{.}\) When \(X\) is a random waiting time, the interpretation is that it is equally likely to occur before \(\ln 2\) times its mean as after. Because \(\ln 2 \approx 0.7\text{,}\) the median is significantly less than the mean. When modeling with exponentials, it is good to remember it produces values that are unbounded but always positive.
Any of you who have studied radioactive decay know that each atom acts randomly and independently of the others, decaying at a random time with an exponential distribution. The fraction remaining after time \(t\) is the same as the probability that each individual remains undecayed at time \(t\text{,}\) namely \(e^{-t/\mu}\text{,}\) so another interpretation for the median is the half-life: the time at which only half the original substance remains. Other examples are the life span of an organism that faces environmental hazards but does not age, or time for an electronic component to fail (they don’t seem to age either).
The uniform distribution.
The uniform distribution on the interval \([a,b]\) is the probability density whose density is a constant on this interval: the constant will be \(1/(b-a)\text{.}\) This is often thought of the least informative distribution if you know that the the quantity must be between the values \(a\) and \(b\text{.}\) The mean and median are both \((a+b)/2\text{.}\)
Aside
Checkpoint 190.
Use calculus to prove that a constant function \(C\) on an interval \([a,b]\) is a probability density if and only if \(C = \frac{1}{b-a}\text{.}\)
Example 13.7.
In your orienteering class you are taken to a far away location and spun around blindfolded when you arrive. When the blindfold comes off, you are facing at a random compass angle (usually measured clockwise from due north). It would be reasonable to model this as a uniform random variable from the interval \([0,360]\) in units of degrees.
Checkpoint 191.
In Example 13.7, the mean and median are both \(180^\circ\text{.}\) Why are these not meaningful measures of the center of the distribution in this case?
The normal distribution.
The normal density with mean \(\mu\) and standard deviation \(\sigma\) is the density
\begin{equation*}
\frac{1}{\sigma \sqrt{2 \pi}} e^{- (x - \mu)^2 / (2 \sigma^2)} \, .
\end{equation*}
The standard normal is the one with \(\mu = 0\) and \(\sigma = 1\text{.}\) There is a very cool mathematical reason for this formula, which we will not go into. When a random variable is the result of summing a bunch of smaller random variables all acting independently, the result is usually well approximated by a normal. It is possible (though very tricky) to show that the definite integral of this density over the whole real line is in fact 1 (in other words, that we have chosen the right constant to make it a probability density).
Annoyingly, there is no nice antiderivative, so no way in general of computing the probability of finding a normal between specified values \(a\) and \(b\text{.}\) Because the normal is so important in statistical applications, they made up a notation for the indefinite integral in the case \(\mu = 0, \sigma = 1\text{,}\) using the capital Greek letter \(\Phi\) ("phi", pronounced "fee" or "fie"):
\begin{equation*}
\Phi (x) := \int_{-\infty}^x \frac{1}{\sqrt{2 \pi}} e^{-x^2/2} \, dx \, .
\end{equation*}
So now you can say that the probability of finding a standard normal between \(a\) and \(b\) is exactly \(\Phi (b) - \Phi (a)\text{.}\) In the old, pre-computer days, they published tables of values of \(\Phi\text{.}\) It was reasonably efficient to do this because you can get the antiderivative \(F\) of any other normal from the one for the standard normal by a linear substition: \(F(x) = \Phi \left(\frac{x-\mu}{\sigma}\right)\text{.}\)
