Summary statistics

Unit 13.2 Summary statistics

Several important quantities associated with a probability distribution are the mean, the variance, the standard deviation and the median. We call these summary statistics because they tell us about the probability density as a whole. Again, a couple of paragraphs don’t do justice to these ideas, but we hope they explain the concepts at least a little and make the math seem more motivated and relevant.

🔗

Probably the intuitively-simplest summary statistic is the median. This is the \(50^{th}\) percentile of the distribution -- the value of the measured variable which splits the distribution into two equal pieces.

🔗

Definition 13.4.

median\(X\)\(f\)\(m\)

\begin{equation} \mathbb{P} (X \gt m) = \mathbb{P} (X \lt m) = \frac{1}{2} \, .\tag{13.1} \end{equation}

🔗

Checkpoint 184.

Write (13.1) as an equation with an integral in it.

🔗

Definition 13.5.

mean 🔗: If \(X\) has probability density \(f\text{,}\) the mean or expectation of \(X\) (the two terms are synonyms) is the quantity
\begin{equation} \mathbb{E} X := \int_{-\infty}^\infty x \, f(x) \, dx.\tag{13.2} \end{equation}
A variable commonly used for the mean of a distribution is \(\mu\text{.}\)
variance 🔗: If \(X\) has probability density \(f\) and mean \(\mu\text{,}\) the variance of \(X\) is the quantity
\begin{equation*} \operatorname{Var} (X) := \int_{-\infty}^\infty (x - \mu)^2 \, f(x) \, dx \, . \end{equation*}
standard deviation 🔗: The standard deviation of \(X\) is just the square root of the variance
\begin{equation*} \sigma := \sqrt{\operatorname{Var} (X)} \, . \end{equation*}

🔗

Checkpoint 185.

Let’s say our distribution \(X\) represents a length: we pick a board at random from a stack at the hardware store and measure its length in feet. What are the units of:

the median
🔗

🔗
the mean \(\mu=\mathbb{E}(X)\text{:}\)
🔗

🔗
the variance \(\operatorname{Var}(X)\text{:}\)
🔗

🔗
the standard deviation \(\sigma\text{:}\)
🔗

🔗

🔗

Why do you think we take the square root of the variance?

🔗

Answer 1.

\(\text{foot}\hbox{, }\text{feet}\hbox{, or }ft\)

🔗

Answer 2.

\(\text{foot}\hbox{, }\text{feet}\hbox{, or }ft\)

🔗

Answer 3.

\(\text{square feet}\hbox{ or }ft^{2}\)

🔗

Answer 4.

\(\text{foot}\hbox{, }\text{feet}\hbox{, or }ft\)

🔗

Each of these definitions looks like the corresponding formula from discrete probability. By way of example, consider the formula for expectation. You might recall what happens when rolling a die. Each of the six numbers comes up about \(1/6\) of the time, so in a large number \(N\) of dice rolls you will get (approximately) \(N/6\) of each of the six outcomes. The average will therefore be (approximately)

\begin{equation*} \frac{1}{N} \left [ (N/6) \cdot 1 + (N/6) \cdot 2 + (N/6) \cdot 3 + (N/6) \cdot 4 + (N/6) \cdot 5 + (N/6) \cdot 6 \right ] \, . \end{equation*}

We can write this in summation notation as

\begin{equation*} \displaystyle\sum_{j=1}^6 j \cdot \mathbb{P} (X = j) \, . \end{equation*}

If we think of this sum as a Riemann sum for something, and do our Greek-to-Latin trick, we get back the formula (13.2).

🔗

Checkpoint 186.

A carnival game that costs a dollar to play gives you a quarter for each spot on a roll of a die (e.g., 75 cents if you roll a 3).

🔗

When you have spent \(N\) dollars, about how much money (in dollars) will you have received? Give a formula involving the variable \(N\text{.}\)

🔗

Should you play this game?

yes -- you will tend to make money
🔗

🔗
no -- you will tend to lose money
🔗

🔗
who cares? In the long run you’ll break even.
🔗

🔗

🔗

Answer 1.

\(0.875N\)

🔗

Answer 2.

\(\text{no -- you will tend to lose money}\)

🔗

When instead there are infinitely many possible outcomes spread over an interval, the sum is replaced by an integral

\begin{equation} \int_{-\infty}^\infty x \cdot f(x) \, dx \, .\tag{13.3} \end{equation}

A famous theorem in probability theory, called the Strong Law of Large Numbers, says that the formula (13.3) still computes the long term average: the long term average of independent draws from a distribution with probability density function \(f\) will converge to \(\int x \cdot f(x) \, dx\text{.}\)

🔗

Checkpoint 187.

The random variable \(X\) has probability density \(2x\) on \([0,1]\text{.}\) If you sample 68 times and take the average of the samples, roughly what will you get?

🔗

Answer.

\(0.666667\)

🔗

It is more difficult to understand why the variance has the precise definition it does, but it is easy to see that the formula produces bigger values when the random variable \(X\) tends to be farther from its mean value \(\mu\text{.}\) The standard deviation is another measure of dispersion. To see why it might be more physically relevant, consider the units.

🔗

Probabilities such as \(\mathbb{P} (X \in [a,b])\) can be considered to be unitless because they represent ratios of like things: frequency of occurrences within the interval \([a,b]\) divided by frequency of all occurrences. Probability densities, integrated against the variable \(x\) (which may have units of length, time, etc.) give probabilities. Therefore, probability densities have units of "probability per unit \(x\)-value", or in other words, inverse units to the independent variable.

🔗

The units of the mean are units of \(\int x f(x) \, dx\text{,}\) which is units of \(f\) times \(x^2\text{;}\) but \(f\) has units of inverse \(x\text{,}\) so the mean has units of \(x\text{.}\) This makes sense because the mean represents a point on the \(x\)-axis. Similarly, the variance has units of \(x^2\text{.}\) It is hard to see what the variance represents physically. The standard deviation, however, has units of \(x\text{.}\) Therefore, it is a measure of dispersion having the same units as the mean. It represents a distance on the \(x\)-axis which is some kind of average discrepancy from the mean.

🔗

Figure 13.6. Three densities with mean 1.
🔗

Checkpoint 188.

Figure 13.6 shows three probability densities with mean 1. Rank them in order from least to greatest standard deviation. You don’t have to compute precisely unless you want to; just state an answer and justify it intuitively. The three densities are graphed below.

\(f(x) := 1\) on \([1/2,3/2]\) (blue)
🔗

🔗
\(f(x) := 10 - 100 \lvert x-1\rvert\) on \([0.9,1.1]\) (red)
🔗

🔗
\(f(x) := e^{-x}\) on \([0,\infty)\) (black)
🔗

🔗

🔗

(least standard deviation)

blue distribution
🔗

🔗
red distribution
🔗

🔗
black distribution
🔗

🔗

blue distribution
🔗

🔗
red distribution
🔗

🔗
black distribution
🔗

🔗

blue distribution
🔗

🔗
red distribution
🔗

🔗
black distribution
🔗

🔗

(greatest standard deviation)

🔗

Answer 1.

\(\text{red distribution}\)

🔗

Answer 2.

\(\text{blue distribution}\)

🔗

Answer 3.

\(\text{black distribution}\)

🔗

Prev Top Next