The basic concept in probability theory is that of a random variable. A random variable is a function of the basic outcomes in a probability space. To define a probability space (a la Kolmorogov) one needs three ingredients:
These axioms imply that if is the complement of A, then p(
)=1-p(A), and the principle of inclusion and exclusion: p(
)=p(A)+p(B)-p(
), even if A and B are not disjoint.
Discrete Random Variables -- take on only isolated (discrete) values, such as when you are counting something. Usually the values are (subset of the) integers, and we can assign a probability to any subset of the sample space, as soon as we know the probability of any set containing one element, i.e., p({k}) for all k. Usually, we are sloppy about the notation and express this as a function p(k) --- and we set p(k)=0 for all numbers not in the sample space. We repeat, for discrete random variables, the value p(k) represents the probability that the event {k} occurs. So any function from the integers to the (real) interval [0,1] that has the property that
defines a discrete probability distribution.
For example, the possible outcomes of rolling one die are {1}, {2}, {3}, {4}, {5} and {6}. Each of these outcomes has the same probability, namely 1/6. We can express this by making a table, or specifying a function f(k)=1/6 for all k = 1,2,3,4,5,6 and f(k)=0 otherwise. Using the disjoint union rule, we find for example that p({1,2,5})=1/2, p({2,3})=1/3, etc..
The sample space S is 0,1,2,....n since these are the possible outcomes (number of heads, number of people favoring the Republican [n=100 in this case]). As before, the sigma algebra is the set of all subsets of S. The function p is more interesting this time:
where is the binomial coefficient
which equals the number of subsets of an n-element set that have exactly k elements.
Note that this is an honest probability function, since we will have
Continuous Random Variables: can take on only any real values, such as when you are measuring something. Usually the values are (subset of the) reals, and for technical reasons, we can only assign a probability to certain subsets of the sample space (but there are a lot of them). These subsets, either the collection of Borel sets (sets that can be obtained by taking contable unions and intersections of intervals) or Lebesgue-measurable sets (Borels plus a few other exotic sets) comprise the set . As soon as we know the probability of any interval, i.e., p([a,b]) for all a and b, we can calculate the probabililty of any Borel set. In fact, it is enough to know the probabilities of "very small" intervals of the form [x,x+dx]. In other words, we can calculate continuous probabilities as integrals of "probability density functions" (pdf's).
A pdf is a function p(x) that takes on only positive values (they don't have to be between 0 and 1 though), and whose integral over the whole sample space (we can use the whole real line if we assign the value p(x)=0 for points x outside the sample space) is equal to 1. In this case, we have (for small dx) that p(x)dx represents (approximately) the probability of the set (interval) [x,x+dx] (with error that goes to zero faster than dx does). More generally, we have the probability of the set (interval) [a,b] is:
So any positive function on the real numbers that has the property that
defines a continuous probability distribution.
We will use a trick that goes back (at least) to Liouville: First, note that
because we can certainly change the name of the variable in the second integral, and then we can convert the product of single integrals into a double integral. Now (the critical step), we'll evaluate the integral in polar coordinates (!!) -- note that over the whole plane, r goes from 0 to infinity as goes from 0 to
, and dxdy becomes
:
Therefore, I=. We need to arrange things so that the integral is 1, and for reasons that will become apparent later, we arrange this as follows: define
Then N(x) defines a probability distribution, called the standard normal distribution. Here is a graph of N(x):
More generally, we define the normal distribution with parameters and
to be
The expectation of a random variable is essentially the average value it is expected to take on. Therefore, it is calculated as the weighted average of the possible outcomes of the random variable, where the weights are just the probabilities of the outcomes. As a trivial example, consider the (discrete) random variable X (outcomes of some probabilistic experiment) whose sample space is the set {1,2,3} with probability function given by p(1)=0.3, p(2)=0.1 and p(3)=0.6. If we repeated this experiment 100 times, we would expect to get about 30 occurrences of X=1, 10 of X=2 and 60 of X=3. The average X would then be ((30)(1)+(10)(2)+(60)(3))/100 = 2.3. In other words, (1)(0.3)+(2)(0.1)+(3)(0.6). This reasoning leads to the defining formula:
for any discrete random variable. The notation E(X) for the expectation of X is standard, also in use is the notation .
For continuous random variables, the situation is similar, except the sum is replaced by an integral (think of summing up the average values of x by dividing the sample space into small intervals [x,x+dx] and calculating the probability p(x)dx that X falls into the interval. By reasoning similar to the previous paragraph, the expectation should be
This is the formula for the expectation of a continuous random variable.
which is what we should have expected from the uniform distribution.
Note that we can take the derivative of both sides to get the formula:
If we multiply both sides of this formula by x we get
We will use this formula with x replaced by .
If X is a discrete random variable with a Poisson distribution, then its expectation is:
Note the difference between the expectations of the Poisson and exponential distributions!!
By the symmetry of the respective distributions around their "centers", it is pretty easy to conclude that the expectation of the binomial distribution (with parameter n) is n/2, and the expectation of the normal distribution (with parameters and
) is
.
is really a probability density function on the real line (i.e., that it is positive and that its integral from -infinity to infinity is 1). Calculate the expectation of this random variable.