Math 480 Course Notes -- May 22, 1996

Crash Course on Probability

The basic concept in probability theory is that of a random variable. A random variable is a function of the basic outcomes in a probability space. To define a probability space (a la Kolmorogov) one needs three ingredients:

  1. A set S of "outcomes" for some experiment. This is the set of all "basic" things that can happen. This set can be a discrete set (such as the set of 5-card poker hands, or the possible outcomes of rolling two dice) or it can be a continuous set (such as an interval of the real number line for measuring temperature, etc).
  2. A sigma-algebra of subsets of S. This means a set of subsets of S (so itself is a subset of the power set of all subsets of S) that contains the empty set, contains S itself, and is closed under finite intersections and countable unions. When the basic set S is finite (or countably infinite), then is taken to be all subsets of S. When S is a continuous subset of the real line, this is not possible, and one usually restricts attention to the set of subsets that can be obtained by starting with all open intervals and taking intersections and unions (countable) of them -- the so-called Borel sets [or more generally, Lebesgue measurable subsets of S]. is the collection of sets to which we will assign probabilities.
  3. A function p from to the real numbers that assigns probabilities to events. The function p must have the properties that:
    1. p(S)=1, p()=0.
    2. for every
    3. If , i=1,2,3,.... is a countable (finite or infinite) collection of disjoint sets (i.e., for all i different from j), then .

These axioms imply that if is the complement of A, then p()=1-p(A), and the principle of inclusion and exclusion: p()=p(A)+p(B)-p(), even if A and B are not disjoint.


Discrete Random Variables -- take on only isolated (discrete) values, such as when you are counting something. Usually the values are (subset of the) integers, and we can assign a probability to any subset of the sample space, as soon as we know the probability of any set containing one element, i.e., p({k}) for all k. Usually, we are sloppy about the notation and express this as a function p(k) --- and we set p(k)=0 for all numbers not in the sample space. We repeat, for discrete random variables, the value p(k) represents the probability that the event {k} occurs. So any function from the integers to the (real) interval [0,1] that has the property that

defines a discrete probability distribution.

Finite Discrete Random Variables:

Infinite Discrete Random Variables:

Continuous Random Variables: can take on only any real values, such as when you are measuring something. Usually the values are (subset of the) reals, and for technical reasons, we can only assign a probability to certain subsets of the sample space (but there are a lot of them). These subsets, either the collection of Borel sets (sets that can be obtained by taking contable unions and intersections of intervals) or Lebesgue-measurable sets (Borels plus a few other exotic sets) comprise the set . As soon as we know the probability of any interval, i.e., p([a,b]) for all a and b, we can calculate the probabililty of any Borel set. In fact, it is enough to know the probabilities of "very small" intervals of the form [x,x+dx]. In other words, we can calculate continuous probabilities as integrals of "probability density functions" (pdf's).

A pdf is a function p(x) that takes on only positive values (they don't have to be between 0 and 1 though), and whose integral over the whole sample space (we can use the whole real line if we assign the value p(x)=0 for points x outside the sample space) is equal to 1. In this case, we have (for small dx) that p(x)dx represents (approximately) the probability of the set (interval) [x,x+dx] (with error that goes to zero faster than dx does). More generally, we have the probability of the set (interval) [a,b] is:

So any positive function on the real numbers that has the property that

defines a continuous probability distribution.