Lecture 14: Probability Theory
\( \newcommand{\set}[1]{\{#1\}} \newcommand{\comprehension}[2]{\{#1\,\vert\,#2\}} \newcommand{\size}[1]{\left\vert#1\right\vert} \newcommand{\true}{\top} \newcommand{\false}{\bot} \newcommand{\limplies}{\rightarrow} \newcommand{\divides}{\mathbin{\vert}} \newcommand{\mult}{\mathbin{\cdot}} \newcommand{\xor}{\oplus} \newcommand{\union}{\cup} \newcommand{\intersect}{\cap} \newcommand{\complement}[1]{\overline#1} \newcommand{\powerset}{\mathcal{P}} \newcommand{\ixUnion}{\bigcup} \newcommand{\ixIntersect}{\bigcap} \newcommand{\Div}{\mathrm{div}} \newcommand{\gcd}{\mathrm{gcd}} \newcommand{\divmod}{\mathop{\mathbf{divmod}}} \newcommand{\div}{\mathop{\mathbf{div}}} \newcommand{\mod}{\mathop{\mathbf{mod}}} \newcommand{\ceiling}[1]{\lceil#1\rceil} \newcommand{\floor}[1]{\lfloor#1\rfloor} \DeclareMathOperator{\Pr}{Pr} \)The midterm will be in class, Wednesday of 6th week, February 13. If you have an examination accommodation, please make the necessary arrangements with Student Disability Services to schedule your exam, and please choose a time that substantially overlaps with the in-class final.
The lecture will cover material through Lecture 13, inclusive.
These notes are based largely on Chapter 7 of Professor Babai's notes.
Definition 14.1 A probability space is a finite set $\Omega \not= \emptyset$ together with a function $\Pr : \Omega \to \mathbb{R}$ such that:
- $\forall \omega \in \Omega, \Pr(\omega) \gt 0$, and
- $\sum_{\omega \in \Omega} \Pr(\omega) = 1$.
We will refer to $\Omega$ as the sample space, and $\Pr$ as the probability distribution.
An event is a subset $A \subseteq \Omega$, and we lift $\Pr : \mathcal{P}(\Omega) \to \mathbb{R}$ via $\Pr(A) = \sum_{\omega\in A}\Pr(\omega)$, which we read as the probability of $A$. The atomic events are singleton sets $\set{\omega}$, for which $\Pr(\set{\omega}) = \Pr(\omega)$. The trivial events are those with probability $0$ or $1$, i.e., $\emptyset$ or $\Omega$ itself.
Definition 14.2 The uniform distribution over a sample space $\Omega$ defines $\Pr(\omega) = 1 / \size{\Omega}$ for all $\omega \in \Omega$, and therefore $\Pr(A) = \size{A} / \size{\Omega}$. In the uniform distribution, computing a probability amounts to counting.
Example 14.3 In the game of bridge, a standard $52$ card deck is dealt to four players, known as North, East, South, and West. What is the probability that North holds all of the aces?
First, we'll reason somewhat informally. We can re-imagine the process of the deal as one in which there are $52$ slots, $13$ assigned to each hand, and the cards are distributed from the top uniformly to the remaining slots. We'll start by putting the four aces on the top of the deck. The probability that the first ace is dealt to north is $13/52$. For the second card, also an ace, there are $51$ remaining slots. If we assume the first ace was dealt to North, there are only $12$ slots remaining for him, so the odds that the second card also goes to North are $12/51$. In like manner, the odds that the third card goes to North are $11/50$, and for the fourth card, $10/49$. So odds of this happening are
\begin{equation*} {13 \over 52} \mult {12 \over 51} \mult {11 \over 50} \mult {10 \over 49} \approx 0.264\% \end{equation*}Of course, while the result seems reasonable, and this may be a familiar way of calculating the probability, it doesn't map very well onto our discussion of probability spaces—at least, not without a fair amount of mathematical justification which we haven't provided. So let's try a formal approach.
First, we have to identify our probability space, and for us, the atomic events are deals of the deck, i.e., ordered 4-tuples of sets of the 13-cards, whose union is the full deck. We can count the total number of atomic events via the multinomial coefficient
\begin{equation*} \size{\Omega} = {52 \choose 13,13,13,13} \end{equation*}Now, the event of interest $N_{\text{4-aces}}$ consists of all deals of the deck in which the first coordinate (the North player) gets all the aces. To count the number of such hands, we consider how the remaining $48$ cards are dealt: $9$ to North, and $13$ to each of the remaining players. So
\begin{equation*} \size{N_{\text{4-aces}}} = {48 \choose 9,13,13,13} \end{equation*} This allows us to compute \begin{align*} \Pr(N_{\text{4-aces}}) &= {\size{N_{\text{4-aces}}} \over \size{\Omega}} \\ &= {{48 \choose 9,13,13,13} \over {52 \choose 13,13,13,13}}\\ &= {{13 \mult 12 \mult 11 \mult 10} \over {52 \mult 51 \mult 50 \mult 49}}\\ &\approx 0.264\% \end{align*}as before. So our informal approach resulted in the correct answer, at least, this time.
*Exercise 14.4 Calculate the probability that each of the four hands in bridge will be dealt an ace. Provide both an informal analysis, and an analysis based on counting deals.
Theorem 14.5 $\Pr(A \union B) + \Pr(A \intersect B) = \Pr(A) + \Pr(B)$.
This is essentially the inclusion-exclusion theorem (cf., Theorem 2.12), but where each atomic event may have its own weight.
Definition 14.6 Events $A$ and $B$ are disjoint if $A \intersect B = \emptyset$.
Theorem 14.7 $\Pr(\ixUnion_{i=1}^k A_i) \leq \sum_{i=1}^k \Pr(A_i)$, with equality holding if and only if the $A_i$ are pairwise disjoint.
This follows from iterating Theorem 14.5, setting all of the intersection terms to $0$.
Definition 14.8 If $A$ and $B$ are events, then the conditional probability of $A$ relative to $B$ is
\begin{equation*} \Pr(A \vert B) = {\Pr(A \intersect B) \over \Pr(B)} \end{equation*}
As the image above shows, conditional probability can be quite different from (unconditional) probability, e.g., $\Pr(A)$ is clearly much less than $1/2$, but $\Pr(A \vert B)$ is clearly much more than $1/2$.
Exercise 14.9 Show that if $\Omega$ together with $\Pr$ is a probability space, and $B \subset \Omega$, then if we define $\Pr_B(b) = \Pr(b) / \Pr(B)$, then $B$ together with $\Pr_B$ is also a probability space.
Remark 14.10 $\Pr(A \intersect B) = \Pr(A \vert B) \mult \Pr(B)$.
Example 14.11 Consider the roll of $3$ dice. (a) What is the probability that the sum of the dice is $9$? (b) What is the probability that the first die shows $5$? (c) What is the conditional probability that the first die shows $5$, given that the sum of the dice is $9$?
First, we have to clarify what our sample space is. We'll start by assuming that the die are distinguishable, so that there are $6^3 = 216$ points in our uniform sample space.
For (a), we want to count the number of dice rolls that sum to $9$. This amounts to determining the number of solutions to the problem $x_1 + x_2 + x_3 = 9$, where each $x_i \in \set{1,2,3,4,5,6}$. We can rephrase this as the number of solutions to the problem $x_1 + x_2 + x_3 = 6$, where each $x_i \in {0,1,2,3,4,5}$ We can count this as the number of $6$-combinations with replacement from set of size $3$, noting that the three of the solutions to that equation $(6,0,0)$, $(0,6,0)$, $(0,0,6)$ don't meet our constraints, so the count is ${6 + 3 - 1 \choose 6} - 3 = {8 \choose 2} - 3 = 25$, so the probability is $25/216 \approx 11.6\%$.
For (b), the answer is clearly $1/6$, which we can compute formally as $1 \mult 6 \mult 6 \over 6 \mult 6 \mult 6$.
For (c), we have another counting problem. Because we know the probability that the sum of the dice is $9$, it suffices to compute the probability that the sum of the dice is $9$, and the first die is $5$. If the first die is $5$ and all three sum to $9$, the sum of the last two must be $4$, whereupon by inspection there are only three solutions possible: $(5,1,3)$, $(5,2,2)$ and $(5,3,1)$. So the conditional probability is:
\begin{align*} \Pr(\text{the first die is $5$} \vert \text{the sum of the dice is $9$}) &= { \Pr(\text{the first die is $5$} \intersect \text{the sum of the dice is $9$}) \over \Pr(\text{the sum of the dice is $9$}) }\\ &= { \size{\text{the first die is $5$} \intersect \text{the sum of the dice is $9$}} \over \size{\text{the sum of the dice is $9$}} }\\ &= {3 \over 25}\\ &\lt {1 \over 6} \end{align*}Definition 14.12 A partition of $\Omega$ is a family of pairwise disjoint events $H_1,\ldots,H_m$ covering $\Omega$, i.e.,
- $\ixUnion_{i=1}^m H_i = \Omega$, and
- $\forall i \not= j, H_i \intersect H_j = \emptyset$.
Theorem 14.13 (The Law of Complete Probability) Given a partition $H_1,\ldots,H_m$ of $\Omega$, and an event $A$
\begin{equation*} \Pr(A) = \sum_{i=1}^m \Pr(A \vert H_i) \mult \Pr(H_i) \end{equation*}The significance of Theorem 14.13 is that it is sometimes easier to calculate the right-hand side of the equality than the left-hand side.
Example 14.14 This example is from the article Wiki: Law of total probability Suppose that two factories supply light bulbs to the market. Factory X's bulbs work for over 5000 hours in 99% of cases, whereas factory Y's bulbs work for over 5000 hours in 95% of cases. It is known that factory X supplies 60% of the total bulbs available. What is the chance that a purchased bulb will work for longer than 5000 hours?
We compute using the Law of Total Probability:
\begin{align*} \Pr(\text{light bulb lasts $\gt$ 5000 hours}) &= \Pr(\text{light bulb lasts $\gt$ 5000 hours}\vert\text{Factory X})\Pr(\text{Factory X})\\ &\phantom{= }+ \Pr(\text{light bulb lasts $\gt$ 5000 hours}\vert\text{Factory Y})\Pr(\text{Factory Y})\\ &= 0.99 \mult 0.6 + 0.95 \mult 0.4\\ &= 0.974\\ &= 97.4\% \end{align*}Exercise 14.15 Prove Theorem 14.13
Definition 14.16 Events $A$ and $B$ are independent if $\Pr(A \intersect B) = \Pr(A) \Pr(B)$.
Exercise 14.17 If $Pr(B) \gt 0$ then: $A$ and $B$ are independent if and only if $\Pr(A\vert B) = \Pr(A)$.
Exercise 14.18 Let $A$ and $B$ be independent events. Show $\overline{A}$ and $\overline{B}$ are also independent, where $\overline{A} = \Omega \setminus A$.
*Exercise 14.19 Prove: $\Pr(\overline{A}) = 1-\Pr({A})$.