Lecture 15
\( \newcommand{\set}[1]{\{#1\}} \newcommand{\comprehension}[2]{\{#1\,\vert\,#2\}} \newcommand{\size}[1]{\left\vert#1\right\vert} \newcommand{\true}{\top} \newcommand{\false}{\bot} \newcommand{\limplies}{\rightarrow} \newcommand{\divides}{\mathbin{\vert}} \newcommand{\mult}{\mathbin{\cdot}} \newcommand{\xor}{\oplus} \newcommand{\union}{\cup} \newcommand{\intersect}{\cap} \newcommand{\complement}[1]{\overline#1} \newcommand{\powerset}{\mathcal{P}} \newcommand{\ixUnion}{\bigcup} \newcommand{\ixIntersect}{\bigcap} \newcommand{\Div}{\mathrm{div}} \newcommand{\gcd}{\mathrm{gcd}} \newcommand{\divmod}{\mathop{\mathbf{divmod}}} \newcommand{\div}{\mathop{\mathbf{div}}} \newcommand{\mod}{\mathop{\mathbf{mod}}} \newcommand{\ceiling}[1]{\lceil#1\rceil} \newcommand{\floor}[1]{\lfloor#1\rfloor} \DeclareMathOperator{\Pr}{Pr} \DeclareMathOperator{\rng}{rng} \)Independence, Correlation, and Random Variables
Example 15.1 (The Monte Hall Problem) There was (and is again) a TV show called “Let's Make a Deal.” The original host (way back when I was in high school) was a fellow named Monte Hall. The basic shtick of the program was that contestants were offered a choice between something of value, and a hidden item that about a third of the time was considerably more valuable than the visible item, and about two-thirds of the time was a gag gift (usually of little value) called a “zonk.” E.g., imagine a choice between a TV set and a hidden item, where the hidden item might be a a very nice laptop computer, or a gift certificate to Noodles (not a bad thing of itself, but 12 out of 12 contestants would rather have the TV). Particularly successful contestants were offered the opportunity to participate in the “Big Deal,” in which they'd trade their winnings for a shot at the Grand Prize, which was hidden behind one of three doors. They were asked to pick a door, and then the host would reveal what was behind one of the other doors (invariably, a zonk). The contestant was then offered the opportunity to switch doors, and then given the contents of whatever was behind that door.
The question is whether to switch or not. This has been subject to a lot of bad analysis, and is a case where your instincts might easily lead you astray. Counter-intuitively, the correct answer is “switch.” Why?
The probability of getting the door right in the first phase is $1/3$. If you don't change, that's your winning percentage (as you gain no useful information from the revealed door, as there's always going to be a gag gift available for the host to show). But if you guessed wrong (which happens with probability $2/3$), the revealing of the other gag-gift door implicitly identifies the Grand Prize door. So switching wins with probability $2/3$. It's a big difference!
Recall:
Definition 15.2 Events $A$ and $B$ are independent if $\Pr(A \intersect B) = \Pr(A) \mult \Pr(B)$.
Exercise 15.3 Show that if $A$ and $B$ are events in a probability space $(\Omega,\Pr)$, and $\Pr(A \vert B) = \Pr(A)$, then $\Pr(B \vert A) = \Pr(B)$.
Independence can be a slippy notion, and people are often mislead in their instinctive judgments, c.f., “the odds of guessing right in the Monte Hall game are one-in-three, and just because Monte shows you a zonk is no reason to change.” Because it is. Think about it this way. Let $A$ be the event “Door $A$ is the winning door," and likewise $B$ and $C$:
\begin{align*} \Pr(B \vert \overline{A}) &= \Pr(B \vert \set{B,C})\\ &= 1/2;\\ \end{align*}but
\begin{align*} \Pr(B \vert \overline{A} \intersect \overline{C}) &= \Pr(B \vert \set{B,C} \intersect \set{A,B})\\ &= \Pr(B \vert \set{B}) \\ &= 1.\\ \end{align*}There is information in the door that Monte reveals!
Definition 15.4 The events $A$ and $B$ are said to be positively correlated if $\Pr(A \intersect B) \gt \Pr(A) \mult Pr(B)$. They are negatively correlated if $\Pr(A \intersect B) \lt \Pr(A) \mult Pr(B)$.
Note that because negative correlation, independence, and positive correlation of events $A$ and $B$ is associated with whether $\Pr(A \intersect B)$ is less than, equal to, or greater than, respectively, $\Pr(A) \mult \Pr(B)$, that these are trichotomous properties too.
Example 15.5 Consider the roll of a standard 6-sided die, and the events $A = \text{“the roll is even”}$ and $B=\text{“the roll is prime”}$. We can re-express $A$ and $B$ in terms of elementary events, $1,2,\ldots,6$, which indicate that the corresponding number is rolled. Thus, $A = \set{2,4,6}$ and $B=\set{2,3,5}$, so $\Pr(A) = 3/6 = 1/2$, and $Pr(B) = 3/6 = 1/2$. Now,
\begin{align*} \Pr(A \intersect B) &= \Pr(\set{2}) \\ &= 1/6\\ &\lt 1/2 \mult 1/2\\ &= \Pr(A) \mult \Pr(B)\\ \end{align*}and so $A$ and $B$ are negatively correlated, i.e., if we know that $B$ occurred (the die roll was prime) it becomes less likely that $A$ occurred (the die roll was even), and conversely.
*Exercise 15.6 Let $(\Omega,\Pr)$ be a probability space where $\size{\Omega}$ is prime, and $\Pr$ is the uniform probability distribution on $\Omega$. Show that any two non-trivial events $A$ and $B$ cannot be independent, i.e., they must be either positively or negatively correlated.
Example 15.7 Consider families with three children. Let $A$ be the event “the family has children of both genders,” and $B$ be the event “at most one child is a boy.” We describe atomic events by a three-letter sequence, e.g., MFM to denote a family in which the children are, in birth order, male, female, and male. Assume (contrary to the real-world) that all eight possible birth orders are equally likely. We can describe the events $A$ and $B$ by enumerating their elements. Thus, $A = \set{\text{MMF}, \text{MFM}, \text{MFF}, \text{FMM}, \text{FMF}, \text{FFM}}$, $B=\set{\text{MFF},\text{FMF},\text{FFM},\text{FFF}}$, and$A \intersect B = \set{\text{MFF},\text{FMF},\text{FFM}}$. Thus
\begin{align*} \Pr(A) \mult \Pr(B) &= 6/8 \mult 4/8\\ &= 3/4 \mult 1/2\\ &= 3/8\\ &= \Pr(A \intersect B) \end{align*}So $A$ and $B$ are independent, which is not intuitively obvious (unless you have a very refined intuition for probabilistic judgments).
Definition 15.8 Events $A_1, A_2, \ldots, A_k$ are pairwise independent if for all $i,j$, $A_i$ and $A_j$ are independent, i.e., $\Pr(A_i \intersect A_j) = \Pr(A_i)\Pr(A_j)$. These events are mutually independent if for every $I \subseteq \set{1..k}$, $\Pr(\ixIntersect_{i\in I} A_i) = \prod_{i\in I} \Pr(A_i)$. Note that mutual independence is stronger than pairwise independence.
Exercise 15.9 Give an example of a probability space $(\Omega,\Pr)$ and pairwise independent events $A$, $B$, and $C$ which are not mutually independent.
Random Variables
Definition 15.10 A random variable is a function $f: \Omega \to \mathbb{R}$ where $\Omega$ is the sample space of a probability space.
The term “random variable” seems a bit odd, because they're revealed to be (a) not-random, and (b) not variables, but functions. Still, it is useful to image a random variable as being driving by events chosen at random from the underlying probability space.
Definition 15.11 If $X$ is a random variable, we define $\Pr(X=r) = \Pr(\comprehension{\omega\in \Omega}{X(\omega) = r})$.
Example 15.12 Consider the probability space that consists of flipping a fair coin three times. We can name the elements of the space by sequences, e.g., HTT corresponds to a first flip of heads, and a second and third flip of tails. We assume the flips are mutually independent, so that $\Pr(\text{HTT}) = 1/2 \mult 1/2 \mult 1/2 = 1/8$, and indeed, this is so for all the atomic events.
Consider the random variable $X$ that counts the number of heads in an atomic event, e.g., $X(\text{HTT}) = 1$. We can readily compute $\Pr(X=r)$ for $r \in \rng(X)$ as follows:
$i$ | $\comprehension{\omega\in \Omega}{X(\omega)} = i$ | $\Pr(X=i)$ |
---|---|---|
0 | $\set{\text{TTT}}$ | 1/8 |
1 | $\set{\text{HTT}.\text{THT}.\text{TTH}}$ | 3/8 |
2 | $\set{\text{THH}.\text{HTH}.\text{HHT}}$ | 3/8 |
3 | $\set{\text{HHH}}$ | 1/8 |
Of course, there is something very suggestive about this table...
Bernoulli Trials
Definition 15.13 A Bernoulli Trial is a random variable $B$ whose range is restricted to $\set{0,1}$, where the event $\comprehension{\omega\in \Omega}{B(\omega) = 1}$ is the success event, and $\comprehension{\omega\in \Omega}{B(\omega) = 0}$ is the failure event.
The point behind Bernoulli Trials is to repeat them. This raises some points ordinarily glossed over. The usual setup is that we want to consider $n$ mutually independent trials. But how can an event be independent from itself?
The idea here isn't all that hard, but it's not covered in the text. We pass from a probability space $(\Omega,\Pr)$ to the iterated product space $(\Omega^n,\Pr_\pi)$, where $\Pr_{\pi}((a_1,a_2,\ldots,a_n)) = \Pr(a_i) \Pr(a_2) \cdots \Pr(a_n)$. Then the $i$-th trial is based on the $i$-th coordinate, i.e., if $X_i$ is the random variable corresponding to the $i$-th Bernoulli trial, then $X_i((a_1,a_2,\ldots,a_n)) = X(a_i)$. This actually speaks to an important intuition. We understand the trials to be independent because they depend on pairwise disjoint sets of coordinates. As there is no interaction between the coordinates (our definition of the product distribution is the very embodiment of the definition of mutual independence), we expect that the corresponding random trials will be independent too, and they are.
The key question we deal with in Bernoulli trials is to count the number of successes $k$ in $n$-trials. This is itself a random variable, and one that we can easily compute. If $\Pr(\text{success}) = p$, and $\Pr(\text{failure}) = q = 1-p$ (remember, success and failure partition $\Omega$), then we can compute the probability of $k$ successes as follows. Points $a = (a_1,\ldots,a_n) \in \Omega^n$ can be mapped to strings of length $n$ over the alphabet $s = \set{\text{S},\text{F}}$, where
\begin{equation*} s_i = \begin{cases} \text{S}, &\text{if $X(a_i) = 1$} \\ \text{F}, &\text{otherwise} \end{cases} \end{equation*}If we view such an $s$ as an event (consisting of the points of $\Omega$ that map to it), then we can compute $\Pr_{\pi}(s)$ as a product of $p$'s and $q$'s correspondingly, i.e., $\Pr_{\pi}(SFFS) = p\mult q \mult q \mult p = p^2q^2$. Now, if a Bernoulli trial has $k$ successes out of $n$ trials, with a probability of success of $p$, then to compute the probability of this event ($b(k;n,p)$ in Rosen) we'll want to consider all $n \choose k$ strings over $\set{\text{S},\text{F}}$ of length $n$ containing $k$ S's, multiplied by the probability of a single event of that form ($p^kq^{n-k} = p^k(1-p)^{n-k}$), so,
\begin{equation*} b(k;n,p) = {n \choose k} p^k(1-p)^{n-k} \end{equation*}which should look a term in the expansion of a power of a binomial. Which, of course, it is.
With this in mind, let's briefly revisit Example 15.7 and Example 15.12. We can now understand both as working with iterated product spaces, and the probabilities induced on them. Indeed, we can understand both as Bernoulli trials (although I'll leave aside the loaded question of whether having a male or female child counts as “success”).
In both cases, $p=1/2$ (no matter which choice we make regarding success), and so the probabilities are going to be
$k$ | $b(k;n,p)$ |
---|---|
0 | ${3 \choose 0} (1/2)^0(1/2)^3$ = 1/8 |
1 | ${3 \choose 1} (1/2)^1(1/2)^2$ = 3/8 |
2 | ${3 \choose 2} (1/2)^2(1/2)^1$ = 3/8 |
3 | ${3 \choose 3} (1/2)^3(1/2)^0$ = 1/8 |
As Yogi Berra said, “It's like deja vu all over again."
*Exercise 15.14 Suppose we have a loaded coin, one for which $\Pr(\text{heads}) = 3/4$. Produce the table of probabilities $b(k;4,3/4)$ as above, corresponding to the probabilities of getting $0,1,\ldots,4$ heads in 4 trials.