Lecture 10: The Pigeonhole Principle

\( \newcommand{\set}[1]{\{#1\}} \newcommand{\comprehension}[2]{\{#1\,\vert\,#2\}} \newcommand{\size}[1]{\left\vert#1\right\vert} \newcommand{\true}{\top} \newcommand{\false}{\bot} \newcommand{\limplies}{\rightarrow} \newcommand{\divides}{\mathbin{\vert}} \newcommand{\mult}{\mathbin{\cdot}} \newcommand{\xor}{\oplus} \newcommand{\union}{\cup} \newcommand{\intersect}{\cap} \newcommand{\complement}[1]{\overline#1} \newcommand{\powerset}{\mathcal{P}} \newcommand{\ixUnion}{\bigcup} \newcommand{\ixIntersect}{\bigcap} \newcommand{\Div}{\mathrm{div}} \newcommand{\gcd}{\mathrm{gcd}} \newcommand{\divmod}{\mathop{\mathbf{divmod}}} \newcommand{\div}{\mathop{\mathbf{div}}} \newcommand{\mod}{\mathop{\mathbf{mod}}} \newcommand{\ceiling}[1]{\lceil#1\rceil} \newcommand{\floor}[1]{\lfloor#1\rfloor} \)

The midterm will be in class, Wednesday of 6th week, February 13. If you have an examination accommodation, please make the necessary arrangements with Student Disability Services to schedule your exam, and please choose a time that substantially overlaps with the in-class final.

The lecture will cover material through Lecture 13, inclusive.

The pigeonhole principle is based on the following thought experiment: if you have $n$ pigeons distributed among $m$ pigeonholes, with $n \gt m$, then at least one of the pigeonholes must contain more than a single pigeon.

Theorem 10.1 (The Pigeonhole Principle, aka, PHP) Let $a_1,a_2,\ldots,a_m$ be disjoint sets, such that $\size{\ixUnion_{i=1}^m a_i} > m$. Then there exists an $a_i$ such that $\size{a_i} \geq 2$.

Proof This is the contrapositive of the logically equivalent statement, “if all $\size{a_i} \leq 1$, then $\size{\ixUnion_{i=1}^m a_i} \le m$.” The latter follows easily seen by iterating the sum rule:

\begin{align*} \size{\ixUnion_{i=1}^m a_i} &= \sum_{i=1}^m \size{a_i} \\ &\leq \sum_{i=1}^m 1\\ &= m\\ \end{align*}

$\Box$ Theorem 10.1

There are a number of corollaries of the PHP, e.g.

Corollary 10.2 Suppose $A$ and $B$ are finite sets where $\size{A} > \size{B}$, and $f:A \to B$. Then $f$ cannot be one-one.

Proof Let $b_1, b_2, \ldots, b_k$ be the elements of $B$, and consider $a_i = f^{-1}(b_i)$. The pigeonhole principle applies to the $a_i$'s, and so there exists an $a_i$ such that $\size{a_i} \geq 2$. It immediately follows that $f$ is not one-one.

Example 10.3 (Picking Socks) This is a classic example: suppose you have three colors of socks, jumbled together in an unsorted sock bag. How many socks do you need to pull out (without seeing them) to guarantee you'll get a matching pair? By the pigeonhole principle, four.

Example 10.4 (Hand shaking) This is a somewhat surprising result. Suppose there are $n$ people at a meeting. As the meeting breaks up, there's a round of hand-shaking. Show that there exist two people who shook the same number of hands.

On first appearance, the pigeonhole principle does not help us here, as each person could have shook anywhere between $0$ and $n-1$ hands: thus, $n$ people, $n$ alternatives. But the extremes can't co-exist: it is not possible for there to have been one person who didn't shake anyone else's hand, and another person who shook everyone else's hand: either those two shook hands or they didn't. Thus, we can consider two cases: one in which everyone shook someone else's hand, and another in which no-one shook everyone else's hand. Either way, there are only $n-1$ alternatives, and the pigeonhole principle applies in each case.

Example 10.5 ($1$'s and $0$'s) For any number $n \gt 0$, there exists a $k$ such the decimal representation of $k\mult n$ consists only of zeros and ones. Let $f(m)$ be the natural number whose decimal representations consists of precisely $m$ ones, i.e., $f(3) = 111$. Consider the set of numbers $f(1), f(2), \ldots, f(n+1)$. This set contains $n+1$ elements, and by the pigeonhole principle, there must be two distinct elements $f(i)$ and $f(j)$ which have the same remainder when divided by $n$. Without loss of generality, $i \gt j$, and so $f(i) - f(j)$ is a multiple of $n$ whose decimal representation consists only of zeros and ones.

Exercise 10.6 Calculate a multiple of $12$ whose decimal representation consists only of ones and zeros. Calculate a multiple of $12$ whose decimal representation consists only of twos and zeros. [Hint: you can do these exercises by hand if you think about them.]

The pigeonhole principle reflects, in a mathematically precise and “tight” way, the intuition that we can't build a big thing out of a few small things. We can push on this intuition, and obtain a number of similar results.

Theorem 10.7 (The Generalized Pigeonhole Principle) Let $a_1,a_2,\ldots,a_k$ be disjoint sets, such that $\size{\ixUnion_{i=1}^k a_i} = n$. Then there exists an $a_i$ such that $\size{a_i} \geq \ceiling{n/k}$.

Note here $\ceiling{\cdot} : \mathbb{R} \to \mathbb{Z}$ where $\ceiling{r}$ is the least integer greater than or equal to $r$. Similarly, we define $\floor{r}$ to be the greatest integer that is less than or equal to $r$.

Proof Again, we view the stated theorem as the contrapositive of a simpler statement. Note that if $a_i \lt \ceiling{n/k}$, it immediately follows that $a_i \lt n/k$ (otherwise we'd contradict the definition of $\ceiling{\cdot}$).

\begin{align*} \size{\ixUnion_{i=1}^k a_i} &= \sum_{i=1}^k \size{a_i}\\ &\lt \sum_{i=1}^k n/k\\ &= k (n/k)\\ &= n\\ \end{align*}

as required.

$\Box$ Theorem 10.7

Example 10.8 (The Sock Problem for Centipedes) The sock problem is a bit harder for centipedes, who have to pick out enough socks to guarantee that there are at least $100$ socks of the same color. Assuming again that there are three colors of socks, the centipede must take the least number $n$ such that $\ceiling{n/3} = 100$. This is equivalent to requiring that $n/3 \gt 99$, i.e., that $n \gt 297$. So the centipede has to choose $298$ socks to be guaranteed to have $100$ socks of the same color. At some level, this is completely obvious: the most unsuccessful pick for the centipede is when he has 99 socks of each color, at which point, one more sock of any color puts him over the top.

Example 10.9 Assuming that telephone numbers take the form $(dnn) dnn-nnnn$ where $n$ can be any digit, and $d$ can be any digit except for $0$ or $1$, how many area codes are necessary to support a metropolis that has $25$ million phones?

First, we compute the number of phone numbers in a given area code: $8,000,000$. If we associate each phone number with its exchange and station (i.e., ignore the area code), we know that there must be at least one number that requires $\ceiling{25,000,000 / 8,000,000} = 4$ distinct area codes.

Example 10.10 (Subset sum) Given any six distinct natural numbers between 1 and 9, there must be a pair that sums to $10$. To see this, we divide the numbers from $1$ to $9$ into five sets: $\set{1,9}$, $\set{2,8}$, $\set{3,7}$, $\set{4,6}$, $\set{5}$. As we have six distinct numbers, at least two of them must belong to the same set. But all of the sets in this partition containing at least two distinct numbers consist of two distinct numbers that sum to $10$.

*Exercise 10.11 Show that the subset sum property above also holds for $11$, i.e., given $6$ distinct natural numbers between $1$ and $10$, there must exist a pair which sum to $11$. Then propose and prove a generalization of the subset sum property that holds for an arbitrary $n$. [Note that the generalized result is vacuous for $n \lt 3$.]

Example 10.12 Among any set $S$ of $n+1$ positive numbers not exceeding $2n$, there must be one number that is a multiple of another.

Let $f: \mathbb{N^+}\to \mathbb{N^+}$ be such that $f(k)$ is the greatest odd divisor of $k$. There are only $n$ odd numbers less than $2n$, so necessarily, there are two numbers $a,b\in S$ such that $f(a) = f(b) = m$ for some odd number $m$. Without loss of generality, $a \gt b$, and we easily see that $b \divides a$, as both numbers are of the form $2^k \cdot m$.

Example 10.13 Any sequence $a_1,a_2,\ldots,a_k$ of distinct numbers of length $n^2+1$ must have a strictly monotonic (either strictly increasing or strictly decreasing) subsequence of length at least $n+1$.

The idea is to map each $a_k$ to a pair $(i_k,d_k)$ representing the length of the longest strictly increasing ($i_k$) and decreasing ($d_k$) subsequence beginning with $a_k$. Assume for a contradiction that there are no strictly monotonic subsequences of length $n$ or greater. Then all of the $(i_k,d_k)$ pairs are drawn from $\set{0,\ldots,n-1} \times \set{0,\ldots,n-1}$, and there are $n^2$ such pairs. By the pigeonhole principle, there are numbers $a_s$ and $a_t$ such that $(i_s,d_s) = (i_t,d_t)$. We know that $a_s \not= a_t$, as the elements of the sequence were distinct. Therefore $a_s \lt a_t$ or $a_s \gt a_t$. Without loss of generality, $a_s \lt a_t$. But now, there's a increasing sequence that begins with $a_s$, and continues with $a_t$ and $i_t-1$ additional terms, hence an increasing sequence of length $i_s+1$ starting at $a_s$. This is a contradiction.

Our next example uses the pigeonhole principle to establish a basic of Ramsey Theory. Consider complete graphs $G$, in which the edges have two colors. The basic object of interest in Ramsey Theory is monochromatic cliques, i.e., complete subgraphs $G'$ of $G$, in which all edges have the same color. The Ramsey Number $R(m,n)$ is the smallest number $r$ necessary to guarantee that blue/red edge-colored complete graph with $r$ vertices will have monochromatic blue clique of size $m$, or red clique of size $n$.

Theorem 10.14 $R(3,3) \leq 6$. Let $G$ be a graph containing at 6 vertices, and let $v$ be a designated vertex. As $v$ has 5 neighbors, there must be a set vertices of size $3$ connected to $v$ by edges of the same color, without loss of generality, blue. Now, either this set of three vertices contains two connected by a blue edge, in which case those two together with $v$ form a blue monochromatic triangle, or the three together form a monochromatic red triangle. Either way, $G$ contains a monochromatic clique of size $3$ as required.

My particular experience with $R(3,3)$ began with a game proposed by one of my colleagues: the game of Hex. The idea was simple, you have two players, one with a blue pen, and one with a red. They start with six dots on a sheet of paper, arranged in a hexagon. The players alternate turns, connecting two unconnected dots with either a blue or red line, according the the pen they were given. The first player to create a monochromatic triangle loses. The problem was to determine which player had a winning strategy. I remember writing a very clever program that answered the question, at least to my satisfaction, more than 30 years ago. But time's moved on, and I no longer remember the answer, nor can I find the code. So I leave it as a challenge to you guys. Just remember, I did it using a computer with a few tens of megabytes of main memory.