Lecture 9

\( \newcommand{\set}[1]{\{#1\}} \newcommand{\comprehension}[2]{\{#1\,\vert\,#2\}} \newcommand{\size}[1]{\left\vert#1\right\vert} \newcommand{\true}{\top} \newcommand{\false}{\bot} \newcommand{\limplies}{\rightarrow} \newcommand{\divides}{\mathbin{\vert}} \newcommand{\mult}{\mathbin{\cdot}} \newcommand{\xor}{\oplus} \newcommand{\union}{\cup} \newcommand{\intersect}{\cap} \newcommand{\complement}[1]{\overline#1} \newcommand{\powerset}{\mathcal{P}} \newcommand{\ixUnion}{\bigcup} \newcommand{\ixIntersect}{\bigcap} \newcommand{\Div}{\mathrm{div}} \newcommand{\gcd}{\mathrm{gcd}} \newcommand{\divmod}{\mathop{\mathbf{divmod}}} \newcommand{\div}{\mathop{\mathbf{div}}} \newcommand{\mod}{\mathop{\mathbf{mod}}} \)

The midterm is scheduled for 7:00pm-8:15pm on Monday, February 6th, in Kent 120. Please try to make every possible arrangement necessary to be able to take the exam at that time. If there is an impossible conflict, please let me know as soon as possible.

Counting

Counting is ubiquitous in the design and analysis of communication and computing systems.

Example 9.1 (Illinois License Plates) Let's consider a real-world motivating problem, which illustrates a number of standard issues. The State of Illinois has issued license plates in a variety of formats. For the sake of simplicity, let's consider just passenger vehicles, and standard issues (i.e., we won't consider vanity or other special series plates). There have been several different series used (in what follows, “n” denotes a digit, and "a" denotes a letter):

We might ask the question, “How many distinct Illinois passenger vehicle license plates are possible?” Or perhaps even the question, “Why are there so freaking many different formats?!” As it turns out, these are not unrelated problems. And it also turns out that we're going to have to make some educated guesses along the way, as the documentation available isn't perfect.

Theorem 9.2 (The Product Rule) For all finite sets $A$ and $B$, $\size{A \times B} = \size{A} \mult \size{B}$, i.e., the number of ordered pairs of the form $(a,b)$ is the product of the number of $a$'s and the number of $b$'s.

This was actually Exercise 2.10.

The product rule generalizes to tuples of arity greater than $2$, in the expected way: if we defined the iterated cartesian product $\prod_{i=1}^n A_i$ to mean the set of all $n$-tuples whose $i$-th coordinate is drawn from $A_i$, then

Corollary 9.3 (The Iterated Product Rule)

\begin{equation*} \size{\prod_{i=1}^n A_i} = \prod_{i=1}^n \size{A_i}. \end{equation*}

Note here that we're using the notation $\prod$ ambiguously here: it refers to the iterated cartesian product on the left-hand side of the equation, and iterated multiplication on the right-hand side.

For example, let's consider how many license plates could have been issued with the first format: nnnnn. We know that each digit take on one of $10$ values, and so by the product rule, there are $10^5$ possible formats. But..., the real world has a constraint that this doesn't account for: leading zeros were suppressed, introducing an “elision rule” for this format. License plate numbers really were numbers! But this meant that there could be no plate that corresponded to 00000! So the actual count is $10^5-1 = 99,999$.

Our second rule of counting is:

Theorem 9.4 (Inclusion-Exclusion) For all finite sets $A$ and $B$, \begin{equation*} \size{A \union B} = \size{A} + \size{B} - \size{A \intersect B}. \end{equation*}

We saw this before as Theorem 2.12. An important special case comes with the sets $A$ and $B$ are disjoint, i.e., when $A \intersect B = \emptyset$:

Theorem 9.5 (The Sum Rule) Let $A$ and $B$ be disjoint finite sets. \begin{equation*} \size{A \union B} = \size{A} + \size{B} \end{equation*}

This, by the way, gives us an alternative way to understand the count of the nnnnn series. Let's used d to indicate a non-zero digit. We can understand the nnnnn series as being a disjoint union of:

And then we use the product rule to compute the number of such plates as $9$, $9 \mult 10$, $9 \mult 10^2$, $9\mult 10^3$, and $9\mult 10^4$ respectively, and the disjointness of the series to sum these counts, resulting in $99,999$.

The format annnnn was introduced reasons of vanity rather than capacity: people wanted to have low numbers, and they were willing to pay for them. If you couldn't get $1$, with a little money and a little luck, you could get $A1$. The records available through the Illinois Secretary of State suggest that only a limited number of letters were actually used, and they don't include "O" or "I," either of which is likely to be confused with a digit, and so it seems reasonable that the omission was intentional. So let's introduce "b" as a format symbol for a letter other than “I” or “O.” So the number of license plates of the form bnnnn was (as a first cut) $24 \mult 10^4$, or, as leading zeros were suppressed, making an “all zero” choice impossible, there were $24 \mult 9,999$ possible plates in the annnn series, for $99,999 + 24 \mult 9,999 = 339,975$ between the two early series.

*Exercise 9.6 Give a more precise analysis of the bnnnn format, much as we did for the nnnnn format in the remarks immediately after Theorem 9.5, dividing the format into distinct forms without elisions, and carrying out the count of each format by the product rule, and of the whole by the sum rule.

We now want to accommodate the additional series

A reasonable question is whether or not these represent truly distinct series, and the answer appears to be “no.” The precise details whereby a number formatted varied some from year-to-year, but there weren't distinct examples of these series used within a single year. Moreover, as before, leading zero's were suppressed, so at this point, it seems that need only consider two series to handle the first few years:

i.e., $9,999,999 + 24\mult 9 999$ = 10,239,975.

Of course, this gets at another large ambiguity in such systems, and that is the re-issuance of numbers that have fallen into disuse, as well as obsolete formats (like an nnn), and how they should be accounted for. We're trying to make reasonable decisions here, but the real world isn't always as neat as we'd like it to be.

We now have to account for the “modern” formats, which have been in use since the 60's:

In these formats, the n's are not subject to elision rules, thus AA 0000 is a possible license plate, but AA 000 is not. But at the same time, there's the issue of distinguishing closely formats which use the same block sizes (i.e., accounting for white space). In particular, it seems likely that the second letter in the aan nnnn format can't be an "O" or an "I," lest such a plate be confused with a plate from the ann nnnn series. We'll assume that the last letter in a sequence of two or more letters can't be an "O" or an "I" for the same reason.

So we have $26^2 \mult 10^4$ plates with the format aa nnnn, we have $(26-3) \mult 26 \mult (26-2) \mult 10^3$ plates with the aaa nnn format, we have $(26-3) \mult 10^5$ plates with the a nn nnn format, $(26-3) \mult 10^6$ plates with the a nnn nnn format (which we'll assume is the same as the ann nnnn format), and $(26 - 3) \mult (26-2) \mult 10^5$ plates with the aan nnnn format, for a total (includig the two earlier formats) of

\begin{align*} &(10^7-1) \tag{n nnn nnn}\\ &+ 24 \mult (10^4-1)\tag{annnn}\\ &+ (26-3) \mult (26-2) \mult 10^4 \tag{aa nnnn}\\ &+ (26-3) \mult 26 \mult (26-2) \mult 10^3 \tag{aaa nnn}\\ &+ (26-3) \mult 10^6 \tag{a nnn nnn,ann nnnn}\\ &+ (26-3) \mult (26-2) \mult 10^5 \tag{aan nnnn}\\ &= 108,311,975 \end{align*}

Just remember, there was a time when $99,999$ seemed like more vehicles than could possibly exist in the state, too.

Exercise 9.7 What is the next format that is likely to be used, and how many additional plates will it make possible?

Some simpler examples

Example 9.8 (Grandma's Calendars) We tend to think of recycling as a relatively recent phenomenon, but folks have been doing it for a long time. My grandmother had a set of calendars that were printed on linen, and which hung on the kitchen door. If you think about it for a moment, you don't need a calendar for every year: you only need $14 = 7 \mult 2$ calendars, which accounts for the day of the week of January 1st, and whether or not a given year has a leap day. She reused those calendars year after year, e.g., using the 1931 calendar (January 1st is Tuesday, no leap day) again in 1974 (which had the same pattern).

Although a simple example, this exemplifies the product rule in action: to specify a particular calendar, you have to specify two independent pieces of information: the day that January 1st falls on (7 choices), and whether or not the year has a leap day (2 choices). Every pair of choices corresponds to a calendar, and every calendar to a pair of choices. There is a simple 1-1 correspondence, allowing us to count via the product rule.

Exercise 9.9 Assume that grandma was given her set of 14 calendars in 1928. What calendar was least frequently used? Assuming she lived until 1980, how many times was that calendar used?

A somewhat more complicated issue arises when there's a dependence on the pieces of information needed to specify a particular object in a set to be counted.

Example 9.10 (Teaching Discrete Math) There are three sections of undergraduate discrete math this year (counting both honors and regular variants). There are three different faculty who currently teach undergraduate discrete math: Professors Kurtz, Simon, and Razborov. How many different assignments are possible?

Let's assume we first assign a class to Professor Kurtz. There are three choices. Next, we assign a class to Professor Razborov, but we can't assign him the class already assigned to Professor Kurtz, so there will only be two choices for him. Finally, we assign to Professor Simon the only class that remains, a single choice. [Note: The process doesn't exactly work this way!] The total number of possible assignments is fixed by the product rule as $3 \mult 2 \mult 1 = 6$.

This is actually a special case of a more general counting principle.

Theorem 9.11 (Counting 1-1 functions) If $\size{A} = m$ and $\size{B} = n$, then the number of 1-1 functions from $A$ to $B$ is $n \mult (n-1) \mult \cdots (n-m+1)$.

This is easily seen by the product rule: we have $n$ choices as to where to send the $1$-st element of $A$, $n-1$ choices as to where to send the $2$-nd, up to $n-m+1$ choices as to where to send the $m$-th.

$\Box$ Theorem 9.11

Example 9.12 (Explore the Pour) Chris, Joe, and Stu go to “Explore the Pour,” a beer tasting event, with the goal of trying many different beers. (It could happen...) The event has $10$ brewers, each of whom is offering $4$ different varieties. For each round, they agree to sample the wares of different brewers. How many different ways could they do it? Let's send our drinkers to beers in the order given. Chris has 40 choices: $10$ brewers, and $4$ varieties per brewer. Joe has only $36$ choices, because Chris's choice doesn't just rule out the beer he chose, it ruled out the brewer, so Joe has $9$ brewers to chose from, and $4$ varieties per brewer. Stu, who goes last, has to console himself with only $32$ choices, as Chris and Joe have ruled out $2$ brewers, leaving $8$, times $4$ varieties per brewer. So, there are $40 \times 36 \times 32 = 46,080$ possible distinct rounds.

Example 9.13 (Counting Passwords) Suppose we have a computer system in which passwords can be 6, 7, or 8 characters long, where each character is either a lower-case letter or a digit, and where at least one digit is used. Moreover, suppose that an intruder is able to mount a brute-force attack against a user, trying $1000$ passwords per second. How much time does it take for the intruder to try all legal passwords?

Let's approach this by first considering a password of length $k$. There are $36^k$ distinct sequences of digits and letters, of which $26^k$ are forbidden because they consist only of letters. Therefore, the total number of passwords is $$\sum_{i=6}^8 (36^k-26^k) = 2,684,483,063,360.$$ We then compute as follows:

\begin{align*} &\text{$2,684,483,063,360$ passwords}\\ &\times \text{$1$ second / $1000$ passwords} \\ &\times \text{$1$ minute/ $60$ seconds}\\ &\times \text{$1$ hour / $60$ minutes}\\ &\times \text{$1$ day / $24$ hours}\\ &\times \text{$1$ year / $365.2422$ days}\tag{Gregorian Calendar}\\ &= \text{$85.07$ years} \end{align*}

Unless our intruder is very patient, he may want to find a better strategy.

*Exercise 9.14 Now, let's say that our intruder is able (by a clever bit of social engineering) to get a copy of the password file, which contains the passwords for all the accounts in an encrypted format (following standard practice). The basic problem is unchanged, but now the intruder has a cluster of 64 machines, each of which can try $10^6$ passwords per second. How long does it take our intruder now to try all legal passwords?