Lecture 2: A Brief Introduction to Set Theory
\( \newcommand{\set}[1]{\{#1\}} \newcommand{\comprehension}[2]{\{#1\,\vert\,#2\}} \newcommand{\size}[1]{\vert#1\vert} \newcommand{\true}{\top} \newcommand{\false}{\bot} \newcommand{\limplies}{\rightarrow} \newcommand{\divides}{\mathbin{\vert}} \newcommand{\mult}{\mathbin{\cdot}} \newcommand{\xor}{\oplus} \newcommand{\union}{\cup} \newcommand{\intersect}{\cap} \newcommand{\complement}[1]{\overline#1} \newcommand{\powerset}{\mathcal{P}} \newcommand{\ixUnion}{\bigcup} \newcommand{\ixIntersect}{\bigcap} \newcommand{\div}{\mathrm{div}} \newcommand{\gcd}{\mathrm{gcd}} \newcommand{\div}{\mathop{\mathrm{div}}} \newcommand{\mod}{\mathop{\mathrm{mod}}} \)The Ur-structure of classical (i.e., contemporary) mathematics is the universe $\mathbb{V}$ of sets, together with the binary membership relation $\in$. Mathematicians (for the most part) accept set theory (more specifically, Zermelo-Frankel or the very similar Gödel-Bernays-von Neumann set theories) as a foundation for mathematics because they've become adept at representing other mathematical structures within set theory, all-but reducing the foundational questions of mathematics to the consistency of set theory and variations thereof, and because it provides a common and expressive language for discussing mathematical concepts. Our interest today is going to be primarily in that common, expressive language.
Definition 2.1 A set is an unordered collection of objects, which are called the members or elements of the set. If $A$ is a set, and $a$ is an element of that set, then we write $a \in A$, and likewise $a \not\in A$ if $a$ is not an element of $A$.
In pure set theory, the only objects are sets, and therefore a set can only contain (have as elements) other sets. In pure set theory, it's “sets all the way down,” until you hit the empty set, i.e., the set that doesn't contain any elements. For the most part, we won't be working in pure set theory, but instead will be using set theory as a tool as we work on other systems. In particular, we'll often form sets out of objects that we don't think of as sets, even if, from a foundational point of view, they're ultimately represented by sets.
Definition 2.2 The set which contains no elements is commonly called either the empty set or the null set, and is typically denoted by either $\emptyset$ or $\{\}$.
We'll often denote finite sets by simply enumerating their elements between set-braces, e.g., $\{0,2,4,6,8\}$ is the set of even natural numbers less than $10$. We'll often abuse this notation, using ellipses, to describe sets too large to enumerate conveniently, e.g., $\{2,4,6,\ldots,98\}$ to denote the set of even natural numbers less than $100$, or even $\{1,3,5,\ldots\}$ to denote the infinite set of odd natural numbers.
Another common approach is to use comprehensions (note that Rosen uses the nonstandard terminology set-builder notation), e.g., $\comprehension{x}{\text{$x$ is an even natural number less than $1000$}}$. In principle, it should be possible to reduce the definitional part of the comprehension to a formula in the first order predicate calculus, e.g., $\comprehension{x}{x \lt 1000 \land \exists n,\, 2 \cdot n = x}$. Careless use of comprehensions can result in contradictions, and so formal versions of set theory restrict the use of comprehensions, usually limiting them to selecting a subset of a given set, e.g., $\comprehension{x\in\mathbb{N}}{\exists n,\, 2 \cdot n = x}$. In most circumstances, the intended universe of the comprehension is clear from context, and is often omitted in practice. As a further, informal practice, mathematicians will often (in circumstances where the intended universe is implicit) combine a comprehension with mapping, e.g., $\comprehension{n^2}{n\in\{0,2,4,\ldots,100\}}$ to denote the squares of all even natural numbers less than or equal to $100$.
Definition 2.3 Two sets $A$ and $B$ are equal if and only if they have the same elements, i.e.,
\begin{equation*} A = B \iff (\forall x,\, x \in A \iff x \in B) \end{equation*}We'll take this as a given, but there is something interesting lurking behind the scenes here. The forward direction of this definition (i.e., that equal sets have the same elements) is follows from logical properties of equality. But the reverse direction (i.e., that two sets that have the same elements are necessarily equal) is not a logical property, but instead is the extensionality axiom of formal set theory.
Thus, for example, the sets $A = \{1,3,5,7,9\}$ and $B = \comprehension{2x+1}{x \in \set{0,1,2,3,4}}$ are equal, because they contain the same elements. Indeed, both are equal to $C = \set{9,7,5,3,1}$, because sets are unordered as collections, and only set membership matters.
Definition 2.4 A set $A$ is a subset of a set $B$ if all elements of $A$ are also elements of $B$, in notation,
\begin{equation*} A \subseteq B \iff (\forall x,\, x \in A \limplies x \in B) \end{equation*}Note here that Rosen uses $\subset$ to denote a proper subset, i.e., $A \subset B \iff A \subseteq B \land A \not= B$, in an obvious analogy to $\leq$ vs. $\lt$. Unfortunately, there's a second notational system in common use, in which $\subset$ denotes a subset, and $\subsetneq$ is used to denote a proper subset, and it is this alternative system that your instructor grew up with. For my part, I'll try to be extra-fastidious, writing $\subseteq$ or $\subsetneq$, and avoiding $\subset$ altogether as being ambiguous. The moral is that if you're not sure what $\subset$ means on the chalkboard or in these notes, ask, because it means that I've let my guard down one way or the other.
Definition 2.5 The cardinality of a set $A$ (in notation, $\size{A}$), is the number of elements that $A$ contains. If the cardinality is finite, then it will be a natural number (indeed, the natural numbers are just mathematical abstractions of the sizes of finite sets). For infinite sets, the notion of cardinality is somewhat fraught, but we will use $\aleph_0$ to denote the cardinality of the natural numbers (and all other infinite countable) sets. The cardinality of the real numbers $\mathbb{R}$ is $2^{\aleph_0}$, and even if cardinal exponentiation goes beyond this course, your intuitions from ordinary arithmetic should give the impression that it is much, much, larger than $\aleph_0$. It is.
Note that the symbol $\aleph$ is “aleph,” the first letter of the Hebrew alphabet. Mathematicians can never have enough alphabets, although to the best of my knowledge modern mathematicians haven't gotten around to the Akkadian cuneiform alphabet. Yet. In case you're wondering, there's an alternative system of infinite cardinal notations built around $\beth$, the “bet” or “beth” numbers, after the second letter of the Hebrew alphabet. In this system $\beth_0 = \aleph_0$, and $\beth_1 = 2^{\beth_0} = 2^{\aleph_0}$, which might lull you in to believing that if you describe the cardinality of the continuum as $\beth_1$, people will know what you're talking about. Don't count on it! The standard notation for the cardinality of the continuum is $2^{\aleph_0}$, as only set theorists care about $\beth$-numbers, let along $\gimel$-functions, etc.
As examples,
- $\size{\emptyset} = 0$,
- $\size{\set{1,3,5}} = 3$, and
- $\size{\set{2,4,6,8,\ldots}} = \aleph_0$.
Definition 2.6 The power set of a set $A$ (in notation, $\powerset(A)$) is the set of all its subsets, i.e.,
\begin{equation*} a \in \powerset(A) \iff a \subseteq A, \end{equation*}or, equivalently,
\begin{equation*} \mathcal{P}(A) = \comprehension{a}{a \subseteq A}. \end{equation*}Theorem 2.7 For finite sets $A$, $\size{\powerset(A)} = 2^{\size{A}}$.
Proof We prove this by induction on $\size{A}$. If inductive arguments aren't familiar to you, don't worry about it yet. We'll consider them more systematically later.
If $\size{A} = 0$, then $A = \emptyset$, and we can easily see that $\powerset(A) = \set{\emptyset}$, whence $\size{\powerset(\emptyset)} = \size{\set{\emptyset}} = 1$, as required.
If $\size{A} = n+1$, we may assume inductively that if $\size{B} = n$, then $\size{\powerset(B)} = 2^n$, and must show $\size{\powerset(A)} = 2^{n+1}$. To that end, fix $a \in A$, and define $A' = \comprehension{x\in A}{x \not= a}$. The cardinality of $A'$ will be one less than the cardinality of $A$, since we've removed exactly one element ($a$) to form $A'$ from $A$, and therefore $\size{A'} = n$ and by our inductive hypothesis, $\size{\powerset(A')} = 2^n$.
Notice that $\powerset(A) = \powerset(A') \union \comprehension{\set{a} \union x}{x \in \powerset(A')}$. There is a natural one-to-one correspondence between elements of $\powerset(A')$ and $\comprehension{\set{a} \union x}{x \in \powerset(A')}$ (simply add or remove $a$, depending on which way you're going), and so $\size{\comprehension{\set{a} \union x}{x \in \powerset(A')}} = \size{\powerset(A')} = 2^n$, moreover, these two sets are disjoint, whence
\begin{align*} \size{\powerset(A)} &= \size{\powerset(A') \union \comprehension{\set{a} \union x}{x \in \powerset(A')}}\\ &= \size{\powerset(A')} + \size{\comprehension{\set{a} \union x}{x \in \powerset(A')}}\\ &= 2^n + 2^n\\ &= 2^{n+1} \end{align*}as required.
$\Box$ Theorem 2.7
Theorem 2.7 suggests something about why we represent the cardinality of the continuum by $2^{\aleph_0}$, but following up on that suggestion goes beyond this course.
Definition 2.8 The ordered pair $(x,y)$ can be defined as the set $\set{\set{x},\set{x,y}}$. This is not beautiful, but it is effective: we can prove $(a,b) = (c,d)$ if and only if $a = c$ and $b = d$, which is the property we care about. We can define tuples of higher arity via several different representation hacks, e.g., $(x,y,z) = (x,(y,z))$, a definition that would naturally occur to a functional programmer, and is often used.
Given the concept of an ordered pair, we can define the cartesian product of two sets via $A \times B = \comprehension{(a,b)}{a \in A \land b \in B}$.
*Exercise 2.9 In general, the cartesian product is not commutative, i.e., there are sets $A$ and $B$ such that $A \times B \not= B \times A$.
- Give examples of $A$ and $B$ such that
- $A \times B = B \times A$, and
- $A \times B \not= B \times A$.
- State and prove a necessary and sufficient condition on $A$ and $B$ such that $A \times B = B \times A$. While it is sufficient that $A = B$, it is not necessary.
Exercise 2.10 Exercise 2.9 shows that cartesian products are not necessarily commutative. Nevertheless, there is a nice relationship between cartesian products and (commutative) multiplication on the natural numbers: $\size{A \times B} = \size{A} \cdot \size{B}$. Prove this for finite sets $A$ and $B$. Infer that $\size{A \times B} = \size{B \times A}$.
Set Operations
Definition 2.11 If $A$ and $B$ are sets, then the union of $A$ and $B$ is $A \union B = \comprehension{x}{x \in A \lor x \in B}$. Likewise, the intersection of $A$ and $B$ is $A \intersect B = \comprehension{x}{x \in A \land x \in B}$.
Theorem 2.12 (Simple Inclusion-Exclusion). For all finite sets $A$ and $B$, $\size{A \union B} = \size{A} + \size{B} - \size{A \intersect B}$.
Proof. We can either count each element of $A \union B$ once (resulting in $\size{A \union B}$), or we can count the elements of $A$, and then then the elements of $B$, which counts the elements in common ($A \intersect B$) twice, and then correcting for where we double-counted.
$\Box$ Theorem 2.12.
Definition 2.13 The difference of two sets $A$ and $B$ is $A \setminus B = \comprehension{x}{x \in A \land x \not\in B}$.
The notion of the difference of two sets gives us another way to understand the inclusion-exclusion principle. First we begin with a simple counting principle, which we've already relied on in our proof of the cardinality of the power set:
Definition 2.14 Two sets $A$ and $B$ are said to be disjoint if $A \intersect B = \emptyset$.
Lemma 2.15 If $A$ and $B$ are disjoint sets, then $\size{A \union B} = \size{A} + \size{B}$.
This is so because whether we count the elements of $A\union B$ together, or count them separately and sum the results, each element is counted exactly once, and so we get the same result.
Lemma 2.16 For all sets $A$ and $B$, $A = (A \setminus B) \union (A \intersect B)$, moreover, $A \setminus B$ and $A \intersect B$ are disjoint.
We're now set up to give an alternative proof of the inclusion-exclusion principle:
It is easy to see that the sets $A \setminus B$, $B\setminus A$, and $A \intersect B$ partition $A \union B$, i.e., these sets are pair-wise disjoint, and their union is $A \union B$. By applying Lemma 2.15 twice, we see
\begin{align*} \size{A \union B} &= \size{A\setminus B} + \size{B \setminus A} + \size{A \intersect B}\\ &= \size{A\setminus B} + \size{B \setminus A} + \size{A \intersect B} + \size{A \intersect B} - \size{A \intersect B}\\ &= (\size{A\setminus B} + \size{A \intersect B}) + (\size{B \setminus A} + \size{A \intersect B}) - \size{A \intersect B}\\ &= \size{A} + \size{B} - \size{A \intersect B} \end{align*}as required.
You may wonder that I'd bother to prove a result twice. Mathematicians care about both theorems and proofs, but at the end of the day, they view proofs as more important than theorems, because different proofs convey different insights, and so provide different tools that may be used to attack future problems. This gets at a very important point: the reason for taking mathematics, for taking this class, isn't so much to learn a specific body of knowledge, as it is to develop a flexible skill that enables you to tackle new problems that aren't covered in the texts.
Rosen discusses the notion of the complement of a set, relative to the universal set $U$. This is dangerous language, and can lead to contradictions. In day-to-day mathematics, we can usually assume that the universe of that structure is a set; but the universe of set theory cannot itself be a set (a famous theorem due to Bertrand Russell). Logicians make a distinction between small structures, i.e., structures where the universe is a set, like $\mathfrak{N}$, and large structures, like set theory itself, where they are not.
Within small structures $\mathfrak{A}$, which are by far the usual kind, we can define the complement of a set $A$ as $\complement A = U^{\mathfrak{A}} \setminus A = \comprehension{x \in U^{\mathfrak{A}}}{x \not\in A}$.
There are a large number of set-theoretic identities which can be thought of as “liftings“ of propositional tautologies. For example, we know that $\lor$ is associative, thus
\begin{align*} A \union (B \union C) &= A \union \comprehension{x}{x \in B \lor x \in C}\\ &= \comprehension{x}{x \in A \lor (x \in B \lor x \in C)}\\ &= \comprehension{x}{(x \in A \lor x \in B) \lor x \in C}\\ &= (A \union B) \union C \end{align*}and so $\union$ is associative as well. Similarly, the associativity of $\land$ “lifts” to the associativity of $\intersect$.
Exercise 2.17 Prove the following two distributive laws for $\union$ and $\intersect$, via suitable propositional tautologies:
- $A \union (B \intersect C) = (A \union B) \intersect (A \union C)$.
- $A \intersect (B \union C) = (A \intersect B) \union (A \intersect C)$.
*Exercise 2.18 State and prove set-theoretic analogues of de Morgan's laws.
Much as we can think of $\exists$ as a (potentially infinite) kind of $\lor$ that's indexed over the elements of a universe, and $\forall$ as a (potentially infinite) kind of $\land$, we introduce indexed versions of $\union$ and $\intersect$ as follows.
Definition 2.19 Let $A_i$ be a family of sets indexed over a set $I$.
- $\ixUnion\limits_{i \in I} A_i = \comprehension{x}{\exists i \in I,\, x \in A_i}$, and
- $\ixIntersect\limits_{i \in I} A_i = \comprehension{x}{\forall i \in I,\, x \in A_i}$.
Functions
There are a lot of muddled presentations of the notion of a function, because (a) functions are very important, (b) there are a number of (not quite equivalent) ways to think about functions, and (c) these different ways of thinking about functions capture different intuitions about what functions are, and even of what mathematics is.
Intuitively, a function $f$ associates to every element of a set $A$ an element of (possibly the same) set $B$. In this case, we write $f: A \to B$, and if $a \in A$, we write $f(a)$ to denote the element of $B$ associated with $a$. In a common case, there is a simple rule or process that defines the function, cf., $f: \mathbb{N} \to \mathbb{N}$, given by $f(x) = x^2 + 2 x + 1$. This sort of definition by expression is extremely non-problematic, as it is manifestly associates a unique element of $B$ to every element of $A$. The practical problem is that expressions-as-functions are inadequate to express the full extent of mathematicians' imaginations. There are all sorts of ways of associating elements $A$ with elements of $B$ that can't be captured via expressions. A mathematician might naturally propose an implicit definition, e.g., attempting to define the square-root function via $\sqrt x = y \iff y \geq 0 \land y^2 = x$. Such an implicit definition raises questions about both totality, i.e., is every element of $a$ associated with some element of $B$?, as well as single-valuedness, i.e., is every element of $a$ is associated with at most one element of $B$? A computer scientist might naturally think to define a function by writing a program, raising questions of convergence, i.e., totality, if not single-valuedness.
Set theory wanders into this epistemological breach with a few definitions of its own, in the attempt to give mathematicians as free a reign for their intuition as possible, while still keeping them out of foundational trouble.
Definition 2.20 A relation $R$ with domain $X$ and co-domain $Y$ is simply a subset of $X \times Y$. We'll usually write $R(x,y)$ rather than $(x,y) \in R$, or use infix notation for binary relations, e.g., $3 \lt 4$, or even $x \mathbin{R} y$.
As relations are simply sets, we can manipulate them using the usual boolean operations of set theory, i.e., $\union$, $\intersect$, and complementation (relative to $A \times B$). In addition, there are operations that are specific to relations, cf.,
Definition 2.21 If $R \subseteq A \times B$ and $S \subseteq B \times C$ are relations, then the join of $R$ and $S$ is
\begin{equation*} S \circ R = \comprehension{(a,c) \in A \times C}{\exists b \in B,\, R(a,b) \land S(b,c)}, \end{equation*}a relation on $A \times C$. Note here that if the order of operands seems backwards, it won't pretty soon.
Of course, our interest is in functions, not relations. To that end, we define:
Definition 2.22 A relation $R \subseteq A \times B$ is total if $\forall a \in A,\, \exists b \in B,\, R(a,b)$. A relation $R$ is single-valued if
\begin{equation*} \forall a \in A, \, \forall b_1 \, b_2 \in B,\, R(a,b_1) \land R(a,b_2) \limplies b_1 = b_2. \end{equation*}Definition 2.23 A function $f: A \to B$ is a total, single-valued relation with domain $A$ and co-domain $B$.
Here we get to the crux of the matter. To a set theorist, the definition above is perfectly satisfactory: a function is a total, single-valued relation, i.e., a set of ordered pairs that satisfies a few additional properties. To many mathematicians, this is a dry definition, which squeezes the very life out of their more organic understanding of what a function is. To them, the set of pairs defined above is the graph of the function, and while they'll reluctantly stipulate a one-to-one correspondence between such graphs and the function they represent, they'll deny as sterile the notion that the graph is the function. As a practical matter, it makes little difference. Sometimes, we'll define a function explicitly via an expression, mooting questions of totality and single-valuedness; and sometimes we'll define a relation (typically via a mathematical argument that amounts to a comprehension), and argue that the relation implicitly defines a function, by proving that it is well-defined as a function, i.e., it is total and single-valued.
Exercise 2.24 Let $f: A \to B$ and $g: B \to C$ be functions. Argue that $g \circ f$ is a function, and indeed that $(g \circ f)(a) = g(f(a))$, and so $\circ$ restricted to functions is just composition.
The exercise above gives us a fruitful way of thinking about relational join. If we turn the set-theoretic world on its head, and rather than viewing functions as special cases of relations, view relations as generalizations of functions, then we might say that a relation can be thought of as a partial, multi-valued generalization of a function, and that join is the natural generalization of composition on total, single-valued functions to partial, multi-valued functions.
There are several important kinds of functions:
Definition 2.25 A function $f:A \to B$ is one-one (or injective) if $\forall x \, y\in A,\, f(x) = f(y) \limplies x = y.$. For example, the function $f:\mathbb{N} \to \mathbb{N}$ given by $f(x) = 0$ for all $x$ fails very badly to be one-one (cf., $f(0) = f(1)$, but $0 \not= 1$), but $g:\mathbb{N} \to \mathbb{N}$ given by $g(x) = x^2$ is one-to-one. Note that $g$ is not one-to-one if defined as a function from $\mathbb{Z}$ to $\mathbb{Z}$, because $g(-1) = 1 = g(1)$.
Exercise 2.26 Prove that a function $f$ is one-one if and only if $\forall x\, y \in A, x \not= y \limplies f(x) \not= f(y)$.
Definition 2.27 A (left-) inverse of a function $f$ is a function $g$ such that $\forall x,\, g(f(x)) = x$.
Working with inverse functions isn't as straightforward as some presentations suggest, because of potential mismatches between the image of $f$, and its range. Let's tease this out. If $f: A \to B$, then the range of $f$ is $B$. But the image (or more generally, the image of $A$ under $f$) is $\comprehension{f(x)}{x\in A}$. These sets need not coincide! But if they do, things are straightforward:
Definition 2.28 A function $f:A \to B$ is onto (or surjective) if $\forall b \in B,\, \exists a \in A,\, f(a) = b$.
Exercise 2.29 Assume $f:A \to B$ is a correspondence (or bijection), i.e., $f$ is one-one and onto. Show that $f$ has a unique inverse (which we'll denote by $f^{-1}$), and that $\forall x \in B,\, f(f^{-1}(x)) = x$, i.e., that $f$ is $f^{-1}$'s inverse.
For example, the squaring function $g$ above is not onto if considered as a function from $\mathbb{N}$ to $\mathbb{N}$, because there is no square root of 2 in the natural numbers; but if viewed as a function from $\mathbb{R}^+$ (the set of positive reals) to $\mathbb{R}^+$, then it is onto. These examples should make clear that a function isn't just a rule of correspondence (like squaring), but it includes specification of a domain and range.