We usually associate arithmetic with the infinite set of integer numbers. However, modular arithmetic on finite sets is commonly used in our daily life. As an example, if it is now 1 am and we let 1000 hours pass, what time will it be? We can use the division algorithm to see that $1000 = 41 \times 24 + 16$ and conclude that adding 1000 hours is like adding 16 hours, since the clock returns to the same position every 24 hours. So after 1000 hours it will be 5 pm (17 hours after midnight).
There are many examples in which it is natural and useful to limit our number system to a finite range of integers, such as 0 through $n-1$, for some $n$. This number system is denoted by $\mathbb{Z}_n$. Days of the week, hours of the day, minutes in an hour are all familiar examples of finite number systems, as are numbers in microprocessor registers, commonly limited to 32 binary digits.
Modular arithmetic allows us to add, subtract, multiply, and sometimes divide numbers while staying within the finite set $\mathbb{Z}_n$. The number $n$ is called the modulus. A central notion in modular arithmetic is congruence. We say that two integers are congruent modulo $n$ if they leave the same remainder when divided by $n$. Here is the formal definition:
Definition: Two integers $a, b \in \mathbb{Z}$ are said to be congruent modulo $n$, written as $a \equiv_n b$ or $a \equiv b \pmod{n}$, if and only if they leave the same remainder when divided by $n$, that is, $a \bmod n = b \bmod n$.
This definition captures our intuition that the day of the week will be the same whether we let 10, 17, or 80 days pass. There is an equivalent definition of congruence that is often useful in proofs:
Lemma 6.1.1. $a \equiv_n b$ if and only if $n \mid (a - b)$.
Proof. If $a \equiv_n b$ then $a \bmod n = b \bmod n$. Put $r = a \bmod n = b \bmod n$. Then there exist two integers $q_1$ and $q_2$, such that $a = q_1 n + r$ and $b = q_2 n + r$. Subtracting the second equation from the first, we get $a - b = (q_1 - q_2) n$ and $n \mid (a - b)$.
On the other hand, if $n \mid (a - b)$ then there exists an integer $d$, such that $a - b = nd$. By the division algorithm, there exist integers $q_1, q_2 \in \mathbb{Z}$, and $0 \leq r_1, r_2 < n$, such that $a = q_1 n + r_1$ and $b = q_2 n + r_2$. Thus $(q_1 - q_2) n + (r_1 - r_2) = nd$, and $r_1 - r_2 = (q_2 - q_1 + d)n$. Thus $n \mid (r_1 - r_2)$. However, $|r_1 - r_2| < n$, so necessarily $r_1 - r_2 = 0$, which implies that $a \bmod n = b \bmod n$, and $a \equiv_n b$.
You should use the definition to verify that for any $a, b, c \in \mathbb{Z}$,
The operations of addition, subtraction, and multiplication on $\mathbb{Z}_n$ are defined by first doing the corresponding operation in $\mathbb{Z}$ and then taking the remainder modulo $n$. That is, if we denote these respective operations by $+_n, -_n,$ and $\cdot_n$, then
\[a +_n b = (a + b) \bmod n\] \[a -_n b = (a - b) \bmod n\] \[a \cdot_n b = (ab) \bmod n\]Exponentiation is defined through repeated multiplication.
Lemma 6.1.2. Properties of congruence:
(a) $(a \bmod n) \bmod n = a \bmod n$
(b) $(a \bmod n) \equiv_n a$
(c) $(ab) \bmod n = (a \bmod n)(b \bmod n) \bmod n$
(d) $(a \bmod n)(b \bmod n) \equiv_n ab$
(e) $\prod_{i=1}^k (a_i \bmod n) \equiv_n \prod_{i=1}^k a_i$
(f) If $a_1 \equiv_n a_2$ and $b_1 \equiv_n b_2$ then
\[a_1 + b_1 \equiv_n a_2 + b_2\] \[a_1 - b_1 \equiv_n a_2 - b_2\] \[a_1 b_1 \equiv_n a_2 b_2\]Proof. (b) is just a restatement of (a). To prove these we need to show that $n \mid (a - (a \bmod n))$. Put $r = a \bmod n$. By the division algorithm, there exists $q \in \mathbb{Z}$, such that $a = qn + r$. Thus $a - r = qn$, which implies that $n \mid (a - r)$ and concludes the proof.
(d) is a restatement of (c), and (e) can be proved from (d) by induction. To prove (c) we need to show that $n \mid (ab - (a \bmod n)(b \bmod n))$. Use the division algorithm to represent $a = q_1 n + r_1$ and $b = q_2 n + r_2$. Then
\[ab - (a \bmod n)(b \bmod n) = (q_1 n + r_1)(q_2 n + r_2) - r_1 r_2 = (q_1 q_2 n + r_1 q_2 + q_1 r_2)n,\]which implies the claim.
We now prove (f). We know that $n \mid (a_1 - a_2)$ and $n \mid (b_1 - b_2)$. That is, there exist integers $q$ and $s$, such that $a_1 - a_2 = qn$ and $b_1 - b_2 = sn$. Adding these equations gives $(a_1 + b_1) - (a_2 + b_2) = (q + s)n$, which yields the first part of the claim. Subtracting similarly gives the second part. Writing $a_1 = a_2 + qn$ and $b_1 = b_2 + sn$ and multiplying the equations gives
\[a_1 b_1 = a_2 b_2 + b_2 qn + a_2 sn + qsn^2\] \[a_1 b_1 - a_2 b_2 = (b_2 q + a_2 s + qsn)n,\]which yields the third part.
You might have noticed that we defined addition, subtraction, and multiplication, but not division. This might not be surprising, since the division operation is not defined for the integers in general: There is no integer that corresponds to 5 divided by 4, for instance. (In other words, there is no $x \in \mathbb{Z}$, such that $4x = 5$.) This distinguishes $\mathbb{Z}$ from sets like $\mathbb{Q}$ or $\mathbb{R}$ that are closed under division.
Division in $\mathbb{Z}_n$ appears even more unruly. For example, in $\mathbb{Z}_6$, the equation $2x = 4$ is satisfied by both $x = 2$ and $x = 5$, while the equation $2x = 3$ has no solutions. So the notion of “b divided by a” can be undefined or even ambiguous in $\mathbb{Z}_n$. In particular, we cannot generally cancel a multiplier from both sides of a congruence, that is, if $ab \equiv_n ac$ we cannot reason that $b \equiv_n c$. To take the above illustration, $2 \cdot 2 \equiv_6 2 \cdot 5$, but $2 \not\equiv_6 5$.
Quite remarkably, however, the division operation is well-defined when $n$ is a prime $p$. Thus $\mathbb{Z}_p$ is in a sense as well-behaved as the real numbers, despite being a finite set! After a small digression that explores what “well-behaved” actually means here, we will state an even more general result on modular division.
Digression (notions from abstract algebra): There is a way to precisely state what we mean by “well-behaved” above. Jumping the gun, I’ll say that $\mathbb{Z}_p$ is a field, not just a ring. Now let me tell you what this means. The notion of a ring in algebra is meant to abstract our intuition concerning the essential properties of the integers. Given a set $S$ equipped with two operations, $+$ (addition) and $\cdot$ (multiplication), we say that $S$ is a ring if the following all hold for any $a, b, c \in S$:
All the number systems we have encountered so far are rings, including $\mathbb{Z}$, $\mathbb{Q}$, $\mathbb{R}$, and $\mathbb{Z}_n$. However, some of them possess additional structure that allows the division operation. Namely, a ring is said to be a field if, in addition to the above, the following holds:
The number systems $\mathbb{R}$ and $\mathbb{Q}$, as well as $\mathbb{Z}_p$ when $p$ is prime, are fields. In fields the division operation is well-defined, and $b/a = b \cdot a^{-1}$, as can be verified by plugging $x = b \cdot a^{-1}$ into the equation $ax = b$. A field with a finite number of elements is called a Galois field, after the French mathematician Évariste Galois. (A feisty young man who died in a duel at the age of 20, after making significant enough contributions to mathematics to have a whole field (sic) named in his honor!) Anyway, now that we know what fields are, let’s see why $\mathbb{Z}_p$ is one. In fact, we prove something more general:
Theorem 6.2.1. If $a$ and $n$ are coprime then there exists exactly one $x \in \mathbb{Z}_n$ for which $ax \equiv_n b$, for any $b \in \mathbb{Z}$.
Proof. We need to prove existence and uniqueness of $x$ as described in the theorem. $ax \equiv_n b$ if and only if there exists $q \in \mathbb{Z}$, such that $ax - b = nq$, or $ax - nq = b$. Now, since $\gcd(a, n) = 1$, any integer, including $b$, is a linear combination of $a$ and $n$. This proves existence.
To prove uniqueness, assume that for $x, y \in \mathbb{Z}_n$ it holds that $ax \equiv_n b$ and $ay \equiv_n b$. Thus $ax - ay \equiv_n 0$, or $n \mid a(x - y)$. As you proved in one of the homework assignments, since $n$ and $a$ are coprime, this implies that $n \mid (x - y)$, and therefore that $x - y \equiv_n 0$. Thus $x \equiv_n y$, which proves uniqueness.
Corollary 6.2.2. For a prime $p$ and any $a, b \in \mathbb{Z}$, such that $a \not\equiv_p 0$, there exists exactly one $x \in \mathbb{Z}_p$ for which $ax \equiv_p b$.
The fact that division is well-defined in $\mathbb{Z}_p$ when $p$ is prime also means that cancellations become valid. Thus if $a \not\equiv_p 0$ and $ab \equiv_p ac$ we can safely conclude that $b \equiv_p c$.
We now know that $b/a$ is well-defined in $\mathbb{Z}_p$, but how do we find it? That is, how do we find $x \in \mathbb{Z}_p$, for which $ax \equiv_p b$. This question is particularly important when $p$ is large and it takes too long to simply enumerate all the elements of $\mathbb{Z}_p$. Fortunately, the following result, known as Fermat’s Little Theorem, can help us:
Theorem 6.2.3. For a prime $p$ and any $a \not\equiv_p 0$,
\[a^{p-1} \equiv_p 1.\]Proof. Consider the set $S$, defined as $1 \cdot a, 2 \cdot a, \ldots, (p - 1) \cdot a$. None of these $p - 1$ integers are congruent modulo $p$, since we have seen that if $ia \equiv_p ja$ then $i \equiv_p j$. However, each element of $S$ is congruent to some element of $\mathbb{Z}_p$. Since there are $p - 1$ elements in $S$ and $p - 1$ nonzero elements in $\mathbb{Z}_p$, the elements of $S$ must be congruent to each of $1, 2, \ldots, (p - 1)$ in some order. Therefore,
\[1 \cdot 2 \cdot \ldots \cdot (p - 1) \equiv_p 1a \cdot 2a \cdot \ldots \cdot (p - 1)a,\]or
\[1 \cdot 2 \cdot \ldots \cdot (p - 1) \equiv_p 1 \cdot 2 \cdot \ldots \cdot (p - 1) \cdot a^{p-1}.\]We can cancel each of $1, 2, \ldots, (p - 1)$ from both sides of the congruence, obtaining
\[a^{p-1} \equiv_p 1.\]Fermat’s Little Theorem allows us to quickly perform division in $\mathbb{Z}_p$. The element $x \in \mathbb{Z}_p$ for which $ax \equiv_p b$ is simply $(a^{p-2} b \bmod p)$.
The definition of a set explicitly disregards the order of the set elements, all that matters is who’s in, not who’s in first. However, sometimes the order is important. This leads to the notion of an ordered pair of two elements $x$ and $y$, denoted $(x, y)$. The crucial property is:
\[(x, y) = (u, v) \text{ if and only if } x = u \text{ and } y = v.\]This notion can be extended naturally to define an ordered $n$-tuple as the ordered counterpart of a set with $n$ elements.
Given two sets $A$ and $B$, their Cartesian product $A \times B$ is the set of all ordered pairs $(x, y)$, such that $x \in A$ and $y \in B$:
\[A \times B = \{(x, y) : x \in A, y \in B\}.\]Here is a useful special case:
\[A^2 = A \times A = \{(x, y) : x, y \in A\}.\]And here is a general definition: $A^1 = A$, and for $n \geq 2$,
\[A^n = \{(x_1, x_2, \ldots, x_n) : x_1, x_2, \ldots, x_n \in A\}.\]For example, $\mathbb{R}^2$ is the familiar Cartesian plane, and $\mathbb{R}^n$ is often referred to as the $n$-dimensional Euclidean space. If we omit the parentheses and the commas, ${a, b}^4$ is comprised of child babble and a 70s pop band:
\[\{aaaa, baba, abab, baaa, baab, aaab, aaba, abaa, abba, bbaa, bbba, bbbb, aabb, abbb, babb, bbab\}.\]Proposition 7.1.1. $(A \cup B) \times C = (A \times C) \cup (B \times C)$
Proof. Recall that for two sets $X$ and $Y$, $X = Y$ if and only if $X \subseteq Y$ and $Y \subseteq X$. Consider any element $(u, v) \in (A \cup B) \times C$. By definition, $u \in A \cup B$ and $v \in C$.
Thus, $u \in A$ or $u \in B$. If $u \in A$ then $(u, v) \in A \times C$ and if $u \in B$ then $(u, v) \in B \times C$. Thus $(u, v)$ is in $A \times C$ or in $B \times C$, and $(u, v) \in (A \times C) \cup (B \times C)$. This proves that $(A \cup B) \times C \subseteq (A \times C) \cup (B \times C)$.
Now consider any element $(u, v) \in (A \times C) \cup (B \times C)$. This implies that $(u, v) \in A \times C$ or $(u, v) \in B \times C$. In the first case $u \in A$ and $v \in C$ and in the second case $u \in B$ and $v \in C$. Thus $u \in A \cup B$ and $v \in C$, which implies $(u, v) \in (A \cup B) \times C$.
Given a set $A$, a relation on $A$ is some property that is either true or false for any ordered pair $(x, y) \in A^2$. For example, “greater than” is a relation on $\mathbb{Z}$, denoted by $>$. It is true for the pair $(3, 2)$, but false for the pairs $(2, 2)$ and $(2, 3)$. In more generality,
Definition 7.2.1. Given sets $A$ and $B$, a relation between $A$ and $B$ is a subset of $A \times B$.
By this definition, a relation $R$ is simply a specification of which pairs are related by $R$, that is, which pairs the relation $R$ is true for. For the relation $>$ on the set ${1, 2, 3}$,
\[> = \{(2, 1), (3, 1), (3, 2)\}.\]This notation might look weird because we do not often regard the symbol “$>$” as a meaningful entity in itself. It is, at least from the vantage point of the foundations of mathematics: This symbol is a particular relation.
The common usage of the symbol “$>$” (as in $3 > 2$) is an instance of a useful notational convention: For a relation $R$, $(a, b) \in R$ can also be specified as $aRb$. Thus, in the above example, $(2, 1) \in >$ can be written as $2 > 1$. How convenient!
Common mathematical relations that will concern us include
\[<, >, \leq, \geq, =, \neq, \|, \equiv_n, \subset, \subseteq, \text{ etc.}\]For example, the relation $=$ on the set $\mathbb{Z}$ is precisely the set ${(n, n) : n \in \mathbb{Z}}$ and the relation $\leq$ on $\mathbb{R}$ is the set ${(x, x + |y|) : x, y \in \mathbb{R}}$.
A relation $R$ on a set $A$ is called:
Equivalence relations. A relation that is reflexive, symmetric, and transitive is called an equivalence relation. Clearly, the common relation $=$ on the set $\mathbb{R}$, say, is an equivalence relation. Also, we have seen earlier that the congruence relation $\equiv_n$ on the set $\mathbb{Z}$ is reflexive, symmetric, and transitive, thus it is also an equivalence relation. The similarity relation on the set of triangles in the plane is another example.
Equivalence relations are special in that they naturally partition the underlying set into equivalence classes. For example, the relation $\equiv_2$ partitions the integers into even and odd ones. These are, respectively, the integers that are related (by $\equiv_2$) to 0, and the ones related to 1. Let’s formalize these concepts.
Definition 7.3.1. A partition of a set $A$ is a set $X \subseteq 2^A \setminus {\emptyset}$, such that
(a) Each $a \in A$ belongs to some $S \in X$.
(b) If $S, T \in X$, either $S = T$ or $S \cap T = \emptyset$.
Stated differently, this definition says that the set $A$ is the union of the members of $X$, and these members are disjoint. Now, given an equivalence relation $R$ on $A$, the equivalence class of $a \in A$ is defined as
\[R[a] = \{b \in A : aRb\}.\]Theorem 7.3.2. Let $R$ be an equivalence relation on a set $A$. Then ${R[a] : a \in A}$ is a partition of $A$.
Proof. Consider an equivalence relation $R$ on $A$. Due to reflexivity, every element $a \in A$ belongs to $R[a]$, which implies (a). Now, consider two equivalence classes $R[a]$ and $R[b]$. If $aRb$, then for any $c \in R[a]$, by transitivity and symmetry, $bRc$ and $c \in R[b]$. This shows $R[a] \subseteq R[b]$. We can symmetrically argue that $R[b] \subseteq R[a]$, which together implies $R[a] = R[b]$.
Otherwise, if $a \not\in R[b]$ then consider some $c \in R[a]$. If $c \in R[b]$ then $aRc$ and $bRc$, which imply, by transitivity and reflexivity, $aRb$, leading to a contradiction. Thus no element of $R[a]$ belongs to $R[b]$ and $R[a] \cap R[b] = \emptyset$. This shows (b) and concludes the theorem.
There are a few ways to define new relations from existing ones, and we describe two important such ways below.
Restrictions of relations. Here is one notion that is sometimes useful: Given a relation $R$ on a set $A$, and a subset $S \subseteq A$, we can use $R$ to define a relation on $S$ called the restriction of $R$ to $S$. Denoted by $R | S$, it is defined as |
Compositions of relations. For three sets $A$, $B$, $C$, consider a relation $R$ between $A$ and $B$, and a relation $S$ between $B$ and $C$. The composition of $R$ and $S$ is a relation $T$ between $A$ and $C$, defined as follows: $aTc$ if and only if there exists some $b \in B$, such that $aRb$ and $bSc$. The composition of $R$ and $S$ is commonly denoted by $R \circ S$.
Note that by this definition, the composition of relations on the same set $A$ is always well-defined. In particular, given a relation $R$ on $A$ we can recursively define $R^1 = R$ and $R^n = R^{n-1} \circ R$ for all $n \geq 2$. Now consider the infinite union
\[T = \bigcup_{i \in \mathbb{N}^+} R^i.\]This relation $T$ is called the transitive closure of $R$.
Proposition 7.4.1. Important properties of transitive closure:
(a) $T$ is transitive.
(b) $T$ is the smallest transitive relation that contains $R$. (That is, if $U$ is a transitive relation on $A$ and $R \subseteq U$, then $T \subseteq U$.)
(c) If $|A| = n$ then
\[T = \bigcup_{i=1}^n R^i.\]The general concept of a function in mathematics is defined very similarly to relations. In fact, as far as the definitions go, functions are relations, of a special type:
Definition 7.5.1. Given two sets $A$ and $B$, a function $f: A \to B$ is a subset of $A \times B$ such that
(a) If $x \in A$, there exists $y \in B$ such that $(x, y) \in f$.
(b) If $(x, y) \in f$ and $(x, z) \in f$ then $y = z$.
A function is sometimes called a map or mapping. The set $A$ in the above definition is the domain and $B$ is the codomain of $f$.
A function $f: A \to B$ is effectively a special kind of relation between $A$ and $B$, which relates every $x \in A$ to exactly one element of $B$. That element is denoted by $f(x)$.
If the above definition is followed rigidly, particular functions should be defined by specifying all the pairs $(x, f(x))$. This is often cumbersome and unnecessary, and we will mostly continue describing a function from $A$ to $B$ as we did before: as a rule for picking an element $f(x) \in B$ for every element $x \in A$. As in, “Consider a function $f: \mathbb{R} \to \mathbb{R}$, where $f(x) = x^2$ for all $x \in \mathbb{R}$.”
Kinds of Functions. For a function $f: A \to B$, the set $f(A) = {f(x) : x \in A}$ is called the range of $f$. The range is a subset of the codomain but may be different from it. If $f(A) = B$ then we say that $f$ is onto. More precisely, a function $f: A \to B$ is a surjection (or surjective), or onto if each element of $B$ is of the form $f(x)$ for at least one $x \in A$.
Today is the day of weird names, so: A function $f: A \to B$ is an injection (or injective), or one-to-one if for all $x, y \in A$, $f(x) = f(y)$ implies $x = y$. Put differently, $f: A \to B$ is one-to-one if each element of $B$ is of the form $f(x)$ for at most one $x \in A$.
As if this wasn’t enough: A function $f: A \to B$ is a bijection (or bijective), or a one-to-one correspondence if it is both one-to-one and onto. Alternatively, $f: A \to B$ is a bijection if each element of $B$ is of the form $f(x)$ for exactly one $x \in A$.
Compositions and Inverse Functions. Given two functions $f : A \to B$ and $g : B \to C$, we can define a new function $g \circ f : A \to C$ by $(g \circ f)(x) = g(f(x))$ for all $x \in A$. One useful function that can be defined for any set $A$ is the identity function $i_A : A \to A$, defined by $i_A(x) = x$ for all $x \in A$. We can use identity functions to define inverse functions. Specifically, if $f : A \to B$ is a bijection, then its inverse $f^{-1} : B \to A$ is defined so that $f^{-1} \circ f = i_A$ and $f \circ f^{-1} = i_B$. Of course, we haven’t shown that $f^{-1}$ even exists or that it is unique, but these properties do hold, assuming that $f : A \to B$ is a bijection. (This assumption is necessary.)
Another result that is sometimes used is the following: If $f : A \to B$ and $g : B \to C$ are bijections then $g \circ f : A \to C$ is a bijection, and
\[(g \circ f)^{-1} = f^{-1} \circ g^{-1}.\]We omit the proof.
Bijections and Cardinality. Bijections allow us to rigorously define when two sets are of the same cardinality:
Definition 7.5.2. Two sets $A$ and $B$ have the same number of elements if and only if there exists a bijection $f : A \to B$.
A proposition is a statement that is either true or false. For example, “It will rain tomorrow” and “It will not rain tomorrow” are propositions, but “It will probably rain tomorrow” is not, pending a more precise definition of “probably”.
A predicate is a statement that contains a variable, such that for any specific value of the variable, the statement is a proposition. Usually, the allowed values for the variable will come from a specific set, sometimes called the universe of the variable, which will be either explicitly mentioned or clear from context. A simple example of a predicate is $x \geq 2$ for $x \in \mathbb{R}$. Clearly, for any real value of $x$, this statement is either true or false. We denote predicates in a similar way to functions, as in $P(x)$. In fact, the connection to functions runs deep: A predicate $P(x)$ can be considered a function, $P: U \to {0, 1}$, where $U$ is the universe of the variable $x$, 1 represents truth, and 0 represents falsehood.
A predicate may have more than one variable, in which case we speak of predicates in two variables, three variables, and so on, denoted as $Q(x, y)$, $S(x, y, z)$, etc.
Given a predicate $P(x)$ that is defined for all elements in a set $A$, we can reason about whether $P(x)$ is true for all $x \in A$, or if it’s at least true for some $x \in A$. We can state propositions to this effect using the universal quantifier $\forall$ and the existential quantifier $\exists$.
Given a predicate in more than one variable, we can quantify each (or some) of the variables. For example, the statement “For every real $x$ and $y$, it holds that $x^2 - y^2 = (x - y)(x + y)$” can be formalized as
\[\forall x, y \in \mathbb{R} : x^2 - y^2 = (x - y)(x + y).\]Somewhat more interestingly, the statement “There is no greatest integer” might be formulated as
\[\forall n \in \mathbb{Z} \, \exists m \in \mathbb{Z} : m > n.\]It is crucial to remember that the meaning of a statement may change if the existential and universal quantifiers are exchanged. For example, $\exists m \in \mathbb{Z} \, \forall n \in \mathbb{Z} : m > n$ means “There is an integer strictly greater than all integers.” This is not only contrary to the spirit of the original statement, but is patently wrong as it asserts in particular that there is an integer that is strictly greater than itself.
Exchanging the order of two quantifiers of the same type (either universal or existential) does not change the truth value of a statement. We do not prove this here.
Given a proposition $P$, the negation of $P$ is the proposition “$P$ is false”. It is true if $P$ is false, and false if $P$ is true. The negation of $P$ is denoted by $\neg P$, read as “not $P$.” If we know the meaning of $P$, such as when $P$ stands for “It will rain tomorrow,” the proposition $\neg P$ can be stated more naturally than “not $P$,” as in “It will not rain tomorrow.” The truth-value of $\neg P$ can be represented by the following truth table:
$P$ | $\neg P$ |
---|---|
true | false |
false | true |
A truth table simply lists the truth values of particular statements in all possible cases. Something interesting can be observed if we consider the truth values of $\neg \neg Q$, which can be obtained by using the above table once with $P = Q$ and once with $P = \neg Q$:
$Q$ | $\neg Q$ | $\neg \neg Q$ |
---|---|---|
true | false | true |
false | true | false |
We see that the statements $Q$ and $\neg \neg Q$ have the same truth values. In this case, we say that the two statements are equivalent, and write $Q \Leftrightarrow \neg \neg Q$. If $A \Leftrightarrow B$, we can freely use $B$ in the place of $A$, or $A$ instead of $B$ in our logical derivations.
Negation gets really interesting when the negated proposition is quantified. Then we can assert that
\[\neg \forall x \in A : P(x) \Leftrightarrow \exists x \in A : \neg P(x)\] \[\neg \exists x \in A : P(x) \Leftrightarrow \forall x \in A : \neg P(x)\]These can be interpreted as the claim that if $P(x)$ is not true for all $x \in A$ then it is false for some $x \in A$ and vice versa, and the claim that if $P(x)$ is not false for any $x \in A$ then it is true for all $x \in A$ and vice versa. What this means, in particular, is that if we want to disprove a statement that asserts something for all $x \in A$, it is sufficient to demonstrate one such $x$ for which the statement does not hold. On the other hand, if we need to disprove a statement that asserts the existence of an $x \in A$ with a certain property, we actually need to show that for all such $x$ this property does not hold.
Looked at another way, the above equivalences imply that if we negate a quantified statement, the negation can be “pushed” all the way inside, so that no negated quantifiers are left. Indeed, leaving any negated quantifiers is often considered a mark of poor style. Here is how this elimination is done in a particular example:
\[\neg \forall n \in \mathbb{Z} \, \exists m \in \mathbb{Z} : m > n \Leftrightarrow \exists n \in \mathbb{Z} \, \neg \exists m \in \mathbb{Z} : m > n \Leftrightarrow \exists n \in \mathbb{Z} \, \forall m \in \mathbb{Z} : m \leq n\]This can be read as “There exists an integer that is greater or equal to any other integer,” which is the proper negation of the original statement.
The symbol $\neg$ is an example of a connective. Other connectives combine two propositions (or predicates) into one. The most common are $\land$, $\lor$, $\oplus$, $\rightarrow$ and $\leftrightarrow$. $P \land Q$ is read as “$P$ and $Q$”; $P \lor Q$ as “$P$ or $Q$”; $P \oplus Q$ as “$P$ xor $Q$”; $P \rightarrow Q$ as “$P$ implies $Q$” or “if $P$ then $Q$”; and $P \leftrightarrow Q$ as “$P$ if and only if $Q$”. The truth-value of these compound propositions (sometimes called sentences) depends on the truth values of $P$ and $Q$ (which are said to be the terms of these sentences), in a way that is made precise in the truth-table below.
$P$ | $Q$ | $P \land Q$ | $P \lor Q$ | $P \oplus Q$ | $P \rightarrow Q$ | $P \leftrightarrow Q$ |
---|---|---|---|---|---|---|
T | T | T | T | F | T | T |
T | F | F | T | T | F | F |
F | T | F | T | T | T | F |
F | F | F | F | F | T | T |
We will not concern ourselves much with the $\oplus$ and $\leftrightarrow$ connectives, as they are encountered somewhat less frequently.
One interesting thing about the above table is that the proposition $P \rightarrow Q$ is false only when $P$ is true and $Q$ is false. This is what we would expect: If $P$ is true but $Q$ is false then, clearly, $P$ does not imply $Q$. The important thing to remember is that if $P$ is false, then $P \rightarrow Q$ is true. One way this can be justified is by remembering that we expect a proposition to be either false or true. Now, $P \rightarrow Q$ being false says that $P$ does not imply $Q$, which means precisely that $P$ is true but $Q$ is still false. In all other cases we expect $P \rightarrow Q$ to be true.
The symbol $\neg$ is an example of a connective. Other connectives combine two propositions (or predicates) into one. The most common are $\land$, $\lor$, $\oplus$, $\rightarrow$ and $\leftrightarrow$. $P \land Q$ is read as “$P$ and $Q$”; $P \lor Q$ as “$P$ or $Q$”; $P \oplus Q$ as “$P$ xor $Q$”; $P \rightarrow Q$ as “$P$ implies $Q$” or “if $P$ then $Q$”; and $P \leftrightarrow Q$ as “$P$ if and only if $Q$.” The truth-value of these compound propositions (sometimes called sentences) depends on the truth values of $P$ and $Q$ (which are said to be the terms of these sentences), in a way that is made precise in the truth-table below.
$P$ | $Q$ | $P \land Q$ | $P \lor Q$ | $P \oplus Q$ | $P \rightarrow Q$ | $P \leftrightarrow Q$ |
---|---|---|---|---|---|---|
T | T | T | T | F | T | T |
T | F | F | T | T | F | F |
F | T | F | T | T | T | F |
F | F | F | F | F | T | T |
We will not concern ourselves much with the $\oplus$ and $\leftrightarrow$ connectives, as they are encountered somewhat less frequently.
One interesting thing about the above table is that the proposition $P \rightarrow Q$ is false only when $P$ is true and $Q$ is false. This is what we would expect: If $P$ is true but $Q$ is false then, clearly, $P$ does not imply $Q$. The important thing to remember is that if $P$ is false, then $P \rightarrow Q$ is true. One way this can be justified is by remembering that we expect a proposition to be either false or true. Now, $P \rightarrow Q$ being false says that $P$ does not imply $Q$, which means precisely that $P$ is true but $Q$ is still false. In all other cases we expect $P \rightarrow Q$ to be true. (Did I succeed in turning something obvious into a confusing mess? Well, we all know what is paved with good intentions…)
Now, there is another statement involving $P$ and $Q$ that is false precisely when $P$ is true and $Q$ is false. It is, of course, $\neg P \lor Q$. As the following truth table demonstrates, the proposition $\neg P \lor Q$ is equivalent to $P \rightarrow Q$:
$P$ | $Q$ | $P \rightarrow Q$ | $\neg P \lor Q$ |
---|---|---|---|
T | T | T | T |
T | F | F | F |
F | T | T | T |
F | F | T | T |
This means something rather interesting: We can replace a proposition that involves implication by an equivalent one that instead has negation ($\neg$) and disjunction ($\lor$). Also, since $P \rightarrow Q$ is false only when $P$ is true and $Q$ is false, the proposition $\neg (P \rightarrow Q)$ is equivalent to $P \land \neg Q$:
\[\neg (P \rightarrow Q) \Leftrightarrow P \land \neg Q.\]This means that in a negated implication, the negation can be “pushed inside,” somewhat like with quantifiers. In fact, similar equivalences exist for other negated compound statements, as can be verified using truth tables (do it!):
\[\neg (P \lor Q) \Leftrightarrow \neg P \land \neg Q\] \[\neg (P \land Q) \Leftrightarrow \neg P \lor \neg Q\]These are the famous DeMorgan’s laws. What they mean is that we can eliminate negated compounds (sounds like a military operation, doesn’t it?) just as we can eliminate negated quantifiers.
Here is another important logical equivalence: The implication $P \rightarrow Q$ is equivalent to the contrapositive implication $\neg Q \rightarrow \neg P$:
\[(P \rightarrow Q) \Leftrightarrow (\neg Q \rightarrow \neg P).\]This is demonstrated by the following truth table:
$P$ | $Q$ | $P \rightarrow Q$ | $\neg Q$ | $\neg P$ | $\neg Q \rightarrow \neg P$ |
---|---|---|---|---|---|
T | T | T | F | F | T |
T | F | F | T | F | F |
F | T | T | F | T | T |
F | F | T | T | T | T |
Indeed, an implication of the form “If $P$ then $Q$” is sometimes proved by assuming that $Q$ does not hold and showing that under this assumption $P$ does not hold. This is called a proof by contrapositive. (Despite the similarity, it is different from a proof by contradiction.)
A sentence that is true regardless of the values of its terms is called a tautology, while a statement that is always false is a contradiction. Another terminology says that tautologies are valid statements and contradictions are unsatisfiable statements. All other statements are said to be satisfiable, meaning they can be either true or false.
Easy examples of a tautology and a contradiction are provided by $P \lor \neg P$ and $P \land \neg P$, as demonstrated by the following truth table:
$P$ | $\neg P$ | $P \lor \neg P$ | $P \land \neg P$ |
---|---|---|---|
T | F | T | F |
F | T | T | F |
Note that by our definition of logical equivalence, all tautologies are equivalent. It is sometimes useful to keep a “special” proposition $T$ that is always true, and a proposition $F$ that is always false. Thus any tautology is equivalent to $T$ and any contradiction is equivalent to $F$.
Here is another tautology: $(P \land Q) \rightarrow P$:
$P$ | $Q$ | $P \land Q$ | $(P \land Q) \rightarrow P$ |
---|---|---|---|
T | T | T | T |
T | F | F | T |
F | T | F | T |
F | F | F | T |
The statement $(P \land Q) \rightarrow P$ is read “$P$ and $Q$ implies $P$.” The fact that this is a tautology means that the implication is always true. Namely, if we know the truth of $P \land Q$, we can legitimately conclude the truth of $P$. In such cases, the symbol $\Rightarrow$ is used, and we can write $(P \land Q) \Rightarrow P$. There is a crucial difference between $(P \land Q) \rightarrow P$ and $(P \land Q) \Rightarrow P$. The former is a single statement, while the latter indicates a relationship between two statements. Such a relationship is called an inference rule. A similar inference rule, $P \Rightarrow P \lor Q$ can be established analogously.