Lecture notes for Phil 513: Mathematical Logic I

Lecture notes for Mathematical Logic I Phil 513 — Kevin C. Klement Spring 2017

CONTENTS

Introduction A. The Topic . . . . . . . . . . . . . . . B. Metalanguage and Object Language C. Set Theory . . . . . . . . . . . . . . D. Mathematical Induction . . . . . . .

. . . .

. . . .

. . . .

. . . .

1 Metatheory for Propositional Logic A. The Syntax of Propositional Logic . . . . . B. The Semantics of Propositional Logic . . . C. Reducing the Number of Connectives . . . D. Axiomatic Systems and Natural Deduction E. Axiomatic System L . . . . . . . . . . . . . F. The Deduction Theorem . . . . . . . . . . . G. Soundness and Consistency . . . . . . . . H. Completeness . . . . . . . . . . . . . . . . I. Independence of the Axioms . . . . . . . . 2 Metatheory for Predicate Logic A. The Syntax of Predicate Logic . . . . . . . B. The Semantics of Predicate Logic . . . . . . C. Countermodels and Semantic Trees . . . . D. An Axiom System . . . . . . . . . . . . . . E. The Deduction Theorem in Predicate Logic F. Doing without Existential Instantiation . . G. Metatheoretic Results for System PF . . . . H. Identity Logic . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

3 Peano Arithmetic and Recursive Functions A. The System S . . . . . . . . . . . . . . . . . . . B. The Quasi-Fregean System F . . . . . . . . . . C. Numerals . . . . . . . . . . . . . . . . . . . . . D. Ordering, Complete Induction and Divisibility E. Expressibility and Representability . . . . . . i

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . .

. . . .

1 1 2 3 6

. . . . . . . . .

8 8 9 12 16 17 19 22 23 25

. . . . . . . .

28 28 31 35 39 40 41 43 50

. . . . .

55 55 58 62 63 67

F. G. H.

Primitive Recursive and Recursive Functions . . . . . . . . . . . . . . . . . . . . . . . . Number Sequence Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Representing Recursive Functions in System S . . . . . . . . . . . . . . . . . . . . . . .

4 Gödel’s Results and their Corollaries A. The System , . . . . . . . . . . . . . . . . . . . B. System S as its Own Metalanguage . . . . . . . C. Arithmetization of Syntax . . . . . . . . . . . . D. Robinson Arithmetic . . . . . . . . . . . . . . . E. Diagonalization . . . . . . . . . . . . . . . . . . F. ř-Consistency, True Theories and Completeness G. Gödel’s First Incompleteness Theorem . . . . . . H. Church’s Thesis . . . . . . . . . . . . . . . . . . I. Löb’s Theorem / Gödel’s Second Theorem . . . . J. Recursive Undecidability . . . . . . . . . . . . . K. Church’s Theorem . . . . . . . . . . . . . . . . .

ii

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

70 77 79 84 84 85 87 93 94 96 98 101 102 104 107

INTRODUCTION

A.

The Topic

logical system a good one. Does it need to conform to natural language? Does it need to conform to the metaphysical structure of the world? Does it need to conform to the ordinary reasoning habits of philosophers and mathematicians when they’re not self-consciously thinking about logic? These are difficult questions. Minimally, we can pretty much all agree on the following. For a logical system to be a good one, it has to have the features it was designed to have. For example, the derivation rules for propositional logic you learned in your first logic course were designed so that any argument for which it is possible to construct a derivation of the conclusion from the premises is a valid one according to truth tables. If a system of derivation were set up with this aim but included, along with modus ponens and modus tollens, additionally the inference rule of affirming the consequent, i.e.,

This is a course in logical metatheory, i.e., the study of logical systems. It is probably very different (and considerably harder) than any logic courses you may have taken before. Those courses (such as our Phil 110 and 310) may have involved learning logical systems and the symbols they employ: what they are, how they are used, and how they relate to English. You learned how to construct formal deductions or proofs within certain logical systems. What you were proving was not anything about a logical system, it was instead either not about anything at all (because the problem never told you what the symbols meant in that context), or about some made-up people or things. (E.g., perhaps you had to prove something about some wacky folks named ‘Jay’ and ‘Kay’.) If you have taken Intermediate Logic you have mastered the “boring” part of logic: classical propositional logic and first-order predicate logic. You may have been exposed to some relatively more advanced and difficult topics: free logic, basic set theory, and if you’re lucky, modal logic. However, the more advanced logical systems become, the more controversial they get. For example, I think “free logic” is a philosophical disaster and should be taught only as something to avoid. Obviously, my colleagues don’t always agree with me. Philosophers (and others) widely disagree about what the right form of modal logic is. So if you plan on continuing your logical education, it’s probably about high time you started thinking about what makes a

From A ⇒ B and B infer A clearly, the system would inadequate, because there would be invalid arguments for which one could construct a derivation. Logical metatheory is the branch of logic that studies logical systems themselves. In this course, rather than using a logical system to prove things about Jay and Kay, we’ll be proving things about logical systems. However, it’s best not to start with the controversial ones. People disagree about how to do relevance logic, or deontic logic, or paraconsistent logic, and even whether or not these branches of logic are worth doing at all. These are not the 1

ideal places to begin to learn how to prove things about logical systems; it’s best to start at the beginning. We’ll be starting with propositional logic. In our first unit, we’ll be proving about our logical system for propositional logic that every deduction possible within it is valid according to truth tables, and conversely, that every argument valid according to truth tables has a corresponding deduction, or in other words that it is sound and complete. We’ll then move on to proving things about first-order predicate logic. Lastly, we’ll move on to the logic of mathematics, the basic reasoning patterns involved in mathematics and the basic principles of arithmetic. We’ll show how, as the logical system under study gets more complex, so does the apparatus one needs in order to prove things about it. We’ll also discover some interesting results about logical systems of a certain sort, specifically that they don’t always quite live up to their original intent. For example, we will be studying the attempt made in the late 1890s and early 1900s to fully capture all truths of elementary number theory within a single deductive system, and show that the attempt failed, and even that what they had hoped is impossible! This is one of the results of Gödel’s incompleteness theorems. But let’s start at the beginning: metatheory for simple propositional logic.

B.

is a slightly more technical variant of English than ordinary English. This is because in addition to the symbols of our object language, we’ll be adding some technical terms and even some symbols to ordinary English to make our lives easier.

The use/mention distinction English already has some handy devices that make it a good metalanguage. Specifically it has things like quotation marks that we can use for mentioning an expression as opposed to using it. Kevin is not a name, but “Kevin” is. Many words are verbs, but “verbs” itself is only one word and it is not a verb. This sentence mentions the word “however”. This sentence, however, both uses and mentions the word “however”. You get the idea.

The logic of the metalanguage We’ll be using the metalanguage to prove things about the object language, and proving anything requires logical vocabulary. Luckily, English has handy words like “all”, “or”, “and”, “not”, “if”, and it allows us to add new words if we want like “iff” for “if and only if”. Of course, our object languages also have logical vocabularies, and have signs like “⇒”, “¬”, “∨”, “∀”. But we’d better restrict those signs to the object language unless we want to get ourselves confused. But we do want our metalanguage to be very clear and precise. For that reason, when we use the word “or”, unless noted otherwise, we mean by this the inclusive meaning or “or”. Similarly, if we use the phrase “if . . . then . . . ” in this class we always mean the material conditional unless stated otherwise. (This makes our metalanguage slightly more precise than ordinary English.) The same sorts of logical inferences that apply in the object language also apply in the metalanguage. So

Metalanguage and Object Language

Modern logical systems tend to make use of their own symbolic languages; hence one of the things that get studied in logical metatheory are the languages of logical systems. Definition: The object language is the language being studied, or the language under discussion.

If blah blah blah then yadda yadda. Blah blah blah. Therefore, yadda yadda.

Definition: The metalanguage is the language used when studying the object language.

In this course, the object languages will be the sym- . . . is a valid inference form. You have to use logic to bolic languages of propositional and predicate logic. study logic. There’s no getting away from it. HowThe metalanguage is English. To be more precise, it ever, I’m not going to bother stating all the logical 2

rules that are valid in the metalanguage, since I’d didn’t look (or shouldn’ t have looked) like : need to do that in the metametalanguage, and that P ⇒Q would just get me started on an infinite regress. The P rule of thumb is: if it’s OK in the object language, Q it’s OK in the metalanguage too. Instead, it looked something like: A ⇒B A B

Metalinguistic variables Ordinary English doesn’t really use variables, but they make our lives a lot easier. Since the metalanguage is usually used in this course to discuss the object language, the variables we use most often in the metalanguage are variables that are used to talk about all or some expressions of the metalanguage. Especially when we get to predicate logic, where the object language itself contains variables, again, we don’t want to get the variables of the object language confused with those of the metalanguage. Since predicate logic uses letters like ‘x’ and ‘y’ as variables in the object language, it is important to be clear when a variable is part of the language. This can be done by making the metalanguage’s variable distinctive. For example, I use fancy script letters like ‘A ’ and ‘B’ in the metalanguage to mean any object-language expression of a certain specified type. For example, I might write something like:

Why? Well, if you used the object-language version, modus ponens would only apply when the antecedent is ‘P ’ and consequent is ‘Q’, and so the following wouldn’t have counted as an instance of the rule: (S ∨ T ) ⇒ R S∨T R The only way to get the rule to cover an infinite number of possible cases is to state it schematically, i.e., using variables of the metalanguage to describe any object language expressions of certain forms. Hence, variables in the metalanguage used in this way are called schematic letters. In your homework and exams, you may prefer to use Greek letters instead of script letters, which may be easier to draw in a more distinctive way. You may do whatever you wish provided I can tell the difference between object language and metalanguage variables. Schematic letters will be used every single day in this class. Better make friends with them quick.

If A is a sentence of predicate logic, then A contains no variables not bound by a quantifier. Notice that, in that statement, the variable ‘A ’ is used, not mentioned. The letter ‘A ’ is not itself used in predicate logic, and contains no variables bound or free. It’s something I use in the metalanguage in place of mentioning a particular object language expression. So A might be “F a” or it might be “(∀x)(F x ⇒ Gx)”, etc. A typical use of these is to represent any object language expression or set of expressions matching certain patterns. This happens for example in stating the inference rules of the object language. Just look at the lists of rules you used when learning logic. Whichever book you used, modus ponens

C.

Set Theory

Generally, in order to do logical metatheory for a given logical system, the logical apparatus of the metalanguage has to be at least as complex, and usually more complex than that of the object language. So in order to do metatheory for propositional and predicate logic, we’ll need something stronger, and in particular, we’ll need some set theory. Note that this course is not a course on set theory; we’re not going to be studying logical systems for set theory. Instead, we’re going to presuppose or use some set 3

theoretical notation in our metalanguage, i.e., English. Therefore, you should think of all the signs and variables in this section as an expansion of English. This semester at least, set theory will be something we use when we study propositional and predicate logic; not something we are studying. This means that we can be relatively informal about it. This is good because the exact rules and principles of set theory are still controversial. There are different systems, e.g., “ZF set theory”, “NBG set theory”, “the theory of types”, and so on. Luckily we don’t need to get into those details, because all we’ll need for this course is the rudiments they all share.

Definition: If ∆ and Γ are sets, the union of ∆ and Γ, written “ ∆∪Γ”, is the set that contains everything that is a member of either ∆ or Γ. Definition: The intersection of ∆ and Γ, written “ ∆ ∩ Γ”, is the set that contains everything that is a member of both ∆ and Γ. Definition: The relative complement of ∆ and Γ, written “ ∆ − Γ”, is the set containing all members of ∆ that are not members of Γ. Definition: The empty set or null set, written “ ∅”, “ Λ” or “ { }”, is the set with no members. Definition: If Γ and ∆ are sets, then they are disjoint iff they have no members in common, i.e., iff Γ ∩ ∆ = ∅.

Sets

Definition: A set is a collection of entities for which it is determined, for every entity of a given type, that Ordered n-tuples and relations the entity either is or is not included in the set.

written Definition: An urelement is a thing that is not a Definition: An ordered n-tuple, “hA1 , . . . , An i”, is something somewhat like a set, set. except that the elements are given a fixed order, so Definition: An entity A is a member of set Γ iff it that hA1 , . . . , An i = hB1 , . . . , Bn i iff Ai = Bi for all i such that 1 ≤ i ≤ n. is included in that set. An ordered 2-tuple, e.g., hA, Bi is also called an ordered pair. An entity is identified with its 1-tuple.

We write this as: “A ∈ Γ”. We write “A ∈ / Γ” to mean that A is not a member of Γ. Sets are determined entirely by their members: for sets Γ and ∆, Γ = ∆ iff for all A, A ∈ Γ iff A ∈ ∆.

Definition: If Γ and ∆ are sets, then the Cartesian product of Γ and ∆, written “ Γ × ∆”, is the set of all ordered pairs hA, Bi such that A ∈ Γ and Definition: A singleton or unit set is a set con- B ∈ ∆. taining exactly one member. Generally, “Γn ” is used to represent all ordered n“{A}” means the set containing A alone. Gener- tupes consisting entirely of members of Γ. Notice ally, “{A1 , . . . , An }” means the set containing all that Γ2 = Γ × Γ. of A1 , . . . , An , but nothing else. The following definition is philosophically probThe members of sets are not ordered, so from lematic, but a common way of speaking in mathe{A, B} = {C, D} one cannot infer that A = C, matics. only that either A = C or A = D. Definition: An n-place relation (in extension) Definition: If ∆ and Γ are sets, ∆ is said to be a on set Γ is any subset of Γn . subset of Γ, written “ ∆ ⊆ Γ”, iff all members of ∆ are members of Γ; and ∆ is said to be a proper A 2-place relation is also called a binary relation. subset of Γ, written “ ∆ ⊂ Γ”, iff all members of Binary relations are taken to be of sets of ordered ∆ are members of Γ, but not all members of Γ are pairs. A 1-place relation is also called (the extension members of ∆. of) a property. 4

Definition: If R is a binary relation, then the do- Cardinal numbers main of R is the set of all A for which there is an B Definition: If Γ and ∆ are sets, then they are such that hA, Bi ∈ R. equinumerous, written “ Γ ∼ = ∆”, iff there is a oneDefinition: If R is a binary relation, the range of one function whose domain is Γ and whose range is R is the set of all B for which there is an A such that ∆. hA, Bi ∈ R. Definition: The field of R is the union of the do- Definition: Sets Γ and ∆ have the same cardinality or cardinal number if and only if they are main and range of R. equinumerous. Definition: If R is a binary relation, R is reflexive iff hA, Ai ∈ R for all A in the field of R. Definition: If Γ and ∆ are sets, then the cardinal Definition: If R is a binary relation, R is symmet- number of Γ is said to be smaller than the cardinal ric iff for all A and B in the field of R, hA, Bi ∈ R number of ∆ iff there is a set Z such that Z ⊆ ∆ and Γ ∼ = Z but there is no set W such that W ⊆ Γ only if hB, Ai ∈ R. and W ∼ = ∆. Definition: If R is a binary relation, R is transitive iff for all A, B and C in the field of R, if Definition: If Γ is a set, then Γ is denumerable iff hA, Bi ∈ R and hB, Ci ∈ R then hA, Ci ∈ R. Γ is equinumerous with the set of natural numbers Definition: A binary relation R is an equivalence {0, 1, 2, 3, 4, . . . , (and so on ad inf.)}. relation iff R is symmetric, transitive and reflexive. Definition: Aleph null, also known as aleph Definition: If R is an equivalence relation then, the naught, written “ ℵ0 ”, is the cardinal number of R-equivalence class on A, written “ [A]R ”, is the any denumerable set. set of all B such that hA, Bi ∈ R. Definition: If Γ is a set, then Γ is finite iff either Γ = ∅ or there is some positive integer n such that Definition: A function (in extension) is a binary Γ is equinumerous with the set {1, . . . , n}. relation which, for all A, B and C, if it includes hA, Bi then it does not also contain hA, Ci unless Definition: A set is infinite iff it is not finite. B = C. Definition: A set is countable iff it is either finite So if F is a function and A is in its domain, then or denumerable. there is a unique B such that hA, Bi ∈ F ; this unique B is denoted by “F (A)”. Homework Definition: An n-place function is a function Assuming that Γ, ∆ and Z are sets, R is a relawhose domain consists of n-tuples. For such a tion, F is a function, and A and B are any entities, function, we write “F (A1 , . . . , An )” to abbreviate informally verify the following: “F (hA1 , . . . , An i)”. (1) A ∈ {B} iff A = B (2) if Γ ⊆ ∆ and ∆ ⊆ Z then Γ ⊆ Z Definition: An n-place operation on Γ is a func(3) if Γ ⊆ ∆ and ∆ ⊆ Γ then Γ = ∆ tion whose domain is Γn and whose range is a subset (4) (Γ ∪ ∆) ∪ Z = Γ ∪ (∆ ∪ Z) of Γ. (5) (Γ ∩ ∆) ∩ Z = Γ ∩ (∆ ∩ Z) Definition: If F is a function, then F is one-one iff (6) Γ ∩ ∅ = ∅ and Γ ∪ ∅ = Γ for all A and B in the domain of F , F (A) = F (B) (7) Γ − Γ = ∅ only if A = B. (8) (Γ ∩ ∆) ∪ (Γ − ∆) = Γ

Functions

5

(9) Γ1 = Γ (10) If R is an equivalence relation, then ([A]R = [B]R iff hA, Bi ∈ R) and (if [A]R 6= [B]R then [A]R and [B]R are disjoint). (11) Addition can be thought of as a 2-place operation on the set of natural numbers. (12) Γ ∼ =Γ (13) The set of even non-negative integers is denumerable. (14) The set of all integers, positive and negative, is denumerable.

D.

In this class, we rarely use these principles in the metalanguage. Instead, we use some corollaries that come in handy in the study of logical systems. Mendelson does not give these principles special names, but I will. Definition: The principle of wff induction states that: For a given logical language, if φ holds of the simplest well-formed formulas (wffs) of that language, and φ holds of any complex wff provided that φ holds of those simpler wffs out of which it is constructed, then φ holds of all wffs.

Mathematical Induction

This principle is often used in logical metatheory. It is a corollary of mathematical induction. Actually, it is a version of it. Let φ0 be the property a number has if and only if all wffs of the logical language having that number of logical operators have φ. If φ is true of the simplest well-formed formulas, i.e., those that contain zero operators, then 0 has φ0 . Similarly, if φ holds of any wffs that are constructed out of simpler wffs provided that those simpler wffs have φ, then whenever a given natural number n has φ0 then n + 1 also has φ0 . Hence, by mathematical induction, all natural numbers have φ0 , i.e., no matter how many operators a wff contains, it has φ. In this way wff induction simply reduces to mathematical induction. Similarly, this principle is usually utilized by proving the antecedents, i.e.:

We’ll also be expanding the logic of the metalanguage by allowing ourselves the use of mathematical induction, a powerful tool of mathematics. Definition: The principle of mathematical induction states the following: If (φ is true of 0), then if (for all natural numbers n, if φ is true of n, then φ is true of n + 1), then φ is true of all natural numbers. To use the principle mathematical induction to arrive at the conclusion that something is true of all natural numbers, one needs to prove the two antecedents, i.e.: Base step. φ is true of 0

Induction step. for all natural numbers n, if φ is Base step. φ is true of the simplest well-formed true of n, then φ is true of n + 1 formulas (wffs) of that language; and Typically, the induction step is proven by means of a conditional proof in which it is assumed that φ is true of n, and from this assumption it is shown that φ must be true of n + 1. In the context of this conditional proof, the assumption that φ is true of n is called the inductive hypothesis. From the principle of mathematical induction, one can derive a related principle:

Again, the assumption made when establishing the induction step that φ holds of the simpler wffs is called the inductive hypothesis. We’ll also be using:

Definition: The principle of complete (or strong) induction states that: If (for all natural numbers n, whenever φ is true of all numbers less than n, φ is also true of n) then φ is true of all natural numbers.

Definition: The principle of proof induction: In a logical system that contains derivations or proofs, if φ is true of a given step of the proof whenever φ is true of all previous steps of the proof, then φ is true of all steps of the proof.

Induction step. φ holds of any wffs that are constructed out of simpler wffs provided that those simpler wffs have φ.

6

The principle of proof induction is an obvious corollary of the principle of complete induction. The steps in a proof can be numbered; we’re just applying complete induction to those numbers. Homework Answer all of these we don’t get to in class: (1) Let φ be the property a number x has just in case the sum of all numbers leading up to and including x is x(x+1) . Use the principle of math2 ematical induction to show that φ is true of all natural numbers. (2) Let φ be the property a number x has just in case it is either 0 or 1 or it is evenly divisible by a prime number greater than 1. Use the principle of complete induction to show that φ is true of all natural numbers. (3) Let φ be the property a wff A of propositional logic has if and only if has a even number of parentheses. Use the principle of wff induction to show that φ holds of all wffs of propositional logic. (If needed, consult the next page for a definition of a wff in propositional logic.) (4) Consider a logical system for propositional logic that has only one inference rule: modus ponens. Use the principle of proof induction to show that every line of a proof in this system is true if the premises are true.

7

UNIT 1 METATHEORY FOR PROPOSITIONAL LOGIC

A.

The Syntax of Propositional Logic

(i) any statement letter is a wff; (ii) if A is a wff then so is ¬A ;1 (iii) if A and B are wffs then so is (A ⇒ B); We finally turn to our discussion of the logical (iv) if A and B are wffs then so is (A ⇔ B); (v) if A and B are wffs, then so is (A ∨ B); metatheory for propositional logic (also known as sentential logic). In particular, we shall limit our (vi) if A and B are wffs, then so is (A ∧ B); study to classical (bivalent) truth-functional propo- (vii) nothing that cannot be constructed by repeated applications of the above steps is a wff. sitional logic. We first sketch the make-up of the object language under study. The syntax of a language, or the rules governing how its expressions * The above definition is provisional; we shall later can and cannot be combined. The basic building blocks are statement letters, amend it. This tell us everything we need to know about the “syntax” or “grammar” of propositional connectives and parentheses. logic. Definition: A statement letter is any uppercase You may be familiar with a slightly different letter of the Roman alphabet written with or without notation. I am sticking with the book. a numerical subscript. Examples: ‘A’, ‘B’, ‘P ’, ‘Q1 ’, ‘P13 ’, and ‘N123 ’ are all statement letters. The numerical subscripts are used in case we would ever need to deal with more than 26 simple statements at once. Hence ‘P1 ’ and ‘P2 ’ are counted as different statement letters.

Mendelson’s sign Alternatives Negation ¬ ∼, − Conjunction ∧ &, • Disjunction ∨ + Material conditional ⇒ →, ⊃ Material biconditional ⇔ ↔, ≡

Definition: A propositional connective is any of the signs ‘ ¬’, ‘ ⇒’, ‘ ⇔’, ‘ ∧’ and ‘ ∨’.

Definition: A well-formed formula* (abbrevi- Feel free to use whatever signs you prefer. I might ated wff) is defined recursively as follows: not even notice. 1

Here we are not really using the phrase “¬A ”, since this definition is in the metalanguage and ‘¬’ is not part of English. Nor, however, are we mentioning it, since ‘A ’ is not a part of the object language. Really we should be using special “quasi-quotation” marks, also known as Quine corners, where p¬A q is the object language expression formed by concatenating ‘¬’ to whatever expression A is. Although imprecise, I forgo Quine corners and rely just on context, to avoid a morass of these marks, and to allow for another use of the same notation Mendelson uses in chap. 3.

8

B.

Parentheses Conventions The chart above also gives the ranking of the connectives used when omitting parentheses. Sometimes when a wff gets really complicated, it’s easier to leave off some the parentheses. Because this leads to ambiguities, we need conventions regarding how to read them. When parentheses are omitted, and it is unclear which connective has greater scope, the operator nearer the top on the list above should be taken as having narrow scope, and the operator nearer the bottom of the list should be taken as having wider scope. For example:

The Semantics of Propositional Logic

To give a semantic theory for a language is to specify the rules governing the meanings of the expressions of that language. In truth-functional propositional logic, however, nothing regarding the meaning of the statement letters over and above their truth or falsity is relevant for determining the truth conditions of the complex wffs in which they appear. Moreover, the meanings of the connectives are thought to be exhausted by the rules governing how the truth-value of the wffs they are used to construct depends on the truth values of the statement letters out of which they are constructed. In short, everything relevant to the logical semantics of a wff of propositional logic is given by its truth table. I assume you already know how to construct truth tables. E.g.:

A⇒B∨C is an abbreviation of: (A ⇒ (B ∨ C)) whereas:

(P T T F F

A⇒B⇔C is an abbreviation of: ((A ⇒ B) ⇔ C)

∨ T T T F

Q) ∨ T T F T T T F F

¬ (Q F T F F T T F F

⇒ (P T T T T F F T F

⇔ Q)) T T F F F T T F

When the operators are the same, the convention is Roughly, this shows us every thing we need to association to the left, i.e., the leftmost occurrence know about the meaning of the wff “(P ∨ Q) ∨ is taken to have narrowest scope. So ¬(Q ⇒ (P ⇔ Q))”. To get serious with our study, we need a number A⇒B⇒C of definitions. is an abbreviation of:

Definition: A truth-value assignment is any function whose domain is the statement letters of propositional logic, and whose range is a nonempty ((A ⇒ B) ⇒ C) subset of truth values {TRUTH, FALSITY} ( T and Obviously, for ‘∨’ and ‘∧’, this last convention is F for short). less important, since (A ∨ B) ∨ C is logically equivalent with A ∨ (B ∨ C ), and similarly, Informally, each row of a truth table represents a (A ∧ B) ∧ C is equivalent with A ∧ (B ∧ C ). different truth-value assignment. Each row repreSometimes parentheses cannot be left off, with- sents a different possible assignment of truth values to the statement letters making up the wff or wffs out making the wff mean something else: in question. In virtue of way it is constructed out of truthA ⇒ (B ⇔ C) functional connectives, every wff is determined to be either true or false (and not both) for any cannot be written given truth-value assignment to the statement letters making it up. The truth value of a statement A ⇒ B ⇔ C. 9

for a given truth-value assignment is represented Abbreviation: The notation in the final column of a truth table, underneath its A B main connective. Definition: A wff is a tautology iff it is true for ev- is used to mean that A and B are logically equivaery possible truth-value assignment to its statement lent. letters. Definition: If Γ is a set of wffs and A is a wff, then The wff “(P ∨ Q) ∨ ¬(Q ⇒ (P ⇔ Q))” is A is logical consequence of Γ if and only if there not a tautology, because it is false for the truth- is no truth-value assignment to the statement letters value assignment that makes both ‘P ’ and ‘Q’ false, making up the wffs in Γ and A that makes every despite that all the other truth-value assignments member of Γ true but makes A false.

make it true. However, the wff “(P ∨ Q) ∨ (Q ⇒ (P ⇔ Q))” is a tautology, because it is true for every truth-value assignment, i.e., on every row of a truth table.

To say that A is a logical consequence of Γ is the same as saying that an argument with the members of Γ as its premises and A is conclusion is valid by truth tables.

Abbreviation: The notation:

Abbreviation: The notation:

A

ΓA

means that A is a tautology.

is used to mean that A is a logical consequence of Γ.

Definition: A wff is a self-contradiction iff it is false for every possible truth-value assignment. These four uses of the sign “” are related in intuitive ways. A tautology can in effect be thought of Definition: A wff is contingent iff it is true for something that is true in virtue of logic alone, or some possible truth-value assignments and false for the conclusion of a logically valid argument that beothers. gins without any premises at all! I.e., “ A ” means Definition: A wff A is said to logically imply a the same as “∅ A ”. wff B iff there is no possible truth-value assignment Definition: Two wffs A and B are said to be conto the statement letters making them up that makes sistent or mutually satisfiable if and only if there A true and B false. is at least one truth-value assignment to the statement letters making them up that makes both A and Abbreviation: The notation: B true. A B means that A logically implies B. Note that this sign is part of the metalanguage; it is an abbreviation of the English words “. . . logically implies . . . ”. The sign ‘’ is not used in the object language. So “A (B D)” and “A ⇒ (B C )” are nonsense. Definition: Two wffs A and B are logically equivalent if and only if every possible truth-value assignment to the statement letters making them up give them the same truth value.

It is time to get our first practice proving things in the metalanguage. Again, we’re going to use English to prove something about the logical language of propositional logic. We can be somewhat informal about the logical structure of our proof, since we haven’t fully laid out a deductive system for the metalanguage. But it’s usually best to number the steps of the proof just like an object language deduction and be as clear as possible about how the proof works. Here’s what we’re going to prove:

10

Result: For any wffs A and B, A B iff (A ⇒ B). (Logical implication is equivalent with tautologyhood of material implication.)

(9) However, this contradicts our assumption at line (6). Hence, A B after all. Lines (6)–(9) represent a “conditional proof” that if (A ⇒ B) then A B. Putting the two together we get that A B iff (A ⇒ B). This is what we were aiming to prove. e

What we’re proving is a biconditional; in particular, we’re proving that one statement logically implies another iff the corresponding object language conditional statement is a tautology. We’ll prove this biconditional using the same strategy we’d use if we were going to prove a biconditional in some object language deduction system. In particular, we prove the conditional going one way, and then the other. So the proof goes like this:

(Here on out, I use “e” to demarcate the end of a proof in the metalanguage.) Be careful about not mixing up the object language and the metalanguage. Assuming that “not A ” is not the same as assuming that “ ¬A ”. After all, if A is contingent, neither it nor its negation is a tautology. The sign “¬” should never be used for negation in the metalanguage, nor “⇔” used instead of “iff”, etc. If you wish, you can write, “2 A ” to mean “not- A ”, but never “¬ A ”. That’s not even meaningful!

Proof: (1) Assume that A B. We need to show that (A ⇒ B) is a tautology. (2) Suppose for reductio ad absurdum (indirect proof) that (A ⇒ B) is not a tautology. This means that there is some truth-value assignment that does not make (A ⇒ B) true. It must make (A ⇒ B) false. (3) According to the truth table rules for the sign ‘⇒’, this means it must make A true and B false. (4) However, this contradicts the assumption that A B, since that rules out any truth-value assignment making A true and B false. (5) Our supposition at line (2) must be mistaken, and so (A ⇒ B) after all. Lines (1)–(5) represent a “conditional proof” in the metalanguage that if A B then (A ⇒ B). We need to go the other way as well. (6) Assume that (A ⇒ B). (7) Assume for reductio ad absurdum that it is not true that A B. This means that there is at least one truth-value assignment that makes A true but B false. (8) Since there is at least one truth-value assignment that makes A true and B false, there is at least one truth-value assignment that makes (A ⇒ B) false, given the rules for constructing truth tables for the sign ‘⇒’.

Result: A B iff (A ⇔ B).

Proof: Similar to previous example.

e

Result: For any wffs A and B, if A and (A ⇒ B) then B.

First, to be clear about what we’re doing, we’re not proving that modus ponens is a valid reasoning form. That would be to prove that {(A ⇒ B), A } B The above is true, and easily proven, but it’s not what we’re after. Instead, we’re proving something a bit stronger, namely that modus ponens preserves tautologyhood, i.e., that if both A and A ⇒ B are tautologies, then B is a tautology as well. Proof: What we’re proving is a conditional. We assume the antecedent and attempt to prove the consequent. (1) Assume that A and (A ⇒ B).

11

(2) This that both A and (A ⇒ B) are tautologies, i.e., that every possible truth-value assignment to the statement letters making them up makes them true. (3) Suppose for reductio ad absurdum that there were some truth-value assignment (row of a truth table) making B false. (4) Notice that because every truth-value assignment makes (A ⇒ B) true, if it makes B false it must make A false as well. (5) From lines (3) and (4) we get the result that there is a truth-value assignment making A false. (6) However, it follows from line (2) that no truthvalue assignment makes A false. (7) Lines (5) and (6) are a contradiction, and so our assumption at line (3) is false, and so B. (8) Therefore, by conditional proof, if A and (A ⇒ B) then B. e

easier if it is simpler, because the more complex the language is, the more there is to say about it. When doing logical metatheory, it’s usually to our advantage to whittle down our object language (and the logical calculi we develop in it) to as small as possible. To that end, we ask, do we really need all five connectives (¬, ⇒, ⇔, ∧ and ∨)? After all, our object language is not inadequate in any way by not including a sign for the exclusive sense of disjunction, since we can represent it using other signs, e.g., as (A ∨ B) ∧ ¬(A ∧ B) or ¬(A ⇔ B), etc. And no, we don’t need all five of the ones we have. First we’ll show that we could get by with just three, and later two, and finally one. Result (Adequate Connectives): Every possible truth function can be represented by means of the connectives ‘ ∧’, ‘ ∨’ and ‘ ¬’ alone.

In your book, there are also proofs of the following:

Result: If A , and B is the result of (uniformly) replacing certain statement letters in A by complex wffs, then B.

Result: If A is a wff containing wff C in one or more places, and B is just like A except containing wff D in those places where A contains C , then if C D then A B.

C.

Reducing the Number of Connectives

When working within a given language, usually the more complex it is, the easier it is to say what you want, because you have more vocabulary in which to say it. However, when you’re trying to prove something about the language, it’s usually

Proof: We’ll prove this somewhat informally. (1) Assume that A is some wff built using any set of truth-functional connectives, including, if you like, connectives other than our five. (A might make use of some three or four-place truth-functional connectives, or connectives such as the exclusive or, or any others you might imagine for bivalent logic.) (2) What we’re going to show is that there is a wff B formed only with the connectives ‘∧’, ‘∨’ and ‘¬’ that is logically equivalent with A . (3) In order for it to be logically equivalent to A , the wff B that we construct must have the same final truth value for every possible truth-value assignment to the statement letters making up A , or in other words, it must have the same final column in a truth table. (4) Let P1 , P2 , . . . , Pn be the distinct statement letters making up A . For some possible truthvalue assignments to these letters, A may be true, and for others A may be false. The only hard case would be the one in which A is contingent. Clearly tautologies and selfcontradictions can be constructed with the

12

signs ‘∧’, ‘∨’ and ‘¬’, and all tautologies are logically equivalent to one another, and all selfcontradictions are equivalent to one another, in those cases, our job is easy. Let us suppose instead that A is contingent. (5) Let us construct a wff B in the following way. a) Consider in turn each possible truth-value assignment to the letters P1 , P2 , . . . , Pn . For each truth-value assignment, construct a conjunction made up of those letters the truth-value assignment makes true, along with the negations of letters the truth-value assignment makes false. Example: Suppose the letters involved are ‘A’, ‘B’ and ‘C’. This means that there are eight possible truth-value assignments, corresponding to the eight rows of a truth table. We construct an appropriate conjunction for each. A T T T T F F F F

B T T F F T T F F

C T F T F T F T F

Conjunction A∧B∧C A ∧ B ∧ ¬C A ∧ ¬B ∧ C A ∧ ¬B ∧ ¬C ¬A ∧ B ∧ C ¬A ∧ B ∧ ¬C ¬A ∧ ¬B ∧ C ¬A ∧ ¬B ∧ ¬C

This means that we form a disjunction using as the disjuncts those conjunctions formed in step a) for those rows that make A true. The others are left out. In this case: (A ∧ B ∧ C) ∨ (A ∧ B ∧ ¬C) ∨ (¬A ∧ B ∧ C) The three conjunctions in the disjunction conform to the three truth-value assignments that make A true. (6) The wff B constructed in step (5) is logically equivalent to A . Consider that for those truthvalue assignments making A true, one of the conjunctions making up the disjunction B is true, and hence the whole disjunction is true as well. For those truth-value assignments making A false, none of the conjunctions making up B is true, because each conjunction will contain at least one conjunct that is false on that truth-value assignment. Example: Let us construct a truth table for the formula we constructed during our last step: (A∧B∧C)∨(A∧B∧¬C) ∨ (¬A∧B∧C) T TTT T T T TT F F T T F TF T F T T T T F F T T T T TT F T F T F T F F T F F F T F T F F FF T F F TF F F T T F F F F F T F F FT F F F TF F F F F F T F T F F F T F F T T T FTTT T F F T F F F F F T FT F F T FTT F F F F F F T F F F F FF T F T FF F F T F F F F F F F F F FT F F T F F F F F

b) From the resulting conjunctions, form a complex disjunction formed from those conjunctions formed in step a) for which the corresponding truth-value assignment makes A true. Example: Suppose for the example above that the final column of the truth table for A is as follows (just at random): A T T T T F F F F

B T T F F T T F F

C T F T F T F T F

A T T F F T F F F

By examining the final column for this truth table, we see that it has the same final column as that given for A . (7) This establishes our result. The example was arbitrary; the same process would work regardless of the number of statement letters or final column for the statement involved. e

13

Reducing Further

Reducing Still Further

The above result means that any set of connectives in which we can always find equivalent forms for (A ∨ B), (A ∧ B) and ¬A is an adequate set of connectives. This means we can reduce still further. We don’t need all three. We can get by with two in any of three ways.

Actually, if we started from a different basis, we could get by with just one connective. The most common way to do this is with the Sheffer stroke, written ‘|’. It has the following truth table: A T T F F

Corollary: All truth-functions can be defined using only ¬ and ∨.

B T F T F

(A | B) F T T T

“A | B” could be read “not both A and B”, and indeed is equivalent to ¬(A ∧ B). However, as our aim is to reduce all operators to ‘|’, it is best not Proof: to think of the meanings of ‘¬’ or ‘∧’ as playing a The form ¬(¬A ∨ ¬B) is equivalent to (A ∧ B) role. and could be used instead of the latter in the proof above. e Corollary: All truth-functions can be defined using only the Sheffer stroke. Corollary: All truth-functions can be defined using only ¬ and ∧. Proof: Note that: Proof: The form ¬(¬A ∧ ¬B) is equivalent with (A ∨ B) and could be used instead of the latter in the proof above. e

(A | A ) ¬A ((A | A ) | (B | B)) (A ∨ B) ((A | B) | (A | B)) (A ∧ B) and just for kicks, we can add: (A | (B | B)) (A ⇒ B) (((A | A ) | (B | B)) | (A | B)) (A ⇔ B)

Corollary: All truth-functions can be defined using only ¬ and ⇒.

Hence, forms using the Sheffer stroke can be substituted in the proof above. e Another way is with the Sheffer/Peirce dagger, written ‘↓’ (“neither . . . nor . . . ”), which has the truth table:

Proof: Note that (¬A ⇒ B) (A ∨ B) and ¬(A ⇒ ¬B) (A ∧ B) and so the former forms can be used in place of the latter forms in the proof above. e

14

A T T F F

B T F T F

(A ↓ B) F F F T

Corollary: All truth-functions can be defined using only the Sheffer/Peirce dagger.

Proof: It suffices to note that: (A ↓ A ) ¬A ((A ↓ A ) ↓ (B ↓ B)) (A ∧ B) ((A ↓ B) ↓ (A ↓ B)) (A ∨ B)

e

(5) If we fill both in with T’s get the Sheffer stroke. If we fill both in F’s, we get the Sheffer/Peirce dagger. That rules out two of the four remaining possibilities. (6) If we fill them in T and F respectively, the result is equivalent with ¬B, and if we fill them in with F and T, the result is equivalent with ¬A . (7) Negation is clearly insufficient for defining all other truth functions (by itself, it can define only two truth functions). So the remaining options are inadequate. There are no possibilities left. Our # is impossible. e

There are, however, triadic connectives and 4+ place But that’s it. ‘|’ and ‘↓’ are the only binary connec- connectives that work. tives from which all truth functions can be derived. In fact, we can prove this.

Austere Syntax

Result: No binary operator besides ‘ |’ and ‘ ↓’ is by itself sufficient to capture all truth functions.

We noted earlier that having a reduced vocabulary in the object language makes proving things about it in the metalanguage easier, because there is less to say. So we might decide to revise our definition of a well formed formula, and make it just this simple: (i) Any statement letter is a wff; (ii) if A and B are wffs then so is (A | B); (iii) nothing that cannot be constructed by repeated applications of the above is a wff. However, there are trade offs. The Sheffer stroke is less psychologically natural, and the rules of inference governing the Sheffer stroke are far less intuitive than anything as simple as modus ponens and modus tollens. In this course, we take an intermediate route, and take ‘⇒’ and ‘¬’ as our only primitive connectives. Therefore, we now officially revise our definition of a wff as follows:

Proof: (1) Suppose there were some other binary connective # that was adequate by itself. (2) We know immediately that (A # B) must be false when A and B are both true. If not, then it would impossible to form something equivalent to a contradiction, since the “top row” of the truth table (the truth-value assignment making all statement letters true) would always make a wff true. (3) For similar reasons, (A # B) must be true when A and B are both false, or else it would be impossible to form something equivalent to a tautology. (4) Lines (2) and (3) give us this much of the table Definition: A(n official) well-formed formula (wff) is defined recursively as follows: for #: (i) Any statement letter is a wff; (ii) if A is a wff then so is ¬A ; A B A #B (iii) if A and B are wffs then so is (A ⇒ B); T T F (iv) nothing that cannot be constructed by repeated T F ? applications of the above is a wff. F T ? F F T We can continue to use the signs ‘⇔’, ‘∧’ and ‘∨’, The question is how to fill in the remaining ?’s. but treat them as mere abbreviations. They are def15

initional shorthands, just like the conventions we ⇔O: From A ⇔ B infer A ⇒ B. From A ⇔ B adopted regarding parentheses: infer B ⇒ A . DN: From ¬¬A infer A . From A infer ¬¬A . Abbreviations: ∨I: From A infer A ∨ B. From A infer B ∨ A . ∧I: From A and B infer A ∧ B. (A ∨ B) abbreviates (¬A ⇒ B) ⇔I: From A ⇒ B and B ⇒ A infer A ⇔ B. (A ∧ B) abbreviates ¬(A ⇒ ¬B) 6I: From A and ¬A infer 6. 6O: From 6 infer A . (A ⇔ B) abbreviates ¬((A ⇒ B) ⇒ ¬⇒O: From ¬(A ⇒ B) infer A ∧ ¬B. ¬(B ⇒ A )) ¬∨O: From ¬(A ∨ B) infer ¬A . From ¬(A ∨ B) infer ¬B. Whenever one of these signs appears, what is really meant is the wff obtained by replacing definiens ¬∧O: From ¬(A ∧ B) infer A ⇒ ¬B. ¬⇔O: From ¬(A ⇔ B) infer ¬A ⇔ B. with the definiendum, e.g., Additional proof techniques CD: Start a subderivation assuming A . If you derive B, you may end the subderivation and infer is just a shorthand abbreviation for A ⇒ B. ID: Start a subderivation assuming A . If you de¬(¬(P ⇒ ¬Q) ⇒ ¬(¬R ⇒ S)) rive 6, you may end the subderivation and infer Similarly, (P ∧ ¬P ) means ¬(P ⇒ ¬¬P ), and ¬A . OR Start a subderivation assuming ¬A . If you derive 6, you may end the subderivation and (P ∨ ¬P ) means (¬P ⇒ ¬P ). infer A . Here there are 21 inference rules and 3 addiD. Axiomatic Systems and tional proof techniques. (P ∧ Q) ∧ (R ∨ S)

Natural Deduction

(2) Copi’s System

Our next topic is proofs or deductions in the object language. You learned a deduction system for propositional logic in your first logic course. Most likely, it was what is called a natural deduction system, and contained 15 or more rules of inference. There are many competing natural deduction systems out there. The following are derived loosely on the systems of Kalish and Montague, Gentzen and Fitch, respectively. Examples:

Inference rules MP: From A ⇒ B and A infer B. MT: From A ⇒ B and ¬B infer ¬A . DS: From A ∨ B and ¬A infer B . HS: From A ⇒ B and B ⇒ C infer A ⇒ C . Simp: From A ∧ B infer A . Conj: From A and B infer A ∧ B. Add: From A infer A ∨ B. CD: From A ∨ B and (A ⇒ D) ∧ (B ⇒ C ) infer D ∨ C Abs: From A ⇒ B infer A ⇒ (A ∧ B)

(1) Hardegree’s System

Replacement rules Inference rules DN: Replace A with ¬¬A or vice versa. ⇒O: From A ⇒ B and A infer B. From A ⇒ B Com: Replace A ∨ B with B ∨ A or vice versa. and ¬B infer ¬A . Replace A ∧ B with B ∧ A or vice versa. ∨O: From A ∨ B and ¬A infer B. From A ∨ B Assoc: Replace A ∨ (B ∨ C ) with (A ∨ B) ∨ C and ¬B infer A . or vice versa. Replace A ∧ (B ∧ C ) with ∧O: From A ∧ B infer A . From A ∧ B infer B. (A ∧ B) ∧ C or vice versa. 16

Dist: Replace A ∧ (B ∨ C ) with (A ∧ B) ∨ (A ∧ C ) or vice versa. Replace A ∨ (B ∧ C ) with (A ∨ B) ∧ (A ∨ C ) or vice versa. Trans: Replace A ⇒ B with ¬B ⇒ ¬A or vice versa. Impl: Replace A ⇒ B with ¬A ∨ B or vice versa. Equiv: Replace A ⇔ B with (A ⇒ B) ∧ (B ⇒ A ) or vice versa. Replace A ⇔ B with (A ∧ B) ∨ (¬A ∧ ¬B) or vice versa. Exp: Replace (A ∧ B) ⇒ C with A ⇒ (B ⇒ C ) or vice versa. Taut: Replace A with A ∧ A or vice versa. Replace A with A ∨ A or vice versa. Additional proof techniques CP: Start a subderivation assuming A . If you derive B, you may end the subderivation and infer A ⇒ B. IP: Start a subderivation assuming A . If you derive B ∧ ¬B, you may end the subderivation and infer ¬A . Here we have 23 rules and two additional proof techniques. A natural deduction system is a system designed to include as its inference rules those steps of reasoning that are most psychologically simple and easy. Usually, this means that some of the rules are redundant. Consider, e.g., modus tollens (MT) in the Copi system. It is redundant given the rules of transposition and modus ponens. Instead of using MT, one could always use them instead. 1. 2. 3. 4.

P ⇒Q ¬Q ¬Q ⇒ ¬P ¬P

with. However, when proving things about a deduction system, it’s much easier when the system is as simple and minimal as possible. Therefore, in what follows we attempt to construct a relatively minimalistic deduction system for propositional logic; a system, moreover, that was custom made for our new revised definition of a well-formed formula. In that system, officially, all wffs are built up only using the signs ‘⇒ ’ and ‘¬’. The other signs can be utilized as abbreviations or shorthand notations, but they are not parts of the official symbolism.

E.

Axiomatic System L

This system uses the restricted definition of a wff in which ⇒ and ¬ are the only primitive connectives. First we need some definitions: Definition: An axiom of L is any wff of one of the following three forms: (A1) A ⇒ (B ⇒ A ) (A2) (A ⇒ (B ⇒ C )) ⇒ ((A ⇒ B) ⇒ (A ⇒ C )) (A3) (¬A ⇒ ¬B) ⇒ ((¬A ⇒ B) ⇒ A ) Note: strictly speaking, there are an infinite number of axioms, because every instance of these forms is an axiom. Instances of (A1) include not only “P ⇒ (Q ⇒ P )” but also complicated instances such as “(¬A ⇒ B) ⇒ (¬(¬D ⇒ ¬¬M ) ⇒ (¬A ⇒ B))”. Hence (A1) is not itself an axiom; it is an axiom schema. System L has an infinity of axioms, but three axiom schemata.

1 Trans 2, 3 MP System L has only one inference rule, viz., modus Natural deduction systems contrast with ax- ponens: from A ⇒ B and A infer B. iomatic systems. Axiomatic systems aim to be Definition: A proof in L of a conclusion A from as minimal as possible. They employ as few basic a set of premises Γ is a finite ordered sequence of wffs principles and rules as possible. For them, sticking B1 , B2 , . . . , Bn , such that the last member of the to what is psychologically most natural or conve- sequence, Bn , is the conclusion A and for each memnient is not the prime goal. ber of the sequence Bi , where 1 ≤ i ≤ n, either (1) Generally, when working within a deduction Bi is a member of the premise set Γ, or (2) Bi is an system, proofs are easier when the system is more axiom of L, or (3) Bi follows from previous members complex, because you have more rules to work of the sequence by modus ponens. 17

To put this less formally, L is a deduction system in which each step must be either a premise, an axiom, or a modus ponens inference. There are no other rules. All proofs are direct. There are no indirect or conditional proofs. Contrast the simplicity of this system with the natural deduction systems on the previous page. Yet, this system is no less powerful. Indeed, in a week or two we will prove that it is complete, i.e., that every thing that should be provable in it is provable in it. Abbreviation: We use the notation ΓÀ

(or Γ `L A )

to mean that there is at least one proof (in L) of A from Γ, or that A is provable from the set of premises Γ. If Γ has one or just a few members, we write simply: B ` A or B, C ` A etc. (The sign ‘`’ is called the turnstile.) Definition: A theorem of L is any wff A such that ∅À. In other words, a theorem is a wff that can be proven without using any premises. Abbreviation: We use the notation À To mean that A is a theorem. Here is a proof showing that “P ⇒ P ” is a theorem of L: 1. P ⇒ ((P ⇒ P ) ⇒ P ) instance of A1 2. P ⇒ (P ⇒ P ) instance of A1 3. (P ⇒ ((P ⇒ P ) ⇒ P )) ⇒ ((P ⇒ (P ⇒ P )) ⇒ (P ⇒ P ))instance of A2 4. (P ⇒ (P ⇒ P )) ⇒ (P ⇒ P ) 1, 3 MP 5. P ⇒ P 2, 4 MP Here we see with line 1 and line 2 that different instances of the same axiom schema are quite often used within the same proof. Line 3 is a typical instance of (A2), making A and C into ‘P’ and B into “(P ⇒ P )”.

In general proofs in an axiomatic system are longer and less natural than in natural deduction. We make it up to ourselves by never proving the same thing twice. Notice that the above proof suffices for the particular theorem “P ⇒ P ”. However, the exact same line of reasoning would work for any statement of the form A ⇒ A . Whatever A is, there is a proof of the form: 1. A ⇒ ((A ⇒ A ) ⇒ A ) A1 2. A ⇒ (A ⇒ A ) A1 3. (A ⇒ ((A ⇒ A ) ⇒ A )) ⇒ ((A ⇒ (A ⇒ A )) ⇒ (A ⇒ A )) A2 4. (A ⇒ (A ⇒ A )) ⇒ (A ⇒ A ) 1, 3 MP 5. A ⇒ A 2, 4 MP Not only that, but we could introduce the appropriate five steps into any derivation whenever we wanted something of the form A ⇒ A . Just like every instance of A ⇒ (B ⇒ A ) is an axiom, every instance of (A1) is a theorem. Hence we call it a theorem schema. Let us call this schema “Self Implication” (Self-Imp). Once we have given a proof for a theorem schema, from then on we treat its instances as though they were axioms, and allow ourselves to make use of it in any later proof just by citing the previous proof. This is allowable since, if need be, we could always just repeat the steps of the original proof in the middle of the new proof. Here’s a proof showing that for any wff A , it holds that: ¬¬A ` A 1. ¬¬A Premise 2. ¬¬A ⇒ (¬A ⇒ ¬¬A ) A1 3. ¬A ⇒ ¬¬A 1, 2 MP 4. (¬A ⇒ ¬¬A ) ⇒ ((¬A ⇒ ¬A ) ⇒ A ) A3 5. (¬A ⇒ ¬A ) ⇒ A 3, 4 MP 6. ¬A ⇒ ¬A (Self-Imp) 7. A 5, 6 MP Strictly speaking, steps such as 6 are not allowed; however, we could remedy this by simply inserting the appropriate steps from our previous proof schema. This gets tedious. Our motto in this class is to never prove something again once you’ve already proven it once. Not only that, but the result that for any wff A , we have ¬¬A ` A is also the sort of thing that might come in handy down the road. It is not

18

a theorem schema, since it involved a premise, and does not show anything to be a theorem. However, what it does show is that whenever we have arrived at something of the form ¬¬A within a proof, we could do the same steps above to arrive at A . We’re allowed to skip these steps, and cite and above result. In effect, we’ve added a new inference rule to our system. We haven’t really added to the system, since we could always fill in the missing steps. Hence a result of this form is called a derived rule. Let us call this derived rule double negation (DN) for obvious reasons. (Actually, it is only half of double negation. We’d also need to show that A ` ¬¬A , which is different.) Your book just gives theorem schemata and derived rules generic names like “Prop. 1.11a”. Personally, I find them easier to remember if I make up my own descriptive names and abbreviations like “(Self-Imp)” and “(DN)”. You can more or less do as you like. When I do my grading I won’t really be looking at how you annotate your proofs. I’ll be looking more at the content of the proofs themselves. Another thing I find helpful that the book doesn’t do is recognize that each step in a proof is itself a result, since we could have stopped the proof there. Hence I like to use the sign “`” before any step of a proof I arrive at without using any premises, and similarly, for those steps that did require a premise, I like to make note of this by writing the premise before the sign “`”. So for the first example, I prefer:

7. ¬¬A ` A 5, 6 MP Written this way, every single line becomes a metatheoretic result. Moreover, it shows which lines in a proof are justified by which premises, and which lines were justified without using any premises. (When a premise is introduced, it is its own justification.) Here we see that in the first proof every line was a theorem, but in the second proof, some lines were theorems, but others required the assumption at line 1. The disadvantage of this notation is that it is more to write, which gets tedious especially when more than one premise is involved. You can do much the same thing by abbreviating using line numbers, e.g., by writing line 3 instead as: 3. [1] ` ¬A ⇒ ¬¬A 1, 2 MP with the “[1]” representing line 1, and so on.

F.

The Deduction Theorem

If you do your homework, you’ll be chugging away at a number of interesting and worthwhile new theorem schemata and derived rules. Today we show something more radical: we show that the natural deduction method of conditional proof, while not strictly speaking allowed in System L, is unnecessary in L, because there is a rote procedure for transforming a would-be conditional proof into a direct proof. To be more precise, we’re going to prove the following meta-theoretic result:

1. ` A ⇒ ((A ⇒ A ) ⇒ A ) A1 2. ` A ⇒ (A ⇒ A ) A1 3. ` (A ⇒ ((A ⇒ A ) ⇒ A )) ⇒ ((A ⇒ (A ⇒ A )) ⇒ (A ⇒ A )) A2 4. ` (A ⇒ (A ⇒ A )) ⇒ (A ⇒ A ) 1, 3 MP 5. ` A ⇒ A 2, 4 MP And for the second, I prefer to write: 1. 2. 3. 4.

Result (The Deduction Theorem): If Γ∪{C } ` A , then Γ ` C ⇒ A . Or, in other words, if we can construct a proof for a certain result A using a set of premises Γ along with an additional premise or assumption, C , then it is always possible, using the original set alone, to construct a proof for the conditional statement C ⇒A.

¬¬A ` ¬¬A Premise ` ¬¬A ⇒ (¬A ⇒ ¬¬A ) A1 ¬¬A ` ¬A ⇒ ¬¬A 1, 2 MP Proof: ` (¬A ⇒ ¬¬A ) ⇒ ((¬A ⇒ ¬A ) ⇒ A ) (1) Assume that Γ ∪ {C } ` A . This means that A3 there is a proof, i.e., an ordered sequence of wffs 5. ¬¬A ` (¬A ⇒ ¬A ) ⇒ A 3, 4 MP B1 , B2 , . . . , Bn that satisfies the definition of 6. ` ¬A ⇒ ¬A (Self-Imp) being a proof of A from Γ ∪ {C }. 19

(2) We’re going to use the technique of proof induction (see page 6) to show that for every step in this proof, Bi , where 1 ≤ i ≤ n, it holds that Γ ` C ⇒ Bi . (3) An argument by proof induction works by first making an inductive hypothesis. Let Bi be an arbitrary step in the proof. We’re allowed to assume as an inductive hypothesis that for all earlier steps in the proof Bj such that j < i it holds that Γ ` C ⇒ Bj . We need to show that the same holds for Bi given this assumption. (4) Because Bi is a step in a proof of A from Γ ∪ {C }, Bi is any one of these three things: a) Bi is a premise, i.e., it is a member of Γ ∪ {C }. b) Bi is an axiom of L. c) Bi followed from previous steps in the proof by modus ponens. We will show for any of these cases, it holds that Γ ` C ⇒ Bi . Case a) : Bi is a premise. This means that either Bi is C or it is a member of Γ. If Bi is C , then C ⇒ Bi is the same as C ⇒ C , and hence an instance of (Self-Imp), which can be introduced into any proof. In that case, Γ ` C ⇒ Bi . If Bi is a member of Γ then clearly Γ ` Bi . We can introduce the axiom Bi ⇒ (C ⇒ Bi ) as an instance of (A1), and so by MP we can conclude Γ ` C ⇒ Bi . Case b) : Bi is an axiom. Hence we can introduce Bi into any proof at any time. By (A1), Bi ⇒ (C ⇒ Bi ) is also an axiom. Hence by MP we get ` C ⇒ Bi , and a fortiori Γ ` C ⇒ Bi . Case c) : Bi followed from previous steps in the proof by modus ponens. This is the hard case. By the definition of modus ponens, there must be two previous members of the sequence, Bj and Bk from which it followed, with Bj taking the form Bk ⇒ Bi . By the inductive hypothesis, it holds that Γ ` C ⇒ Bj and Γ ` C ⇒ Bk . Because Bj takes the form Bk ⇒ Bi , this means Γ ` C ⇒ (Bk ⇒ Bi ). We can then introduce the axiom (C ⇒ (Bk ⇒ Bi )) ⇒ ((C ⇒ Bk ) ⇒ (C ⇒ Bi )) as an instance of (A2). By two applications of MP, we get Γ ` C ⇒ Bi . (5) Hence, for every step Bi in the original proof,

we can push the assumption C through to make it an antecedent. This is true of the last step in the proof, Bn , which must be A , since A was the conclusion of the original proof. Hence, Γ ` C ⇒ Bn means that Γ ` C ⇒ A . e The above proof of the deduction theorem is fairly hard to follow in the abstract, but the idea behind it is actually very simple. What it means is that for every proof making use of some number of assumptions or premises, we can eliminate one of the premises and make it an antecedent on each line of the original proof. There is a rote procedure for transforming each line into a line with the eliminated assumption or premise as an antecedent. We follow this procedure for the example given on the next page. The deduction theorem works almost as a substitute for conditional proof; more precisely, however, it shows that conditional proof in the object language is not needed.

Applying the Deduction Theorem Last class we covered this proof schema: 1. ¬¬A ` ¬¬A Premise 2. ` ¬¬A ⇒ (¬A ⇒ ¬¬A ) A1 3. ¬¬A ` ¬A ⇒ ¬¬A 1, 2 MP 4. ` (¬A ⇒ ¬¬A ) ⇒ ((¬A ⇒ ¬A ) ⇒ A ) A3 5. ¬¬A ` (¬A ⇒ ¬A ) ⇒ A 3, 4 MP 6. ` ¬A ⇒ ¬A (Self-Imp) 7. ¬¬A ` A 5, 6 MP We used a premise to arrive at our conclusion; the deduction theorem tells us that there is a proof not making use of the premise, in which the premise of the original argument becomes an antecedent on the result, i.e.: ` ¬¬A ⇒ A The proof of the deduction theorem provides us with a way of transforming the above proof schema into one for the result that ` ¬¬A ⇒ A . We take the steps of the original proof one by one, and depending on what kind of case it is, we treat it appropriately. In transforming each step, the goal is to push the discharged premise to the other side of the turnstyle, and arrive at “` ¬¬A ⇒ . . .”.

20

Line 1 is the discharged premise. It falls in “case a)” from the previous page. It becomes: 1. ` ¬¬A ⇒ ¬¬A (Self-Imp) Line 2 appeals to an axiom. It falls in “case b)”. So, it becomes: 2. ` ¬¬A ⇒ (¬A ⇒ ¬¬A ) A1 3. ` (¬¬A ⇒ (¬A ⇒ ¬¬A )) ⇒ (¬¬A ⇒ (¬¬A ⇒ (¬A ⇒ ¬¬A ))) A1 4. ` ¬¬A ⇒ (¬¬A ⇒ (¬A ⇒ ¬¬A ))2,3 MP Line 3 is gotten by MP. It falls in “case c)”: 5. ` (¬¬A ⇒ (¬¬A ⇒ (¬A ⇒ ¬¬A ))) ⇒ ((¬¬A ⇒ ¬¬A ) ⇒ (¬¬A ⇒ (¬A ⇒ ¬¬A ))) A2 6. ` (¬¬A ⇒ ¬¬A ) ⇒ (¬¬A ⇒ (¬A ⇒ ¬¬A )) 4, 5 MP 7. ` ¬¬A ⇒ (¬A ⇒ ¬¬A ) 1, 6 MP Line 4 also appeals to an axiom. We treat it just like we treated line 2: 8. ` (¬A ⇒ ¬¬A ) ⇒ ((¬A ⇒ ¬A ) ⇒ A ) A3 9. ` ((¬A ⇒ ¬¬A ) ⇒ ((¬A ⇒ ¬A ) ⇒ A )) ⇒ (¬¬A ⇒ ((¬A ⇒ ¬¬A ) ⇒ ((¬A ⇒ ¬A ) ⇒ A ))) A1 10. ` ¬¬A ⇒ ((¬A ⇒ ¬¬A ) ⇒ ((¬A ⇒ ¬A ) ⇒ A )) 8, 9 MP Line 5 is gotten at by MP, so “case c)” again: 11. ` (¬¬A ⇒ ((¬A ⇒ ¬¬A ) ⇒ ((¬A ⇒ ¬A ) ⇒ A ))) ⇒ ((¬¬A ⇒ (¬A ⇒ ¬¬A )) ⇒ (¬¬A ⇒ ((¬A ⇒ ¬A ) ⇒ A ))) A2 12. ` (¬¬A ⇒ (¬A ⇒ ¬¬A )) ⇒ (¬¬A ⇒ ((¬A ⇒ ¬A ) ⇒ A )) 10,11 MP 13. ` ¬¬A ⇒ ((¬A ⇒ ¬A ) ⇒ A ) 7, 12 MP Line 6 appeals to a theorem schema. Strictly speaking we should write out the intermediate steps, but to save time we can treat it like an axiom, and use the method for case b): 14. ` ¬A ⇒ ¬A (Self-Imp) 15. ` (¬A ⇒ ¬A ) ⇒ (¬¬A ⇒ (¬A ⇒ ¬ A )) A1 16. ` ¬¬A ⇒ (¬A ⇒ ¬A ) 14, 15 MP Line 7 is another MP step: 17. ` (¬¬A ⇒ ((¬A ⇒ ¬A ) ⇒ A )) ⇒ ((¬¬A ⇒ (¬A ⇒ ¬A )) ⇒ (¬¬A ⇒ A )) A2

18. ` (¬¬A ⇒ (¬A ⇒ ¬A )) ⇒ (¬¬A ⇒ A ) 13, 17 MP 19. ` ¬¬A ⇒ A 16, 18 MP We’ve transformed our original 7 step proof into a 19 step proof for the result we were after. Notice that in the new proof, every single step is a theorem; the hypothesis is removed entirely. The final line shows that all wffs of the form ¬¬A ⇒ A are theorems. This procedure can be lengthy, but it sure is effective! The proofs that result from the transformation procedure are not usually the most elegant ones possible Notice, e.g., that lines 2 and 7 are identical, so we could have skipped lines 3–7! However, we followed the recipe provided on the previous page blindly, since we know that that procedure will work in every case. Since we know we can always transform the one kind of proof into the other, from here on out (well, except in tonight’s homework), whenever you have a result of the form A ` B, just go ahead and conclude ` A ⇒ B, annotating with “DT”. (In effect, this allows you to do “conditional proofs” in our System L.)

Important Derived Rules for System L The following are either proven in your book, assigned for homework, or not worth our time to bother proving now. Remember that (A ∨ B) is defined as (¬A ⇒ B) and (A ∧ B) is defined as ¬(A ⇒ ¬B), etc. Derived rules: My name/abbreviation: A ⇒ B, B ⇒ C ` A ⇒ C Syllogism (Syll) A ⇒ (B ⇒ C ) ` B ⇒ (A ⇒ C ) Interchange (Int) A ⇒ B ` ¬B ⇒ ¬A Transposition (Trans) ¬A ⇒ ¬B ` B ⇒ A Transposition (Trans) A ⇒ B, ¬B ` ¬A Modus Tollens (MT) ¬¬A ` A Double Negation (DN) A ` ¬¬A Double Negation (DN) ¬A ` A ⇒ B False Antecedent (FA) A `B⇒A True Consequent (TC) A , ¬B ` ¬(A ⇒ B) True Ant/False C.(TAFC) ¬(A ⇒ B) ` A True Antecedent (TA) ¬(A ⇒ B) ` ¬B False Consequent (FC)

21

A ⇒ B, ¬A ⇒ B ` B A ∨A À A À ∧A A À ∨B A `B∨A A ∨B`B∨A A ∧B`B∧A A ⇔B`B⇔A (A ∧ B) ∧ C ` A ∧ (B

Inevitability (Inev) Redundancy (Red) Redundancy (Red) Addition (Add) Addition (Add) Commutativity (Com) Commutativity (Com) Commutativity (Com) ∧ C) Associativity (Assoc) A ∧ (B ∧ C ) ` (A ∧ B) ∧ C ” (Assoc) (A ∨ B) ∨ C ` A ∨ (B ∨ C ) ” (Assoc) A ∨ (B ∨ C ) ` (A ∨ B) ∨ C ” (Assoc) A ∧BÀ Simplification (Simp) A ∧B`B Simplification (Simp) A ,B ` A ∧ B Conjunction Intro (Conj) A ⇒ B, B ⇒ A ` A ⇔ B Biconditional Intro (BI) A ,B ` A ⇔ B Biconditional Intro (BI) ¬A , ¬B ` A ⇔ B Biconditional Intro (BI) A ⇔BÀ ⇒B Biconditional Elim (BE) A ⇔B`B⇒A Biconditional Elim (BE) A ⇔ B, A ` B Bic. Modus Ponens (BMP) A ⇔ B, B ` A Bic. Modus Ponens (BMP) A ⇔ B, ¬A ` ¬B Bic. Modus Tollens (BMT) A ⇔ B, ¬B ` ¬A Bic. Modus Tollens (BMT) As you see, I prefer Copi’s abbreviations. You can use whatever abbreviations you prefer, provided that you don’t use a derived rule until you’ve given a proof schema for it! Once you do it, you can always refer back to it.

G.

Soundness and Consistency

Result (Soundness): System L is sound, i.e., for any wff A , if ` A then A . In other words, every theorem of L is a tautology.

Proof: (1) Assume ` A . This means that there is a sequence of wffs B1 , B2 , . . . , Bn constituting proof of A in which every step is either an axiom or derived from previous steps by MP. (2) We shall show by proof induction that every step of such a proof is a tautology. We assume as inductive hypothesis that all the steps prior to a given step Bi are tautologies. We now need to show that Bi is a tautology. (3) Bi is either an axiom or derived from previous steps by MP. If it is an axiom, then it is a tautology. (A simple truth table for the three axiom schemata shows that all instances are tautologies.) By an earlier result (see p. 11), anything derived from MP from tautologies is also a tautology. Hence, Bi is a tautology. (4) By proof induction, all steps of the proof are tautologies, including the last step, which is A . Hence A . e

Corollary (Consistency): System L is consistent, i.e., there is no wff A such that both ` A and ` ¬A .

Proof:

Time to get to the really good stuff—the important results of this chapter. Generally, we say that a logical system is sound if and only if everything provable in it ought to be provable in it given the intended semantics for the signs utilized in the language. Generally, we say that a logical system is consistent if and only if there is no wff A such that both A and ¬A are provable in the system. We now show that L has these features.

Suppose for reductio that there is some A such that ` A and ` ¬A . Since L is sound, A and ¬A . By the definition of a tautology, every truth-value assignment makes both A true and ¬A true. However, no truth-value assignment can make both A and ¬A true, and so our assumption is impossible. e

22

Here we see that consistency is a corollary of sound- Composition Lemma ness. Here’s another. (Something that is proven only as a means towards proving something else is called a lemma.) Most likely, one of the first things you learned Corollary: If {B1 , B2 , . . . , Bn } ` A then about propositional logic is how to compute the {B1 , B2 , . . . , Bn } A . truth value of a given statement if you are given the truth values of all its statement-letters. This is what you do when you fill in a row of a truth table. Proof: P Q R P ⇒ ¬ (Q ⇒ R) The reason is that if T T F T T T (T F F) {B1 , B2 , . . . , Bn } ` A In system L, this corresponds to the result that, if, for every statement letter in A , you are given then by multiple applications of the deduction theeither it or its negation as a premise, you should orem, be able to derive either the truth or the falsity of ` (B1 ⇒ (B2 ⇒ . . . (Bn ⇒ A ))). A . For the example just given, we should have: {P, Q, ¬R} ` P ⇒ ¬(Q ⇒ R). We do! Then, by soundness, we can conclude: 1. {P, Q, ¬R} ` P Premise (B1 ⇒ (B2 ⇒ . . . (Bn ⇒ A ))) 2. {P, Q, ¬R} ` Q Premise 3. {P, Q, ¬R} ` ¬R Premise Then by simple reflections on the rules governing 4. {P, Q, ¬R} ` ¬(Q ⇒ R) 2, 3 (TAFC) truth tables, it is obvious that: 5. ` ¬(Q ⇒ R) ⇒ [P ⇒ ¬(Q ⇒ R)] A1 {B1 , B2 , . . . , Bn } A 6. {P, Q, ¬R} ` P ⇒ ¬(Q ⇒ R) 4, 5 MP In other words, only logically valid arguments have Let us prove this result in a general form. proofs in L. e

H.

Result (Composition Lemma): If A is a wff whose statement letters are P1 , . . . , Pn , and there is a truth-value assignment f such that set Γ contains Pi iff f assigns the truth value T to Pi , and Γ contains ¬Pi iff f assigns the truth value F to Pi , then if f makes A true, then Γ ` A , and if f makes A false, then Γ ` ¬A .

Completeness

Our next task is to prove the converse of soundness, i.e., that if A then ` A . Unfortunately, the word “complete” is used with two different meanings in mathematical logic. On one meaning (used by Emil Post), a system is said to be complete if and only if for every wff A , either A or ¬A is a theorem of the system. System L is obviously not complete in this sense, since for a contingent statement, neither it nor its negation is a tautology, and hence neither it nor its negation is a theorem. The other sense of completeness is the converse of soundness, i.e., that everything that should be provable in the system given the semantics of the signs it employs is in fact provable. This notion of “completeness” was first used by Kurt Gödel, and is sometimes called “semantic completeness”. System L is complete in this sense. Before we prove this, we first need to prove something else.

Proof: We show this by wff induction. Base step: Let A be a statement letter. Then the only statement letter making up A is A itself. If f assigns T to A , then A ∈ Γ and hence Γ ` A . Similarly, if f assigns F to A , then ¬A ∈ Γ, and hence Γ ` ¬A . Induction step: Because all complex wffs are built up using the signs ¬ and ⇒, we need to show two

23

things, (a) if A takes the form ¬B then the above Example: Consider the truth table for A ; it is a holds for A assuming it holds of B, and (b) if A tautology, true on every row. Each row gives us takes the form B ⇒ C , then the above holds of A a different result from the Composition Lemma, assuming it holds of B and C . but always a different way of proving A . First let’s show part (a). P Q R A Result of lemma Suppose A takes the form ¬B. If f makes A true, T T T T {P, Q, R} ` A then it must make B false. By our assumption, T T F T {P, Q, ¬R} ` A Γ ` ¬B, which is the same as Γ ` A , which is T F T T {P, ¬Q, R} ` A what we want. If f makes A false, it must make T F F T {P, ¬Q, ¬R} ` A B true. By our assumption Γ ` B, and by (DN) F T T T {¬P, Q, R} ` A Γ ` ¬¬B, which is the same as Γ ` ¬A . F T F T {¬P, Q, ¬R} ` A Now let’s show part (b). F F T T {¬P, ¬Q, R} ` A F F F T {¬P, ¬Q, ¬R} ` A Suppose A takes the form B ⇒ C . If f makes A true, it must make either B false or C true. If f (4) By the Deduction Theorem, we can conclude makes B false, then by our assumption Γ ` ¬B, that if ∆ is a set containing either Pi or ¬Pi and by the derived rule (FA), it follows that Γ ` for each i such that 1 ≤ i ≤ n−1, we have both B ⇒ C , i.e., Γ ` A . If f makes C true, by our as∆ ` Pn ⇒ A and ∆ ` ¬Pn ⇒ A . By the sumption Γ ` C and so by the derived rule (TC), we derived rule (Inev), we can conclude ∆ ` A . get Γ ` B ⇒ C , i.e., Γ ` A . On the other hand, if Example: What we’re doing here taking the last f makes A false, it must make B true and C false. statement letter or negation in each premise By the assumption, Γ ` B and Γ ` ¬C . Then by set and removing it by the deduction theorem, the derived rule (TAFC), we get Γ ` ¬(B ⇒ C ), thereby making it an antecedent. However, or in other words, Γ ` ¬A . since we have both the case with the affirmaThis completes the induction step, and hence the tive antecedent and the case with the negative Composition Lemma follows by wff induction. e antecedent, they drop off by (Inev). We are now ready to tackle completeness. {P, Q} ` R ⇒ A so {P, Q} ` A {P, Q} ` ¬R ⇒ A {P, ¬Q} ` R ⇒ A Result (Completeness): System L is semantiso {P, ¬Q} ` A {P, ¬Q} ` ¬R ⇒ A cally complete, i.e., for any wff A , if A then {¬P, Q} ` R ⇒ A À. so {¬P, Q} ` A {¬P, Q} ` ¬R ⇒ A {¬P, ¬Q} ` R ⇒ A so {¬P, ¬Q} ` A {¬P, ¬Q} ` ¬R ⇒ A Proof: (1) Assume that A , and let the statement letters (5) By continued application of the same process making it up be P1 , . . . , Pn . described in step (4), we can successively elimiExample: For illustration purposes only, we’ll assume A contains only three statement letters ‘P ’, ‘Q’ and ‘R’. (2) As a tautology, every truth-value assignment to those statement letters makes A true. (3) By the Composition Lemma, it follows that for every for set Γ that contains either Pi or ¬Pi but not both for each i such that 1 ≤ i ≤ n, we have Γ ` A . 24

nate the members of the premise sets, arriving ultimately at the results that ` P1 ⇒ A and ` ¬P1 ⇒ A . Again, by (Inev), it follows that À. Example: We just continue the same process: {P, Q} ` A so P ` Q ⇒ A P À {P, ¬Q} ` A so P ` ¬Q ⇒ A {¬P, Q} ` A so ¬P ` Q ⇒ A ¬P ` A {¬P, ¬Q} ` A so ¬P ` ¬Q ⇒ A

Finally we get both ` P ⇒ A and ` ¬P ⇒ A , This may seem like a difficult task: how does and can conclude ` A by (Inev). one prove that something is not provable from something else? Consider: if we expanded L by Actually, from the above proof, the proof of the adding axioms that are not tautologies, then obvicomposition lemma and the proof of the deduction ously, we could prove that the new axioms were theorem, we could write an algorithm for teaching independent because everything provable from the a computer how to construct a derivation for any axioms of L alone is a tautology. We can’t use this given tautology of our language. (Most such proofs, method to establish the independence of A1 from however, would be several thousand steps long.) A2 and A3, however, because all are tautologies. Instead, we focus on a different, made-up property a wff can have, called selectness. Corollary: if B1 , . . . , Bn A , then Definition: A schmuth-value assignment is B1 , . . . , Bn ` A . For every valid argument, any function mapping statement letters of the lanthere is a deduction for it in our very minimal guage of propositional logic to the set {0, 1, 2}. System L. In effect, such an assignment is something that assigns either 0, 1 or 2 to each statement letter. This is rather like a truth-value assignment, which maps Proof: statement letters to T and F, except that here there See the proof of the converse of the above, given are more possibilities. as a corollary to Soundness on p. 23, and run it in Indirectly a schmuth-value assignment deterthe other direction. e mines a schmuth-value for complex wffs according to the following charts: Corollary: For any wff A , ` A iff A . (All and only tautologies are theorems of L.) A 0 1 2

Proof: Combine Soundness and Completeness.

I.

Independence of the Axioms

¬A 1 1 0

A 0 0 0 1 1 1 2 2 2

B 0 1 2 0 1 2 0 1 2

A ⇒B 0 2 2 2 2 0 0 0 0

These charts allow us to construct schmuth tables. You might think that in establishing soundness and Let us see what one looks like for an instance of completeness, we have shown that System L is ex- (A1). actly what it was intended to be: all and only logical A ⇒ (B ⇒ A) truths are provable in it, and all and only valid ar0 0 0 0 0 guments have proofs in it. The only way in which 0 2 1 2 0 it might be criticized if it is were redundant, i.e., if 0 0 2 0 0 it contained more axioms than necessary. Our task 1 0 0 2 1 today it to show that we needed all three axiom 1 0 1 2 1 schemata. We’ll show that it is impossible to de1 2 2 0 1 rive any one of the axiom schemata as a theorem 2 0 0 2 2 schema using only the other two axiom schemata 2 0 1 0 2 and MP. 2 0 2 0 2 25

The final schmuth-value of this formula for each schmuth-value assignment is given underneath Result: Axiom schema (A1) is independent of the main connective (the first “⇒”). Here we see (A2) and (A3). that this wff is schmtingent, i.e., it has different schmuth-values for different schmuth-value assignments. Contrast this with the possible instances of (A3): Proof: Suppose we had an axiom system in which our only (¬ A ⇒ ¬ B) ⇒ ((¬ A ⇒ B) ⇒ A ) axiom schemata were (A2) and (A3) and our only 1 0 2 1 0 0 1 0 2 0 0 0 inference rule were modus ponens. If so, then every 0 1 0 2 1 0 0 1 0 2 1 1 theorem of the system would be select, since the ax1 0 2 0 2 0 1 0 0 2 0 0 ioms are select and everything derived from select 0 1 1 2 0 0 1 1 1 2 1 0 wffs by MP is also select. Because some instances 1 1 2 1 1 0 1 1 2 1 0 1 of (A1) are not select, this means some instances of 1 1 2 0 2 0 1 1 0 2 2 1 (A1) would not be theorems of this system. Hence, 0 0 2 0 0 2 2 0 2 2 1 0 not all instances of (A1) are derivable from (A2), 0 2 2 1 1 0 0 2 2 1 0 2 (A3) and MP alone. e 0 0 2 2 2 0 2 0 2 0 0 2 A similar procedure can show that (A2) is inInstances of (A3) are schmtologies, i.e., have the dependent of (A1) and (A3). We again consider schmuth-value 0 for any possible schmuth-value functions assigning one of {0, 1, 2} to each stateassignment. ment letter, but instead use the different rules below for complex wffs: Definition: We say that a wff is select if and only A B A ⇒B if it is a schmtology, i.e., it has schmuth-value 0 for 0 0 0 any possible schmuth-value assignment. 0 1 2 A ¬A 0 2 1 Similarly, all the instances of (A2) are select. I’ll 0 1 1 0 0 spare you the 27 row table. You’ll just have to take 1 0 1 1 2 my word for it. 2 1 1 2 0 2 0 0 2 1 0 Result: Modus ponens preserves selectness. 2 2 0

Proof: Suppose A is select, i.e., has schmuth value 0 for every possible schmuth-value assignment. Similarly, suppose that A ⇒ B is select, i.e., has 0 for every possible schmuth-value assignment. Then B must be select as well. We can see this by the schmuth table rules for ‘⇒’. If B were not select, then it would have 1 or 2 as value for some assignment. If so, then A ⇒ B and A could not both be select, because A ⇒ B has value 2 when A has 0 and B gets 1 or 2 as value. e

We then define a notion of grotesqueness. A complex wff is grotesque if and only if it comes out with value 0 using these revised rules for any possible assignment of 0, 1 or 2 to all its statement letters. It turns out that all instances of (A1) and (A3) are grotesque, but some instances of (A2) are not. Modus ponens preserves grotesqueness. So (A2) is independent of (A1) and (A3). For homework, you’ll be proving the independence of (A3) from (A1) and (A2). Relatively, that’s the easiest, since it doesn’t require three values, and can be done with assignments into {0, 1}, provided

26

that one changes the rule governing how the value for ¬A is determined by the value of A . These independence results establish that there is no redundancy in our axiom schemata; we couldn’t simply remove one of them and be left with a complete system. In one sense, our system is “as minimal as possible,” but in another sense it isn’t. We can’t simply remove any of the ones we have, but we could start with completely different axiom schemata. Several “rival” axiomatizations are possible; you can find a list of some of them in your book pp. 45–46. Axiomatizations have been found in which there is only one axiom schema. Just like the decision to use both ‘⇒’ and ‘¬’ instead of ‘|’, however, there are diminishing returns to minimalism. The proofs in such systems for even the most mundane results often require an insane number of steps and insanely complicated axioms. However, in case you’re curious, the first complete system for propositional logic using a single axiom schema was discovered by Jean Nicod in 1917, and it uses the Sheffer stroke instead of ‘⇒’ and ‘¬’. An axiom is any instance of the single schema: (A | (B | C )) | ((D | (D | D)) | ((E | B) | ((A | E ) | (A | E )))) The only inference rule is: From A | (C | B) and A infer B. Are you glad I didn’t make you use that system?

27

UNIT 2 METATHEORY FOR PREDICATE LOGIC

A.

The Syntax of Predicate Logic

variable one: the difference between ‘x2 ’ and ‘xi ’. It simplifies some things when we get to the semantics to use only one letter ‘x’, but I find statements Onwards and upwards. Our first task is to describe with multiple variables much easier to read with ‘x’, ‘y’ and ‘z’ instead of ‘x1 ’, ‘x2 ’ and ‘x3 ’. our new language. Definition: An individual variable is one of the Definition: An individual constant is one of the lowercase letters ‘ x’, ‘ y’, or ‘ z’, written with or with- lowercase letters from ‘ a’ to ‘ e’, written with or without a numerical subscript. out a numerical subscript: Examples: ‘x’, ‘x1 ’, ‘x12 ’, ‘y’, ‘y2 ’, ‘z’, ‘z13 ’, etc.

Examples: ‘a’, ‘a2 ’, ‘b’, ‘c124 ’, etc.

I use the unitalicized letters ‘x’, ‘y’ and ‘z’ as object Again, I use them unitalicized for object-language language variables, and italicized letters in the same constants, and italicized, when (very rarely) I need range—‘x’, ‘y’, ‘z’ —as metalinguistic schematic let- to make a schematic statement about any constant. Again, Mendelson only uses ‘an ’. ters for any object-language variables. Thus, e.g., Definition: A predicate letter is one of the uppercase letters from ‘A’ to ‘ T ’, written with a numerical Schematically represents all of “(∀x)(F x ⇒ Gx)” superscript ≥ 1, and with or without a numerical and “(∀y)(F y ⇒ Gy)” and “(∀x3 )(F x3 ⇒ Gx3 )”, subscript. and so on. The difference is subtle, and usually not so important to keep straight. After all, object lan- Examples: ‘A1 ’, ‘R2 ’, ‘H 4 ’, ‘F21 ’, ‘G34 ’, etc. guage variables tend to be interchangeable; these do not mean anything different. This is why I’m us- Even when italicized, take these to be object laning notation that does not emphasize the difference. guage constants; script letters such as P are used Still, we do need a technical means for differentiat- in their place schematically if need be. ing between the two when it is necessary. The superscript indicates how many terms the Officially, Mendelson only uses ‘xn ’, and not ‘yn ’ predicate letter takes to form a statement. A predior ‘zn ’, although he doesn’t stick to this. His vari- cate letter with a superscript ‘1’ is called a monadic ables are always italicized; the only difference be- predicate letter. A predicate letter with a superween object language and metalanguage is whether script ‘2’ is called a binary or dyadic predicate a particular numerical subscript occurs, or only a letter. (∀x)(F x ⇒ Gx)

28

It is customary to leave these superscripts off However, I adopt the convention that if the when it is obvious from context what they must be. terms in an atomic formula contain no function E.g., “R2 (a, b)” can be written simply “R(a, b)”. letters, the parentheses and commas may be rem Officially Mendelson only uses ‘An ’. moved. Definition: A function letter is one of the lower- Examples: “F x” is shorthand for “F 1 (x)”, and case letters from ‘f ’ to ‘ l’, written with a numerical “Rab” is shorthand for “R2 (a, b)”. superscript ≥ 1, and with or without a numerical subscript. Examples: ‘f 1 ’, ‘g 2 ’, ‘h33 ’, etc.

Definition: A well-formed formula (wff) is reThe numerical superscript indicates how many cursively defined as follows: argument places the function letter has. A func(i) any atomic formula is a wff; tion letter with a superscript ‘1’ is called a monadic (ii) if A is a wff, then ¬A is a wff; function letter; a function letter with a superscript (iii) if A and B are wffs, then (A ⇒ B) is a wff; ‘2’ is called a binary/dyadic function letter, etc. (iv) If A is a wff and x is an individual variable, Here too, it is customary to leave these superthen ((∀x) A ) is a wff; scripts off when it is obvious from context what (v) nothing that cannot be constructed by repeated they must be. E.g., “f 1 (x)” can be written simply applications of the above is a wff. “f (x)”. Definition: A term of the language is defined re- Mendelson puts parentheses around quantifiers. Other notations for “(∀x)” include “(x)”, “∀x”, and cursively as follows: V “ x”. Again, you can use whatever notation you (i) all individual variables are terms; want, and I might not even notice. (ii) all individual constants are terms; We continue to use the same conventions as (iii) if F is a function letter with superscript n, and t1 , . . . , tn are terms, then F (t1 , . . . , tn ) is last unit for dropping parentheses. In Mendelson’s practice, the quantifier is taken to fall between ¬, a term; (iv) nothing that cannot be constructed by repeated ∧, ∨ and ⇒ , ⇔ in the ranking. In other words: applications of the above is a term. (∀x) F x ⇔ Ga

Examples: ‘a’, ‘x’, “f (a)”, “g(x, f (y))”, etc. As evinced above, I use italicized lowercase letters from later on in (but not the end of) the alphabet, such as ‘t’, ‘r’, etc., schematically for any terms.

abbreviates: (((∀x) F x) ⇔ Ga)

Definition: An atomic formula is any expression Whereas: of the form P(t1 , . . . , tn ) where P is a predicate letter with superscript n, and t1 , . . . , tn are all terms. abbreviates: Examples: “F 1 (a)”, “F 1 (f (x))”, “H 4 (x, b, y, g(a, x))”, etc.

“R43 (a, b, c)”,

If you used Hardegree’s Intermediate textbook, you may be used to using hard brackets ‘[’ and ‘]’ instead of soft brackets for atomic formulas. Mendelson uses soft brackets for both, as does almost everyone else. Here, I follow Mendelson, though I’ll put hard brackets to another use in a minute.

(∀x) F x ∨ Ga

((∀x)(F x ∨ Ga)) This is unusual on Mendelson’s part and I shall avoid making use of this convention in cases similar to this last one. The existential quantifier is introduced by definition. The definitions for the other connectives remain unchanged.

29

Abbreviations:

Definition: A wff A is said to be closed iff A contains no free variables; otherwise A is said to be open.

((∃x) A ) abbreviates ¬((∀x) ¬A ) (A ∧ B) abbreviates ¬(A ⇒ ¬B) (A ∨ B) abbreviates (¬A ⇒ B) (A ⇔ B) abbreviates ¬((A ⇒ B) ⇒ ¬(B ⇒ A ))

Open formulas may be very unfamiliar to some of you. In Hardegree’s books, you never see something like “F x” by itself as a line of a proof: there, variables are always bound by quantifiers. Only constants appear on their own. To not be thrown by wffs including free variDefinition: A first-order language is any logical ables, try not to equate the notion of a true/false language that makes use of the above definition of sentence with the notation of a wff. In fact: a wff, or modifies it at most by restricting which constants, function letters and predicate letters are Definition: A sentence is a closed wff. utilized (provided that it uses at least one predicate letter). E.g., a language that does not use function Normally, we’ll only call sentences “true” or “false”. For wffs containing free variables, we say they are letters still counts as a first-order language. satisfied by some values of the variables, and not satisfied by others. Free and Bound Variables In a derivative sense, however, we’ll say that an Definition: When a quantifier (∀x) occurs as part open wff is “true” iff any values we choose for the of a wff A , the scope of the quantifier is defined as free variable(s) would satisfy them. However, these the smallest part ((∀x) B) of A such that ((∀x) B) are semantic issues and we’re still doing syntax. is itself a wff. Examples: Definition: If x is a variable that occurs within a wff A , then an occurrence of x in A is said to be a bound occurrence iff it occurs in the scope of a quantifier of the form (∀x) within A ; otherwise, the occurrence of x is said to be a free occurrence. Examples: 1. All three occurrences of ‘x’ in “(∀x)(F x ⇒ F x)” are bound. 2. The (solitary) occurrence of ‘x’ in “F x ⇒ (∀y) Gy” is free. Definition: A variable x that occurs within a wff A is said to a bound variable, or simply bound, iff there is at least one bound occurrence of x in A . Definition: A variable x that occurs within a wff A is said to be a free variable, or simply free, iff there is at least one free occurrence of x in A .

1. If ‘R2 ’ means “. . . is taller than . . . ”, and ‘b’ is an individual constant standing for Barack Obama, then the open wff, “R2 (x, b)”, is satisfied by all values of the variable that are things taller than Obama, and in our derivative sense, we say that this wff is not true because it is not satisfied by every value of the variable. 2. The open wff “R2 (x, b) ⇒ R2 (x, b)”, (whatever our interpretation) is satisfied by all values of the variable, and, derivatively, is regarded as true. Why do we need both “(∀x) R2 (x, b)” and “R2 (x, b)”? This will become clearer when we get to the system of deduction. (Actually this is in part historical accident; axiom systems that do without free variables have been devised, but they are more complicated.) The difference is roughly the same as between any and all.

Definition: If A is a wff, t is a term and x is a variNotice that ‘x’ is both bound and free in “F x ⇒ able, then t is free for x in A iff no free occurrence (∀x) Gx”, because some occurrences are bound and of x in A lies within the scope of some quantifier one is free. (∀y) where y is a variable occurring in t . 30

Basically, this means that if you substitute t for all the free occurrences of x within A , you won’t end up inadvertently “binding” any of the variables that happen to occur in t. Examples: 1. ‘a’ is free for ‘x’ in “(∀y) Gy ⇒ Gx”. 2. “f 2 (x, z)” is free for ‘x’ in “(∀y) Gy ⇒ Gx”. 3. ‘z’ is not free for ‘x’ in “(∀y) Gy ⇒ (∀z) Rxz”. 4. “f 2 (a, z)” is not free for ‘x’ in “(∀y) Gy ⇒ (∀z) Rxz”. 5. “f 2 (a, z)” is free for ‘x’ in “(∀y) Gy ⇒ (∀z) (∀x) Rxz”. 6. All terms are free for ‘x’ in “(∀y) Gy”. 7. All terms are free for ‘y’ in “(∀y) Gy”. I write A [x] for an arbitrary wff that may or may not contain x free. If the notation A [t] is used in the same context, it means the result of substituting the term t for all free occurrences of x (assuming there are any) in A [x].

B.

The Semantics of Predicate Logic

Over the next couple weeks, we’ll introduce an axiomatic system of deduction for predicate logic, and prove it complete, consistent and sound, just like we did for propositional logic. In other words, we’ll prove that every theorem is logically true, and that every logically true wff is a theorem. But, what is a logically true wff in predicate logic? In propositional logic, a logically true wff is just a tautology, and we had a decision procedure for determining which wffs are logically true and which are logically false, and which arguments are valid and which are invalid, viz., truth tables. However, you may never have learned anything similar for determining whether a given predicate logic statement is valid or invalid according to its semantics (i.e., its form and the meaning of its logical constants). We don’t teach it here at UMass in Intro or Intermediate Logic. We must make up for this glaring omission immediately! The notion of a “truth-value assignment” from propositional logic is replaced with the notion of an interpretation or model.

Examples: 1. If A [x] is “F x”, then A [y] is “F y”. 2. If A [x] is “(∀y) R(y, x)” then A [f (b)] is “(∀y) R(y, f (b))”. 3. If A [z] is “F z ⇒ Gz”, then A [d] is “F d ⇒ Gd”. Definition: An interpretation M consists of the 4. If A [x] is “F x ⇒ (∀x) Gx”, then A [d] is “F d ⇒ following four things: (∀x) Gx”. 1. The specification of some non-empty set D to serve 5. If A [x] is “F a” then A [y] is “F a”. as the domain of quantification for the language. Mendelson writes A (x) and A (t) instead of A [x] This set is the sum total of entities the quantiand A [t]. I think my notation makes it clearer that fiers are interpreted to “range over”. The domain these signs are parts of the metalanguage, and that might include numbers only, or people only, or the parentheses that appear here are not the parenanything else you might imagine. The domain theses used in atomic formula or in function terms. of quantification is sometimes also known as the Similarly, I write A [x, y] for an arbitrary wff universe of discourse. that may or may not contain x and y free, and in 2. An assignment, for each individual constant in the same context I use A [t, s] for the result of subthe language, some fixed member of D for which stituting t for all free occurrences of x, and s for all it is taken to stand. free occurrences of y, in A [x, y]. For a given constant c, this member is denoted in the metalanguage by “(c)M ”. Examples: 3. An assignment, for each predicate letter with su1. If A [x, y] is “Rxy”, then A [a, b] is “Rab”. perscript n in the language, some subset of Dn . 2. If A [x, y] is “(∀z)(Rzx ∧ Ryz)”, then A [a, b] That is, the interpretation assigns to each predicate is “(∀z)(Rza ∧ Rbz)”. letter a set of n-tuples from D. 31

For a given predicate letter P n , this set is denoted in the metalanguage by “(P n )M ”. This set can be thought of as the extension of the predicate letter under the interpretation. 4. An assignment, for each function letter with superscript n in the language, some n-place operation on D. In other words, each function letter is assigned a set of ordered pairs, the first member of which is itself some n-tuple of members of D, and the second member of which is some member of D. This set of ordered pairs is a function, so for each n-tuple in its domain, there is a unique element of D in its range. So if D is the set of natural numbers, a two-place function letter F 2 might be assigned the addition operation, i.e., the set of all ordered pairs of the form hhn, mi, pi such that n + m = p. This operation can be thought of as the mapping, or function-in-extension, represented by the function letter under the interpretation. In a sense, the four parts of a model fix the meanings of the quantifiers, constants, predicate letters and function letters, respectively. (Or at the very least, they fix as much of their meanings as is relevant in an extensional logical system such as first-order predicate logic.) This leaves only something to be said about variables.

Sequences Each model is associated with a certain domain or universe of discourse. Variables are allowed to take different values within that domain. A variable is given a value by what is called a (denumerable) sequence.

order and assign them a numbered position in that ordering. I will utilize the ordering: x, y, z, x1 , y1 , z1 , x2 , y2 , z2 , . . . For any given variable x, its position in this order can be determined with the formula: p = 3n + k Where p is the number of the position, n is the number of the subscript on the variable (or 0 if it has none), and k is either 1, 2 or 3 depending on whether the letter used is ‘x’, ‘y’ or ‘z’. (Because your book does not use ‘y’ or ‘z’ officially, it can simply order the variables according to their subscripts.) For each interpretation M, there will be some (usually infinite) number of sequences of the elements of its domain D. A denumerable sequence can be thought of as an ordered list of elements of the domain D that has a beginning but has no end, in which the members of D are arranged in any order, with or without repetition or patterns. So if D is the set containing the members of the Beatles, each of the columns below represents a different denumerable sequence. s1 s2 s3 s4 1 John John Ringo Ringo 2 John Paul Paul Ringo 3 John George John Ringo 4 John Ringo John Paul 5 John John Ringo Ringo 6 John Paul George Ringo 7 John George John Ringo 8 John Ringo George Paul 9 John John Ringo George .. .. .. .. .. . . . . .

Definition: A denumerable sequence or vari- There are infinitely many more such sequences in able assignment for domain D is a function whose addition to those listed. Think of each member of a sequence as the domain is the set of positive natural numbers, and value of the variable that occupies the correspondwhose range is a subset of D. ing position. The variable ‘x’ is correlated with the What does this have to do with assigning values to first position in such sequences; ‘y’ is correlated the variables? with the second position and so on. So s1 makes We first note that while there are an infinite John the value of every variable. s2 different Beatnumber of variables (since we can always use dif- les the values of different variables in a patterned ferent subscripts), we can arrange them in a fixed way. And s3 does so in an unpatterned way. 32

Therefore, each sequence correlates every variable of the language with a member of the domain of that interpretation. For a given variable x and sequence s, in the metalanguage, we use “s(x)” to denote the member of D which s correlates with x. Hence, s3 (‘y’) is Paul. Given the assignment made for the constants and function letters in M, derivatively, each sequence correlates every term of the language with a member of D. If c is an individual constant, then let s(c) be (c)M . Then for function terms, let s(F (t1 , . . . , tn )) be the entity ε in D such that hhs(t1 ), . . . , s(tn )i, εi ∈ (F )M . A sequence acts just as an assignment of values to the variables. Within a given interpretation M, an open wff might be satisfied by some and not others.

Satisfaction Definition: The notion of satisfaction is defined recursively. For a given interpretation M with domain D: (i) If A is an atomic wff P(t1 , . . . , tn ), then sequence s satisfies A iff the ordered n-tuple formed by those entities in the domain D that s correlates with t1 , . . . , tn is in the extension of P for interpretation M, or more precisely, hs(t1 ), . . . , s(tn )i ∈ (P)M . (ii) Sequence s satisfies a wff of the form ¬A iff s does not satisfy A . (iii) Sequence s satisfies a wff of the form (A ⇒ B) iff either s does not satisfy A or s does satisfy B. (iv) Sequence s satisfies a wff of the form (∀x) A iff every sequence s∗ that differs from s at most with regard to what entity of D it correlates with the variable x satisfies A .

Truth and other Semantic Notions The truth of a wff of predicate logic is relative to an interpretation, just like whether or not a wff of propositional logic is true is relative to a given truth-value assignment. Definition: A wff A is true for interpretation M iff every denumerable sequence one can form from the domain D of M satisfies A . Abbreviation: Used in the metalanguage: M A Means that A is true for M. (The subscript on is necessary here.) Definition: A wff A is false for interpretation M iff no denumerable sequence one can form from the domain D of M satisfies A . Notice that while closed wffs are always either true or false (but not both) for an interpretation M; open wffs can be neither. However, for all wffs A , A is true for M iff ¬A is false, and A is false iff ¬A is true. Here are some other important consequences of these definitions (elaborated upon in your book, pp. 61–64):

Roughly speaking, each sequence assigns a member of D to each free variable, and from there, one can determine whether or not the wff is satisfied by that variable assignment as one would expect. The notion of satisfaction is important because it is used to define the notion of truth. (This is the heart of Tarski’s formal semantics.) 33

• Modus ponens preserves truth in an interpretation, i.e., for all interpretations M, if M A and M (A ⇒ B) then M B. • If A is an open wff, and B is obtained from A by binding all free variables of A with initial quantifiers ranging over the whole wff, then for any interpretation M, M A iff M B. • Every instance of a truth-table tautology is true for every interpretation. • If A is a sentence, then for every interpretation M, either M A or M ¬A . • If t is free for x in A [x], then any wff of the form (∀x) A [x] ⇒ A [t] is true for all interpretations. • If A does not contain x free, then any wff of the form (∀x)(A ⇒ B) ⇒ (A ⇒ (∀x) B) is true for all interpretations.

Definition: If M is an interpretation, and Γ is a set Definition: A wff A is said to be contradictory of wffs all of which are true for M, then M is called a iff it is not satisfiable. Hence, A is contradictory iff model for Γ. ¬A . Notice that every interpretation will be a model for some sets of wffs. So it is appropriate to equate the notion of a model with the notion of an interpretation. In fact, the study of formal semantics for artificial languages is sometimes called “model theory”. I tend to use the words “model” and “interpretation” interchangeably.

Definition: A wff A is said to logically imply a wff B iff in every interpretation, every sequence that satisfies A also satisfies B. Abbreviation: The notation: A B

Definition: A wff A is said to be logically true or logically valid iff A is true for every possible means that wff A logically implies B. interpretation. Definition: A wff A is a logical consequence of a set of wffs Γ iff in every interpretation, every sequence that satisfies every member of Γ also satisfies A.

Abbreviation: The notation: A

(leaving off any subscript) means that A is logically valid. Because interpretations are analogous for truth-value assignments in propositional logic, Abbreviation: Similarly this definition is analogous to the definition of a ΓA tautology given in our last unit; this is why the notation “” is appropriate. means that A is a logical consequence of Γ. What interpretations are possible? Do we know how many? (In a footnote, Mendelson, rather dubiously, equates interpretations with “possible Definition: A wff A is said to be logically equivworlds”. This is misleading in many ways, but it alent to a wff B iff in every interpretation M, A can sometimes be helpful to think of it in this way.) and B are satisfied by the same sequences. • It is impossible that both A and ¬A .

Abbreviation: The notation

Definition: A wff A is said to be satisfiable iff there is at least one interpretation for which there is at least one sequence that satisfies A .

A B means that A and B are logically equivalent.

• Hence, A iff ¬A is not satisfiable, and ¬A iff A is not satisfiable.

• It follows that A B iff A B and B A.

Derivatively, a set of wffs Γ is said to be (mutually) satisfiable iff there is at least one interpretation for which there is at least one sequence that satisfies This rounds out our presentation of the important every member of Γ. semantic concepts for predicate logic. 34

C.

Countermodels and Semantic Trees

one interpretation that satisfies all the premises of the argument but does not satisfy the conclusion, the conclusion is not a logical consequence of the If you are like many students, in your introduc- premises. The same reasoning shows that the wff: tory logic courses, you were taught the truth-table method for showing the validity or invalidity of F a ⇒ ((Ga ⇒ F a) ⇒ Ga) an argument in propositional logic, but were never taught an analogous method for showing the inva- is not logically valid. lidity of an argument in predicate logic. To be sure, you were probably taught a deduction system for Definition: A model in which there is a sequence predicate logic; but such deductions can only be that does not satisfy a given wff A (or set of wffs Γ) used to show that an argument is valid, not that an is called a countermodel to A (or to Γ.) argument is invalid. The definitions above tell us more or less what In propositional logic, there is an effective procethe process should be like. Just like showing dure that always identifies a “counter truth-value a propositional logic argument to be invalid in- assignment” if one exists, or shows that there are volves finding a truth-value assignment making none. (Truth tables.) When a wff is logically valid, the premises true and the conclusion false, the ap- is there always a method for proving that there are propriate method for predicate logic involves find- no countermodels? If a wff is not valid, is there ing an interpretation containing a sequence that always an effective method for finding its countermodels? satisfies all the premises but not the conclusion. As it turns out, no, there isn’t. There is a method Consider the following argument: that works a lot of the time, but it isn’t always efFa fective. This procedure involves constructing what Ga ⇒ F a are called semantic trees. Ga Semantic trees work very similarly to abbreviThe conclusion of this argument is not a logical con- ated truth tables, i.e., those you do by simply assumsequence of its premises. The reason is that there ing that the complex wff is F and attempting either are sequences in some interpretations that satisfy to find a truth-value assignment in line with this the premises but not the conclusion. All we have assumption, or to show that no truth-value assignto do is describe one. ment ever could be in line with this assumption. Consider the model B in which (1) the domain To test whether a certain wff is satisfiable we of quantification is the set {Britney Spears}, (2) the write ‘T’ next to it. To test whether it is a logical assignment to all constants, including ‘a’, is Brit- truth we write ‘F’ next to it to determine whether it ney Spears, (3) all predicate letters are assigned an is possible for it not to be satisfied. To test whether empty extension except the predicate letter ‘F 1 ’, a certain group of wffs could be satisfiable while which is assigned the extension {Britney Spears}, another is not, we write ‘T’ next to those which (4) all function letters are assigned operations map- are to be satisfied and ‘F’ next to those which are ping ordered n-tuples of the members of the set not. (This could be useful in testing the validity of {Britney Spears} onto Britney Spears. an argument.) The book does not write ‘T’s and This model has only one sequence, that which ‘F’s, but just wffs and their negations, but I think it assigns every variable to Britney Spears. Call this better to stress the semantic nature of this exercise. sequence s. Since s(‘a’) ∈ (‘F 1 ’)B , s satisfies “F a”. (This is not meant as a replacement for a system of Hence, s also satisfies “Ga ⇒ F a”. However, s deduction.) does not satisfy “Ga”, since s(‘a’) ∈ / (‘G1 ’)B . (ReWe then apply the rules below to the statements, 1 B call that (‘G ’) is ∅.) depending on their main connectives. These break Because there is at least one sequence in at least down how the satisfaction of a given wff depends 35

on its parts, to see if the proposal is possible. When a certain possibility might be true in more than one way, the tree branches to explore both possibilities.

Atomic Formulas T P(t1 , . . . , tn ) Check to see whether F P(t1 , . . . , tn ) appears previously on branch. If so, close branch with 6. If not, do nothing.

Semantic Tree Rules Here are the rules for the primitive connectives.

F P(t1 , . . . , tn ) Check to see whether T P(t1 , . . . , tn ) appears previously on branch. If so, close branch with 6. If not, do nothing.

Negations T (¬A ) .. . FA

Strictly speaking, the above rules suffice, since we could rewrite any wff containing other connectives, or the existential quantifier, in unabbreviated form. However, it can be seen that the rules will be equivalent to the following additional tree rules. You may choose either to use or not use these.

F ¬A .. . TA Conditionals T (A ⇒ B) .. .

Disjunctions T (A ∨ B) .. .

TB

FA

F (A ⇒ B) .. .

TA

TA FB

TB

F (A ∨ B) .. . FA FB

Universal Quantifier T (∀x) A [x] .. . T A [t1 ] .. .

T A [tn ] (for all closed terms ti occurring on this branch of tree) F (∀x) A [x] .. .

F A [c] (where c is some new constant unused in tree) Afterwards, reapply rule for any “T (∀y) B” (or “F (∃y) B”) lines that were applied earlier on the current branch.

Conjunctions T (A ∧ B) .. . TA TB

F (A ∧ B) .. . FA 36

FB

Biconditionals T (A ⇔ B) .. . TA TB

FA FB

F (A ⇔ B) .. . TA FB

FA TB

Existential Quantifiers T (∃x) A [x] .. .

T A [c] (where c is some new constant unused in tree) Afterwards, reapply rule for any “T (∀y) B” (or “F (∃y) B”) lines on the current branch. F (∃x) A [x] .. .

(2) You will have applied the rules to every wff in some branch without it closing. In this case, the branch remaining open can be used to construct a model and sequence for the original hypothesis. (This will be a countermodel to B if you assumed B was unsatisfied, etc.) Choose a domain with as many entities as there closed terms on the branch, and assign each term ti to one of the entities of the domain, (ti )M , and, for each n-place predicate letter P on the branch, include h(t1 )M , . . . , (tn )M i in (P)M iff the assumption “T P(t1 , . . . , tn )” occurs on that branch. The described model will have a sequence that satisfies all and only those wffs that have a T next to them in the branch. (3) You will be stuck in an infinite loop of steps, and never finish the tree. In this case, there is likely a model that will have a sequence that satisfies all the initial assumptions, but it may be one with an infinitely large domain. With creative insight, you may be able to determine what this model will be like, but there is no algorithm for doing this.

By changing the initial assumption, we can use trees also to test whether or not a sentence is conF A [t1 ] tradictory (by, e.g., assuming T B at the start), or .. . whether two sentences are equivalent (by deterF A [tn ] mining whether their biconditional can be unsatis(for all closed terms ti occurring on this branch of fiable), and so on. tree) When a tree branches, you’re considering different ways of making good on your original assumption. Examples: “The current branch” is considered everything you can reach by tracing upwards but not downwards from the current location. 1. Let us first use a tree to show that Rules can be applied in any order, but generally, it’s more helpful to apply other rules before the “T (A ⇒ B)” and “T (∀x) A [x]” rules. (∀x)(F x ⇒ Gx), (∀x)(Gx ⇒ Hx) (∀x)(F x ⇒ Hx) If you continue this procedure, you will achieve one of three results: (1) Every branch of the tree will close. In this case, the initial assumption turned out We do this by exploring the possibility of a seto be impossible. Therefore, B. The tree quence satisfying the premises but not the conitself can be transformed into a proof in the clusion, and show that this is impossible. metalanguage that B has no countermodels. 37

F (((∃x) F x) ∧ ((∃y) Gy)) ⇒ (∃x)(F x ∧ Gx) T ((∃x) F x) ∧ ((∃y) Gy) F (∃x)(F x ∧ Gx) T (∃x) F x T (∃y) Gy T Fa T Gb F F a ∧ Ga

T (∀x)(F x ⇒ Gx) T (∀x)(Gx ⇒ Hx) F (∀x)(F x ⇒ Hx) F (F a ⇒ Ha) T Fa F Ha T (F a ⇒ Ga) F Fa 6

T Ga T (Ga ⇒ Ha) F Ga 6

F Fa 6

T Ha 6

F Ga F F b ∧ Gb F Fb

F Gb 6

Although two branches closed, one remains open. We can use this to construct a countermodel, M. Let D = {α, β}, (‘a’)M = α, (‘ b’)M = β, (‘F 1 ’)M = {α}, (‘G1 ’)M = {β}. In The above can easily be transformed into a metaany such model, no sequence satisfies the above. language proof. Such a proof would begin: “SupHence this wff is not logically valid. pose for reductio ad absurdum there is some se- 3. For an example of an infinite tree, consider one quence s in some model M such that s satisfies attempting to show that ‘(∀x)(F x ⇒ Gx)’ and ‘(∀x)(Gx ⇒ Hx)’ but not ‘(∀x)(F x ⇒ Hx)’. By the definition of satis2 (∀x) (∃y) Rxy ⇒ F a faction, it there must be some other sequence s0 , differing from s by at most what it assigns to the which looks like this: variable ‘x’ that does not satisfy “(F x ⇒ Hx)”; let us call the entity s0 assigns to ‘x’, α. [α plays F (∀x) (∃y) Rxy ⇒ Ga the role of ‘a’ in the tree, though we should not T (∀x) (∃y) Rxy assume anything about the constant ‘a’.] Any seF Ga quence that assigns α to ‘x’ will satisfy ‘F x’ but T (∃y) Ray not ‘Hx’ . . . ”, and so on, matching the lines of T Rab the tree. Branching will result in a proof by cases T (∃y) Rby in the metalanguage, where each case leads to a T Rbc different contradiction. T (∃y) Rcy 2. We now show that T Rcd T (∃y) Rdy T Rde .. . 2 (((∃x) F x) ∧ ((∃y) Gy)) ⇒ (∃x)(F x ∧ Gx)

We do this by constructing a countermodel via a tree. We do this by assigning ‘F’ to the above. 38

Clearly, there is no end to this tree, but it’s also pretty clear that it does describe a model. Let D = the set of natural numbers, (‘R2 ’)M be the less than relation, (‘G1 ’)M be the property of being odd, and the constants stand for the natural numbers in order, beginning with (‘a’)M = 0.

Mendelson goes so far as to give metatheoretic proofs that whenever a tree closes, the wff(s) in question is (are) unsatisfiable (or logically true, if F was assumed), and that the appropriate kind of model exists if the tree doesn’t close, and so on.

D.

An Axiom System

As our system of deduction for predicate logic, we introduce the following. (Below, ‘A ’, ‘B’, and ‘C ’ are used as schematic letters representing wffs, ‘x’ as a schematic letter for individual variables, and ‘t’ for terms. ‘Γ’ is used as a metalinguistic variable ranging over sets of object-language wffs.)

The First-Order Predicate Calculus (System PF)1 Definition: An axiom of PF or logical axiom is any wff of one of the following five forms: (A1) A ⇒ (B ⇒ A ) (A2) (A ⇒ (B ⇒ C )) ⇒ ((A ⇒ B) ⇒ (A ⇒ C )) (A3) (¬A ⇒ ¬B) ⇒ ((¬A ⇒ B) ⇒ A ) (A4) (∀x) A [x] ⇒ A [t], for all instances such that t is free for x in A [x] (A5) (∀x)(A ⇒ B) ⇒ (A ⇒ (∀x) B), for all instances such that A contains no free occurrences of x.

Definition: A first-order theory K is an axiomatic system in the language of predicate logic that can be obtained from the above by adding zero or more proper or non-logical axioms. Proper axioms are added to represent the basic principles of a certain area of thought. E.g., we might form a first-order theory for the study of the solar system by using the constants ‘a1 ’, . . . , ‘a9 ’ as the nine planets (and Pluto), ‘b1 ’ for the sun, etc., using ‘O2 ’, for the orbiting relation, etc., and adding as axioms certain laws of physics stated in the language of predicate logic, and so on. Note that every theorem schema of L corresponds to a theorem schema of PF (or any other first-order theory). Since L is complete, if A is a truth-table tautology, then `PF A . You may cite this in your proofs by writing “Taut” as justification. Similarly, every derived rule of L corresponds to a derived rule of PF. You may make use of this in your derivations by using the abbreviations on p. 21, or simply writing “SL” [System L] as justification. Alternatively, you may utilize the notation used within your favorite natural deduction system for propositional logic, or abbreviate the names given with the derived rules listed in sec. 2.5 of your textbook. All first-order theories, including the barebones PF, have the following additional derived rules.

Definition: The inference rules of PF are: Modus ponens (MP): From (A ⇒ B) and A , infer B. Generalization (Gen): From A infer (∀x) A .

Result (UI, ∀O or rule A4): (∀x) A [x] ` A [t], where t is free for x in A [x]. (Universal instantiation.)

Abbreviation: Γ ` A and simply ` A Proof: are defined as you might expect. In this unit, unless Follows directly from (A4) by MP. otherwise specified, ‘`’ means ‘`PF ’. 1

PF stands for “Full Predicate calculus”, i.e., the calculus within a syntax including all possible constants, predicate letters and function letters. The “Pure Predicate calculus”, PP, is the same, but excluding all constants or function letters from the syntax. Mendelson gives these abbreviations in Chapter 3. There are predicate calculi that are neither pure nor full.

39

Result (EG, ∃I or E4): A [t, t] ` (∃x) A [t, x], where t is free for x in A [t, x]. (Existential generalization; the repetition of t here indicates that not all the occurrences of t need to change.)

Result (DT): If Γ ∪ {C } ` A and in the proof B1 , . . . , Bn of A from Γ ∪ {C }, no step is obtained by an application of Gen that both (i) is applied to a previous step that depends upon having C in the premise set, and (ii) uses a variable occurring free in C , then Γ ` C ⇒ A .

Proof: The following schema shows the object-language Proof: (1) Assume the complex antecedent of DT. We will steps necessary. show, using proof induction, that for every step 1. A [t, t] ` A [t, t] (Premise) Bi in the proof B1 , . . . , Bn of A from Γ∪{C }, 2. A [t, t] ` ¬¬A [t, t] 1 SL (DN) that it holds that Γ ` C ⇒ Bi . We are en3. ` (∀x) ¬A [t, x] ⇒ ¬A [t, t] A4 titled to assume that we have already gotten 4. A [t, t] ` ¬ (∀x) ¬A [t, x] 2, 3 SL (MT) Γ ` C ⇒ Bj for all steps Bj prior to Bi . 5. A [t, t] ` (∃x) A [t, x] 4 definition of ∃ e (2) Because Bi is a step in the proof of A from Γ ∪ {C }, the cases we have to consider are that: (a) Bi is a member of Γ, (b) Bi is C , (c) Bi is an axiom, (d) Bi follows from previous steps Result (Sub or Repl): A [x] ` A [t], where t in the proof by MP, and (e) Bi follows from a is free for x in A [x]. (The rule of substitution or previous step by an application of Gen obeying replacement of free variables.) the restriction mentioned above. We consider each case. Case (a). Bi is a member of Γ. Hence Γ ` Bi , and by SL, Γ ` C ⇒ Bi . Proof: Case (b). Bi is C . Then C ⇒ Bi is simply Schematically: C ⇒ C , a simple tautology, whence Γ ` C ⇒ 1. A [x] ` A [x] (Premise) Bi . 2. A [x] ` (∀x) A [x] 1 Gen Case (c). Bi is an axiom. Hence ` Bi and by 3. A [x] ` A [t] 2 UI e SL, ` C ⇒ Bi . A fortiori, Γ ` C ⇒ Bi . Case (d). Bi follows from previous members of the series by MP. Therefore there are previE. The Deduction Theorem in ous members of the series Bj and Bk such that Bj takes the form Bk ⇒ Bi . By the Predicate Logic inductive hypothesis, we already have both Γ ` C ⇒ Bk and Γ ` C ⇒ (Bk ⇒ Bi ). The deduction theorem does not hold generally in By SL, Γ ` C ⇒ Bi . the first-order predicate calculus PF, nor would we want it to. After all, in the semantics of predicate Case (e). Bi follows from a previous member of logic, it is not the case that F x ⇒ (∀x) F x, the series by an application of Gen obeying the and similarly in the system of deduction, while we restriction mentioned above. Therefore, there have F x ` (∀x) F x by Gen we should not have is a previous step Bj such that Bi takes the ` F x ⇒ (∀x) F x. We therefore state and prove form (∀x) Bj for some variable x. Because of the deduction theorem in the following restricted the restriction, either obtaining Bj did not deform: pend on having C in the premise set, or C does 40

not contain x free. In the first subcase, Γ ` Bj and hence by Gen, we have Γ ` (∀x) Bj , i.e., Γ ` Bi . By SL, then, Γ ` C ⇒ Bi , as usual. In the second subcase, we first note that we have Γ ` C ⇒ Bj by the inductive hypothesis. By Gen, we obtain Γ ` (∀x)(C ⇒ Bj ). Because C does not contain x free, as an instance of (A5) we have ` (∀x)(C ⇒ Bj ) ⇒ (C ⇒ (∀x) Bj ). By MP, Γ ` C ⇒ (∀x) Bj , i.e., Γ ` C ⇒ Bi . e

F.

The natural deduction rule of “Existential Instantiation” or “Existential Elimination” (EI, ∃O) recommends that from a given existentially quantified statement, one should infer the corresponding statement with the quantifier removed, and some new or unused constant in place of the variable. (Mendelson calls this “Rule C” for choice.) However, note that for most wffs A [x] it is not the case that

Obviously, in the proof establishing that F x ` (∀x) F x, Gen is applied to a step that both depends on F x and makes use of a variable occurring free in F x. (Note that invoking the “Sub” or “Repl” derived rule requires the same.) So we cannot conclude ` F x ⇒ (∀x) F x. However, such is not the case with the proof: 1. 2. 3. 4.

(∀x) F x ` (∀x) F x ` (∀x) F x ⇒ F y (∀x) F x ` F y (∀x) F x ` (∀y) F y

This we transform as follows: 1. 2. 3. 4. 5. 6.

` (∀x) F x ⇒ (∀x) F x ` (∀x) F x ⇒ F y ` (∀x) F x ⇒ ((∀x) F x ⇒ F y) ` (∀x) F x ⇒ F y ` (∀y)((∀x) F x ⇒ F y) ` (∀y)((∀x) F x ⇒ F y) ⇒ ((∀x) F x ⇒ (∀y) F y) 7. ` (∀x) F x ⇒ (∀y) F y

Doing without Existential Instantiation

(∃x) A [x] A [c]

for any constant c. Thus, e.g., we ought not have (∃x) F x ` F (c) for any constant c, even an unused one. Within an interpretation M, every constant c is assigned a fixed entity of the domain, viz., (c)M . There simply is no inferring that (c)M is in the extension of the predicate letter ‘F ’, viz., (‘F ’)M , simply on the assumption that something is. This so-called “rule” of natural deduction is logically inPremise valid, and should be done away with. Luckily, we (A4) don’t need it. Bearing in mind Exercise 2.32d from 1, 2 MP your homework, we have: 3 Gen (New-DR) If A does not contain x free, then (∀x)(B ⇒ A ) ` (∃x) B ⇒ A . With this, we have the following “conversion” from (Taut) a pseudo-proof that uses EI to a proof that doesn’t. A4 PSEUDO-PROOF: 2 SL (∃x) F x `∗ (∃x)(F x ∨ Gx) 1, 3 SL 4 Gen 1. (∃x) F x ` (∃x) F x Premise 2. (∃x) F x `∗ F a 1 EI/∃O A5 3. (∃x) F x `∗ F a ∨ Ga 2 SL 5, 6 MP 4. (∃x) F x `∗ (∃x)(F x ∨ Gx) 4 EG

Again, this is not the most eloquent proof. (Get- CONVERSION: ting line 4 by SL is silly, since it’s an axiom, and (∃x) F x ` (∃x)(F x ∨ Gx) indeed, the same one introduced at line 2.) However, that’s what case (d) called for and following 1. (∃x) F x ` (∃x) F x Premise the rote procedure always works. 2. F x ` F x Premise From here on out, you can use this to shorten 3. F x ` F x ∨ Gx 2 SL your proofs, but bear in mind the restrictions. You 4. F x ` (∃x)(F x ∨ Gx) 3 EG need to make sure that you don’t apply it when 5. ` F x ⇒ (∃x)(F x ∨ Gx) 4 DT you’ve used Gen on a variable appearing free in an 6. ` (∀x)(F x ⇒ (∃x)(F x ∨ Gx)) 5 Gen assumption! 7. ` (∃x) F x ⇒ (∃x)(F x ∨ Gx) 6 New-DR 41

8. (∃x) F x ` (∃x)(F x ∨ Gx)

1, 7 MP (3) It is obvious that if we expand Γ by adding {C1 [c1 ], . . . , Cm [cm ]}, we can prove We will show that whenever a pseudo-proof is posB without EI, or, in other words, Γ ∪ sible with EI/∃O, a conversion to a real proof similar {C1 [c1 ], . . . , Cm [cm ]} ` B, since we are simto the above is always possible. Stated very simply: ply adding the results of our EI steps to our replace every step arrived at by EI with a premise premise set. similar to it, except containing a variable not occurring free in any lines of the pseudo-proof instead (4) Because no application of Gen is made to a free variable of Cm [cm ] after it is introduced, by the of the “new” constant. Continue the proof as nordeduction theorem we have: mal, then push the new premise through with the deduction theorem. Generalize, and apply New-DR. Γ ∪ {C1 [c1 ], . . . , Cm−1 [cm−1 ]} ` Cm [cm ] ⇒ B Then, along with the existential statement to which you applied EI, and MP, you get the result. Let us (5) Pick some variable y that does not occur free anywhere in the series A1 , . . . , An (preferably state this result more formally. xm ). Replace cm with y everywhere in the proof Definition: Pseudo-derivability or `∗ : Γ `∗ B for Γ ∪ {C1 [c1 ], . . . , Cm−1 [cm−1 ]} ` Cm [cm ] ⇒ iff there is an ordered series of wffs A1 , . . . , An where B. The result will also be a proof. Notice An is B, and for each step Ai where 1 ≤ i ≤ n, eithat cm does not occur anywhere in the set ther: Γ ∪ {C1 [c1 ], . . . , Cm−1 [cm−1 ]}, since it was new (a) Ai is an axiom; when we introduced it. So there is no reason it (b) Ai is a member of Γ; must be used rather than the variable. (c) there is some previous step in series, Aj , such (6) Hence, Γ ∪ {C1 [c1 ], . . . , Cm−1 [cm−1 ]} ` that Aj takes the form (∃x) C [x], and Ai takes Cm [y] ⇒ B. the form C [c], where c is a constant that does (7) By Gen, Γ ∪ {C1 [c1 ], . . . , Cm−1 [cm−1 ]} ` not occur in any previous step of the pseudo(∀y)(Cm [y] ⇒ B). proof, nor in B, nor in any premise in Γ (i.e., Ai (8) Because y does not occur free in the proof, and was derived by the pseudo-rule, EI); B is An , B does not contain y free. Hence, (d) Ai follows from previous steps by MP; by (New-DR), Γ ∪ {C1 [c1 ], . . . , Cm−1 [cm−1 ]} ` (e) Ai follows from a previous step by Gen, but not (∃y) Cm [y] ⇒ B. using a variable that is free in some previous (9) Because Cm [cm ] was arrived at in the original step of the series C [c] arrived at by EI. pseudo-derivation by EI on some wff of the form (∃xm ) Cm [xm ], which either is (∃y) Cm [y], or can be used to get it, it must be that Γ ∪ {C1 [c1 ], . . . , Cm−1 [cm−1 ]} ` (∃y) Cm [y]. Thus, Result: If Γ `∗ B then Γ ` B. by MP, Γ ∪ {C1 [c1 ], . . . , Cm−1 [cm−1 ]} ` B. (The non-necessity of EI.) (10) By the same procedure described in steps (5)– (9), we can eliminate Cm−1 [cm−1 ] from the premise set, and so on, until we have eliminated everything except the members of Γ. Hence, Proof: Γ ` B. e

(1) Assume Γ ` B, and let A1 , . . . , An be the steps of the pseudo-proof. (2) Let (∃x1 ) C1 [x1 ], . . . , (∃xm ) Cm [xm ] be the members of the pseudo-proof to which EI is applied (in order), and let C1 [c1 ], . . . , Cm [cm ] be the results of these EI steps (in order). ∗

This proof shows us that we don’t need Existential Instantiation, very much like we don’t need Conditional Proof. However, because we have this metatheoretic result, we now know that whenever we are able to carry out a pseudo-proof making use of the rule, we could transform it into a proof that does not make use of the rule. Hence, it is

42

innocuous to pretend as if we do have such a rule. Ultimately, we also want to prove the completeness From here on out we allow ourselves to make use of of PF. We’ll get there, but we first need to prove a “Rule C” or “EI” in our proofs, even though strictly number of lemmas. speaking there is no such rule. We must be careful, however, to obey the restrictions for what counts as a “pseudo-derivation”, as defined in the previResult (Denumerability of wffs): The set of ous page. I like to mark the pseudo-steps with ‘`∗ ’ wffs of the language of predicate logic is denurather than ‘`’; you can stop using the ‘*’ when the merable, i.e., we can generate a one–one corredummy constant no longer appears. spondence between the set of natural numbers the set of wffs.

G.

Metatheoretic Results for System PF Proof: All wffs are built up of the simple signs: ‘(’, ‘ , ’, ‘)’, ‘⇒’, ‘¬’, ‘∀’, as well as the individual constants, variables, predicate letters and function letters.

Result (Soundness): For all wffs A , if ` A then A .

Proof: Every instance of the axiom schemata is logically valid. (This can be verified using semantic trees.) MP and Gen preserve logical validity. (In the case of Gen, note that a wff is logically valid iff it is satisfied by all sequences in all models. If an open wff is satisfied by all sequences in a model, then the corresponding wff with one of the variables bound with an initial quantifier will also be satisfied by all sequences in that model.) If ` A , then A is derivable from the axioms by some finite number of steps of MP and Gen, each preserving validity, and hence, A . e Result (Consistency): There is no wff A such that both ` A and ` ¬A .

Proof: Suppose for reductio ad absurdum that there is a wff A such that ` A and ` ¬A . By soundness, A and ¬A . In other words, every sequence in every model satisfies both A and ¬A . But a sequence satisfies ¬A iff it does not satisfy A , so any arbitrary sequence will both satisfy and not satisfy A , which is absurd. e

A. We define a function g that associates each simple sign with a different natural number. (1) Let g(‘(’) = 3, g(‘)’) = 5, g(‘, ’) = 7, g(‘¬’) = 9, g(‘⇒’) = 11, and g(‘∀’) = 13. (2) If c is a constant, and n is the number of its subscript (if c has no subscript, then n = 0), then depending on which letter of the alphabet is used, let k be either 1, 2, 3, 4 or 5 (1 for ‘a’, 2 for ‘b’, etc.), and let g(c) = 7 + 8(5n + k). (3) If x is a variable, and n is the number of its subscript (if x has no subscript, then n = 0), then depending on which letter of the alphabet is used, let k be either 1, 2, or 3 (1 for ‘x’, 2 for ‘y’ and 3 for ‘z’), and let g(x) = 13 + 8(3n + k). (4) If F is a function letter, and n is the number of its subscript (if F has no subscript, then n = 0) and m is the number of its superscript, then depending of which letter of the alphabet is used (‘f ’ through ‘l’), let k be one of 1 through 7, and let g(F ) = 1 + 8(2m 37n+k ). (5) If P is a predicate letter, and n is the number of its subscript (if P has no subscript, then n = 0) and m is the number of its superscript, then depending of which letter of the alphabet is used (‘A’ through ‘T ’), let k be one of 1 through 20, and let g(P) = 3 + 8(2m 320n+k ). We can now define the value of g for formulas in virtue of its value for simple signs.

43

(6) Let (p0 , p1 , p3 , p4 , . . . ) be the sequence of prime integers in order starting with 2. (There is no greatest prime.) Hence p0 = 2, p1 = 3, p3 = 5, and so on. (7) Let µ0 µ1 µ2 · · · µr be some string of signs from the syntax of predicate logic. It might be something ill-formed like “→)x∀a12 (”, or it might be a well-formed formula like “(∀x1 )(F (x1 ) ⇒ F (x1 ))”. Here, µ0 is the first sign in the string, µ1 is the second sign, and so on. For all such g(µ ) g(µ ) strings, let g(µ0 µ1 µ2 . . . µr ) = p0 0 · p1 1 · g(µ ) r) p2 2 · . . . · pg(µ . r (8) For a given expression A , the number g(A ) is called the Gödel number of A . Notice that because the Gödel numbers of the different simple signs are all different, so are the Gödel numbers of strings of signs, since for different strings, these numbers will have different prime factorizations. (9) Let N − {0} be the set of natural numbers greater than zero, C the subset of natural numbers that are Gödel numbers of wffs, and W the set of all wffs. Consider now the function w(x) from N − {0} onto C, whose value for x as argument is the xth smallest natural number that is the Gödel number of a wff of predicate logic. Consider also the function s(x) from C onto W, whose value for any Gödel number of a wff is that wff. Then the function s(w(n + 1)), is a 1–1 correspondence between the set of natural numbers and the set of wffs. e

Mendelson prefers to speak of different first-order theories. Remember than a first-order theory is an axiomatic system gotten by adding additional axioms to the axioms of PF. Really, talking about what theorems are provable in a given system K, where the additional axioms of K are the members of a set Γ is equivalent to speaking about what is provable in the barebones system PF beginning with Γ as a set of premises, since clearly: `K A iff Γ `PF A It’s really a matter of taste whether we view the proof as being about different theories or as being about different premise sets. I prefer to speak about premise sets, since that we don’t have to deal with any sense of “`” other than “`PF ”. But the differences are trivial. Before moving on, let us introduce some new metalinguistic definitions. Definition: A set of wffs Γ is said to be consistent iff there is no wff B such that both Γ ` B and Γ ` ¬B . (Otherwise, Γ is inconsistent.) Definition: A set of wffs Γ is said to be maximal iff for every closed wff B, either B ∈ Γ or ¬B ∈ Γ. Definition: A set of wffs Γ is said to be universal iff for every wff B[x] that contains at most x free, if it is the case for all closed terms t that B[t] ∈ Γ, then (∀x) B[x] ∈ Γ. We now move on to our next important Lemma on the way to completeness.

Corollary: The set of closed wffs is also denumerable.

Proof: As above, with the set of closed wffs (and their Gödel numbers) substituted for W (and C).

Result (LEL): If Γ is a consistent set of closed wffs, then there is a set of closed wffs ∆ such that: (a) Γ ⊆ ∆, (b) ∆ is consistent, (c) ∆ is maximal, and (d) ∆ is universal. (The Lindenbaum Extension Lemma.)

We’re inching closer to completeness. Before moving on, I want to make note of some differences be- Proof: tween my proof of completeness and Mendelson’s. (1) Assume that Γ is a consistent set of closed wffs. 2

If this assumption is not warranted, we could use another denumerable sequence of constants, e.g., the ‘b’s or the ‘c’s, or even add a new sequence of constants ‘o’, ‘o1 ’, ‘o2 ’, ‘o3 ’, . . . , to the language if need be.

44

(2) For convenience, we assume that none of the constants ‘e’, ‘e1 ’, ‘e2 ’, ‘e3 ’, . . . , etc., occur anywhere in the wffs in Γ.2 (3) By the denumerability of the set of closed wffs of the language, we can arrange them in an infinite sequence: A1 , A2 , A3 , . . . , etc. Making use of this sequence, let us recursively define an infinite sequence of sets of wffs: Γ0 , Γ1 , Γ2 , . . . , etc. As follows: a) Let Γ0 = Γ. b) We define Γn+1 in terms of Γn in one of the following three ways: (i) if Γn ∪ {An+1 } is consistent, then let Γn+1 = Γn ∪ {An+1 }; (ii) if Γn ∪ {An+1 } is inconsistent and An+1 does not take the form (∀x) B[x], then let Γn+1 = Γn ∪ {¬An+1 }; (iii) if Γn ∪ {An+1 } is inconsistent and An+1 does take the form (∀x) B[x], then let Γn+1 = Γn ∪ {¬An+1 } ∪ {¬B[ex ]}, where ‘ex ’ is the first member of the sequence ‘e’, ‘e1 ’, ‘e2 ’, ‘e3 ’, . . . , that does not occur in Γn . (4) Let ∆ be the union of all of the members of the Γ-sequence (i.e., Γ0 ∪ Γ1 ∪ Γ2 ∪ . . . etc.) (5) Obviously, Γ ⊆ ∆. This establishes part (a) of the consequent of the Lemma. (6) Every member of the Γ-sequence is consistent. We prove this by mathematical induction. Base step: Γ0 is Γ, and it is consistent ex hypothesi. Induction step: Suppose Γn is consistent. It follows that Γn+1 is consistent by a proof by cases: Case (i): Γn+1 = Γn ∪ {An+1 } and Γn ∪ {An+1 } is consistent, so Γn+1 is consistent. Case (ii): Γn+1 = Γn ∪ {¬An+1 }, and Γn ∪ {An+1 } is inconsistent. 45

• Hence there is some B such that both Γn ∪ {An+1 } ` B and Γn ∪ {An+1 } ` ¬B. • By SL, Γn ∪ {An+1 } ` B ∧ ¬B. • Because, An+1 is closed, it follow by DT, that Γn ` An+1 ⇒ (B ∧ ¬B). • But ` ¬(B ∧ ¬B) by SL. • By MT, Γn ` ¬An+1 . • Suppose for reductio that Γn+1 is inconsistent. • So there is some C such that Γn ∪ {¬An+1 } ` C and Γn ∪ {¬An+1 } ` ¬C . • By reasoning parallel to the above, by SL and the deduction theorem, we also have Γn ` An+1 . • So Γn is inconsistent. • This contradicts the inductive hypothesis. Hence Γn+1 is consistent. Case (iii): Γn+1 = Γn ∪{¬An+1 }∪{¬B[ex ]}, Γn ∪ {An+1 } is inconsistent and An+1 takes the form (∀x) B[x]. • By the same reasoning as in the previous case, Γn ` ¬An+1 . • Suppose for reductio that Γn+1 is inconsistent. • So there is some C such that Γn ∪ {¬An+1 } ∪ {¬B[ex ]} ` C and Γn ∪ {¬An+1 } ∪ {¬B[ex ]} ` ¬C . • By SL, Γn ∪ {¬An+1 } ∪ {¬B[ex ]} ` C ∧ ¬C . • By DT, Γn ∪ {¬B[ex ]} ` ¬An+1 ⇒ (C ∧ ¬C ). • By MP, Γn ∪ {¬B[ex ]} ` C ∧ ¬C . • Because An+1 is closed and it takes the form (∀x) B[x], the wff B[ex ] is also closed. • By DT, Γn ` ¬B[ex ] ⇒ (C ∧ ¬C ). • ` ¬(C ∧ ¬C ), and so by SL,

•

• • •

Γn ` B[ex ]. ‘ex ’ is not included in Γn . Hence, we can replace ‘ex ’ with the variable x throughout the proof for Γn ` B[ex ] and the result will also be a proof. Hence Γn ` B[x]. By Gen, Γn ` (∀x) B[x], which is the same as Γn ` An+1 . So Γn is inconsistent, which contradicts the inductive hypothesis. Hence Γn+1 is consistent.

b) c)

d) e)

(7) It follows from (6) that ∆ is consistent. a) Note that the Γ-sequence is constantly expanding: For all j and k such that j < k, Γj ⊆ Γk . Crudely, ∆ can be thought of as the upper limit of the expansion. b) So every finite subset of ∆ is a subset of some Γi for some suitably large i. c) However, every proof from ∆ has only a finite number of steps, and hence only makes use of a finite subset of ∆. d) If there were some B such that both ∆ ` B and ∆ ` ¬B, for some suitably large i, it would have to be that both Γi ` B and Γi ` ¬B. e) This is impossible because all the members (10) of the Γ-sequence are consistent by (6). f) Hence, ∆ is consistent. g) This establishes part (b) of the consequent of the Lemma. (8) ∆ is obviously maximal as well. a) All closed wffs are members of the sequence A1 , A2 , . . . , etc. b) For each Ai , either it or its negation is a member of Γi , and Γi ⊆ ∆. c) So for all closed wffs A1 , A2 , . . . , etc., either it or its negation is included in ∆. d) This establishes part (c) of the consequent of the Lemma. (9) Finally, ∆ is also universal. a) We show this by reductio. Suppose otherwise, i.e., suppose that there is a wff

f) g) h) i) j) k) l)

B[x] that contains at most x free, such that for all closed terms t, B[t] ∈ ∆, but (∀x) B[x] ∈ / ∆. (∀x) B[x] is closed, so because ∆ is maximal, it must be that ¬ (∀x) B[x] ∈ ∆. Because (∀x) B[x] is closed, it also follows that (∀x) B[x] is a member of the A sequence, i.e., (∀x) B[x] is An+1 for some number n. Obviously, however, since (∀x) B[x] ∈ / ∆, it follows that Γn+1 is not obtained from Γn using case (i). Nor was it obtained using case (ii), since An+1 is of the form (∀x) B[x]. This leaves case (iii), so Γn+1 is Γn ∪ {¬An+1 } ∪ {¬B[ex ]}. Hence for some x, ¬B[ex ] ∈ Γn+1 and so ¬B[ex ] ∈ ∆. But by our assumption, it holds for all closed terms t that B[t] ∈ ∆. All constants, ‘ex ’ included, are closed terms, so B[ex ] ∈ ∆. j) Hence, both ∆ ` ¬B[ex ] and ∆ ` B[ex ]. However, this is impossible, because we have already shown ∆ to be consistent. Our supposition has been shown to be impossible, hence ∆ is universal. This establishes part (d) of the consequent of the Lemma.

By suitably defining ∆, we have shown each of parts (a)-(d) of the consequent of the Lemma on the basis of the assumption of its antecedent. Hence, the Lemma is established. e

We’ve just shown that beginning with any consistent set of sentences, we can keep adding to it ad infinitum to get a maximally consistent set of sentences of the language. We pause again for a new definition: Definition: A model or interpretation M is a denumerable model iff its domain of quantification D is denumerable (as defined on p. 5).

46

“F 1 (a)” ∈ ∆, and will exclude ‘a’ just in case “¬F 1 (a)” ∈ ∆, and so on.

Result (MCL): If ∆ is a consistent, maximal, and universal set of closed wffs, then there is at least one denumerable model for ∆. (The Maximal Consistency Lemma.)

Proof: (1) Assume that ∆ is a consistent, maximal, and universal set of closed wffs. We can then describe a denumerable model M for ∆ using the following procedure. (2) Essentially, we’ll let all the closed terms of the language stand for themselves. (Another possible way of constructing a model would be to let each closed term stand for its Gödel number. However, let us proceed using the former method.) (3) Let the domain of quantification D of M be the set of closed terms of the language of first-order predicate logic. Note that there are denumerably many closed terms, so M is a denumerable model. (4) For each constant c, let (c)M be c itself. So, for example, (‘a’)M is ‘a’, (‘b12 ’)M is ‘b12 ’, etc. (5) For each function letter F with superscript n, let (F )M be that n-place operation on D which includes all ordered pairs of the form hht1 , . . . , tn i, F (t1 , . . . , tn )i i.e., the operation that has the closed term F (t1 , . . . , tn ) as value for ht1 , . . . , tn i as argument. Example: The operation (‘f 1 ’)M , which M assigns to the monadic function letter ‘f 1 ’, will contain such ordered pairs as h‘a’, “f 1 (a)”i, h‘b12 ’, “f 1 (b12 )”i, and h“f 1 (a)”, “f 1 (f 1 (a))”i, and so on. (6) For each predicate letter P with superscript n, let (P)M be that subset of Dn that includes the n-tuple ht1 , . . . , tn i iff the atomic wff P(t1 , . . . , tn ) is included in ∆. Example: The extension of ‘F 1 ’ under M, viz., (‘F 1 ’)M , will include the term ‘a’ just in case

(7) We must now prove that this interpretation M is a model for ∆, i.e., that for all wffs A , if A ∈ ∆, then M A . We will actually prove something stronger, i.e., that for all closed wffs A , A ∈ ∆ iff M A . (∆ only contains closed wffs, so we need not worry about open wffs.) We prove this by wff induction. Base step: A is a closed atomic formula. • Hence, A takes the form P(t1 , . . . , tn ) where P is a predicate letter with superscript n and t1 , . . . , tn are closed terms. • Because closed terms contain no variables, all sequences in M will associate each ti with itself. • So by the definition of satisfaction, all sequences in M will satisfy A iff ht1 , . . . , tn i ∈ (P)M . • By the definition of truth in an interpretation, M A iff ht1 , . . . , tn i ∈ (P)M . • By our characterization of M under (6) above, ht1 , . . . , tn i ∈ (P)M iff P(t1 , . . . , tn ) ∈ ∆. • So P(t1 , . . . , tn ) ∈ ∆ iff M A , i.e., A ∈ ∆ iff M A . Induction step: Assume as inductive hypothesis that it holds for all closed wffs B with fewer connectives than A , that B ∈ ∆ iff M B. We will then show that it holds for the complex closed wff A that A ∈ ∆ iff M A . This proceeds by a proof by cases on the makeup of A . Case (a): A takes the form ¬B, where B is also closed and has one fewer connective than A . • By the inductive hypothesis, B ∈ ∆ iff M B. • Because ∆ is consistent, if A ∈ ∆, then B ∈ / ∆. • Because ∆ is maximal, if B ∈ / ∆, then A ∈ ∆. • So B ∈ / ∆ iff A ∈ ∆. • Hence A ∈ ∆ iff not-M B. • Since B is closed, M ¬B iff not-M B.

47

• Hence, A ∈ ∆ iff M ¬B, i.e., A ∈ ∆ iff M A . Case (b): A takes the form B ⇒ C , where B and C are closed wffs with fewer connectives. First we prove that if A ∈ ∆ then M A . • Suppose A ∈ ∆. • Since ∆ is maximal and consistent, B ∈ ∆ or ¬B ∈ ∆, but not both, and likewise with C . • However, because B ⇒ C ∈ ∆ and ∆ is consistent, it cannot be that both B ∈ ∆ and ¬C ∈ ∆, so either ¬B ∈ ∆ or C ∈ ∆. • By the inductive hypothesis, B ∈ ∆ iff M B, and C ∈ ∆ iff M C . • By the same reasoning given for the previous case, ¬B ∈ ∆ iff M ¬B. • So either M ¬B or M C . • By the definition of satisfaction for conditionals, it follows that M B ⇒ C , i.e., M A . Now we prove that if M A then A ∈ ∆. • Suppose M A , i.e., M B ⇒ C. • Because B and C are closed, by the definition of satisfaction for conditionals, we have either M ¬B or M C . • By the inductive hypothesis, B ∈ ∆ iff M B and C ∈ ∆ iff M C . • Again, by the reasoning given for the previous case, ¬B ∈ ∆ iff M ¬B. • So either ¬B ∈ ∆ or C ∈ ∆. • Because ∆ is maximal, either B ⇒ C ∈ ∆ or ¬(B ⇒ C ) ∈ ∆. • If ¬(B ⇒ C ) ∈ ∆, then ∆ would be inconsistent, because ` ¬(B ⇒ C ) ⇒ B and 48

` ¬(B ⇒ C ) ⇒ ¬C . • So B ⇒ C ∈ ∆, i.e., A ∈ ∆. Putting these two results together, we get that A ∈ ∆ iff M A . Case (c): A takes the form (∀x) B[x], where B[x] contains fewer connectives, and B[x] contains at most x free. First we prove that if A ∈ ∆ then M A . • Suppose A ∈ ∆, i.e., (∀x) B[x] ∈ ∆. • Because B[x] contains at most x free, for all closed terms t, B[t] is a closed wff. • Because ∆ is maximal, for all closed terms t, either B[t] ∈ ∆ or ¬B[t] ∈ ∆. • However, since ∆ is consistent, it must be that for all closed terms t , B[t] ∈ ∆. • By the inductive hypothesis, for all closed terms t, M B[t]. • Because the domain of quantification for M is D, and D consists of the set of closed terms, and every closed term is interpreted as standing for itself, a sequence of M will satisfy B[x] iff it satisfies B[t] for that closed term t that gets assigned to x in that sequence. • Because all sequences of M satisfy B[t] for all closed terms t, all sequences of M will satisfy B[x], and hence all sequences of M will satisfy (∀x) B[x]. • Hence, M (∀x) B[x], i.e., M A . We now prove that if M A then A ∈ ∆. • Suppose M A , i.e., all sequences of M satisfy (∀x) B[x]. • Hence, all sequences of M satisfy B[x], regardless of what entity in the domain gets assigned to x.

• Because the domain of quantification for M is D, and D consists of the set of closed terms, and every closed term is interpreted as standing for itself, a sequence of M will satisfy B[x] iff it satisfies B[t] for that closed term t that gets assigned to x in that sequence. • So, for all closed terms t, M B[t]. • By the inductive hypothesis, it follows that, for all closed terms t, B[t] ∈ ∆. • Because ∆ is universal, it follows that (∀x) B[x] ∈ ∆, i.e., A ∈ ∆. Putting these together, we get that A ∈ ∆ iff M A . (8) By induction, regardless of A ’s length, A ∈ ∆ iff M A . So M is a model for ∆. This establishes the Lemma. e If we follow Mendelson and think of a model as a sort of possible world, a maximally consistent set of sentences can be thought of as a maximally descriptive yet consistent description of a possible world. This lemma says roughly that for every maximally descriptive consistent description of a possible world, one exists for which that description is true.

Result (The Modeling Lemma): A set of closed wffs Γ is consistent iff it has a denumerable model (i.e., there is at least one denumerable model for Γ).

Instead of proving (MLa) directly, we shall prove the following stronger thesis: (MLa)* If a set of closed wffs Γ has any model, then Γ is consistent. Proof of (MLa)* and (MLa): (1) Assume the opposite for reductio ad absurdum. I.e., assume that Γ is a set of closed wffs, and there is at least one model M for Γ, but that Γ is inconsistent. (2) Hence, there is some A such that Γ ` A and Γ ` ¬A . (3) This means that A and ¬A are each derivable from the members of Γ along with the axioms of PF by zero or more applications of MP and Gen. (4) All the axioms of PF are logically valid, and hence true in M. (5) Similarly, all the members of Γ are true in M by hypothesis. (6) However, both MP and Gen preserve truth in an interpretation, so it must be that both M A and M ¬A . (7) By the definition of truth in an interpretation, every sequence in M satisfies both A and ¬A . (8) However, a sequence satisfies ¬A iff it does not satisfy A , so any arbitrary sequence of M will both satisfy and not satisfy A , which is absurd. (9) Hence (MLa)* must be true. Regardless of the size of the domain, any set of closed wffs that can be modeled is consistent. This includes those with denumerable models, so (MLa)* entails (MLa).

Proof of (MLb): (1) Assume that Γ is a consistent set of closed wffs. (2) By LEL, there is a set of closed wffs ∆ such that: (a) Γ ⊆ ∆, (b) ∆ is consistent, (c) ∆ is maximal, Proof: and (d) ∆ is universal. This biconditional breaks down into: (3) By MCL, there is an interpretation M that is a denumerable model for ∆. (MLa) If a set of closed wffs Γ has a denumerable (4) So for all closed wffs A , if A ∈ ∆, then M A . model, then Γ is consistent. (5) Because Γ ⊆ ∆, for all closed wffs A , if A ∈ Γ (MLb) If a set of closed wffs Γ is consistent, then then A ∈ ∆. Γ has a denumerable model. (6) So for all closed wffs A , if A ∈ Γ then M A . 49

(7) Therefore, M is also a denumerable model for f) Hence, {¬B} is consistent. Γ. e (6) By the Modeling Lemma, {¬B} has a denumerable model. Hence there is an interpretation M The following is not needed for completeness, but such that M ¬B. is an interesting and surprising result of (MLa)* and (7) But we also have B, and hence B. M (MLb). (8) By the definition of truth in an interpretation, every sequence in M satisfies both B and ¬B. (9) However, a sequence satisfies ¬B iff it does not Corollary (The Skolem-Löwenheim Theorem): satisfy B, so any arbitrary sequence of M will If a set Γ of closed wffs of first-order predicate both satisfy and not satisfy B, which is absurd. logic has any sort of model, then it has a (10) We’ve shown our supposition to be impossible, denumerable model. thereby establishing completeness indirectly. e

Proof: By the stronger (MLa)*, if Γ has any sort of model, then it is consistent. By (MLb), if it is consistent, it has a denumerable model.

Corollary: If Γ A then Γ ` A .

Finally we turn to completeness: Proof: Follows from minor modifications on the above proof. e

Result (Completeness): For all wffs A , if A then ` A .

Proof: (1) Suppose A , but suppose for reductio ad absurdum that it is not the case that ` A . (2) Let B be the universal closure of A , i.e., if the free variables of A are x1 , . . . , xn , then B is (∀x1 ) . . . (∀xn ) A . (3) Universal closure preserves truth in an interpretation, so B. (4) B has no free variables left, so B is closed. (5) The singleton set containing ¬B alone, {¬B}, must be consistent. Here’s a proof of this by reductio: a) Suppose there were some C such that {¬B} ` C and {¬B} ` ¬C . b) By SL, {¬B} ` C ∧ ¬C . c) Since B is closed, so is ¬B, and so by DT, we have ` ¬B ⇒ (C ∧ ¬C ). d) But ` ¬(C ∧ ¬C ), so by SL, ` B. e) But A is derivable from B by universal instantiation, so it would follow that ` A , which contradicts our earlier assumption.

Unfortunately, this proof does not, as in the Propositional Calculus (System L), provide a “recipe” for constructing a proof of any given logical truth in PF. We have simply proven that any given logical truth must be derivable, because if it were not, there would exist a countermodel to its logical validity. The completeness of the first-order predicate calculus was first proven by Kurt Gödel in 1930, and so this is sometimes called “Gödel’s Completeness Theorem,” although his way of proving it was actually very different from ours. (It was first proven our way by Leon Henkin in 1949.) However, Gödel is much more famous for his incompleteness theorems than his completeness theorem.

H.

Identity Logic

To add identity to first-order predicate logic, we simply pick a 2-place predicate letter—say ‘I 2 ’—to use to stand for the identity relation, and make the appropriate additions to our logical system.

50

Syntax Result (Ref=): `PF= t = t, for any term t. Officially, the syntax is unchanged. We already had 2 (Reflexivity of identity.) ‘I ’ as a predicate letter. We are simply fixing its intended meaning. However, it is useful to introduce abbreviations such as the following. Proof: Direct from (A6) by universal instantiation. Abbreviations: t = u abbreviates I 2 (t, u) t 6= u abbreviates ¬I 2 (t, u) (∃1 x) A [x] abbreviates (∃x) A [x] ∧ [(∀x) (∀y)(A [x] ∧ A [y] ⇒ x = y)], where y is the first variable not occurring in A [x]. (∃n+1 x) A [x] abbreviates (∃y)(A [y] ∧ (∃n x)(x 6= y ∧ A [x])), where y is the first variable not occurring in A [x].

Result (LL/Sub=): t = u, A [t, t] `PF= A [t, u], for all terms t and u that are free for all variables in A [x, y], and where A [t, u] arises from A [t, t] by replacing some or all occurrences of t with u. (Leibniz’s law)

The above definition defines (∃2 x) A [x] in terms of (∃1 x) A [x] and (∃3 x) A [x] in terms of (∃2 x) A [x] and so on. (We could do even better by beginning with (∃0 x) A [x] for (∀x) ¬A [x].) Because the syntax is unchanged, the set of wffs and the set of closed wffs remain denumerable.

Proof: Derived from (A7) by Gen on both x and y, then universal instantiation to t and u, and MP × 2. It may be necessary to do some bound variable juggling, but this is no problem.

System of Deduction

Result (Sym=): t = u `PF= u = t, for any terms t and u. (Symmetry of identity.)

Definition: The first-order predicate calculus with identity, or System PF = is the system obtained from PF by adding the following axiom and axiom schema: Proof: (A6) (∀x) x = x t = u, t = t `PF= u = t is an instance of LL, and (A7) x = y ⇒ (A [x, x] ⇒ A [x, y]), we have `PF= t = t by reflexivity. for all instances in which y is free for x in A [x, x], and A [x, y] is obtained from A [x, x] by replacing some, but not necessarily all, free occurrences of x with y. Result (Trans=): t = u, u = v `PF= t = v for any terms t, u and v. (Transitivity of Definition: A first-order theory with identity identity.) [equality] is any first-order theory that has all theorems of PF = formulable in its syntax as theorems (i.e., it is a theory built on PF in which (A6) is either Proof: an axiom or theorem, and all instances of (A7) are u = v, t = u `PF= t = v is an instance of LL. either axioms or theorems.) This includes PF= itself. The deduction theorem and replacement for exisSome easy theorems and derived rules: tential instantiation are unchanged by the addition. 51

Semantics for Identity Logic We intend ‘I 2 ’ to stand for the identity relation. So: Definition: An interpretation M is a normal model iff, for M, (‘I 2 ’)M is the set of all and only ordered pairs of the form ho, oi of objects o included in the domain of quantification D of M. Definition: A wff A is identity-valid iff it is true for all normal models. I abbreviate this as: = A . Note that all wffs that are logically valid simpliciter ( A ) are identity-valid (= A ), but not viceversa. Note that (A6) and all allowed instances of (A7) are identity-valid. (Proving this is homework.)

Some Important Metatheoretic Results for PF = and other Theories with Identity Result (Soundness): For all wffs A , if `PF= A then = A .

(Proof is left as part of an exam question.)

Result (Consistency): There is no wff A such that `PF= A and `PF= ¬A .

(Proof is left as part of an exam question.)

Result: Any first-order theory K in which (A6) is an axiom or theorem, and all instances of (A7) in which A [x, x] is an atomic formula with no individual constants are either axioms or theorems, is a first-order theory with identity. (The possibility of reducing (A7).)

Proof: (1) Assume that K is a first-order theory in which (A6) is an axiom or theorem, and all instances of (A7) in which A [x, x] is an atomic formula with no individual constants are either axioms or theorems. We shall prove that all instances of (A7) can be derived regardless of the complexity of A [x, x], by wff induction. (2) Base step: A [x, x] is atomic. By hypothesis, (A7) is a theorem of K for all cases in which A [x, x] is an atomic formula with no individual constants. All others can be obtained by Gen and universal instantiation. (3) Induction step: We assume that all instances of (A7) hold for instances of A [x, x] that are simpler than a given instance, and need to show that for the given instance of A [x, x] (A7) holds as well. This proceeds by a proof by cases of the possible make-up of the instance of A [x, x] in question. Case (a): A [x, x] takes the form ¬B[x, x]. i) Let C [x] be B[z, x]. Clearly, C [x] is the same complexity as B[x, x]. ii) By the inductive hypothesis, we have this instance of (A7): `K x = y ⇒ (C [x] ⇒ C [y]). iii) By manipulating variables with Gen and UI, we get: `K y = x ⇒ (C [y] ⇒ C [x]). iv) Because we have atomic instances, we have: `K x = y ⇒ (x = x ⇒ y = x), and so with (A6) and SL we get: `K x = y ⇒ y = x. v) So by SL: `K x = y ⇒ (C [y] ⇒ C [x]). vi) That is: `K x = y ⇒ (B[z, y] ⇒ B[z, x]). vii) By Gen on z and UI to x we get: `K x = y ⇒ (B[x, y] ⇒ B[x, x]). viii) By SL: `K x = y ⇒ (¬B[x, x] ⇒ ¬B[x, y]), i.e., `K x = y ⇒ (A [x, x] ⇒ A [x, y]). Case (b): A [x, x] is of form (B[x, x] ⇒

52

C [x, x]). i) By the inductive hypothesis, we have: `K x = y ⇒ (C [x, x] ⇒ C [x, y]). ii) By the same procedure described in the previous case: `K x = y ⇒ (B[x, y] ⇒ B[x, x]). iii) By MP: x = y `K B[x, y] ⇒ B[x, x]. iv) Similarly: x = y `K C [x, x] ⇒ C [x, y]. v) Clearly, if we add B[x, x] ⇒ C [x, x] as a further premise, we could complete a syllogism, i.e.: x = y, B[x, x] ⇒ C [x, x] `K B[x, y] ⇒ C [x, y]. vi) By DT × 2, we have: `K x = y ⇒ ((B[x, x] ⇒ C [x, x]) ⇒ (B[x, y] ⇒ C [x, y])), which is: `K x = y ⇒ (A [x, x] ⇒ A [x, y]). Case (c): A [x, x] takes the form (∀z) B[x, x, z]. i) By the inductive hypothesis: `K x = y ⇒ (B[x, x, z] ⇒ B[x, y, z]). ii) By MP: x = y `K B[x, x, z] ⇒ B[x, y, z]. iii) Hence, by UI and MP: x = y, (∀z) B[x, x, z] `K B[x, y, z]. iv) By Gen: x = y, (∀z) B[x, x, z] `K (∀z) B[x, y, z]. v) By DT × 2, we have: `K x = y ⇒ ((∀z) B[x, x, z] ⇒ (∀z) B[x, y, z]), which is: `K x = y ⇒ (A [x, x] ⇒ A [x, y]). (4) Hence, regardless of the complexity of A [x, x], we have the appropriate instance of (A7). Therefore, all instances of (A7) are theorems of K. (5) K is a first-order theory (one built by expanding PF by adding proper axioms). Hence K has all instances of (A1)–(A5) as axioms. (A6) is either

an axiom or a theorem of K, and all instances of (A7) are theorems. Hence all axioms of PF= are theorems of K. Moreover, K has all the inference rules of PF, and hence all the inference rules of PF= . (6) All theorems of PF= are derived from (A1)–(A7) by the inference rules. Therefore, all theorems of PF= are theorems of K. (7) Therefore, K is a first-order theory with identity. e

Result: If M is a model for the set of axioms of PF = , then there is a normal model M* such that for all wffs A , M A iff M∗ A . (Contracting Models to Normal Models.)

Proof: (1) Assume that M is a model for the set of axioms of PF= . (2) It does not follow from this that M is a normal model, i.e., it does not follow that (‘I 2 ’)M only consists of ordered pairs of the form ho, oi of objects o included in the domain of quantification D of M. However, we do know the following things about (‘I 2 ’)M : a) Because M makes (A6) true, (‘I 2 ’)M must be a reflexive relation in the set-theoretic sense. b) Because M makes the instance of (A7), x = y ⇒ (x = x ⇒ y = x), true, and because it is reflexive (so all sequences satisfy x = x), (‘I 2 ’)M must also be a symmetric relation in the set-theoretic sense. c) Because M makes the instance of (A7), x = y ⇒ (x = z ⇒ y = z), true, and because it is symmetric, (‘I 2 ’)M must also be a transitive relation in the set-theoretic sense. d) So, (‘I 2 ’)M must be an equivalence relation. e) Let us call this equivalence relation E. For any object o in the domain D of M , [o]E is the E-equivalence class on o; i.e., the set of p such that ho, pi ∈ E. (3) We can then construct a normal model M* in the following way.

53

a) Let the domain of quantification for M*, viz., D∗ , be the set of all E-equivalence classes formed from members of D. I.e., if D is {o1 , o2 , o3 , . . . } then let D∗ be {[o1 ]E , [o2 ]E , [o3 ]E , . . . }. ∗ b) For all constants c, let (c)M be [(c)M ]E . c) For all function letters F with super∗ script n, let (F )M be the n-place operation on D∗ that includes the ordered pair hh[o1 ]E , . . . , [on ]E i, [oq ]E i iff (F )M includes hho1 , . . . , on i, oq i. d) For all predicate letters P with su∗ perscript n, let (P)M be the subset of D∗n that includes the ordered ntuple h[o1 ]E , . . . , [on ]E i iff (P)M includes ho1 , . . . , on i. (4) It follows that M* is normal. Because (‘I 2 ’)M is ∗ E, (‘I 2 ’)M is the set of ordered pairs that contains h[o]E , [p]E i iff ho, pi ∈ E. Because E is an equivalence relation, ho, pi ∈ E iff [o]E = [p]E . (5) We now prove that M A iff M∗ A for all wffs A . Note that this will be the case when A is satisfied by all sequences in one interpretation when it is satisfied by all sequences in the other. Each sequence s of M of the form: o1 , o2 , o3 , . . . corresponds to a sequence s0 of M* of the form [o1 ]E , [o2 ]E , [o3 ]E , . . . . For each such sequence pair, s and s0 , it is apparent that for any term t, s0 (t) is [s(t)]E . We now prove that for all wffs A , for all such sequence pairs s and s0 , sequence s (of M) will satisfy A iff the corresponding sequence s0 (of M*) satisfies A , by wff induction.

Case (a): A takes the form ¬B. By the inductive hypothesis, s satisfies B iff s0 satisfies B. By the definition of satisfaction, s satisfies A iff s does not satisfy B, and the same holds for s0 . So, s does not satisfy A iff s0 does not satisfy A , and hence s satisfies A iff s0 satisfies A . Case (b): A takes the form B ⇒ C . By the inductive hypothesis, s satisfies B iff s0 satisfies B and s satisfies C iff s0 satisfies C . s satisfies A iff it either does not satisfy B or it does satisfy C , and similarly for s0 , so s satisfies A iff s0 satisfies A . Case (c): A takes the form (∀x) B[x]. Now, s will satisfy A iff all sequences ς in M differing from s at most with regard to what entity gets assigned to x satisfy B[x], and s0 will satisfy A iff all sequences ς 0 in M* differing from s0 at most with regard to what entity gets assigned to x satisfy B[x]. Each such sequence ς in M corresponds to such a sequence ς 0 of M* and viceversa. By the inductive hypothesis, it will hold that ς satisfies B[x] iff ς 0 satisfies B[x], so s satisfies A iff s0 satisfies A . So regardless of the length of A , for such a sequence pair, s and s0 , s satisfies A iff s0 satisfies A . Such sequence pairs will exhaust the sequences of M and M*, so it follows that for all wffs A , M A iff M∗ A . Base step: A is atomic, i.e., it takes (6) Obviously, it follows from this that M* is also a the form P(t1 , . . . , tn ). Then s satisfies model for the set of axioms of PF= . This estabA iff hs(t1 ), . . . , s(tn )i ∈ (P)M , and s0 lishes the result. e satisfies A iff h[s(t1 )]E , . . . , [s(tn )]E i ∈ ∗ (P)M . By our description of M* above, ∗ h[s(t1 )]E , . . . , [s(tn )]E i ∈ (P)M iff Result (Completeness): For all wffs A , hs(t1 ), . . . , s(tn )i ∈ (P)M , so s satisfies A 0 if = A then `PF= A . iff s satisfies A . Induction step: Assume as inductive hypothesis that it holds for all wffs B simpler than A that, for all such sequence pairs s and s0 , s This proof is left as an exam question, but it resatisfies B iff s0 satisfies B. We must show that quires the above possibility of contracting models it holds for A as well. Proof by cases. to normal models. 54

UNIT 3 PEANO ARITHMETIC AND RECURSIVE FUNCTIONS

A.

The System S

predicate theory, and adding a few additional axioms. Definition: Stated in English, the Peano postuAny system that has the same mathematical lates (also called the Peano axioms or the Peano- theorems is called a Peano arithmetic. Dedekind Axioms) are the following five principles: (P1) Zero is a natural number. (P2) Every natural number has a successor which is Mendelson’s System S: Syntax also a natural number. The new system does not require the addition of (P3) Zero is not the successor of any natural number. anything new to the syntax of standard first-order (P4) No two natural numbers have the same succes- predicate logic. In fact, we give the system S a less sor. complicated syntax than PF by making the follow(P5) If something is both true of zero, and true of the ing restrictions: successor of a number whenever it is true of that A. There is only one predicate letter: number, then it is true of all natural numbers (i.e., the principle of mathematical induction). I2 Your book discusses the history of these principles and again, instead of I 2 (t, u) we write (t = u). in more detail, but in the late 19th and early 20th century it was widely believed that all the truths B. There is only one constant: of number theory (pure arithmetic) could be derived as theorems from these principles. But so a long as they are simply stated in English, and not introduced within a precisely formulated logical but as an alternative, we use numeral ‘0’. calculus, this is a difficult supposition to test. If these are the only truths we take as axiomatic, C. There are three function letters: to get truths regarding addition, multiplication, etc., we’d also need certain principles of set theory, and f22 f1 f12 the proper axiomatization of set theory is still very controversial. However, without set theory, we can but instead of writing f 1 (t), we write t0 , and obtain more or less the same results by taking the instead of writing f12 (t, u), we write (t + u), notions of addition and multiplication as primitive and functions within a more or less standard first-order instead of writing f22 (t, u), we write (t · u). 55

D. There are still denumerably many variables, as Axiomatization of S before. Hence, all atomic wffs are identity statements. The system is built upon the predicate calculus; bearing in mind the restrictions on the syntax menOther wffs are built from atomic ones as before. tioned above, its axioms include instances of axiom schemata (A1) through (A5) of the predicate calculus. Its only two primitive inference rules are Gen Mendelson’s System S: Semantics and MP. (The deduction theorem, etc., holds in the The system S has a single intended interpretation, same form.) We add the following: its so-called “standard model”. (Although it does have other models.)

Definition: The standard model for S can be characterized as follows: 1. The domain of quantification is the set of natural numbers {0, 1, 2, 3, . . .}. 2. The interpretation of the constant ‘a’ is the number zero. 3. The interpretation of the predicate letter ‘I 2 ’ is the identity relation on the set of natural numbers. 4. The interpretation of the function letter ‘f 1 ’ is the set of ordered pairs in which the second element is the number one greater than the first element, e.g., h0, 1i, h1, 2i and h2, 3i, etc. The interpretation of ‘f12 ’ is the set of ordered pairs in which the first element is itself an ordered pair of natural numbers, and the second element is the sum of those two numbers, e.g., hh2, 3i, 5i, etc. The interpretation of ‘f22 ’ is the set of ordered pairs in which the first element is itself an ordered pair of natural numbers, and the second element is the product of those two numbers, e.g., hh2, 3i, 6i, etc. The axioms of system S (listed below) are true in the standard model. Because S has a model, by the Modeling Lemma, it is consistent. However, because the proof of the Modeling Lemma requires mathematical methods such as the principle of mathematical induction in the metalanguage, and system S contains object-language translations of these very principles, this ‘proof’ of consistency appears somewhat circular. It is customary therefore to state this result in a somewhat weaker way: e.g., assuming that ordinary mathematical reasoning (as reflected in the metalanguage) is consistent, so is system S.

Definition: A Proper Axiom of S is any one of (S1)–(S8) listed below, or any instance of (S9). (S1) (S2) (S3) (S4) (S5) (S6) (S7) (S8) (S9)

x = y ⇒ (x = z ⇒ y = z) x = y ⇒ x 0 = y0 0 6= x0 x 0 = y0 ⇒ x = y x+0=x x + y0 = (x + y)0 x·0=0 x · y0 = (x · y) + x A [0] ⇒ ((∀x)(A [x] ⇒ A [x0 ]) ⇒ (∀x) A [x])

Result: S is a first-order theory with identity.

Although (A6) and (A7) of the predicate calculus with identity are not taken as axioms, they are derivable as theorems in this system from the above. It will be recalled from our last unit that we proved that if (A6) is a theorem, and those instances of (A7) involving atomic formulas are theorems, then other instances of (A7) follow. Some of the principles necessary for getting instances of (A7) involving atomic wffs are proved below, some are proved in the book, and some are left as homework.

56

4. 5. 6. 7.

y = x ⇒ (y = z ⇒ x = z) 3 UI×3 x=y⇒y=x (Sym=T) UI×2 x = y ⇒ (y = z ⇒ x = z) 4, 5 SL (∀x) (∀y) (∀z)(x = y ⇒ (y = z ⇒ x = z)) 6 Gen×3 (Trans=) follows by UI×3, and MP×2. e

Result: The theorems and derived rules governing reflexivity, symmetry and transitivity of identity hold in S, i.e.: (A6) `S (∀x) x = x (Ref=) `S t = t, for any term t. (Sym=T) `S (∀x) (∀y)(x = y ⇒ y = x). (Sym=) t = u `S u = t, for any terms t, u. (Trans=T) `S (∀x) (∀y) (∀z)(x = y ⇒ (y = z ⇒ x = z)) (Trans=) t = u, u = s `S t = s, for all terms t, u, s.

`S `S `S `S

Result (MI): A [0], (∀x)(A [x] ⇒ A [x0 ]) `S (∀x) A [x], for any variable x and wff A [x]. (Derived rule for mathematical induction.)

Proof: This follows from (S9) and MP×2.

e

(From here on out I shall often ignore the fact that (S1)–(S9) are stated with particular variables, rather than schematically for all variables or all terms, and Proof: treat, e.g., anything of the form x = x + 0 as if it Demonstration of (A6): counted as (S5); obviously it takes only Gen and UI 1. `S x + 0 = x (S5) to move from ‘x’ to any other variable x.) 2. `S x = y ⇒ (x = z ⇒ y = z) (S1) 3. `S (∀x) (∀y) (∀z)(x = y ⇒ (x = z ⇒ y = z)) 2 Gen×3 Result (Sub+): `S (∀x) (∀y) (∀z)(x = y ⇒ 4. `S x + 0 = x ⇒ (x + 0 = x ⇒ x = x) 3 UI×3 x + z = y + z). (Substitution of identicals for 5. `S x = x 1, 4 MP×2 addition.) 6. `S (∀x) x = x 5 Gen (Ref=) follows directly from (A6) by UI. For (Sym=T): 1. `S x = y ⇒ (x = z ⇒ y = z) (S1) 2. `S (∀x) (∀y) (∀z)(x = y ⇒ (x = z ⇒ y = z)) 1 Gen×3 3. `S x = y ⇒ (x = x ⇒ y = x) 2 UI×3 4. `S x = x ⇒ (x = y ⇒ y = x) 3 SL 5. `S x = x Ref= 6. `S x = y ⇒ y = x 4, 5 MP 7. `S (∀x) (∀y)(x = y ⇒ y = x) 6 Gen×2 (Sym=) follows by UI×2 and MP. (Trans=T): 1. `S (∀x) (∀y) (∀z)(x = y ⇒ (x = z ⇒ y = z)) (S1) Gen×3 2. `S x1 = y1 ⇒ (x1 = z1 ⇒ y1 = z1 ) 1 UI×3 3. `S (∀x1 ) (∀y1 ) (∀z1 )(x1 = y1 ⇒ (x1 = z1 ⇒ y1 = z1 )) 2 Gen×3

Proof: 1. x = y `S x = y (Premise) 2. `S x = x + 0 (S5) 3. `S y = y + 0 (S5) 4. x = y `S x = y + 0 1, 3 Trans= 5. `S x + 0 = x 2 Sym= 6. x = y `S x + 0 = y + 0 4, 5 Trans= 7. `S x = y ⇒ x + 0 = y + 0 6 DT 8. x = y ⇒ x + z = y + z `S x = y ⇒ x + z = y+z (Premise) 9. x = y ⇒ x+z = y +z, x = y `S x+z = y +z 1,8 MP 0 0 10. `S (∀x) (∀y)(x = y ⇒ x = y ) (S2) Gen×2 11. `S x + z = y + z ⇒ (x + z)0 = (y + z)0 10 UI×2 12. x = y ⇒ x + z = y + z, x = y `S (x + z)0 = (y + z)0 9, 11 MP 0 0 13. `S x + z = (x + z) (S6)

57

14. `S y + z 0 = (y + z)0 13 Gen, UI 15. x = y ⇒ x+z = y+z, x = y `S x+z 0 = y+z 0 12, 13, 14 Trans=, Sym= 16. `S (x = y ⇒ x + z = y + z) ⇒ (x = y ⇒ x + z 0 = y + z 0 ) 15 DT×2 17. `S (∀z)[(x = y ⇒ x + z = y + z) ⇒ (x = y ⇒ x + z 0 = y + z 0 )] 16 Gen 18. `S (∀z)(x = y ⇒ x + z = y + z) 7, 17 MI 19. `S (∀x) (∀y) (∀z)(x = y ⇒ x + z = y + z) 18 Gen×2 e The proofs of the following are in the book:

such as the following, which I’m calling system F. This system is not Frege’s system, but a crude oversimplification thereof.

Syntax 1. We add to the syntax of predicate logic the following subnective, which yields a term for any variable x and wff A [x]. {x|A [x]} This is read, “the set of all x such that A [x]”. 2. All occurrences of x in a term of the form {x|A [x]} are considered to be bound. 3. We also choose a two-place predicate letter E 2 to use for the membership relation. An expression of the form (t ∈ u) is shorthand for E 2 (t, u), and (t ∈ / u) is shorthand for ¬E 2 (t, u).

Result: Analogues of (S5), (S6) and (Sub+), flipped. `S (∀x) x = 0 + x `S (∀x) (∀y) x0 + y = (x + y)0 `S (∀x) (∀y) (∀z)(x = y ⇒ z + x = z + y)

Axiomatization

Result (Com+/Assoc+): Commutativity and associativity of addition. `S (∀x) (∀y) x + y = y + x `S (∀x) (∀y) (∀z)((x + y) + z) = (x + (y + z))

Tonight’s homework includes proving analogous results for multiplication. (I.e., you’ll prove substitution within multiplication, (Sub·), flipped versions of (S7), (S8) and (Sub·), as well as (Com·).) We get from these that S is a theory with identity, as substitution is allowed in all contexts.

B.

The Quasi-Fregean System F

Suppose you wanted to construct an axiomatic system for mathematics, but did not want to take ‘·’, ‘+’, ‘0 ’, and ‘0’ as primitive, and instead wanted to define them. One initially attractive way would be to do this within an axiomatic set theory, in a way

The system F contains analogues of axiom schemata (A1) through (A7) of the predicate calculus with identity (PF= ), the inference rules MP and Gen, and the following two additional axiom schemata: (A8) (∀x)(A [x] ⇔ x ∈ {y|A [y]}), for all cases in which the variable y is free for x in A [x]. (A9) (∀x)(x ∈ {y|A [y]} ⇔ x ∈ {z|B[z]}) ⇒ {y|A [y]} = {z|B[z]}, where {y|A [y]} and {z|B[z]} do not contain x free.

Some Intuitive Definitions / Abbreviations (In the following, x, y and z are the first three variables that do not occur in the terms t and u; note also that some of these are abbreviations of terms, others are abbreviations of wffs.) Set theoretic definitions (t ∩ u) for {x|x ∈ t ∧ x ∈ u} (t ∪ u) for {x|x ∈ t ∨ x ∈ u} t for {x|x ∈ / t} (t ⊆ u) for (∀x)(x ∈ t ⇒ x ∈ u) V for {x|x = x} ∅ for {x|x 6= x}

58

{t} for {x|x = t} {t, u} for {x|x = t ∨ x = u} ht, ui for {{t}, {t, u}} (t × u) for {x| (∃y) (∃z) x = hy, zi ∧ y ∈ t ∧ z ∈ u)} Dom(t) for {x| (∃y)(hx, yi ∈ t)} Rng(t) for {x| (∃y)(hy, xi ∈ t)} F ld(t) for (Dom(t) ∪ Rng(t)) Inv(t) for {x| (∃y) (∃z)(x = hy, zi ∧ hz, yi ∈ t)} F nct(t) for (∀x) (∀y) (∀z)(hx, yi ∈ t ∧ hx, zi ∈ t ⇒ y = z) Biject(t) for (F nct(t) ∧ F nct(Inv(t))) Mathematical definitions (t ∼ = u) for (∃x)(Biject(x) ∧ Dom(x) = t ∧ Rng(x) = u) t} Card(t) for {x|x ∼ = 0 for Card(∅) t0 for {x| (∃y)(y ∈ x ∧ x ∩ {y} ∈ t)} 1 for 00 2 for 10 3 for 20 [. . . and so on for other numerals] N for {x| (∀y)(0 ∈ y ∧ (∀z)(z ∈ y ⇒ z 0 ∈ y) ⇒ x ∈ y)} F in(t) for (∃x)(x ∈ N ∧ t ∈ x) Inf in(t) for ¬F in(t) Denum(t) for (t ∼ = N) Ctbl(t) for (F in(t) ∨ Denum(t)) (t ≤ u) for (∃x) (∃y) (∃z)(x ∈ t ∧ y ∈ u ∧ z⊆y∧x∼ = z) (t < u) for ((t ≤ u) ∧ ¬(u ≤ t)) §(t) for {x|x ∈ N ∧ x < t} (t + u) for Card((§(t) × {0}) ∪ (§(u) × {1})) (t · u) for Card(§(t) × §(u))

Results With these definitions in place, one can derive Peano’s postulates as theorems in the following forms: (P1) `F 0 ∈ N (P2) `F x ∈ N ⇒ x0 ∈ N (P3) `F x ∈ N ⇒ 0 6= x0 (P4) `F x ∈ N ∧ y ∈ N ⇒ (x0 = y 0 ⇒ x = y) (P5) `F A [0] ∧ (∀x)(A [x] ⇒ A [x0 ]) ⇒ (∀x)(x ∈ N ⇒ A [x]) As well as analogues of Mendelson’s other axioms:

(S5F) (S6F) (S7F) (S8F)

`F x ∈ N ⇒ x + 0 = x `F x ∈ N ∧ y ∈ N ⇒ (x + y 0 ) = (x + y)0 `F x · 0 = 0 `F x ∈ N ∧ y ∈ N ⇒ (x · y 0 ) = ((x · y) + x)

Also, we have, e.g.: `F ((∃1 x) A [x]) ⇔ ({x|A [x]} ∈ 1) `F ((∃2 x) A [x]) ⇔ ({x|A [x]} ∈ 2) `F ((∃3 x) A [x]) ⇔ ({x|A [x]} ∈ 3) And so on.

Disaster The system F, unfortunately, is inconsistent due to Russell’s paradox: `F {x|x ∈ / x} ∈ / {x|x ∈ / x} ⇔ {x|x ∈ / x} ∈ {x|x ∈ / x} Proof: Direct from (A8) and universal instantiation. Whence both `F {x|x ∈ / x} ∈ {x|x ∈ / x}, and `F {x|x ∈ / x} ∈ / {x|x ∈ / x}. Hence `F A for all wffs A , making the system entirely unsuitable for mathematics. In this system we have both `F 1 + 1 = 2 and `F 1 + 1 = 3! Poor Frege. Homework Without using Russell’s paradox or other contradiction, prove `F {x} = {y} ⇒ x = y.

Some History In the late 19th century, Euclid’s axiomatization of geometry came under new scrutiny. Many mathematicians began to investigate the axiomatization of arithmetic as well. In 1879 German mathematician Richard Dedekind surmised that five principles formed the basis of all pure arithmetic. In the 1880s and 1890s, the adequacy of those five principles (and others) was studied in depth, most importantly by a group of Italian mathematicians lead by Giuseppe Peano. In order to consider them more systematically, Peano urged that the principles be written using a rigorously defined

59

symbolic notation for logic and set theory, which he was still developing at that time. Given Peano’s role in systematizing and popularizing the above principles, they have since come to be called the “Peano axioms” or “Peano postulates”. In the year 1900, Peano presented some of his findings at the International Congress of Philosophy in Paris. In the audience was a young English polyglot whose main contribution to academia was a fellowship thesis on the compatibility of nonEuclidian geometry with Hegelian idealism. He was so impressed with Peano’s work that over the next few months he had not only mastered Peanist logic, but had suggested several improvements. This was a 28 year old Bertrand Russell. Russell suggested that it wasn’t enough to state the axioms of arithmetic in logical notation. One needed also to be explicit about the rules and principles governing that logical notation, because only then could one really test what is and what isn’t provable. The axioms of arithmetic needed to be supplemented with the axioms of logic. However, in attempting to axiomatize logic and set theory, bearing in mind Georg Cantor’s definition of cardinal numbers in terms of one-one correspondences, Russell became convinced that given suitable definitions of the notions of ‘zero’, ‘successor’ and ‘natural number’, the Peano (so-called) ‘axioms’ could actually be derived as theorems from the axioms of logic alone. Russell, having just taught a course on Leibniz, saw this as vindication of Leibniz’s theory that mathematical truths are simply more complicated truths of logic: a theory now known as logicism. Russell began work on writing what he imagined to be a two volume work called The Principles of Mathematics. In volume one, he would explain the reduction of mathematics to logic informally (in English), and in volume two, he would set out to derive all of pure mathematics within an axiomatic system whose only axioms were logical axioms. However, in mid-1901, after he had finished the bulk of vol. I, he discovered the paradox of sets that now bears his name. Realizing that this made the most natural axiomatization of set theory inconsistent, Russell started to look for a philosophically adequate solution that would nevertheless salvage

most of the work he (and others such as Dedekind and Peano) had done. Not finding an easy solution, Russell decided to publish vol. I with only a preliminary discussion of the contradiction and possible ways of avoiding it, leaving a complete solution of the inconsistency within the formal system for further development in vol. II. While finishing vol. I in 1901–1902, Russell did a search of recent literature on the foundations of mathematics, and in so doing rediscovered the works of Gottlob Frege. Frege, working in almost complete isolation, had already in his 1884 Grundlagen der Arithmetik (trans. Foundations of Arithmetic), given a list of basic principles of arithmetic very similar to Dedekind’s, but also suggested, like Russell, that given suitable definitions in terms of notions of pure logic, that these principles could be derived from logical principles alone. In fact, Frege had already developed the core of an axiomatic system for logic in his 1879 work Begriffsschrift, and in his later 1893 magnum opus, Grundgesetze der Arithmetik, vol. I. (trans. Basic Laws of Arithmetic), Frege expanded that system by adding axioms for “value-ranges” (in effect, class theory), and had begun to derive the elementary truths of number theory. While Russell was delighted to find such common ground between his work and Frege’s, he also discovered that Frege’s system fell prey to his paradox, and was therefore inconsistent. He broke the news gently to Frege in a letter. Here is a translation of that letter, as well as Frege’s response (both originally written in German):

60

Dear Colleague: [16 June 1902] I have known your Grundgesetze der Arithmetik for a year a half, but only now have I been able to find the time for the thorough study I intend to devote to your writings. I find myself in full accord with you on all main points, especially in your rejection of any psychological element in logic and in the value you attach to a conceptual notation for the foundations of mathematics and of formal logic, which, incidentally, can hardly be distinguished. On many questions of detail, I find dis-

cussions, distinctions and definitions in your writings for which one looks in vain in other logicians . . . I have encountered a difficulty only on one point. You assert (p. 17) that a function could also constitute the indefinite element. This is what I used to believe, but this view now seems to me dubious because of the following contradiction: Let w be the predicate of being a predicate which cannot be predicated of itself. Can w be predicated of itself? From either answer, the contradictory follows. We must therefore conclude that w is not a predicate. Likewise, there is no class (as a whole) of those classes which, as wholes, are not members of themselves . . . On the fundamental questions where symbols fail, the exact treatment of logic has remained very backward; I find yours to be the best treatment I know in our time; and this is why I have allowed myself to express my deep respect for you. It is very regrettable that you did not get around to publishing the second volume of your Grundgesetze; but I hope that this will still be done. Yours sincerely, Bertrand Russell Dear Colleague: [22 June 1902] Many thanks for your interesting letter of 16 June. I am glad that you agree with me in many things and that you intend to discuss my work in detail . . . Your discovery of the contradiction has surprised me beyond words and, I would almost say, left me thunderstruck, because it has rocked the ground on which I intended to build arithmetic. It seems accordingly that . . . my Basic Law [axiom] V is false . . . I must give some further thought to the matter. It is all the more serious as the

collapse of my Law V seems to undermine not only the foundations of my arithmetic but the only possible foundations for arithmetic as such . . . Your discovery is at any rate a very remarkable one, and it may perhaps lead to a great advance in logic, undesirable as it may seem at first sight . . . The second volume of my Grundgesetze is to appear shortly. I shall have to give it an appendix where I will do justice to your discovery. If only I could find the right way of looking at it! Yours sincerely, Gottlob Frege Unfortunately, the hastily prepared solution Frege included in an Appendix to vol. II of Grundgesetze was unsuccessful, and leads to a similar, but more complicated, contradiction. For Russell’s part, it took him seven more years to find a solution he was happy with. By then, volume II of the Principles had grown so big, and had deviated so far from the plan laid out in vol. I that Russell, along with his new collaborator, Alfred North Whitehead, decided to rename it Principia Mathematica, which was itself split into three volumes, published in 1910, 1911 and 1913. (Principia dropped set theory as such, and instead re-interpreted talk of ‘classes’ in mathematics using notions not involving sets, but instead higher-order quantification over ‘propositional functions’ divided into ramified ‘types’.) Meanwhile, other mathematicians had developed consistent set theories whose axioms, however, did not seem to have the character of selfevidence usually thought to be required of logical truths. Such mathematicians still thought much of mathematics could be reduced to set theory, but denied that set theory was a branch of logic. The first system was developed by Ernst Zermelo in 1908, added to, and made more rigorous by Adolf Frænkel in 1922. Their system is now called ZF or Zermelo-Frænkel set theory. Another was suggested by John von Neumann in 1925, and expanded by Paul Bernays and Kurt Gödel in the 1930s and is now called NBG set theory. Two more set theories, NF and ML, were developed by W. V. Quine in 1937 61

and 1940. New versions continue to be discovered, Proof: 0 0 such as George Boolos’s “New V” which stays close 1. `S x + 0 = (x + 0) to Frege’s original system with only a slight mod- 2. `S x + 0 = x 0 0 ification to Frege’s Basic Law V. However, let us 3. `S x + 0 = x 0 return to Mendelson’s System S for the time being. 4. `S x + 1 = x

C.

3 def. 1

(S6) (S5) 1, 2 LL e

Numerals Result (·1): `S x · 1 = x

The following were either proven in your homework, or follow from those results. (LL) t = u, A [t, t] `S A [t, u], and u = t, A [t, t] `S A [t, u], for all terms t and u that are free for x in A [x, x]. (Canc+T) `S (∀x) (∀y) (∀z)(x + z = y + z ⇒ x = y) (Canc+) t + s = u + s `S t = u

Proof: 1. `S 2. `S 3. `S 4. `S 5. `S 6. `S

Definition: Numerals are the primary or canonical terms used in a given language to stand for specific natural numbers.

x · 00 = (x · 0) + x (S8) Gen, UI x·0=0 (S7) x · 00 = 0 + x 1, 2 LL 0+x=x (S5), (Com+) 0 x·0 =x 3, 4 Trans= 5 def. 1 e x·1=x

Result (·2): `S x · 2 = x + x.

We have numerals in both the object language and the metalanguage. In standard English (our metalanguage) the numerals are the signs: Proof: 1. `S 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, . . . , etc. 2. `S In the language of system S, the numerals constitute 3. `S 4. `S the following series of closed terms:

x · 000 = (x · 00 ) + x x · 00 = x x · 000 = x + x x·2=x+x

(S8) Gen, UI Above, def. 1 1, 2 LL 3 def. 2 e

0, 00 , 000 , 0000 , 00000 , 000000 , 0000000 , 00000000 , . . . , etc. Let us now introduce a metalanguage function, n that yields, for a given number, the numeral of S for that number. We define this function recursively in the metalanguage as follows: Abbreviation: 0 is the constant ‘0’ n + 1 is n0 (This is the metalanguage ‘+’.) So 2 is “000 ”, 5 is 000000 , and 25 is 00000000000000000000000000 .

Result (+1): `S x + 1 = x0

Result (0+): `S x + y = 0 ⇒ (x = 0 ∧ y = 0).

Proof: 1. x + 0 = 0 `S x + 0 = 0 (Premise) 2. `S 0 + 0 = 0 (S5) Gen, UI 3. x + 0 = 0 `S x + 0 = 0 + 0 1, 2 LL 4. x + 0 = 0 `S x = 0 3 Canc+ 5. `S 0 = 0 Ref= 6. x + 0 = 0 `S x = 0 ∧ 0 = 0 3, 4 SL 7. `S x + 0 = 0 ⇒ (x = 0 ∧ 0 = 0) 6 DT 0 0 8. `S x + y = (x + y) (S6) 62

9. 10. 11. 12. 13. 14.

`S 0 6= (x + y)0 (S3) Gen, UI 0 0 `S (x + y) = 0 ⇒ 0 = (x + y) (Sym=T) UI×2 `S (x + y)0 6= 0 9, 10 MT 0 `S x + y 6= 0 8, 11 LL `S x + y 0 = 0 ⇒ (x = 0 ∧ y 0 = 0) 12 SL `S [x + y = 0 ⇒ (x = 0 ∧ y = 0)] ⇒ [x + y 0 = 0 ⇒ (x = 0 ∧ y 0 = 0)] 13 SL 15. `S (∀y){[x + y = 0 ⇒ (x = 0 ∧ y = 0)] ⇒ [x + y 0 = 0 ⇒ (x = 0 ∧ y 0 = 0)]} 14 Gen 16. `S (∀y)[x + y = 0 ⇒ (x = 0 ∧ y = 0)] 7,15 MI 17. `S x + y = 0 ⇒ (x = 0 ∧ y = 0) 16 UI e

or that: 000...(n times)...0 = 000...(m times)...0 `S 000...(n−m times)...0 = 0 However, the negations of these follow from (S3), or (S3) and Sym=, and UI. So, by DT and MT, we get that `S 000...(n times)...0 6= 000...(m times)...0 , i.e., `S n 6= m. e

Result (Num+): For all natural numbers n and m, `S n + m = n + m.

Proofs of the following are left as homework or are given in the book:

Proof: We use induction on m in the metalanguage. First, for the base, n + 0 is simply n, and we have `S n = n + 0 by (S5) and Sym=. For the induction step, assume `S n + m = n + m. We need `S n + (m + 1) = n + (m + 1), i.e., `S n + m0 = 0 n + (m) . This follows by (S2), (S6) and Sym=. e

(0·) `S x 6= 0 ⇒ (x · y = 0 ⇒ y = 0) (1+) `S x + y = 1 ⇒ [(x = 0 ∧ y = 1) ∨ (x = 1 ∧ y = 0)] (1·) `S x · y = 1 ⇒ (x = 1 ∧ y = 1) (∃Succ) `S x 6= 0 ⇒ (∃y)(x = y 0 ) (Canc·) `S x 6= 0 ⇒ (y · x = z · x ⇒ y = z) (∃Succs) `S x 6= 0 ⇒ [x 6= 1 ⇒ (∃y)(x = y 00 )]

Result (Num·): For all natural numbers n and m, `S n · m = n · m.

We also get the following very important results, stated in the metalanguage.

You will prove the above as part of your homework.

Result (Num6=): For all natural numbers n and m, if n 6= m, then `S n 6= m

D.

Ordering, Complete Induction and Divisibility

Proof: Assume that n 6= m. Hence what we need to prove Abbreviations: is a statement of the form: (t < u) for (∃x)(x 6= 0 ∧ t + x = u), where x 0 `S 000...(n times)... 6= 000...(m times)...0 is the first variable not in t or u (t ≤ u) for (t < u) ∨ (t = u) where one side has more 0 -signs than the other. Perform a reductio in the object language, taking as a (t > u) for (u < t) premise the wff, 000...(n times)...0 = 000...(m times)...0 . By (t ≥ u) for (u ≤ t) successive applications of (S4) and MP, depending (t ≮ u) for ¬(t < u) on whether n < m or m < n you’ll get either that: and we define (t u), (t ≯ u) & (t u) similarly. 000...(n times)...0 = 000...(m times)...0 `S 0 = 000...(m−n times)...

0

63

Others (many assigned as homework): Result (Irref