Second Ode on Induction: The Natural Numbers

Last time we went over an example of induction. Yes, even after I promised that we’d dispense with the examples and really dig in to the meaning behind induction. I did that because I wanted to show you something a little more complicated and a little more computer sciencey than the usual examples you see in discrete math for programmers, like proving that \sum_{r=0}^{n}r = \frac{n(n+1)}{2}. Binary trees are fairly simple and very computer sciencey, so I thought they’d be nice. I know I didn’t see a decent induction proof involving a data structure until my upper-division algorithms class, and that’s just too late, in my opinion.

Anyway, I said last time that the three parts of induction could be explained in terms of an argument you make to your friend when he says he was abducted by aliens who stripped him naked and looked at him. Like this:

  1. Base case: “Here’s an alien from the species you say abducted you, let’s call them the Borg. This Borg says he personally would never travel across the entire galaxy from the Delta Quadrant just to see you naked.”
  2. Inductive hypothesis: “Let’s assume that if you polled n Borg drones, they would agree with this drone—they wouldn’t fly across the galaxy to see you naked.”
  3. Inductive conclusion: “Given that n drones concur, n+1 drones concur. Therefore the entire collective concurs. The Borg would never fly from the Delta Quadrant so they could abduct you and look at you naked.”

Why does the inductive conclusion work? Why can we assume that n concurring drones means n+1 concurring drones, and that this property means a unanimous collective? If you’ve seen Star Trek, you know that it’s because the Borg have a hive mind. If one drone holds an opinion, that opinion is shared by the entire collective. But why does this work for induction?

The Natural Numbers

The natural numbers are just the normal counting numbers that you learned in kindergarten (1, 2, 3…). Zero technically isn’t a natural number, but sometimes it’s convenient to include it (our binary tree example from last time was one instance where it was useful).

Imagine that we lined up all the Borg drones in the collective in a big line that stretches off to infinity, and gave each of them a number, starting at 1.

The Borg can assimilate other species and make them into Borg drones, so the collective is always growing. Suppose a poor dumb Bolian gets assimilated and has to join the line, but we don’t want to make him walk all the way to the end, so we let him cut in right here in the middle, at position 498. Since we want our Borg line to be in order by number, we just give our new Bolian drone the number 498, and the other drones’ numbers all go up by one. No matter how many drones were in our line before we let the Bolian cut in at position 498, there are always enough natural numbers that we can bump everyone up by one without running out. And no matter how many new drones we allow to cut into the line at the middle, we can always repeat this move and bump everyone’s number up by one. Everyone always gets an index.

This is what mathematicians mean when they say the natural numbers are infinite: no matter how many numbers we assign to drones, there are always more of them. The natural numbers never run out.

The Peano axioms

But how do we know that bumping everyone’s number up by one doesn’t somehow change someone’s number into something that’s not a natural number? How do we know that we won’t bump up number 5354 and somehow end up with number 5354.12?

Your first reaction to that was probably “Of course we won’t. How could that happen? That’s stupid!” But mathematicians like to be extra careful about things like that. They like to prove that it can’t happen. But there’s really no obvious way to actually prove that in the mathematical sense. All we know is that if we have five mangos and we get another mango, we have six mangos; we don’t have 5.5 unless we cut the sixth mango in half and eat one half.

When mathematicians can’t prove something, but it seems obvious or self-evident that it should be true, they often make it into an axiom. Axioms are rules that we can’t prove, but that seem like they should be true, so we just assume they are and use them to prove things. Even the most skeptical mind can’t constantly be in doubt about everything, or knowledge would never get anywhere. For example, when scientists run experiments, they usually assume that they aren’t hallucinating the entire experiment from the inside of a padded cell. You can’t really prove this; we could all be in the Matrix right now. But it seems unproductive to worry about that, so we just assume that isn’t the case and go about our lives. In the same way, mathematicians are allowed to assume certain things are true, if it seems like it just couldn’t work any other way. Non-Euclidean geometry is an interesting example of what can happen if you change your axioms a little, but we won’t go into that here (maybe in a future post).

So there are certain things we just assume to be true about the natural numbers, certain axioms that define them. These are called the Peano axioms, after Italian mathematician Giuseppe Peano, who worked on mathematical induction. See Wikipedia for more information.

For induction, we mostly care about just three of the Peano axioms. Those are:

  1. 1 is a natural number.
  2. If n is a natural number, then n+1 is a natural number.
  3. If K is a set which contains 1 and which contains n+1 whenever it contains n, then K contains every natural number.

The first two give us a recursive definition (!) of the natural numbers. 1 is a natural number. Since 1 is a natural number, 1+1 = 2 is also a natural number. Since 2 is a natural number, 2+1 = 3 is also a natural number. So on. We can go as high as we want, but hopefully you see what’s happening; if we start with one Borg drone, labeled 1, and keep adding drones to the front of the line, the original drone’s label will keep going higher and higher. According to the second axiom above, no matter how high the first drone’s number gets, it’s always a natural number, because it was a natural number when we started and we just bumped it up by one.

Mathematicians often say that n+1 is the successor of n. Rather than saying that you add 1 to a natural number to get the next natural number, they think of it as applying a successor function, S(n). This way of thinking about it is useful later, when we start applying the ideas behind the Peano axioms to things other than numbers, like binary trees.

The third axiom above is called the inductive axiom. It’s what allows us to write proofs by induction. But we need one more idea to see exactly why induction is valid: the isomorphism. We’ll go over that next time. Isomorphism will show us how we can create a set of natural numbers that represent some other object, like sums or binary trees, prove things about that set, and translate those proofs into proofs about our original objects.


  • The natural numbers are infinite; no matter how many of them you have, there’s always another one.
  • The natural numbers can be defined recursively by starting at 1 and taking the successor.
  • One of the rules of the natural numbers, the inductive axiom, combines with the idea of isomorphism to give us proofs by induction.

First Ode on Induction

Induction is one of the weirdest proof techniques around. It took me forever to fully understand it. It’s complicated; it has all these different parts; and it’s difficult to understand how it even accomplishes its stated goal of proving that something is true for all natural numbers. Most people probably have the following three questions the first time they see induction:

  1. WTF did I just see?!
  2. How did that even work?!
  3. How did they ever come up with this?

I know I did. It’s unfortunate that induction is so hard to understand, because it’s also extremely important for computer science; in fact, probably 90% of all the proofs I’ve ever seen in theoretical computer science were proofs by induction. That’s because induction captures the idea of doing something repeatedly, just like the loops and recursion that are used all over the place in programs. But when my school covered induction (in a lower-division discrete math grab bag course), the teacher just slapped it on the board and said “There it is.” Most of the class just memorized how to do simple induction proofs, and by the time I was in upper-division theory of computation and algorithms classes, the professors were expecting proofs out of us that most of the students had no idea how to write, because they never learned how to apply induction to anything more complicated than \sum_{i=0}^{n} i = \frac{n(n+1)}{2}.

That’s why I’m going to try and give a little insight into induction. Once I knew what I’d just seen, how it even worked, and how anyone ever came up with it, I saw that induction is actually a fascinating and elegant proof technique. I also, eventually, saw why it’s so important for computer science, but most CS classes treat math as a tool, like vi or Git, that you learn on your own time; throw up some formulas and some examples to copy, and everyone goes home and memorizes it. I learned what I learned about induction from math classes, and I’m going to try and put it out there so all CS majors can learn to appreciate induction.

An Example, just to set the stage

First, though, I am going to make a total liar of myself and walk you through an example. That’s because induction has so many parts to it that I want to make sure we’re all on the same page about that before I start going into the background. I’m going to assume that you’ve seen all the simple examples involving sums, and that you’re ready for something a bit more complicated. So let’s talk about binary trees. Actually, let’s talk about full or perfect binary trees. And here one is:


(Source: Go there if you’re unclear at all about trees.)

In textbooks, the statements to be proved are always provided for you, free of charge, but in general you have to actually do some digging to figure out just what you’re going to prove. To make this more real, let’s suppose that our binary tree represents something—for example, we can say it’s a decision tree for some kind of simple AI program that decides whether it likes a car or not based on what features the car has. Each node is a feature; taking the left branch means yes, the car has the feature, while taking the right branch means no, the car doesn’t have the feature. Something like this:


(I know, no one uses CD players anymore, but fitting “stylish adaptive media center with voice-activated controls and iCharging Station” onto the picture just wasn’t worth it.)

You can see that when we add a feature, we don’t just add one new node for that feature. We had just one node for leather seats, but we needed two for air conditioning—one for the reality where we had leather seats, and one for the reality where we didn’t. Then when we added CD player, we needed four nodes—one for the reality where we had leather seats and air conditioning, one for the reality where we had leather seats and no air conditioning, one for the reality where we didn’t have leather seats but had air conditioning, and one for the reality where we had neither leather seats nor air conditioning. You can probably see that if we add another feature—say, power windows—we’ll need eight new nodes: one for leather seats, air conditioning,CD player, power windows; one for leather seats, air conditioning, CD player, no power windows; one for leather seats, air conditioning, no CD player, power windows; one for…

Let’s say that the children of the bottom feature nodes are decision nodes–each one represents a decision of yes, buy the car, or no, don’t buy it. Then we could have something like this:


In this case the decision of whether to buy or not is based only on whether the car has air conditioning, but we could have made a separate decision for every possible combination of features, and then we’d have eight decision nodes down there at the bottom.

We can see just from this small example that the number of nodes we have to store gets big pretty quickly as we add more and more choices, but just how quickly? What we’d like is some kind of formula where we plug in the number of features we want to consider and get back how many nodes that will require. That way, if we know how big a node is and how much memory our computer has, we can figure out how many features we can have before it crashes the computer.

(I hope it’s obvious that having a formula is better than just trying some random number of features, seeing if the program crashes, and trying some random smaller number if it does. Even an approximation formula that gets you in the right ballpark is helpful.)

If you try out some examples—adding features, drawing in the new nodes, and counting them—you’ll probably notice a pattern: each feature requires exactly twice as many nodes as the feature before it. Leather seats only required one node, but air conditioning required 2 \times 1 = 2 nodes, and CD player required 2 \times 2 = 4 nodes.  This makes sense, because each node leads to two children—a yes child and a no child—and all the child nodes of one feature represent the next feature. So how do we turn that into a formula? Well, if we want to know how many nodes there are in the whole tree, we can just add up the number of nodes for all the features. And we know that each feature has twice as many nodes as the feature before it. So if we have f features, we can use something like N = 1 + 2\times 1 + 2\times 2 \times 1 +\ldots + 2\times 2 \times \ldots (f times) \times 1, for N the number of nodes in the tree, right? Or in somewhat cleaned up, more mathy notation:

N = \sum_{i = 0}^{f} 2^i.

Let’s not be hasty

Now that you’re a big time computer scientist, you’re not allowed to assume that something always works just because it seems to work for a few small examples. This is where induction comes in.

While it might seem obvious that every level of the tree (which corresponds to a feature in our example; again, see a tutorial on trees if you’re drawing a blank on this term) has twice as many nodes as the level before it, we can’t actually be sure of this. There are some truly bizarre patterns and arbitrary cutoff points in math, where things that seem the same are totally different, or things that seem different are really the same thing in disguise. We’ll see an example of this later when we talk about the PCI and the well-ordering principle. For other examples, check out finding the roots of polynomials (something magical happens between a polynomial of degree 4 and one of degree 5) or graph coloring (between 3 and 4 here).

We have two assumptions in the formula above: that every level of a binary tree has twice as many nodes as the level above it, and that the sum of nodes in every level of the tree is the total number of nodes in the tree. Both of these need to be proved before we can say for sure that the formula above is always true, no matter how many levels our tree has.

So first of all, let’s prove that every level of a tree has twice as many nodes as the level before it, or equivalently, the ith level of the tree has 2 \times 2^{i-1} = 2^i nodes, using induction. If you’re familiar with the sum examples I assumed you’d seen, this should be fairly straightforward. First we need to decide what to induct on. This decision is pretty much made for us here—it’s the levels—but in more complicated examples, it’s often not obvious what to induct on. Next we need a base case. The obvious candidate is the root. You might be tempted to make the root level 1, but the root level has one node, and 2^1 = 2. If we make the root level 0, then 2^0 = 1, and we’re good.

This is a fine point to keep in mind: the base case can start anywhere. It does not have to start at 1 or 0. It could start at 22, or 69 (if you’re that sort of person), but remember that if you put the base case at 69, you have only proved the theorem for when the relevant quantity is 69 or higher. If we chose to start this example at 69, then our proof would only show that if the tree has 69 or more levels, every level past level 69 has twice as many nodes as the previous level. So while you can bring dorm room humor into theoretical CS if you want to, you’re really just hurting yourself in the end, because you haven’t proved that the theorem is true for a tree with 68 or fewer levels. Here we’re starting at 0, so every tree that has a root (i.e. pretty much all of them) has the property that level i has twice the number of nodes of level i-1.

So we have our base case. Next is our inductive hypothesis. This is the weirdest step for beginners to understand. It blew my brain for several months after I first learned about induction. That’s because it doesn’t actually involve anything factual. Instead, the inductive hypothesis and inductive conclusion together do pretty much what you do when your friend says he was abducted by aliens and you reply “Assuming aliens existed” (your hypothesis), “they wouldn’t fly across the galaxy just to see you naked” (your conclusion). Induction, though, does more than this, because we have the base case, so we know that the theorem is true in at least one case; it’s as if we said “Here’s an alien. He says he wouldn’t fly across the galaxy to see you naked (base case). Assuming more aliens exist (hypothesis), they wouldn’t fly across the galaxy to see you naked either (conclusion).” In future parts, we’ll see the mathematical formalism behind this, which will show why an induction is a valid proof and not just an invalid generalization based on a single example.

So for our inductive hypothesis for the tree level example, we assume that N_k, the number of nodes in level k, is equal to 2 \times N_{k-1}, twice the number of nodes in level k-1. And we assume that this was the case for every level back down to level 0, so that N_{k-1} = 2 \times 2 \times \ldots = 2^{k-1} and N_k = 2 \times 2^{k-1} = 2^k. Now we want to show that, given this assumption, N_{k+1} = 2 \times N_k; i.e. there are twice as many nodes in level k+1 as in level k.

From here it’s really just algebra. We’re trying to prove that N_{k+1} = 2^{k+1}. We can pick one side of this equation and try to transform it into a new form that involves the inductive hypothesis; then we can use the assumption of the inductive hypothesis to show that the conclusion is true. We know that 2^{k+1} = 2 \times 2^k; given that we assumed N_k was equal to 2^k, we can now see that N_{k+1} = 2^{k+1} = 2 \times 2^{k} = 2 \times N_{k}, which is exactly what we wanted to show: that level k+1 has twice as many nodes as level k. We have proven the first thing we needed to prove.

Now we need to show the second thing, that in a tree with n levels, there are \sum_{i = 0}^{n}2^i nodes. First the base case: a tree with zero levels (i.e. only a root) has \sum_{i = 0}^{0} 2^i = 2^0 = 1 node. Now, for the inductive hypothesis, we assume that in a tree with k levels, there are \sum_{i=0}^{k} 2^i nodes. We then try to show that a tree with k+1 levels has \sum_{i = 0}^{k+1}2^i nodes. If N_{k+1} is the number of nodes in a tree with k+1 levels, we can show this as follows:

N_{k+1} = \sum_{i = 0}^{k+1} 2^i is what we’re trying to prove.

\sum_{i = 0}^{k+1} 2^i = \sum_{i=0}^{k} 2^i + 2^{k+1}

We know that \sum_{i=0}^{k} 2^i = N_k because of our inductive hypothesis, and we know that level k has 2^{k} nodes because of what we proved above, so adding the {k+1}st new level must have added 2^{k+1} nodes. This proves the formula. We can now take it and use it to find out how many features our decision tree can have before it’s too big to fit in memory.

What was the point?

I walked through that whole example, from conception to culmination, because I wanted to give you some kind of idea about why induction is used in computer science. We saw that the binary tree was something we could keep building by applying the same rules repeatedly (building a new level by giving each leaf in the previous level two children). That sounds a lot like a loop! Indeed, if we were building that tree in a program, we’d probably have a loop that would read a list of car features and build the tree up by adding children, making levels for each feature. All we had to do was number each level in our tree; then we could use induction, and prove that no matter how many levels the tree has, the same properties still apply.

As we’ll see in the next part, you can use induction any time your objects can be numbered in some way using the natural numbers so that there’s an operation to transform the kth object into the k+1st object, for any value of k. This is what makes induction so useful in computer science: it’s basically a way of showing that algorithms with loops or recursion in them are mathematically valid—that if we write a for-loop with 1001 iterations, it does exactly the same thing as a loop with 1000 iterations, just on a slightly larger scale.


  • Induction is a very important proof technique in computer science, because it captures the idea of doing something repeatedly. Loops and recursion can both be proven correct using induction.
  • Proofs by induction have three parts: the base case (“This alien says he wouldn’t travel across the galaxy to see you naked.”), the inductive hypothesis (“Assuming that more than one alien exists…”), and the inductive conclusion (“…they wouldn’t travel across the galaxy to see you naked either.”).