Third Ode on Induction: The Isomorphism

In the first post, I mentioned the three thoughts that occurred to me the first time I ever saw an induction proof; to wit:

  1. WTF did I just see?
  2. How did that even work?
  3. How did they ever come up with this?

The first post was meant to address the first question more fully by making the process of proving something by induction more clear. The second post and this one are meant to address the second and third questions.

Last time we went over the Peano axioms, which include the induction axiom:

  1. For any set of natural numbers K, if 1 \in K and n \in K \Rightarrow n+1 \in K, then the set K contains all natural numbers.

Isomorphism will show us how we can create a set of natural numbers that represent some other object, like sums or binary trees, prove things about that set, and translate those proofs into proofs about our original objects. We assume that this set includes just the natural numbers corresponding to objects with some quality we like, such as having 2^{k} nodes in level k. Then we can prove that the set has the properties asked for in the inductive axiom, which shows that it includes all natural numbers. This means the set of all natural numbers corresponding to objects with the quality we like is, in fact, the set of all natural numbers. The isomorphism allows us to conclude that every one of the objects has the quality we like.

If you were to use Unlimited Rulebook: Secession Version and jump a thousand feet into the sky, that’s the view you’d see of how an induction proof really works. Let’s get a little bit closer and examine how isomorphisms really let us do all this amazing stuff.

The Missing Link: Isomorphism

Isomorphism is a very deep mathematical idea, one that I won’t go into too much, because it’s very profound and I would just embarrass myself. The best explanation I’ve found of the idea for non-mathematicians is in Douglas Hofstadter’s Gödel, Escher, Bach. But here’s the “For Dummies” version of the idea.

An isomorphism is a transformation between two sets. It’s basically a function, but a special kind: it preserves all the properties of the set.

Say we have a list of items and we’ve chosen to number the items, like this:

  1. Hitagi Crab
  2. Mayoi Snail
  3. Suruga Monkey
  4. Nadeko Snake
  5. Tsubasa Cat
  6. Karen Bee
  7. Tsukihi Phoenix

We could have just as easily given each item a letter, like this:

  1. Hitagi Crab
  2. Mayoi Snail
  3. Suruga Monkey
  4. Nadeko Snake
  5. Tsubasa Cat
  6. Karen Bee
  7. Tsukihi Phoenix

It’s pretty obvious that that’s the same list. It has the same items in the same order; nothing has actually changed except for the labels we use to refer to items. In other words, there is an isomorphism between the first list and the second list, a function that maps numbers 1–7 to letters a–g. If we know Nadeko Snake is number 4 in the first list and want to know where Nadeko Snake falls in the second list, we can calculate I(4) and see that it’s d, so we know Nadeko Snake is at position d in the second list.

When you do an induction proof and choose what to “induct on”, you’re actually constructing an isomorphism between the object the proof is about and the natural numbers. In our first binary tree example, we wanted to prove that level n of a binary tree has 2^n nodes. We inducted on the exponent of 2. We can also say that we mapped a level of a binary tree to a natural number label, and then showed that there’s a formula for the number of nodes in a level that depends on the label that level gets in our labeling scheme, and that the formula works for any label, as long as the label is a natural number.

In the second example, we wanted to show that a binary tree with n levels has \sum_{r=0}^{n}2^r nodes. Here we chose to induct on the number of levels; we mapped the tree to a natural number which represented the number of levels it had. We then showed there was a formula that would let you calculate how many nodes the tree had based on the number of levels it had in our labeling scheme. The labeling of trees according to how many levels they have was the isomorphism.

In the classic example, proving that \sum_{r=0}^{n}r = \frac{n(n+1)}{2}, we induct on n. It helps here if you can think of the sum \sum_{r=0}^{n}r as a single object with no internal structure that changes according to what n is set to. Think of it like a Borg drone, say. A drone is just a drone; they’re basically all the same, except for whatever label we gave them when we lined them up. Think of these sums like that; they’re basically all the same, except for the label we give them.

Next time we’ll use isomorphisms and the Peano axioms to finish answering the second point from the above list: how does induction even work? How does it actually prove that some pattern remains valid off into the infinity of the natural numbers? We’ll see.


  • Isomorphisms are basically functions that “relabel” a set of objects. Although the objects are reanamed, they still have the same properties they did before.
  • When you write an induction proof and choose what to induct on, you are constructing an isomorphism from the objects you care about (binary trees, sums, regions of a circle, horses, whatever) to the natural numbers.

Second Ode on Induction: The Natural Numbers

Last time we went over an example of induction. Yes, even after I promised that we’d dispense with the examples and really dig in to the meaning behind induction. I did that because I wanted to show you something a little more complicated and a little more computer sciencey than the usual examples you see in discrete math for programmers, like proving that \sum_{r=0}^{n}r = \frac{n(n+1)}{2}. Binary trees are fairly simple and very computer sciencey, so I thought they’d be nice. I know I didn’t see a decent induction proof involving a data structure until my upper-division algorithms class, and that’s just too late, in my opinion.

Anyway, I said last time that the three parts of induction could be explained in terms of an argument you make to your friend when he says he was abducted by aliens who stripped him naked and looked at him. Like this:

  1. Base case: “Here’s an alien from the species you say abducted you, let’s call them the Borg. This Borg says he personally would never travel across the entire galaxy from the Delta Quadrant just to see you naked.”
  2. Inductive hypothesis: “Let’s assume that if you polled n Borg drones, they would agree with this drone—they wouldn’t fly across the galaxy to see you naked.”
  3. Inductive conclusion: “Given that n drones concur, n+1 drones concur. Therefore the entire collective concurs. The Borg would never fly from the Delta Quadrant so they could abduct you and look at you naked.”

Why does the inductive conclusion work? Why can we assume that n concurring drones means n+1 concurring drones, and that this property means a unanimous collective? If you’ve seen Star Trek, you know that it’s because the Borg have a hive mind. If one drone holds an opinion, that opinion is shared by the entire collective. But why does this work for induction?

The Natural Numbers

The natural numbers are just the normal counting numbers that you learned in kindergarten (1, 2, 3…). Zero technically isn’t a natural number, but sometimes it’s convenient to include it (our binary tree example from last time was one instance where it was useful).

Imagine that we lined up all the Borg drones in the collective in a big line that stretches off to infinity, and gave each of them a number, starting at 1.

The Borg can assimilate other species and make them into Borg drones, so the collective is always growing. Suppose a poor dumb Bolian gets assimilated and has to join the line, but we don’t want to make him walk all the way to the end, so we let him cut in right here in the middle, at position 498. Since we want our Borg line to be in order by number, we just give our new Bolian drone the number 498, and the other drones’ numbers all go up by one. No matter how many drones were in our line before we let the Bolian cut in at position 498, there are always enough natural numbers that we can bump everyone up by one without running out. And no matter how many new drones we allow to cut into the line at the middle, we can always repeat this move and bump everyone’s number up by one. Everyone always gets an index.

This is what mathematicians mean when they say the natural numbers are infinite: no matter how many numbers we assign to drones, there are always more of them. The natural numbers never run out.

The Peano axioms

But how do we know that bumping everyone’s number up by one doesn’t somehow change someone’s number into something that’s not a natural number? How do we know that we won’t bump up number 5354 and somehow end up with number 5354.12?

Your first reaction to that was probably “Of course we won’t. How could that happen? That’s stupid!” But mathematicians like to be extra careful about things like that. They like to prove that it can’t happen. But there’s really no obvious way to actually prove that in the mathematical sense. All we know is that if we have five mangos and we get another mango, we have six mangos; we don’t have 5.5 unless we cut the sixth mango in half and eat one half.

When mathematicians can’t prove something, but it seems obvious or self-evident that it should be true, they often make it into an axiom. Axioms are rules that we can’t prove, but that seem like they should be true, so we just assume they are and use them to prove things. Even the most skeptical mind can’t constantly be in doubt about everything, or knowledge would never get anywhere. For example, when scientists run experiments, they usually assume that they aren’t hallucinating the entire experiment from the inside of a padded cell. You can’t really prove this; we could all be in the Matrix right now. But it seems unproductive to worry about that, so we just assume that isn’t the case and go about our lives. In the same way, mathematicians are allowed to assume certain things are true, if it seems like it just couldn’t work any other way. Non-Euclidean geometry is an interesting example of what can happen if you change your axioms a little, but we won’t go into that here (maybe in a future post).

So there are certain things we just assume to be true about the natural numbers, certain axioms that define them. These are called the Peano axioms, after Italian mathematician Giuseppe Peano, who worked on mathematical induction. See Wikipedia for more information.

For induction, we mostly care about just three of the Peano axioms. Those are:

  1. 1 is a natural number.
  2. If n is a natural number, then n+1 is a natural number.
  3. If K is a set which contains 1 and which contains n+1 whenever it contains n, then K contains every natural number.

The first two give us a recursive definition (!) of the natural numbers. 1 is a natural number. Since 1 is a natural number, 1+1 = 2 is also a natural number. Since 2 is a natural number, 2+1 = 3 is also a natural number. So on. We can go as high as we want, but hopefully you see what’s happening; if we start with one Borg drone, labeled 1, and keep adding drones to the front of the line, the original drone’s label will keep going higher and higher. According to the second axiom above, no matter how high the first drone’s number gets, it’s always a natural number, because it was a natural number when we started and we just bumped it up by one.

Mathematicians often say that n+1 is the successor of n. Rather than saying that you add 1 to a natural number to get the next natural number, they think of it as applying a successor function, S(n). This way of thinking about it is useful later, when we start applying the ideas behind the Peano axioms to things other than numbers, like binary trees.

The third axiom above is called the inductive axiom. It’s what allows us to write proofs by induction. But we need one more idea to see exactly why induction is valid: the isomorphism. We’ll go over that next time. Isomorphism will show us how we can create a set of natural numbers that represent some other object, like sums or binary trees, prove things about that set, and translate those proofs into proofs about our original objects.


  • The natural numbers are infinite; no matter how many of them you have, there’s always another one.
  • The natural numbers can be defined recursively by starting at 1 and taking the successor.
  • One of the rules of the natural numbers, the inductive axiom, combines with the idea of isomorphism to give us proofs by induction.

Finally writing something in Clojure part 3: The Experience (part 1: The Bad)

So once all that setup is finally done, what is it like actually writing something in Clojure?

In some ways, my long delay before actually writing something was an advantage. I learned more about the functional style of programming before I was actually called on to use it. For the same reason, when I took Programming Languages Part II: With a Vengeance at UC Davis, I had a bit less trouble writing the Sisal project than some of my classmates, and my code turned out less imperative.

Namespaces: You’d think a space for names wouldn’t be this complicated

Leiningen strongly encourages you to write your Clojure code in namespaces, which are one of just two things I really didn’t like about Clojure as I was using it for this project. Don’t get me wrong—I believe in organizing your code, and the Clojure namespace is a good way to do it, much less formal and more lightweight than a class system, and yet much better at keeping things organized than Common Lisp’s I-never-figured-out-what. They’re about like Python’s modules, in that you can just shove a bunch of functions and variables in a file and they’ll all be nicely sequestered in their own little cloister, easily available to the outside world if needed, but never clashing with what you’re doing elsewhere in the program.

But jeez, working with them is a pain.

The idiomatic way to work with namespaces is through the ns macro, which is also how you’re supposed to do all imports, references, requirements, and inclusions. The ns macro is kind of confusing and the syntax can be really weird (I kept forgetting what parts needed to be in vectors and whether the namespace names needed to be quoted), but it’s not that bad once you figure it out. It’s marginally better than Java’s import. Except that it includes Java’s import, as well as two other Clojure functions—require and use—that basically all do the same thing slightly differently. Also, all three of these are available as standalone functions. There’s also yet another function, load, that does something similar to these. That seems like a lot of functions just to import things into namespaces.

Here’s the deal (I learned this by Googling, so don’t be afraid to Google if you don’t like my explanation—there’s lots of good resources by real Clojure experts, which I am not, out there).
All three of the importing functions (require, use, and import) can be invoked as keyword arguments to the ns function, so that you can define a namespace and say which other namespaces it references at the same time. Here’s an example:

  (:require [clojure [string :as s] set]
            [ :refer [file]])
  (:import [java.util Date Timer])
  (:use [clojure test]))

The :as argument alias the string library to s, so in your code you can write (s/blank? " ") instead of (clojure.string/blank? " "). The :refer option is like Python’s from module import thing1, thing2; it says “Import the library, but only refer to the file function.”

The ns macro is the documentation’s suggested way to import things into your namespaces, though you can also import with standalone functions (useful in the REPL). Here’s an example:

(require '(clojure set [string :as s] set))

This does just what the first line under :require does above.

require is used for libraries written in Clojure. In the early days, the use function was for importing an entire namespace without qualifying the names. So use was like Java’s import java.blah.bloo.blergh.blagh.*; or Python’s from module import *. Nowadays, require can take an :all option that lets you do that, so use is redundant. So the example above should really be

  (:require [clojure [string :as s] set [test :refer :all]]
            [ :refer [file]])
  (:import [java.util.concurrent Thread]))

or something along those lines.

The import function is used for Java code. It would be nice if we could import Java code in the same way as Clojure code, but I’m glad they made it fairly easy, because Clojure’s Java interop is amazing (more on that later).

As for load, it’s a lot like running a Python module at the interactive prompt; you can just load up a file of code at the REPL and it will execute in the context of your REPL session. As I was working on my project, I maintained a file called context.clj that had a bunch of variable definitions for test data. Then I could start a REPL, type

(load "context")

and start playing with my data in the REPL. (Note there’s no .clj—it will balk if you include the extension.)

In summary, you’d begin a file like this:

    (:require [clojure.string :as s]
              [clojure.set :refer :all]
              [ :refer [pprint]]
              [clojure.test :refer :all])
    (:import [java.util.concurrent Thread))

That make it so everything you define in this file is part of the namespace, so you can have both the function and the function, and they won’t conflict because they’re in separate namespaces.

By the way, Leiningen expects your directory structure to match your namespace. So say you had two files, world.clj and world_star.clj. The following shows what’s at the top of world.clj:

    (:require [clojure.walk :as w])
    (:import [java.util [StringBuffer]]
             [javax.swing [JFrame JLabel]]))

Leiningen will search your classpath for a directory called vanishment. It can be anywhere on your classpath; let’s assume it’s under src. Under src/vanishment, it expects a folder called this. And inside src/vanishment/this, you can put your two files, world.clj and world_star.clj. Notice that Leiningen also expects underscores in the filename wherever you have hyphens in the namespace name.

This contradicts what I’ve seen in C++ programs, where a single namespace is defined across multiple files. Unlike C++, the idiomatic way to do things in Clojure seems to be to make one file for one namespace. In C++ you can define a namespace across several files, but Clojure makes this difficult (though not impossible).

Having a bijection between namespaces and files has its advantages. For one thing, it makes things a lot easier for the build tool. It also made it easier for me to write fixtures for unit testing. When I wrote my unit tests, each file in my program had a corresponding .test file (with its own .test namespace) in the test folder. All the unit tests for functions in the namespace were in the namespace Since the functions in my namespaces were similar to each other or each made up one stage of a larger computation, they all needed similar test data. So I could set up a test environment that would cover all the functions in one namespace, using a fixture inside the .test namespace. This was a huge improvement over the first thing I tried, putting all my tests in the same file and using a let to set up the data. Not only was there a lot of tedious parenthesis matching with that approach, it was also hard to change the data later when I modified a function.

Errors were implemented erroneously

The second thing I disliked about Clojure was the way the compiler handles errors. It basically gives you the error message that the Java runtime throws on the Java code that implements the compiler, so the error messages aren’t really error messages for your code—they’re error messages for the Clojure compiler’s code. So they’re really hard to read: not only do they give you almost no information about what your code did wrong, they also print a three foot-long stack trace. And there’s not much of a debugger that I could find.

This is how I figured out where errors were whenever I got one of those. First, scan the stack trace and look for the namespace prefix of your code (in our examples above, vanishment.this). I had a REPL open in Emacs, so I used C-s to do a search on my namespace prefix. This should tell you what namespace the error came from and what line it came from in that namespace’s file. Then flick your stare between the error message, the line it gave you, surrounding lines, and the code of any functions called by the bad function, until you see what’s wrong. Keeping up on your unit tests is also very helpful, and if all else fails, print statements work pretty well. (I’m one of those awful people who hates debuggers and usually uses assertions and print statements to debug, although I’ve recently gotten on the unit testing bandwagon after seeing how many assertions and print statements it lets me cut out.)

Clojure code, though, tends to be really small, so it’s not too hard to pin down errors. And its functional style makes unit testing and REPL debugging really easy; just write some new unit tests to cover the area that’s giving you trouble, or plunk the code in a REPL session and ply it with different inputs until you see where it’s going bad. That’s powerful enough that it almost makes a debugger redundant. In fact, I’ll go so far as to say that Clojure probably doesn’t need a full GDB-style debugger, but some kind of tool that could load random pieces of code into a REPL at one click might be nice. (Eclipse has something that approaches this, though as I said before, I can’t use Eclipse.) It also might be nice if you could import stuff from your REPL straight into unit tests.

[EDIT: Emacs can plunk code into the REPL. Start up an nrepl session, then go into the file and hit C-x C-e on the function you want to load. See this page for more, and also more on the setup phase that I talked about in previous parts.]

Forward Declarations: A Minor Annoyance

So that’s pretty much the bad side of it, aside from a few minor annoyances like forward declarations. Make sure you declare all functions and variables before the point in the file where they are first used, like this:

(declare later-defined-function)
;; Later
(defn earlier-defined-function
;; Even later
(defn later-defined-function

This will be familiar to C and C++ programmers, but it might seem pretty rinky-dink if you mostly do Java or Python. This is actually necessary because of a design decision made by Rich Hickey, the creator of Clojure.

When the Java compiler compiles your code, it first reads the text, then massages it into data structures, and then does multiple passes over the data structures, first finding everything you’ve declared, and then evaluating it. So Java knows where your stuff is, no matter what order you declared it in. If a Java function had some code like above, it would run across the call to later-defined-function and say “Oh, we need later-defined-function to finish evaluating this. I haven’t evaluated later-defined-function yet, but I did see it during my earlier passes, and I took note of where it was. I’ll go evaluate that first, then come back and finish with earlier-defined-function.”

Clojure, on the other hand, has a single-pass compiler, which means that it only loops over your code once after it’s massaged it all into data structures. So without the forward declaration it runs into the call to later-defined-function and says “What is this?! You never told me this function existed!” By using (declare later-defined-function), you have told the Clojure compiler about this function, so it can fill in a pointer to later-defined-function when it evaluates earlier-defined-function, which will point to valid code later on when it evaluates later-defined-function.

The single-pass compiler has led to some pretty big arguments among Clojure users. Rich Hickey says this was a necessary evil, while Steve Yegge begs to differ. I’m just a humble nobody, so I’ll refrain from commenting about whether it was necessary. You can find Hickey’s arguments for the single-pass compiler by doing a Google search, along with lots of arguments against it.

This was just a minor annoyance for me because I tend to code bottom-up anyway, using the REPL to try things out before I add them in, so defining everything before its first use matches the way I think pretty well. But some people like to work top-down, so they hate having to use forward declarations to get the compiler to accept their style, and I can certainly sympathize with that viewpoint—I tend to write code in Java, for instance, top-down. Nonetheless, the two big annoyances were namespaces and error messages, and I have hope that both things will be fixed in future releases.

Next time we’ll talk about what I liked. Anyway, some of what I liked, because I liked almost everything.

First Ode on Induction

Induction is one of the weirdest proof techniques around. It took me forever to fully understand it. It’s complicated; it has all these different parts; and it’s difficult to understand how it even accomplishes its stated goal of proving that something is true for all natural numbers. Most people probably have the following three questions the first time they see induction:

  1. WTF did I just see?!
  2. How did that even work?!
  3. How did they ever come up with this?

I know I did. It’s unfortunate that induction is so hard to understand, because it’s also extremely important for computer science; in fact, probably 90% of all the proofs I’ve ever seen in theoretical computer science were proofs by induction. That’s because induction captures the idea of doing something repeatedly, just like the loops and recursion that are used all over the place in programs. But when my school covered induction (in a lower-division discrete math grab bag course), the teacher just slapped it on the board and said “There it is.” Most of the class just memorized how to do simple induction proofs, and by the time I was in upper-division theory of computation and algorithms classes, the professors were expecting proofs out of us that most of the students had no idea how to write, because they never learned how to apply induction to anything more complicated than \sum_{i=0}^{n} i = \frac{n(n+1)}{2}.

That’s why I’m going to try and give a little insight into induction. Once I knew what I’d just seen, how it even worked, and how anyone ever came up with it, I saw that induction is actually a fascinating and elegant proof technique. I also, eventually, saw why it’s so important for computer science, but most CS classes treat math as a tool, like vi or Git, that you learn on your own time; throw up some formulas and some examples to copy, and everyone goes home and memorizes it. I learned what I learned about induction from math classes, and I’m going to try and put it out there so all CS majors can learn to appreciate induction.

An Example, just to set the stage

First, though, I am going to make a total liar of myself and walk you through an example. That’s because induction has so many parts to it that I want to make sure we’re all on the same page about that before I start going into the background. I’m going to assume that you’ve seen all the simple examples involving sums, and that you’re ready for something a bit more complicated. So let’s talk about binary trees. Actually, let’s talk about full or perfect binary trees. And here one is:


(Source: Go there if you’re unclear at all about trees.)

In textbooks, the statements to be proved are always provided for you, free of charge, but in general you have to actually do some digging to figure out just what you’re going to prove. To make this more real, let’s suppose that our binary tree represents something—for example, we can say it’s a decision tree for some kind of simple AI program that decides whether it likes a car or not based on what features the car has. Each node is a feature; taking the left branch means yes, the car has the feature, while taking the right branch means no, the car doesn’t have the feature. Something like this:


(I know, no one uses CD players anymore, but fitting “stylish adaptive media center with voice-activated controls and iCharging Station” onto the picture just wasn’t worth it.)

You can see that when we add a feature, we don’t just add one new node for that feature. We had just one node for leather seats, but we needed two for air conditioning—one for the reality where we had leather seats, and one for the reality where we didn’t. Then when we added CD player, we needed four nodes—one for the reality where we had leather seats and air conditioning, one for the reality where we had leather seats and no air conditioning, one for the reality where we didn’t have leather seats but had air conditioning, and one for the reality where we had neither leather seats nor air conditioning. You can probably see that if we add another feature—say, power windows—we’ll need eight new nodes: one for leather seats, air conditioning,CD player, power windows; one for leather seats, air conditioning, CD player, no power windows; one for leather seats, air conditioning, no CD player, power windows; one for…

Let’s say that the children of the bottom feature nodes are decision nodes–each one represents a decision of yes, buy the car, or no, don’t buy it. Then we could have something like this:


In this case the decision of whether to buy or not is based only on whether the car has air conditioning, but we could have made a separate decision for every possible combination of features, and then we’d have eight decision nodes down there at the bottom.

We can see just from this small example that the number of nodes we have to store gets big pretty quickly as we add more and more choices, but just how quickly? What we’d like is some kind of formula where we plug in the number of features we want to consider and get back how many nodes that will require. That way, if we know how big a node is and how much memory our computer has, we can figure out how many features we can have before it crashes the computer.

(I hope it’s obvious that having a formula is better than just trying some random number of features, seeing if the program crashes, and trying some random smaller number if it does. Even an approximation formula that gets you in the right ballpark is helpful.)

If you try out some examples—adding features, drawing in the new nodes, and counting them—you’ll probably notice a pattern: each feature requires exactly twice as many nodes as the feature before it. Leather seats only required one node, but air conditioning required 2 \times 1 = 2 nodes, and CD player required 2 \times 2 = 4 nodes.  This makes sense, because each node leads to two children—a yes child and a no child—and all the child nodes of one feature represent the next feature. So how do we turn that into a formula? Well, if we want to know how many nodes there are in the whole tree, we can just add up the number of nodes for all the features. And we know that each feature has twice as many nodes as the feature before it. So if we have f features, we can use something like N = 1 + 2\times 1 + 2\times 2 \times 1 +\ldots + 2\times 2 \times \ldots (f times) \times 1, for N the number of nodes in the tree, right? Or in somewhat cleaned up, more mathy notation:

N = \sum_{i = 0}^{f} 2^i.

Let’s not be hasty

Now that you’re a big time computer scientist, you’re not allowed to assume that something always works just because it seems to work for a few small examples. This is where induction comes in.

While it might seem obvious that every level of the tree (which corresponds to a feature in our example; again, see a tutorial on trees if you’re drawing a blank on this term) has twice as many nodes as the level before it, we can’t actually be sure of this. There are some truly bizarre patterns and arbitrary cutoff points in math, where things that seem the same are totally different, or things that seem different are really the same thing in disguise. We’ll see an example of this later when we talk about the PCI and the well-ordering principle. For other examples, check out finding the roots of polynomials (something magical happens between a polynomial of degree 4 and one of degree 5) or graph coloring (between 3 and 4 here).

We have two assumptions in the formula above: that every level of a binary tree has twice as many nodes as the level above it, and that the sum of nodes in every level of the tree is the total number of nodes in the tree. Both of these need to be proved before we can say for sure that the formula above is always true, no matter how many levels our tree has.

So first of all, let’s prove that every level of a tree has twice as many nodes as the level before it, or equivalently, the ith level of the tree has 2 \times 2^{i-1} = 2^i nodes, using induction. If you’re familiar with the sum examples I assumed you’d seen, this should be fairly straightforward. First we need to decide what to induct on. This decision is pretty much made for us here—it’s the levels—but in more complicated examples, it’s often not obvious what to induct on. Next we need a base case. The obvious candidate is the root. You might be tempted to make the root level 1, but the root level has one node, and 2^1 = 2. If we make the root level 0, then 2^0 = 1, and we’re good.

This is a fine point to keep in mind: the base case can start anywhere. It does not have to start at 1 or 0. It could start at 22, or 69 (if you’re that sort of person), but remember that if you put the base case at 69, you have only proved the theorem for when the relevant quantity is 69 or higher. If we chose to start this example at 69, then our proof would only show that if the tree has 69 or more levels, every level past level 69 has twice as many nodes as the previous level. So while you can bring dorm room humor into theoretical CS if you want to, you’re really just hurting yourself in the end, because you haven’t proved that the theorem is true for a tree with 68 or fewer levels. Here we’re starting at 0, so every tree that has a root (i.e. pretty much all of them) has the property that level i has twice the number of nodes of level i-1.

So we have our base case. Next is our inductive hypothesis. This is the weirdest step for beginners to understand. It blew my brain for several months after I first learned about induction. That’s because it doesn’t actually involve anything factual. Instead, the inductive hypothesis and inductive conclusion together do pretty much what you do when your friend says he was abducted by aliens and you reply “Assuming aliens existed” (your hypothesis), “they wouldn’t fly across the galaxy just to see you naked” (your conclusion). Induction, though, does more than this, because we have the base case, so we know that the theorem is true in at least one case; it’s as if we said “Here’s an alien. He says he wouldn’t fly across the galaxy to see you naked (base case). Assuming more aliens exist (hypothesis), they wouldn’t fly across the galaxy to see you naked either (conclusion).” In future parts, we’ll see the mathematical formalism behind this, which will show why an induction is a valid proof and not just an invalid generalization based on a single example.

So for our inductive hypothesis for the tree level example, we assume that N_k, the number of nodes in level k, is equal to 2 \times N_{k-1}, twice the number of nodes in level k-1. And we assume that this was the case for every level back down to level 0, so that N_{k-1} = 2 \times 2 \times \ldots = 2^{k-1} and N_k = 2 \times 2^{k-1} = 2^k. Now we want to show that, given this assumption, N_{k+1} = 2 \times N_k; i.e. there are twice as many nodes in level k+1 as in level k.

From here it’s really just algebra. We’re trying to prove that N_{k+1} = 2^{k+1}. We can pick one side of this equation and try to transform it into a new form that involves the inductive hypothesis; then we can use the assumption of the inductive hypothesis to show that the conclusion is true. We know that 2^{k+1} = 2 \times 2^k; given that we assumed N_k was equal to 2^k, we can now see that N_{k+1} = 2^{k+1} = 2 \times 2^{k} = 2 \times N_{k}, which is exactly what we wanted to show: that level k+1 has twice as many nodes as level k. We have proven the first thing we needed to prove.

Now we need to show the second thing, that in a tree with n levels, there are \sum_{i = 0}^{n}2^i nodes. First the base case: a tree with zero levels (i.e. only a root) has \sum_{i = 0}^{0} 2^i = 2^0 = 1 node. Now, for the inductive hypothesis, we assume that in a tree with k levels, there are \sum_{i=0}^{k} 2^i nodes. We then try to show that a tree with k+1 levels has \sum_{i = 0}^{k+1}2^i nodes. If N_{k+1} is the number of nodes in a tree with k+1 levels, we can show this as follows:

N_{k+1} = \sum_{i = 0}^{k+1} 2^i is what we’re trying to prove.

\sum_{i = 0}^{k+1} 2^i = \sum_{i=0}^{k} 2^i + 2^{k+1}

We know that \sum_{i=0}^{k} 2^i = N_k because of our inductive hypothesis, and we know that level k has 2^{k} nodes because of what we proved above, so adding the {k+1}st new level must have added 2^{k+1} nodes. This proves the formula. We can now take it and use it to find out how many features our decision tree can have before it’s too big to fit in memory.

What was the point?

I walked through that whole example, from conception to culmination, because I wanted to give you some kind of idea about why induction is used in computer science. We saw that the binary tree was something we could keep building by applying the same rules repeatedly (building a new level by giving each leaf in the previous level two children). That sounds a lot like a loop! Indeed, if we were building that tree in a program, we’d probably have a loop that would read a list of car features and build the tree up by adding children, making levels for each feature. All we had to do was number each level in our tree; then we could use induction, and prove that no matter how many levels the tree has, the same properties still apply.

As we’ll see in the next part, you can use induction any time your objects can be numbered in some way using the natural numbers so that there’s an operation to transform the kth object into the k+1st object, for any value of k. This is what makes induction so useful in computer science: it’s basically a way of showing that algorithms with loops or recursion in them are mathematically valid—that if we write a for-loop with 1001 iterations, it does exactly the same thing as a loop with 1000 iterations, just on a slightly larger scale.


  • Induction is a very important proof technique in computer science, because it captures the idea of doing something repeatedly. Loops and recursion can both be proven correct using induction.
  • Proofs by induction have three parts: the base case (“This alien says he wouldn’t travel across the galaxy to see you naked.”), the inductive hypothesis (“Assuming that more than one alien exists…”), and the inductive conclusion (“…they wouldn’t travel across the galaxy to see you naked either.”).

Finally writing something in Clojure: The Setup

I’m going to take us back a little bit, to the time before I decided to write something in Clojure, and explain how I got Clojure to run. It was most certainly not trivial. It’s not that any single step of the process is insurmountable, but figuring out what to install can be a pain.

I think Python is a model of how programming language setup should go. You download the binary from the site or install the package from the manager, set your path, and then you have a new command, python, that you can run on .py files. Just run the command without giving a file, and you get an interactive prompt.

I think Clojure is a model of how programming language setup should not go. If you download the jar file from the web site, you basically get a Java library. You have to build the rest of the environment yourself—right down to a repl that has command history.

Other people, fortunately, have done that work for you, by integrating Clojure with other development environments. You can pick the 1970s route, using Emacs with clojure-mode and paredit or Vim with Vim-Clojure, or you can pick the 1990s route of giant, heavy, desktop-crashing IDE with Clojure plugin. Since I was going through an “I hate the 70s” phase when I first installed Clojure, I took the 90s route and tried using Eclipse with the Counterclockwise plugin.

It’s a pretty nice environment, as far as functionality. It includes a nice repl with command history, as well as the Clojure compiler and the usual project management and code completion tools. But as usual, Eclipse barfed on me and couldn’t get off my couch, forcing me to cut the couch into kindling. In fact, part of the reason I didn’t do anything with Clojure for so long was that Eclipse got drunk, barfed and crashed my computers (all of them) nine times out of ten, and would have also done so the tenth time except I was called to dinner before it had the chance.

After my operating systems teacher basically forced us to use Emacs for our assignments, I decided maybe it wasn’t so awful, and after succumbing to peer pressure from reading Coders at Work, I declared an “I Love the 70s!” phase and started using Emacs. Emacs has nice syntax highlighting, and Paredit is awesome once you get used to it. (It’s a cussed pain before you get used to it, and it has an annoying habit of deleting all your text if you hit the wrong key combi, so make sure to keep backups.)

Emacs doesn’t include the Clojure compiler, so you need something else to build with. The most commonly used Clojure build tools are Leiningen and Maven; I went with Leiningen, because it’s the most sparkly and mustachioed build tool. (If Colonel Mustang had a Clojure build tool, I might have gone with that one instead, though.) Unlike many Clojure utilities, Leiningen seems to be pretty well documented. There’s a very nice tutorial on its GitHub page, and lots of help on Stack Overflow and elsewhere. It does basically what Eclipse does as far as managing projects and files, except without barfing. In particular, Leiningen will download the Clojure jar for you and make sure your classpath is set up correctly. (If you don’t understand Java classpaths, don’t feel bad, because neither do I. I let Leiningen handle that stuff for me. But it’s similar to the operating system’s path variable.) It can also add any other libraries you might want–I used it to add Clojure’s Enlive library for HTML manipulation, and the data.json library for reading JSON files.

One of the other nice features of Leiningen is that its configuration files are all written in Clojure. When you make a new project, the root directory will contain a project.clj file that has Clojure code you can edit to change the properties of your project. This is where you specify what libraries you want, including the version of Clojure you want to use. (If you need to know the exact syntax of the version string, most Clojure libraries are on GitHub, and most of the developers have a nice habit of putting the string for the latest version at the top of their readme.) Leiningen will download them all for you and make sure they’re accessible from your own code; all you have to do is import the namespaces (which can also be nontrivial, but we’ll talk about that later). Java’s main build system is Ant, which uses XML to specify the project configuration. XML is pretty verbose compared to Clojure, and you have to learn Ant’s dialect of XML to get anywhere. That’s still preferable to the abomination that is Make, though. But with Leiningen it’s all Clojure, so there’s no new syntax to deal with.

So, I went the total Linux weenie route and used Emacs and Leiningen. My hardware is usually sort of outdated, so I just couldn’t run Eclipse, but if you can and aren’t a total Linux weenie, you might try Eclipse. Netbeans also has a plugin, Enclojure, although I don’t know anything about it. Leiningen’s not too bad, though, so if you want something lighter than a big giant IDE, but don’t want to relive the days of disco and Charles Manson, using Leiningen with another editor (such as JEdit, which also has a Clojure repl plugin) is also an option.

The difficulty of getting everything set up is, I think, a sign of Clojure’s immaturity. It’s still a very new language, even if it is building on the ancient Lisp legacy. But things are improving; Leiningen is very nice, and I’m interested in checking out Light Table sometime. Someday I imagine we’ll have a Clojure setup process that’s as easy as Python’s.

Finally writing something in Clojure: The Decision

I’ve been interested in functional languages for a while. My first exposure to functional programming was in Alan Gauld’s tutorial, where he introduced some of the ideas from functional programming used in Python. I’d like to say my mind was blown by this incredible new programming paradigm, but at that time I had been programming for all of three months and barely understood functions (and classes were right out) so it was more like my mind was a steel wall and functional programming was the ball from a medieval cannon. It just bounced right off.

But back when I was doing linguistics, I was always attracted to obscure languages, especially the ones with baroque, logical, and elegant grammars. I’m talking about Latin, Turkish, Lakhota, Sanskrit, that kind of thing: languages with complex yet consistent morphology, that combine a small number of concepts in a logical way to build up short, expressive sentences. Functional languages are sort of the programming language equivalent of that, so it was almost inevitable that I’d end up being attracted to them. (Perl fans apparently say that Perl is like the English of programming languages. You can take that comparison whatever direction you like.)

Anyway, I bought the book Seven Languages in Seven Weeks, which included four functional languages: Erlang, Scala, Clojure, and Haskell. Scala looked too much like Java crossed with Pascal for my tastes, and while Haskell seemed really cool, monads seemed really uncool. At first I liked the look of Erlang. I loved Prolog, also covered in the book, and Erlang is loosely based on Prolog, so it includes some of Prolog’s most ingenious ideas, like the atom datatype. The concurrency stuff was also pretty awesome. I had never done any concurrency before, aside from a few tiny book examples with Python’s threads, but I came to really like Erlang’s “smoke signals on a mountaintop” approach to representing synchronous processes.

There was just one problem: I write a lot of utility programs for working with text, for both my linguistics hobby and my writing hobby. Erlang is awful at dealing with text. Its strings are actually lists of characters, and its characters are actually integers. I thought about going for it anyway with Erlang, but then I read the chapter on Clojure. While I wasn’t too excited about Lisp and the explanation of concurrency in Clojure was total gibberish to me at the time, I decided it was probably the best way to go, since it has Java backing it up. Java is okay at dealing with text; it’s no Icon, but it does at least have a string type.

So that was that, and I learned Lisp, and everything was good.

NO! No one ever learns Lisp that easily! In fact, I had already been struggling for most of two years just to understand its syntax. Reading the Clojure chapter in the Seven Languages book (and learning more about prefix notation) made the syntax itself pretty clear, and I just pounded the concepts of s-expressions, macros, and working with lists into my heading by reading introductions to them in every book and on every website I could find. I later took a programming languages class at UC Davis where we used Common Lisp, and that made everything pretty clear (and also made me appreciate Clojure’s little bits of syntax, like [] for vectors and {} for hash maps). By the way, the professor was a total Java lover, so I’m surprised he didn’t have us use Clojure.

I installed Clojure, messed around with it, got bored, decided I’d better stop trying to learn functional languages and should focus on stuff that would get me a job, like PHP (ugh), and tried and failed several more times to do something with Clojure. I used it once to implement the Lucas-Lehmer algorithm for primality testing, but that was about it. (My number theory teacher had asked us to test some rather large numbers using Lucas-Lehmer. It would have been plausible to do it by hand with a calculator, but instead I had fun writing the algorithm in Clojure, and presented the code as my answer, along with its results for the given numbers. Full marks! If only the rest of my number theory homework could have been solved by Clojure.)

Recently I decided to publish a book to the Kindle Store. Amazon likes HTML in one big file. I had two problems:  I saved everything in separate ODT files (one per chapter), and Open Office spits out the most awful HTML ever, and includes formatting flaws caused by weird invisible control characters. As an example, if you hold down shift and hit return in Open Office, you get some kind of weird pseudo-newline. When viewing the file in Open Office, it’s indistinguishable from a newline; in the HTML output, it gets turned into a <br /> tag, whereas a normal newline becomes a <p> tag. So when you view the document in a browser, there’s an empty line between every paragraph, except for the ones where I was thinking about my writing and not whether Open Office was going to mangle my formatting later, and held down shift when I hit return.

I tried using PyUNO to make Open Office merge my chapters into one big file, but there’s barely any documentation on PyUNO (or any of the other options to access the API) that I could find, and even getting it to start was a challenge. When I did get it started, I had a fun API of funness to master, with all the functionality inside classes which were inside packages which were named with like five layers of qualification. (The API seems to be written in Java, and Java code is all like that.) I found some scripts that claimed to do what I wanted, but they all crashed when I ran them, so I gave up on PyUNO and decided to just convert the files to HTML using Libre Office’s command line batch conversion powers (which Open Office claims to also have, but which didn’t work for me) and work on them in that state.

A while ago I wrote a truly awful Python script to convert an HTML page into a PDF. I used Python’s built-in HTML parser class to massage the text into a usable form, and the ReportLab library to do the conversion. The script’s user interface was totally braindead. For a while you had to open the source code and change the value of a global variable to change the title. Yes, I know that you should never make your user modify your source code to get basic functionality, and using global variables is a slippery slope that will lead us back to the days of BASIC and goto statements. I had to interact with a ReportLab call which required a function as input; this function had to make some more API calls that would set various parameters, including the title. I couldn’t make a function that took the title as an argument because the ReportLab function was going to be the one calling it, so it had to conform to ReportLab’s expectations. Then my brain finally turned on and I remembered closures; I made a factory function that would take the title as an argument and close another function around it. That closure could then be passed to the ReportLab API.

I went into that story for two reasons: closures are useful after all (no other method could have solved this problem so cleanly), and in the end, no matter how awful it was, I got my HTML pages as PDFs, so I figured I could do that kind of thing again. I was going to use Python this time too, but I’d been thinking about picking up Clojure again, so I decided to put in some effort and try to do the project in Clojure. While I knew Python had the libraries and documentation to handle the problem, I felt pretty secure with Clojure because you can always dip into Java. So far I haven’t had to, though.