Finally writing something in Clojure part 3: The Experience (part 1: The Bad)

So once all that setup is finally done, what is it like actually writing something in Clojure?

In some ways, my long delay before actually writing something was an advantage. I learned more about the functional style of programming before I was actually called on to use it. For the same reason, when I took Programming Languages Part II: With a Vengeance at UC Davis, I had a bit less trouble writing the Sisal project than some of my classmates, and my code turned out less imperative.

Namespaces: You’d think a space for names wouldn’t be this complicated

Leiningen strongly encourages you to write your Clojure code in namespaces, which are one of just two things I really didn’t like about Clojure as I was using it for this project. Don’t get me wrong—I believe in organizing your code, and the Clojure namespace is a good way to do it, much less formal and more lightweight than a class system, and yet much better at keeping things organized than Common Lisp’s I-never-figured-out-what. They’re about like Python’s modules, in that you can just shove a bunch of functions and variables in a file and they’ll all be nicely sequestered in their own little cloister, easily available to the outside world if needed, but never clashing with what you’re doing elsewhere in the program.

But jeez, working with them is a pain.

The idiomatic way to work with namespaces is through the ns macro, which is also how you’re supposed to do all imports, references, requirements, and inclusions. The ns macro is kind of confusing and the syntax can be really weird (I kept forgetting what parts needed to be in vectors and whether the namespace names needed to be quoted), but it’s not that bad once you figure it out. It’s marginally better than Java’s import. Except that it includes Java’s import, as well as two other Clojure functions—require and use—that basically all do the same thing slightly differently. Also, all three of these are available as standalone functions. There’s also yet another function, load, that does something similar to these. That seems like a lot of functions just to import things into namespaces.

Here’s the deal (I learned this by Googling, so don’t be afraid to Google if you don’t like my explanation—there’s lots of good resources by real Clojure experts, which I am not, out there).
All three of the importing functions (require, use, and import) can be invoked as keyword arguments to the ns function, so that you can define a namespace and say which other namespaces it references at the same time. Here’s an example:

(ns vanishment.this.world
  (:require [clojure [string :as s] set]
            [clojure.java.io :refer [file]])
  (:import [java.util Date Timer])
  (:use [clojure test]))

The :as argument alias the string library to s, so in your code you can write (s/blank? " ") instead of (clojure.string/blank? " "). The :refer option is like Python’s from module import thing1, thing2; it says “Import the clojure.java.io library, but only refer to the file function.”

The ns macro is the documentation’s suggested way to import things into your namespaces, though you can also import with standalone functions (useful in the REPL). Here’s an example:

(require '(clojure set [string :as s] set))

This does just what the first line under :require does above.

require is used for libraries written in Clojure. In the early days, the use function was for importing an entire namespace without qualifying the names. So use was like Java’s import java.blah.bloo.blergh.blagh.*; or Python’s from module import *. Nowadays, require can take an :all option that lets you do that, so use is redundant. So the example above should really be

(ns vanishment.this.world
  (:require [clojure [string :as s] set [test :refer :all]]
            [clojure.java.io :refer [file]])
  (:import [java.util.concurrent Thread]))

or something along those lines.

The import function is used for Java code. It would be nice if we could import Java code in the same way as Clojure code, but I’m glad they made it fairly easy, because Clojure’s Java interop is amazing (more on that later).

As for load, it’s a lot like running a Python module at the interactive prompt; you can just load up a file of code at the REPL and it will execute in the context of your REPL session. As I was working on my project, I maintained a file called context.clj that had a bunch of variable definitions for test data. Then I could start a REPL, type

(load "context")

and start playing with my data in the REPL. (Note there’s no .clj—it will balk if you include the extension.)

In summary, you’d begin a file like this:

(ns vanishment.this.world-star
    (:require [clojure.string :as s]
              [clojure.set :refer :all]
              [clojure.java.io :refer [pprint]]
              [clojure.test :refer :all])
    (:import [java.util.concurrent Thread))

That make it so everything you define in this file is part of the namespace vanishment.this.world-star, so you can have both the function vanishment.this.world-star/lucifer-fire and the function vanishment.this.world/lucifer-fire, and they won’t conflict because they’re in separate namespaces.

By the way, Leiningen expects your directory structure to match your namespace. So say you had two files, world.clj and world_star.clj. The following shows what’s at the top of world.clj:

(ns vanishment.this.world
    (:require [clojure.walk :as w])
    (:import [java.util [StringBuffer]]
             [javax.swing [JFrame JLabel]]))

Leiningen will search your classpath for a directory called vanishment. It can be anywhere on your classpath; let’s assume it’s under src. Under src/vanishment, it expects a folder called this. And inside src/vanishment/this, you can put your two files, world.clj and world_star.clj. Notice that Leiningen also expects underscores in the filename wherever you have hyphens in the namespace name.

This contradicts what I’ve seen in C++ programs, where a single namespace is defined across multiple files. Unlike C++, the idiomatic way to do things in Clojure seems to be to make one file for one namespace. In C++ you can define a namespace across several files, but Clojure makes this difficult (though not impossible).

Having a bijection between namespaces and files has its advantages. For one thing, it makes things a lot easier for the build tool. It also made it easier for me to write fixtures for unit testing. When I wrote my unit tests, each file in my program had a corresponding .test file (with its own .test namespace) in the test folder. All the unit tests for functions in the namespace vanishment.this.world were in the namespace vanishment.this.world.test. Since the functions in my namespaces were similar to each other or each made up one stage of a larger computation, they all needed similar test data. So I could set up a test environment that would cover all the functions in one namespace, using a fixture inside the .test namespace. This was a huge improvement over the first thing I tried, putting all my tests in the same file and using a let to set up the data. Not only was there a lot of tedious parenthesis matching with that approach, it was also hard to change the data later when I modified a function.

Errors were implemented erroneously

The second thing I disliked about Clojure was the way the compiler handles errors. It basically gives you the error message that the Java runtime throws on the Java code that implements the compiler, so the error messages aren’t really error messages for your code—they’re error messages for the Clojure compiler’s code. So they’re really hard to read: not only do they give you almost no information about what your code did wrong, they also print a three foot-long stack trace. And there’s not much of a debugger that I could find.

This is how I figured out where errors were whenever I got one of those. First, scan the stack trace and look for the namespace prefix of your code (in our examples above, vanishment.this). I had a REPL open in Emacs, so I used C-s to do a search on my namespace prefix. This should tell you what namespace the error came from and what line it came from in that namespace’s file. Then flick your stare between the error message, the line it gave you, surrounding lines, and the code of any functions called by the bad function, until you see what’s wrong. Keeping up on your unit tests is also very helpful, and if all else fails, print statements work pretty well. (I’m one of those awful people who hates debuggers and usually uses assertions and print statements to debug, although I’ve recently gotten on the unit testing bandwagon after seeing how many assertions and print statements it lets me cut out.)

Clojure code, though, tends to be really small, so it’s not too hard to pin down errors. And its functional style makes unit testing and REPL debugging really easy; just write some new unit tests to cover the area that’s giving you trouble, or plunk the code in a REPL session and ply it with different inputs until you see where it’s going bad. That’s powerful enough that it almost makes a debugger redundant. In fact, I’ll go so far as to say that Clojure probably doesn’t need a full GDB-style debugger, but some kind of tool that could load random pieces of code into a REPL at one click might be nice. (Eclipse has something that approaches this, though as I said before, I can’t use Eclipse.) It also might be nice if you could import stuff from your REPL straight into unit tests.

[EDIT: Emacs can plunk code into the REPL. Start up an nrepl session, then go into the file and hit C-x C-e on the function you want to load. See this page for more, and also more on the setup phase that I talked about in previous parts.]

Forward Declarations: A Minor Annoyance

So that’s pretty much the bad side of it, aside from a few minor annoyances like forward declarations. Make sure you declare all functions and variables before the point in the file where they are first used, like this:

(declare later-defined-function)
;; Later
(defn earlier-defined-function
  []
  (later-defined-function))
;; Even later
(defn later-defined-function
  ...

This will be familiar to C and C++ programmers, but it might seem pretty rinky-dink if you mostly do Java or Python. This is actually necessary because of a design decision made by Rich Hickey, the creator of Clojure.

When the Java compiler compiles your code, it first reads the text, then massages it into data structures, and then does multiple passes over the data structures, first finding everything you’ve declared, and then evaluating it. So Java knows where your stuff is, no matter what order you declared it in. If a Java function had some code like above, it would run across the call to later-defined-function and say “Oh, we need later-defined-function to finish evaluating this. I haven’t evaluated later-defined-function yet, but I did see it during my earlier passes, and I took note of where it was. I’ll go evaluate that first, then come back and finish with earlier-defined-function.”

Clojure, on the other hand, has a single-pass compiler, which means that it only loops over your code once after it’s massaged it all into data structures. So without the forward declaration it runs into the call to later-defined-function and says “What is this?! You never told me this function existed!” By using (declare later-defined-function), you have told the Clojure compiler about this function, so it can fill in a pointer to later-defined-function when it evaluates earlier-defined-function, which will point to valid code later on when it evaluates later-defined-function.

The single-pass compiler has led to some pretty big arguments among Clojure users. Rich Hickey says this was a necessary evil, while Steve Yegge begs to differ. I’m just a humble nobody, so I’ll refrain from commenting about whether it was necessary. You can find Hickey’s arguments for the single-pass compiler by doing a Google search, along with lots of arguments against it.

This was just a minor annoyance for me because I tend to code bottom-up anyway, using the REPL to try things out before I add them in, so defining everything before its first use matches the way I think pretty well. But some people like to work top-down, so they hate having to use forward declarations to get the compiler to accept their style, and I can certainly sympathize with that viewpoint—I tend to write code in Java, for instance, top-down. Nonetheless, the two big annoyances were namespaces and error messages, and I have hope that both things will be fixed in future releases.

Next time we’ll talk about what I liked. Anyway, some of what I liked, because I liked almost everything.

Advertisements