Three Haiku on Clojure

These are three random topics about Clojure that are sort of connected, in that all three address Clojure’s incredible flexibility and genericity from different angles. They were born from a few recent small projects that I started.

My favorite facility for polymorphism in Clojure

Object-oriented programming, at least as presented in C++ and Java, lets you do three things much more easily than languages like C:

  1. Encapsulation. You can hide data from certain parts of the code.
  2. Inheritance. You can create hierarchies of related objects which share some behaviors and modify other behaviors to suit themselves.
  3. Polymorphism. You can implement a function which behaves differently depending on the type of its arguments.

I’m being deliberately vague. Even though these three principles were all over the multiple choice section of the exam on Java when I was in school, they actually transcend object-oriented programming. They’re things which are helpful in all languages. Languages differ in how they support these principles, though.

Encapsulation lets us limit the number of things we have to worry about by only allowing certain parts of the code to modify data. If we have a global variable and we discover in debugging that its value is wrong, the piece of code that made its value wrong could be anywhere. If we have a private member of a class, the piece of code that made its value wrong can only be in that class.

Clojure doesn’t really support encapsulation. Like JavaScript and C, data in Clojure is local or global. You can do weird contortions with closures to make things private, like people do in JavaScript. I wouldn’t, but you can. Since Clojure data is almost all immutable, it’s not nearly as necessary to have encapsulation.

Clojure doesn’t really support inheritance either. But how much of the inheritance we see in Java is about avoiding code duplication, and how much is about the type system? Clojure is dynamically typed, so it doesn’t need inheritance for a type system. And Clojure has macros, the ultimate way of reducing code duplication, so it doesn’t really need inheritance for that reason either. You can build your own inheritance using derive and the polymorphism tools, but the language doesn’t support it directly, not really.

Clojure supports all kinds of sophisticated mechanisms for polymorphism. Records, types, protocols, multimethods. Probably some more that I’m forgetting or that only exist in the upcoming 1.8.1 alpha pre-release snapshot. All of them are useful and interesting, but my favorite example of Clojure polymorphism is probably tree-seq.

Why do I love tree-seq so much? Because this is the only polymorphic function in any language I can think of that both does not care, in any way, in the slightest, not even a little, how you represent the data it works on, and is also so simple that you can make a call to it in one line with practically no boilerplate:

(tree-seq #(not (nil? %))
  #(get % 1)
      [:root
          [[:a [[:b nil] [:c nil]]]
              [:d [[:e
                    [:f nil]]]
                  [:g nil]]]])

It works on this horror of nested vectors, producing a lazy sequence of the child vectors of each node. Even though this thing is the most ghetto way I can imagine to represent a binary tree, tree-seq doesn’t care and still processes it.

It also works on slightly more sophisticated structures:

(defrecord Node [name children])
(def t
    (->Node :root
      [(->Node :a
               [(->Node :b nil) (->Node :c nil)])
       (->Node :d [(->Node :e
                           [(->Node :f nil)])
                   (->Node :g nil)])]))
(tree-seq #(not (nil? %)) :children t)

tree-seq is so cool, you can even do weird things like storing the tree’s structural information in one place and its data in a completely different place:

(def children {:root [:a :d], :a [:b :c], :d [:e :g], :e [:f]})
(tree-seq #(not (nil? (get children % nil)))
      #(get children % nil)
      [:root :a :b :c :d :e :f :g])

Sure, this example is weird and pointless. But it works. tree-seq does not give a hot toddy how your tree is represented, as long as there’s some way to tell which nodes are leaf nodes and some way to get the children of non-leaf nodes.

There’s something refreshing about that freedom. It’s nice to think “For the rest of my code, it would be really nice to represent a tree as XXX” and know that Clojure’s built-in functions can work with that representation. No need to create abstract anonymous protected thread-non-mutable generic reified construction apparati for your new tree representation, or write five hundred lines of XML to register it with your dependency injection framework.

You Can Only Luminus Once Per Project

When I wrote Odyssey Through Three Web Frameworks, I was under the mistaken impression that Luminus was a web framework like Rails. It’s not, and you will be disappointed if you expect it to be.

Facts about Luminus:

  1. It’s implemented as a Leiningen plugin, and in the Java world it would probably be part of Eclipse.
  2. It generates a bunch of useful namespaces for you, and includes a bunch of useful libraries for you, and it does give you some stub functions, but it has nothing as extensive as Rails’s ActiveRecord or Django’s admin interface.
  3. If you don’t like the libraries that Luminus includes for you, it’s easy to pick a different one. Luminus uses Selmer, which is like a Clojure implementation of Django’s template language. I use Enlive. Luminus was okay with that, and we were still able to work together.
  4. Luminus still does not do security.
  5. Luminus is Homu Homu’s favorite Clojure technology, even if she can only use it once a day.

In short, Luminus is still just as do-it-yourself as Clojure always was, while helping out with boilerplate and giving just a bit more guidance than you would get without it.

The thought-polluting effects of Clojure

Recently, I wrote some Python code. It’s been a while, and it was surprisingly good to be back, even though I was definitely not loving on LXML.

A few years ago, whenever I started a new program, I would write a class. I would think about how to carve things up into classes. A few times, I wrote C programs, and I felt lost and confused because I couldn’t put things in classes.

I did use standalone functions in C++ and Python, but only when the function was completely unrelated to the rest of the program. If it was some random utility, I would write it as a standalone function. And if I was overloading an operator that C++ requires to be overloaded as a standalone function, I would write it as a standalone function, after first trying to write it as a method, staring at the bizarre compiler errors for five minutes, and then remembering that you can only overload certain operators as standalone functions. But I usually wanted everything in a class, even when I wasn’t really using the facilities of classes for anything. Both versions of PySounds, for example, stick all the logic in a class, then have a main module that instantiates an object of that class. In PySounds2, the main module does other UI-related things as well, but in PySounds1, it was almost literally just the code if __name__ == "__main__": PySounds().runStuff(lexfile, scfile).

Then suddenly I ran into Clojure. In Clojure, there are no classes. You write everything as standalone functions and stick it in a namespace. I wouldn’t say I felt lost and confused over this. Unlike C, Clojure has namespaces, so I didn’t feel like I was making program soup. But it did take some getting used to. I proceeded to get used to it.

Now, when I go to write some Python, I don’t jump right to classes. I go as far as I can with standalone functions and modules. If I write some code that feels twisted because of the data access patterns, I’ll consider whether it seems classy or not. If it does, I might write a class. Or I might not; I might just add some keyword arguments to a function. I might just stick two pieces of data in a tuple and pass that in or out. I organize Python like it’s Clojure, with modules as namespaces.

This is still pretty idiomatic in Python, but a while back, when I unconsciously applied the same pattern to Java, things got ugly. Java wants, needs, can’t live without, everything in classes. I wanted things not to be in classes. I wrote a class with a bunch of static methods. When I put it up on Code Review.SE, someone complained that it wasn’t object-oriented. And that was just, because it wasn’t, because I didn’t want it to be and Java did and we fought and ended up at a lousy compromise, as usually happens when you disagree with Java.

I’m not sure if there’s really any significance to this change in approach. I’ve felt for at least the past couple years that object-oriented programming was a lot more heavy machinery than I needed for most of the programs I work on. Especially in Java, when you start getting into generics and things get crazy-complicated.

I kind of appreciate Martin Odersky’s approach to Scala, where he knowingly cordoned off all the crazy type rules and made them library implementors’ problem, both at the language level and culturally in the Scala community. Actually, I appreciate a lot of things Odersky has said and done, which is why I have kind of a soft spot for Scala even though the language, on first inspection, doesn’t really match my aesthetic taste. I would like to spend some more time with Scala someday. That might revive my interest in object-oriented programming a bit. Right now it’s at an all-time low.

On the other hand, I recently went over some posts on Code Review.SE where the posters seemed to have totally ignored Clojure’s rudimentary object-oriented features, either knowingly or from ignorance, even though those features were perfect for their problem domain. (Both games, as I remember.) I realized that I really don’t know these features that well, so I got started on a simulation program where I would use them. I haven’t gotten far yet, but I’ve been quite pleased with records and protocols so far. They seem to be simple, flexible, and performant, while also being more sophisticated than regular maps or C structs.

Conclusion

I guess the only takeaway here is that Clojure has had a positive effect in code I write in other languages, by pushing me to make things more polymorphic and generic, and by spurring me to resist needless complexity. Paradoxically, by refusing to be object-oriented, Clojure has brought me greater understanding of object-oriented programming, both its upsides and its downsides. Now, instead of mindlessly classifying everything, I see how far I can go without classes, and when some particular code seems like it could really use a class, I use one. Rather than heavyweight classes with fifty member variables and hundreds of methods, I prefer lightweight classes with a single obvious purpose.

It’s tempting when you first learn about OOP to use it all the time, for everything, especially when your professors and your languages are expecting it. But it’s good to find out how things would go if you didn’t have OOP, whether you’re using C, JavaScript, or Clojure.

Advertisements

Rhapsody in 4Clojure: map-indexed, mapcat, lazy-seq, and lazy-cat

No, I haven’t lost my house. I’ve just become addicted to the site 4clojure.com. If you’ve ever used any of the “koan” sites for learning programming languages, it’s similar to those. I never had before I came to it, and I never will again. It’s too dangerous.

4clojure offers a series of small problems to solve in Clojure. It starts with the most basic of the basic, things like learning which values are truthy and which are falsy; goes through easy stuff like reversing a sequence or finding the penultimate value of a vector; leads into uses for advanced functions like trampoline and juxt; has you implement many of the most important library functions on your own; and finally starts you on harder problems like finding out whether a graph is connected, generating the language of a DFA, and implementing Conway’s Game of Life. I’ve become obsessive about this site, spending hours working on the problems when I should be doing something else. It’s ruining my life.

Ever since I first learned Clojure, I’ve been discouraged from using loop and recur, so I tried really hard to avoid them when I was doing the problems. They’re a lot uglier than direct recursion in Scheme or Haskell, and they can make the code grow in both size and complexity, which are both good reasons to stay away from them when you can. Sometimes you just can’t, but the problems on 4clojure are a great way to learn what your alternatives are. And that’s the theme of this post: four functions I’d never heard of that can get you out of scrapes where it seems like you just have to use loop and recur or Scheme-style direct recursion.

map-indexed

map is a great function. It’s simple, quick, convenient. for is also a great function, although you have to work pretty hard to get to know it—it’s more of a dark, complex loner that you’re always finding out new things about.

Sometimes, despite map‘s greatness, you really do need to know what the index of your item was in the collection you’re mapping over. I used to cover those cases with for, like this:

; Get every fourth item from the collection s
(for [i (range (count s))
      :when (== 0 (mod i 4))]
  (get s i))

That’s kind of hard to read, though, and it seems wasteful somehow—we’re generating a whole other collection just to index our first collection. And we’re looking up every item we want using get. If s is a vector, it’s probably okay, but Clojure lets you treat all sequence types almost identically, so what if it’s something with horrible linear-time lookups?

This is where map-indexed comes in. map-indexed works just like map, except that it takes a function of two arguments. The first argument is the second argument’s index in the collection. To see clearly how it works, run this code:

(map-indexed #(vector %1 %2) '(a b c d e))
; Returns ([0 a] [1 b] [2 c] [3 d] [4 e])

To get every fourth item like we did above using for, you can do this:

(->> s
  (map-indexed #(when (== 0 (mod %1 4)) %2)
  (remove nil?))

To me, it’s more obvious what that code is doing than the version with for. We’re avoiding the creation of a separate indexing collection, and we even got to use our snazzy ->> macro. There’s probably still a better way to do this, but this shows off the usefulness of map-indexed.

mapcat

As evidence for what a lame-brain I am, I didn’t even know about the concat function before I started doing the 4clojure problems. This function is like Common Lisp’s append: it takes a bunch of collections as arguments, and combines them all into one collection.

Now, suppose you wanted to use map for something, and you need the function you’re mapping to return a collection, but you want to have a single collection containing all the items, rather than a collection of collections. For instance, say you have a list containing several vectors, each containing some sets, like this:

(def td '([#{:a, :b}, #{:c, :delete-me}], 
          [#{:and :all :our :yesterdays :have :lighted :fools},    
           #{:the :way :to :dusty :death}], 
          [#{:delete-me, :goodbye-cruel-world}, 
           #{:out, :brief :candle :!}]))

and you want to map over the vectors and get rid of all the sets with the key :delete-me, returning a sequence of all the sets that don’t have that key. Here’s some code that does that.

(map (fn [v]
         (remove #(contains? % :delete-me) v))
         td)

That’s all right, but look at the output:

((#{:a :b}) 
 (#{:lighted :yesterdays :and :have :our :all :fools} 
  #{:to :way :dusty :death :the}) 
 (#{:brief :candle :! :out}))

Aside from the fact that it jumbled the words and ruined Shakespeare’s poetry, every item is still inside a list. So we can do this to fix it:

(apply concat
             (map (fn [v]
                      (remove #(contains? % :delete-me) v))
                  td))
=> (#{:a :b} 
    #{:lighted :yesterdays :and :have :our :all :fools} 
    #{:to :way :dusty :death :the} #{:brief :candle :! :out})

mapcat does exactly what the code above does: it first maps a function (which is expected to return a collection) over something, and then applies concat to the resulting sequence. So we could write the code as

(mapcat (fn [v]
        (remove #(contains? % :delete-me) v)) 
        td)
=> (#{:a :b} 
    #{:lighted :yesterdays :and :have :our :all :fools} 
    #{:to :way :dusty :death :the} #{:brief :candle :! :out})

and get the same effect.

mapcat is also useful when you need the effect of a function that returns multiple values. Just have your function return a collection, and mapcat that function over a sequence. Then the items from all the individual return values will be strung together into a single collection, as if your function did a regular return, but gave back multiple values. An example, which we’ll see again later with lazy-cat:

(mapcat #(repeat % %) (range 5))
=> (1 2 2 3 3 3 4 4 4 4)
;; For each n in (range 5), generates a list containing 
;; n copies of n. E.g. (2 2)---two 2s, (3 3 3)---three 3s, etc. 
;; Concats them all into a single sequence.

Lazy sequences and how to make them

One of Clojure’s selling points is lazy evaluation. Haskell was the first language I ever knew to incorporate this somewhat bizarre feature.

Most popular languages use eager evaluation—everything gets evaluated as soon as it appears. For example, here’s a Python function that crashes the interpreter:

def fib_seq(a, b):
    return [fib_seq(b, a+b)].insert(0, a)

It crashes the interpreter because when you call it, it says “Hold on, before I can get you the fib_seq of a and b, I have to get you a list containing the fib_seq of b and a + b, and then put a on it. But before I can do that, I have to get a list containing the fib_seq of a + b and b + a + b and put b on it. But before I can do that…” This leads to infinite recursion, because the interpreter never finishes getting what it needs; it eventually crashes when you max out the recursion limit.

It’s probably the case that you didn’t really want a list of all the Fibonacci numbers anyway. You probably just wanted a few of them. In eager languages, you’d usually solve this by passing in some kind of limit so the function can check whether it’s done. The limit has to be passed as an argument to every recursive call of the function, so it takes up space on the call stack and clutters up calls to the function. Still, it’s better than crashing the interpreter.

In Python itself, generators are a much nicer way to solve this problem. In fact, they let you achieve behavior very much like Clojure’s laziness. But I just used Python as an example for eager languages in general; in Java or C you don’t have laziness or generators, so you’d have to pass in limits somehow. Still, let’s imagine that Python doesn’t have generators. Wouldn’t it be nice if you could do something like this?

def fib_seq(a, b):
    return [fib_seq(b, a+b)].laze().insert(0, a)

The effect here would be that calling laze() on the list causes any recursive calls inside the list to go unevaluated until they’re actually wanted. It’s as if you’re telling the interpreter “I’d like a Fibonacci sequence, don’t know how big just yet, so could you hold off until I do know?”

Clojure’s version of the fictional laze() method is the lazy-seq function. You define a function that would ordinarily have infinite recursion, like this:

(defn fib-seq [a b]
    (cons a (fib-seq b (+ a b))))

But you wrap the recursive call in lazy-seq, like this:

(defn fib-seq [a b]
    (cons a (lazy-seq (fib-seq b (+ a b)))))

This example comes from the lazy-seq page on ClojureDocs. You wrap the infinite recursion in lazy-seq, telling the compiler “Yes, I’d like infinite recursion, but no, I wouldn’t like it all at once. Hold off on that until later, wouldn’t you?” Later, when you know how much recursion you actually do want, you can use various functions to get it. take grabs the number of items you give it:

(take 17 (fib-seq 0 1))
=> (0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987)

take-while grabs items until the predicate you give it becomes untrue:

(take-while (partial > 1000) (fib-seq 0 1))
=> (0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987)

Here we generate Fibonacci numbers until we find one that’s larger than 1000.

The big deal with lazy evaluation is that absolutely nothing is evaluated until it’s explicitly asked for. Lazy sequences in Clojure have to be realized—this means that some other code has to ask for them to be evaluated. If you just called (fib-seq 0 1), nothing would happen just yet; you’d create a lazy sequence, which you could save somewhere and evaluate later using other functions. This is somewhat similar to the way arguments to macros work—they aren’t evaluated unless you specifically ask for it. Things are different in the REPL, though. The REPL always asks for things to be evaluated right away so it can print them. So if you called (fib-seq 0 1) in the REPL, you’d keep getting Fibonacci numbers until your JVM crashed, just like in the fictional Python example above.

You can also use lazy-seq to create recursive functions that produce sequences, without using loop or stack-consuming direct recursion. One of the 4clojure problems is to implement map. Here’s my solution, which you should not look at if you want to solve the problem on your own.

(fn my-map [f s]
  (if (empty? s)
    nil
    (cons (f (first s)) (lazy-seq (my-map f (rest s))))))

Haskell, to my knowledge, uses lazy evaluation for everything by default, but Clojure uses eager evaluation most of the time, so you have to tell it when you want laziness. Using lazy evaluation for everything can let you do some amazing compiler optimizations, but it can also make reasoning about your code very hard, and it can mess with timing-based things like IO. One reason Haskell uses monads for IO is so that the compiler can make sure to evaluate that code eagerly, so that your IO gets done exactly when you wanted it. Otherwise the compiler might decide that it doesn’t need that code right now, and never evaluate it.

lazy-cat: Not orange, doesn’t eat lasagna

lazy-cat is to lazy-seq as mapcat is to map. If you have a recursive function whose return value is a collection, but you want the items to be part of the outer collection, you can use lazy-cat to produce the sequence.

Here’s an example from ClojureDocs’s lazy-cat page

(defn n-times-n [n]
    (lazy-cat (repeat n n) (n-times-n (inc n))))
(take 12 (n-times-n 1))
=> (1 2 2 3 3 3 4 4 4 4 5 5)

That function starts by taking 1 and calling (repeat 1 1). This returns (1) (i.e. a list with one 1 in it). The function then recurses, with 2, and calls (repeat 2 2), which produces (2 2) (i.e. a list with two 2s in it). If we’d used (cons (repeat n n) (lazy-seq... like before, we’d end up with ((1) (2 2)) as a result. But lazy-cat calls concat, so we instead get (1 2 2) out of this.

Conclusion

There are lots of other useful things I learned on 4clojure, but I’ve already written almost two thousand words, so I’ll leave you to discover them on your own. A lot of the early problems are easier if you know about iterate and intimately understand how for and reduce work. Early on I found myself falling back on loop in cases where one of those three would have solved everything.

Since 4clojure doesn’t let you use defn, it also helps if you know these two things about anonymous functions:

  1. You can give names to anonymous functions created with fn. You can use the name as a recursion point. See my above map example.
  2. You can’t nest two anonymous functions created with the #(...) syntax, but you can do whatever you want with the (fn [arg1, arg2] ...) syntax. To wit:
    ;; Illegal
    #(map #(when (< 20 %) %) %)
    ;; Peachy
    (fn [s] (map #(when (< 20 %) %) s))
    ;; Hunky-dory
    (fn [ks, vs] (map #(when (and (< 20 %1) (> 20 %2)) %2) ks vs))
    ;; Totally kosher
    (fn [ks, vs] (filter #(and (pos? %) (< 100000 %))
                         (map #(when 
                                 (and (< 20 %1) (> 20 %2)) %2) 
                                ks vs)))
    

    And of course, any of the #(...)-using functions could also have used the (fn [args] ...) syntax, say to allow destructuring inside the binding form.

That’s it for now. Just remember to stay out of 4clojure; it’ll ruin your life.