Rhapsody in 4Clojure: map-indexed, mapcat, lazy-seq, and lazy-cat

No, I haven’t lost my house. I’ve just become addicted to the site 4clojure.com. If you’ve ever used any of the “koan” sites for learning programming languages, it’s similar to those. I never had before I came to it, and I never will again. It’s too dangerous.

4clojure offers a series of small problems to solve in Clojure. It starts with the most basic of the basic, things like learning which values are truthy and which are falsy; goes through easy stuff like reversing a sequence or finding the penultimate value of a vector; leads into uses for advanced functions like trampoline and juxt; has you implement many of the most important library functions on your own; and finally starts you on harder problems like finding out whether a graph is connected, generating the language of a DFA, and implementing Conway’s Game of Life. I’ve become obsessive about this site, spending hours working on the problems when I should be doing something else. It’s ruining my life.

Ever since I first learned Clojure, I’ve been discouraged from using loop and recur, so I tried really hard to avoid them when I was doing the problems. They’re a lot uglier than direct recursion in Scheme or Haskell, and they can make the code grow in both size and complexity, which are both good reasons to stay away from them when you can. Sometimes you just can’t, but the problems on 4clojure are a great way to learn what your alternatives are. And that’s the theme of this post: four functions I’d never heard of that can get you out of scrapes where it seems like you just have to use loop and recur or Scheme-style direct recursion.

map-indexed

map is a great function. It’s simple, quick, convenient. for is also a great function, although you have to work pretty hard to get to know it—it’s more of a dark, complex loner that you’re always finding out new things about.

Sometimes, despite map‘s greatness, you really do need to know what the index of your item was in the collection you’re mapping over. I used to cover those cases with for, like this:

; Get every fourth item from the collection s
(for [i (range (count s))
      :when (== 0 (mod i 4))]
  (get s i))

That’s kind of hard to read, though, and it seems wasteful somehow—we’re generating a whole other collection just to index our first collection. And we’re looking up every item we want using get. If s is a vector, it’s probably okay, but Clojure lets you treat all sequence types almost identically, so what if it’s something with horrible linear-time lookups?

This is where map-indexed comes in. map-indexed works just like map, except that it takes a function of two arguments. The first argument is the second argument’s index in the collection. To see clearly how it works, run this code:

(map-indexed #(vector %1 %2) '(a b c d e))
; Returns ([0 a] [1 b] [2 c] [3 d] [4 e])

To get every fourth item like we did above using for, you can do this:

(->> s
  (map-indexed #(when (== 0 (mod %1 4)) %2)
  (remove nil?))

To me, it’s more obvious what that code is doing than the version with for. We’re avoiding the creation of a separate indexing collection, and we even got to use our snazzy ->> macro. There’s probably still a better way to do this, but this shows off the usefulness of map-indexed.

mapcat

As evidence for what a lame-brain I am, I didn’t even know about the concat function before I started doing the 4clojure problems. This function is like Common Lisp’s append: it takes a bunch of collections as arguments, and combines them all into one collection.

Now, suppose you wanted to use map for something, and you need the function you’re mapping to return a collection, but you want to have a single collection containing all the items, rather than a collection of collections. For instance, say you have a list containing several vectors, each containing some sets, like this:

(def td '([#{:a, :b}, #{:c, :delete-me}], 
          [#{:and :all :our :yesterdays :have :lighted :fools},    
           #{:the :way :to :dusty :death}], 
          [#{:delete-me, :goodbye-cruel-world}, 
           #{:out, :brief :candle :!}]))

and you want to map over the vectors and get rid of all the sets with the key :delete-me, returning a sequence of all the sets that don’t have that key. Here’s some code that does that.

(map (fn [v]
         (remove #(contains? % :delete-me) v))
         td)

That’s all right, but look at the output:

((#{:a :b}) 
 (#{:lighted :yesterdays :and :have :our :all :fools} 
  #{:to :way :dusty :death :the}) 
 (#{:brief :candle :! :out}))

Aside from the fact that it jumbled the words and ruined Shakespeare’s poetry, every item is still inside a list. So we can do this to fix it:

(apply concat
             (map (fn [v]
                      (remove #(contains? % :delete-me) v))
                  td))
=> (#{:a :b} 
    #{:lighted :yesterdays :and :have :our :all :fools} 
    #{:to :way :dusty :death :the} #{:brief :candle :! :out})

mapcat does exactly what the code above does: it first maps a function (which is expected to return a collection) over something, and then applies concat to the resulting sequence. So we could write the code as

(mapcat (fn [v]
        (remove #(contains? % :delete-me) v)) 
        td)
=> (#{:a :b} 
    #{:lighted :yesterdays :and :have :our :all :fools} 
    #{:to :way :dusty :death :the} #{:brief :candle :! :out})

and get the same effect.

mapcat is also useful when you need the effect of a function that returns multiple values. Just have your function return a collection, and mapcat that function over a sequence. Then the items from all the individual return values will be strung together into a single collection, as if your function did a regular return, but gave back multiple values. An example, which we’ll see again later with lazy-cat:

(mapcat #(repeat % %) (range 5))
=> (1 2 2 3 3 3 4 4 4 4)
;; For each n in (range 5), generates a list containing 
;; n copies of n. E.g. (2 2)---two 2s, (3 3 3)---three 3s, etc. 
;; Concats them all into a single sequence.

Lazy sequences and how to make them

One of Clojure’s selling points is lazy evaluation. Haskell was the first language I ever knew to incorporate this somewhat bizarre feature.

Most popular languages use eager evaluation—everything gets evaluated as soon as it appears. For example, here’s a Python function that crashes the interpreter:

def fib_seq(a, b):
    return [fib_seq(b, a+b)].insert(0, a)

It crashes the interpreter because when you call it, it says “Hold on, before I can get you the fib_seq of a and b, I have to get you a list containing the fib_seq of b and a + b, and then put a on it. But before I can do that, I have to get a list containing the fib_seq of a + b and b + a + b and put b on it. But before I can do that…” This leads to infinite recursion, because the interpreter never finishes getting what it needs; it eventually crashes when you max out the recursion limit.

It’s probably the case that you didn’t really want a list of all the Fibonacci numbers anyway. You probably just wanted a few of them. In eager languages, you’d usually solve this by passing in some kind of limit so the function can check whether it’s done. The limit has to be passed as an argument to every recursive call of the function, so it takes up space on the call stack and clutters up calls to the function. Still, it’s better than crashing the interpreter.

In Python itself, generators are a much nicer way to solve this problem. In fact, they let you achieve behavior very much like Clojure’s laziness. But I just used Python as an example for eager languages in general; in Java or C you don’t have laziness or generators, so you’d have to pass in limits somehow. Still, let’s imagine that Python doesn’t have generators. Wouldn’t it be nice if you could do something like this?

def fib_seq(a, b):
    return [fib_seq(b, a+b)].laze().insert(0, a)

The effect here would be that calling laze() on the list causes any recursive calls inside the list to go unevaluated until they’re actually wanted. It’s as if you’re telling the interpreter “I’d like a Fibonacci sequence, don’t know how big just yet, so could you hold off until I do know?”

Clojure’s version of the fictional laze() method is the lazy-seq function. You define a function that would ordinarily have infinite recursion, like this:

(defn fib-seq [a b]
    (cons a (fib-seq b (+ a b))))

But you wrap the recursive call in lazy-seq, like this:

(defn fib-seq [a b]
    (cons a (lazy-seq (fib-seq b (+ a b)))))

This example comes from the lazy-seq page on ClojureDocs. You wrap the infinite recursion in lazy-seq, telling the compiler “Yes, I’d like infinite recursion, but no, I wouldn’t like it all at once. Hold off on that until later, wouldn’t you?” Later, when you know how much recursion you actually do want, you can use various functions to get it. take grabs the number of items you give it:

(take 17 (fib-seq 0 1))
=> (0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987)

take-while grabs items until the predicate you give it becomes untrue:

(take-while (partial > 1000) (fib-seq 0 1))
=> (0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987)

Here we generate Fibonacci numbers until we find one that’s larger than 1000.

The big deal with lazy evaluation is that absolutely nothing is evaluated until it’s explicitly asked for. Lazy sequences in Clojure have to be realized—this means that some other code has to ask for them to be evaluated. If you just called (fib-seq 0 1), nothing would happen just yet; you’d create a lazy sequence, which you could save somewhere and evaluate later using other functions. This is somewhat similar to the way arguments to macros work—they aren’t evaluated unless you specifically ask for it. Things are different in the REPL, though. The REPL always asks for things to be evaluated right away so it can print them. So if you called (fib-seq 0 1) in the REPL, you’d keep getting Fibonacci numbers until your JVM crashed, just like in the fictional Python example above.

You can also use lazy-seq to create recursive functions that produce sequences, without using loop or stack-consuming direct recursion. One of the 4clojure problems is to implement map. Here’s my solution, which you should not look at if you want to solve the problem on your own.

(fn my-map [f s]
  (if (empty? s)
    nil
    (cons (f (first s)) (lazy-seq (my-map f (rest s))))))

Haskell, to my knowledge, uses lazy evaluation for everything by default, but Clojure uses eager evaluation most of the time, so you have to tell it when you want laziness. Using lazy evaluation for everything can let you do some amazing compiler optimizations, but it can also make reasoning about your code very hard, and it can mess with timing-based things like IO. One reason Haskell uses monads for IO is so that the compiler can make sure to evaluate that code eagerly, so that your IO gets done exactly when you wanted it. Otherwise the compiler might decide that it doesn’t need that code right now, and never evaluate it.

lazy-cat: Not orange, doesn’t eat lasagna

lazy-cat is to lazy-seq as mapcat is to map. If you have a recursive function whose return value is a collection, but you want the items to be part of the outer collection, you can use lazy-cat to produce the sequence.

Here’s an example from ClojureDocs’s lazy-cat page

(defn n-times-n [n]
    (lazy-cat (repeat n n) (n-times-n (inc n))))
(take 12 (n-times-n 1))
=> (1 2 2 3 3 3 4 4 4 4 5 5)

That function starts by taking 1 and calling (repeat 1 1). This returns (1) (i.e. a list with one 1 in it). The function then recurses, with 2, and calls (repeat 2 2), which produces (2 2) (i.e. a list with two 2s in it). If we’d used (cons (repeat n n) (lazy-seq... like before, we’d end up with ((1) (2 2)) as a result. But lazy-cat calls concat, so we instead get (1 2 2) out of this.

Conclusion

There are lots of other useful things I learned on 4clojure, but I’ve already written almost two thousand words, so I’ll leave you to discover them on your own. A lot of the early problems are easier if you know about iterate and intimately understand how for and reduce work. Early on I found myself falling back on loop in cases where one of those three would have solved everything.

Since 4clojure doesn’t let you use defn, it also helps if you know these two things about anonymous functions:

  1. You can give names to anonymous functions created with fn. You can use the name as a recursion point. See my above map example.
  2. You can’t nest two anonymous functions created with the #(...) syntax, but you can do whatever you want with the (fn [arg1, arg2] ...) syntax. To wit:
    ;; Illegal
    #(map #(when (< 20 %) %) %)
    ;; Peachy
    (fn [s] (map #(when (< 20 %) %) s))
    ;; Hunky-dory
    (fn [ks, vs] (map #(when (and (< 20 %1) (> 20 %2)) %2) ks vs))
    ;; Totally kosher
    (fn [ks, vs] (filter #(and (pos? %) (< 100000 %))
                         (map #(when 
                                 (and (< 20 %1) (> 20 %2)) %2) 
                                ks vs)))
    

    And of course, any of the #(...)-using functions could also have used the (fn [args] ...) syntax, say to allow destructuring inside the binding form.

That’s it for now. Just remember to stay out of 4clojure; it’ll ruin your life.

Advertisements

2 thoughts on “Rhapsody in 4Clojure: map-indexed, mapcat, lazy-seq, and lazy-cat

Comments are closed.