Finally writing something in Clojure: The Decision

I’ve been interested in functional languages for a while. My first exposure to functional programming was in Alan Gauld’s tutorial, where he introduced some of the ideas from functional programming used in Python. I’d like to say my mind was blown by this incredible new programming paradigm, but at that time I had been programming for all of three months and barely understood functions (and classes were right out) so it was more like my mind was a steel wall and functional programming was the ball from a medieval cannon. It just bounced right off.

But back when I was doing linguistics, I was always attracted to obscure languages, especially the ones with baroque, logical, and elegant grammars. I’m talking about Latin, Turkish, Lakhota, Sanskrit, that kind of thing: languages with complex yet consistent morphology, that combine a small number of concepts in a logical way to build up short, expressive sentences. Functional languages are sort of the programming language equivalent of that, so it was almost inevitable that I’d end up being attracted to them. (Perl fans apparently say that Perl is like the English of programming languages. You can take that comparison whatever direction you like.)

Anyway, I bought the book Seven Languages in Seven Weeks, which included four functional languages: Erlang, Scala, Clojure, and Haskell. Scala looked too much like Java crossed with Pascal for my tastes, and while Haskell seemed really cool, monads seemed really uncool. At first I liked the look of Erlang. I loved Prolog, also covered in the book, and Erlang is loosely based on Prolog, so it includes some of Prolog’s most ingenious ideas, like the atom datatype. The concurrency stuff was also pretty awesome. I had never done any concurrency before, aside from a few tiny book examples with Python’s threads, but I came to really like Erlang’s “smoke signals on a mountaintop” approach to representing synchronous processes.

There was just one problem: I write a lot of utility programs for working with text, for both my linguistics hobby and my writing hobby. Erlang is awful at dealing with text. Its strings are actually lists of characters, and its characters are actually integers. I thought about going for it anyway with Erlang, but then I read the chapter on Clojure. While I wasn’t too excited about Lisp and the explanation of concurrency in Clojure was total gibberish to me at the time, I decided it was probably the best way to go, since it has Java backing it up. Java is okay at dealing with text; it’s no Icon, but it does at least have a string type.

So that was that, and I learned Lisp, and everything was good.

NO! No one ever learns Lisp that easily! In fact, I had already been struggling for most of two years just to understand its syntax. Reading the Clojure chapter in the Seven Languages book (and learning more about prefix notation) made the syntax itself pretty clear, and I just pounded the concepts of s-expressions, macros, and working with lists into my heading by reading introductions to them in every book and on every website I could find. I later took a programming languages class at UC Davis where we used Common Lisp, and that made everything pretty clear (and also made me appreciate Clojure’s little bits of syntax, like [] for vectors and {} for hash maps). By the way, the professor was a total Java lover, so I’m surprised he didn’t have us use Clojure.

I installed Clojure, messed around with it, got bored, decided I’d better stop trying to learn functional languages and should focus on stuff that would get me a job, like PHP (ugh), and tried and failed several more times to do something with Clojure. I used it once to implement the Lucas-Lehmer algorithm for primality testing, but that was about it. (My number theory teacher had asked us to test some rather large numbers using Lucas-Lehmer. It would have been plausible to do it by hand with a calculator, but instead I had fun writing the algorithm in Clojure, and presented the code as my answer, along with its results for the given numbers. Full marks! If only the rest of my number theory homework could have been solved by Clojure.)

Recently I decided to publish a book to the Kindle Store. Amazon likes HTML in one big file. I had two problems:  I saved everything in separate ODT files (one per chapter), and Open Office spits out the most awful HTML ever, and includes formatting flaws caused by weird invisible control characters. As an example, if you hold down shift and hit return in Open Office, you get some kind of weird pseudo-newline. When viewing the file in Open Office, it’s indistinguishable from a newline; in the HTML output, it gets turned into a <br /> tag, whereas a normal newline becomes a <p> tag. So when you view the document in a browser, there’s an empty line between every paragraph, except for the ones where I was thinking about my writing and not whether Open Office was going to mangle my formatting later, and held down shift when I hit return.

I tried using PyUNO to make Open Office merge my chapters into one big file, but there’s barely any documentation on PyUNO (or any of the other options to access the API) that I could find, and even getting it to start was a challenge. When I did get it started, I had a fun API of funness to master, with all the functionality inside classes which were inside packages which were named with like five layers of qualification. (The API seems to be written in Java, and Java code is all like that.) I found some scripts that claimed to do what I wanted, but they all crashed when I ran them, so I gave up on PyUNO and decided to just convert the files to HTML using Libre Office’s command line batch conversion powers (which Open Office claims to also have, but which didn’t work for me) and work on them in that state.

A while ago I wrote a truly awful Python script to convert an HTML page into a PDF. I used Python’s built-in HTML parser class to massage the text into a usable form, and the ReportLab library to do the conversion. The script’s user interface was totally braindead. For a while you had to open the source code and change the value of a global variable to change the title. Yes, I know that you should never make your user modify your source code to get basic functionality, and using global variables is a slippery slope that will lead us back to the days of BASIC and goto statements. I had to interact with a ReportLab call which required a function as input; this function had to make some more API calls that would set various parameters, including the title. I couldn’t make a function that took the title as an argument because the ReportLab function was going to be the one calling it, so it had to conform to ReportLab’s expectations. Then my brain finally turned on and I remembered closures; I made a factory function that would take the title as an argument and close another function around it. That closure could then be passed to the ReportLab API.

I went into that story for two reasons: closures are useful after all (no other method could have solved this problem so cleanly), and in the end, no matter how awful it was, I got my HTML pages as PDFs, so I figured I could do that kind of thing again. I was going to use Python this time too, but I’d been thinking about picking up Clojure again, so I decided to put in some effort and try to do the project in Clojure. While I knew Python had the libraries and documentation to handle the problem, I felt pretty secure with Clojure because you can always dip into Java. So far I haven’t had to, though.