Notes on Modern Operating Systems, Chapter 2, Part 2: Interprocess Communication

Notes on the second part of Chapter 2 of Modern Operating Systems, on interprocess communication. These sections cover the basic ideas of mutual exclusion and coordination of concurrent processes. There wasn’t a ton in this section I hadn’t seen before, but it was a good refresher and a good chance to nail down some of the terminology.


2.3 Interprocess Communication

Three issues to consider in interprocess communication:

  1. Passing information between processes
  2. Preventing concurrent processes from interfering with each other
  3. Sequencing of dependency acquisition

2) and 3) also apply to threads. 1) less so because threads can share memory.

2.3.1 Race Conditions

  • A race condition occurs when two processes are reading or writing shared data and the result depends on the order of execution.
  • The shared data can be in memory, a file, a directory, whatever.

2.3.2 Critical Regions

  • Need mutual exclusion to lock other processes out of shared data when it’s being used.
  • The part of the program that uses the shared data, requiring mutual exclusion, is called the critical region.
  • If you ensure that two processes are never executing the critical region at the same time, you can avoid race conditions.

Four conditions must hold for a good solution to this:

  1. No two processes are simultaneously in the critical region.
  2. No assumptions may be made about speed of processes or the number of CPUs.
  3. No process outside its critical region should be able to block other processes.
  4. No process should have to wait forever to enter its critical region.

2.3.3 Mutual Exclusion with Busy Waiting

In practice busy waiting is rarely useful, so I’m not going to write down any notes about it, but the book lists several bad solutions for mutual exclusion using busy waiting and goes over how they introduce race conditions and other badness.

A loop that runs doing nothing to implement busy waiting is called a spin lock.

The one useful thing the section covers is the TSL (test-and-set-lock) instruction, which is a CPU instruction that atomically sets and checks a lock variable. This instruction is useful in other, less terrible kinds of mutual exclusion.

2.3.4 Sleep and Wakeup

Want to get rid of busy waiting.

Producer-Consumer Problem

  • We’ll use this problem as an example for the mutual exclusion techniques we’re about to discuss.
  • Two processes share a fixed-size buffer
  • One process produces things and puts them on the buffer.
  • The other consumes things from the buffer and does something with them.
  • Access to the shared buffer must be carefully managed.
  • Conceptually, we want the producer to sleep when the buffer is full and wake up and add things to it when there is space. Conversely, we want the consumer to sleep when the buffer is empty and wake up and take things from it when it is not empty.

2.3.5 Sempahores

  • One way to achieve sleep and wakeup is with semaphores.
  • A semaphore is an integer variable that supports two operations: up and down.
  • down decrements the semaphore. If the semaphore is zero, the process that called down blocks until the semaphore is no longer zero.
  • up increments the semaphore.
  • Semaphores can represent “tickets” or “passes” to do a specific operation.
  • You can use them like this to synchronize two processes, imposing a certain ordering on their actions.
  • For example, in the producer-consumer problem, you can use empty and full semaphores to track how many spaces are empty and full in the buffer.
  • Then you can force the consumer to do a down on full so that it blocks when there’s on the buffer to take, and force the producer to do a down on empty so that it blocks when there’s no space for it to add to the buffer.
  • This assures that consumption only happens after production and production only proceeds to a certain point before some consumption happens.

2.3.6 Mutexes

  • A mutex is another way of using a semaphore. It only has the values 0 and 1.
  • If the value is 1, processes can enter the critical region. If 0, they block on trying to enter the critical region.
  • If your processor has a TSL instruction, you can implement mutex locking and unlocking in user space.
  • Importantly, this means you can implement mutexes so that when a thread blocks on a mutex, the thread can yield the CPU before blocking.
  • Most modern OSes allow limited sharing of memory between processes, which allows mutex sharing.

2.3.7 Monitors

  • A collection of variables and functions that guarantee only a single thread at once can access them.
  • Monitors are a programming language-specific feature. The compiler handles mutual exclusion, often using a mutex or semaphore under the hood.
  • Monitors still need a way to block processes that can’t proceed anymore so that another process can start using the monitor. They do this with…

Condition Variables

  • Two operations: wait and signal.
  • If a process can’t continue, it waits on a condition variable and blocks.
  • Then another process can enter the monitor and wake up the sleeping process by doing a signal on the condition variable.
  • You can either immediately suspend the signaling process, resuming it later, and let the awakened process run; or you can require a signal to be the last thing in a function.
  • Without the monitor’s guarantee of only one process running the code at once, wait and signal are subject to race conditions where signals cross and wakeup signals disappear into the void just as a process is about to go to sleep, causing them to never wake up.

Java objects can also be used as monitors with the synchronized keyword and the wait and notifyAll methods. This combined with threads is the basis for all Java concurrency.

2.3.8 Message Passing

  • Semaphores and monitors require shared memory, so they can’t be used in distributed systems.
  • Message passing, on the other hand, is more general and works even across machines.
  • Uses two operations: send and receive. These are implemented as system calls.
  • send gets a process ID and a message.
  • receive also gets a process ID and in C implementations it gets passed an empty message to hydrate.
  • A receiver can block until a message arrives, or return immediately with an error code.

Message passing raises several design challenges.

  • Messages can be lost by the network. You can have receivers send back an acknowledgment signal, and have senders retry the message if they don’t get it after a certain timeout.
  • Local message passing is slower than semaphores or monitors.
  • You can implement message passing with a mailbox that buffers received messages, or you can have threads block and wait on each other. The latter is called a rendezvous.

2.3.9 Barriers

  • Block a group of processes all at the same point in the program to coordinate their start on the next phase.

2.4 Classical IPC Problems

These are all well-known and there’s a ton of information out there on them, so I didn’t take detailed notes about them.

2.4.1 Dining Philosophers

Models a situation where there is possible contention for a set of shared resources, which can lead to possible deadlocks or starvation.

2.4.2 Readers and Writers

A simplistic model of a database, where multiple concurrent processes can read but only one can write at a time.

2.4.3 Sleeping Barber

Models a situation where a bounded size queue of tasks is being serviced by a single process but filled by multiple processes.

Notes on Modern Operating Systems, Chapter 2, Part 1: Processes and Threads

These are my notes on Chapter 2, Sections 2.1 and 2.2 of Modern Operating Systems, Second Edition, by Andrew Tanenbaum. The chapter is quite long, and dense with text, so I’m splitting my notes into three parts. Part 1, this part, is the basics of processes and threads. Part 2 will be Sections 2.3 and 2.4 on interprocess communication and concurrency control. Part 3 will be Section 2.5 on scheduling.

As with Computer Networking: A Top-Down Approach, I bought an old edition of this book for cheap about six years ago and happened to have it laying around. The second edition is from the year 2000, so it’s now twenty years old and somewhat out of date. But operating systems haven’t changed as much in twenty years as the Internet has, so although some of the chapters talk as if single-core processors are still the status quo, and the Case Studies section in the back talks about Windows 2000, I’ve yet to stumble on anything that makes me suspicious.

The writing style is clear, but the sections aren’t always the best organized and sometimes make references to material that hasn’t been covered yet. Partly that’s just inherent to the way operating system concepts all connect into a whole, so although it does bother me a little, it was a reasonable choice. If you can find a newer edition for under fifty dollars it might be worth picking up. There are a couple sections towards the end of the book where Tanenbaum talks about his work on MINIX and his opinions on Linux and Linus Torvalds that read like aggrieved Usenet posts cleaned up for publication, and he also constantly dunks on other people’s research, writing in Chapter 3 that research on distributed deadlock detection is not “even remotely practical in real systems” and that its “main function seems to be keeping otherwise unemployed graph theorists off the streets”. I find it highly amusing, not least because Tanenbaum must have known that the undergraduate students who would mostly be reading this book didn’t care.

Section 2.1 Processes

  • Processes pretend to run sequentially, but behind they scenes they run discontinuously as the CPU swaps between them.
  • Each process maintains its own logical program counter which is loaded into the real program counter when the CPU picks it up to run.
  • Run order and run time of processes might be non-deterministic, so you cannot make any assumptions about when processes will run or how long they will run for.

2.1.1 Process Creation

Four events cause process creation:

  1. System initialization
  2. System call by a running process (fork on Unix).
  3. User request to create a process (e.g. running a terminal command, clicking an icon).
  4. Batch job initialization (on mainframes, submitted by users).
  • fork on Unix creates new processes.
    • The child process starts with the same memory, environment strings, and open files as its parent.
    • Then it runs execve or some similar system call to change the memory image and run a new program.
    • The child’s memory image is a copy of the parent’s into a new address space. Children do not share address spaces with their parents, so the parent and child cannot access each other’s memory.

2.1.3 Process Destruction

Processes can terminate for the following reasons:

  1. Normal exit (voluntary), triggered on Unix by the exit system call
  2. Error exit (voluntary), e.g. program cannot finish, such as compiler called on nonexistent file
  3. Fatal error (involuntary), e.g. divide by zero error, attempt to access protected memory (null pointer dereference in C)
  4. Murder by another process (involuntary), e.g. someone calls kill on it.

2.1.4 Process Hierarchies

  • Do not exist on Windows.
  • On Unix a process and its descendants form a group. Process groups share some signals and other things.
  • Since all processes on Unix are children of the init process, there is also a global process group.

2.1.5 Process States

Three basic states:

  1. Running (using the CPU)
  2. Ready (stopped, but could be picked up and run at any time)
  3. Blocked (unable to run until some event happens)
  • Running ⬌ blocked can happen due to a system call made by the running process, or automatically if the process reads from a pipe or a special file like a terminal or socket when there’s no input available.
  • Running ⬌ Ready is managed by the process scheduler, which decides what process to run next.

2.1.6 Implementation of Processes

  • The operating system has a process table that stores the following data for each process:
    • Process state (running, ready, blocked)
    • Program counter (i.e. address of the next instruction to execute)
    • Stack pointer (i.e. address of the top of the runtime stack)
    • Memory allocation
    • Status of open files
    • Accounting and scheduling information
    • Anything else that needs to be saved between process switches

The process table enables interrupt execution. (i.e. hardware interrupts, which “interrupt” the currently running process to deliver something from a hardware device)
1. Hardware pushes the program counter, registers, state, etc. to the stack
2. Hardware loads a new program counter from the interrupt vector, a special location in memory assigned to each class of IO device (e.g. hard disk, timer, terminal devices in mainframes).
3. Assembly language procedure saves the current values of the registers in the process table.
4. Assembly language procedure sets up a new stack.
5. Interrupt service written in C runs. It reads input from the IO device and buffers it into memory.
6. Now that the input from the hardware has been pulled in, the scheduler runs and decides which process to run next.
7. The C procedure returns control to assembly
8. Assembly starts the new current process; it loads up the registers, resets the program counter and stack pointer, etc.

2.2 Threads

2.2.1 The Thread Model

  • A process can have multiple threads of execution
  • Processes group resources together; threads are the thing that actually runs on the CPU.
  • Each thread has its own:
    • Program counter
    • Registers
    • Runtime stack with execution history
  • All the threads in a process share:
    • An address space (so they can access each other’s memory)
    • Global variables
    • Open files
    • Child processes
    • Pending alarms
    • Signals and signal handlers
  • All threads in a process share memory so they can read and write each other’s variables and even stacks. This is part of the benefit of threads; it allows more efficient cooperation between concurrent entities (compared to processes, which can’t read each other’s memory and thus must use other means of communicating).
  • Threads have four states:
    1. Running
    2. Blocked
    3. Ready
    4. Terminated
  • Processes start with one thread and create more using library calls.
  • Threads can exit by calling another library procedure, e.g. thread_exit.
  • In some systems threads can wait for other threads to exit by calling thread_wait.
  • thread_yield gives up the CPU for another thread. The CPU won’t actually stop a thread until a process switch occurs, so threads in the same process have to yield to each other.
  • Threads introduce many complications, including synchronizing their shared memory and coordinating the yields so you don’t waste time running blocked threads.

2.2.2 Thread Usage

  • Threads let a single application do multiple things, modeled as sequential jobs running side by side instead of as a mess of interrupts and signals.
  • On multiprocessor systems or multi-core CPUs, multiple threads can actually run simultaneously. On single-core systems threads will switch on and off the one core.
  • Threads can start up faster than processes since they require fewer resources to get going.
  • Threads can block and cooperatively switch off since they can signal each other using their shared memory.

2.2.3 Threads in User Space

  • Processes manage their own threads in user space.
  • The kernel knows nothing of threads.
  • Each process has a thread table with the threads’ program counters, stack pointers, registers, etc.
  • The threads run on a run-time system, a library of thread-managing procedures

Advantages:

  • Thread switching is very fast since applications don’t need to trap to the kernel to switch.
  • Applications can implement custom thread scheduling.
  • User-space threads scale better since they don’t require space in the kernel.

Drawbacks:

  • If a user-space thread makes a blocking system call, it will block without giving you a chance to switch threads before handing off control to the kernel, because the kernel knows nothing of threads and cannot give you that chance. This means the whole process is blocked.
  • If a thread causes a page fault, e.g. because it needs to run some code that isn’t loaded into memory yet, it gets blocked, again without allowing any chance to switch threads.

Possible solutions to the drawbacks (that all suck):

  • The select system call can detect when a system call will block. You can use it before making a possibly blocking system call, and if it detects that the system call will block, you can switch to another thread and make the system call later. E.g. if the current thread needs to read some data, but the read call will block on a disk read, you can switch threads until the interrupt executes later and the data is buffered in memory, and then switch back to that thread.
  • Using select sucks because you have to wrap all your blocking system calls to do the select check, so you have to modify the kernel, and it’s also annoying and tedious logic to implement.
  • You can also use a clock interrupt to stop the currently running thread every so often and check if it’s been blocked and should be swapped for another thread.
  • Using a clock interrupt sucks because it’s an ugly hack (you might still let the blocked thread have the CPU for a while before the clock interrupt stops it), and apparently each process only gets one clock so if you use it to manage thread swapping you can’t use it for anything else.
  • System calls can all be rewritten to be non-blocking. E.g. rather than waiting for input to be available, a read call would just return 0 (for 0 bytes read) and you could try again later.
  • Rewriting system calls sucks because you have to rewrite a bunch of system calls around threads and that’s not cool.

2.2.4 Threads in the Kernel

  • You can instead put threads in the kernel. The kernel then manages the thread table and thread swapping instead of the user program.
  • All blocking calls are system calls and when a thread blocks, the kernel can choose another thread to run. If there is no suitable thread in the same process, the kernel can choose to let a different process run for a while.
  • Creating, destroying, waiting, etc. are all more expensive since they are system calls.

2.2.5 Hybrid Implementations

  • The idea is to combine the lightweight nature of user-space threads with the ease of swapping out blocked threads that comes with kernel threads.

2.2.6 Scheduler Activations

  • The kernel assigns each process some virtual CPUs (or they can request and release them). [It’s not actually clear to me from the text how the virtual CPUs play into this process.]
  • The kernel can detect when a thread blocks and notify the process through a mechanism called upcalls.
  • Upcalls work by activating the process’s runtime system at a known starting address, similar to a Unix signal. Once activated, the runtime system can choose to run another thread instead of the blocked one.
  • When a hardware interrupt runs, the process will be resumed and the runtime system can decide which thread to run. It can run the thread that was interrupted, or, if the hardware interrupt completed something that a blocked thread cares about (like reading in some data and buffering it in memory), it can choose to run the blocked thread, or it can just choose to run a completely different thread.
  • This system isn’t great because you now have two-way calls between the kernel and user space, whereas a normal operating system has a nice layered model where user space calls the kernel but not the reverse.

To be continued next time with interprocess communication. And then…in the epic final part of Chapter 2…SCHEDULING!!!

Aphorisms from The Pragmatic Programmer(‘s first half)

I can’t remember the last time I read more than halfway through a tech book. Tech books tend to peak early. You get more than half the value of the first half of the book. A lot of times the second half is specialized or optional topics and a lot of times those topics are out of date unless you buy the book the second it’s off the presses. Of course, I’m also lazy, and about halfway through a tech book I’m usually starting to lose interest and decide I’m okay with building CRUD web apps for the rest of my life if it means I don’t have to read any more of this.

I still haven’t made it all the way through The Pragmatic Programmer either, but I’ve made it further than I usually do—about two-thirds of the way through. And I actually want to read the rest of it. I don’t always like its metaphors or cutesy terminology like “binary chop”, but it’s memorable and easy to read.

The Pragmatic Programmer is structured as a list of 100 tips. Some of them cover code structure and design (“Tip 47: Avoid global data”). Some cover development practices (“Tip 28: Always use version control”). And some relate to personal development (“Tip 11: English is just another programming language”), philosophy (“Tip 3: You have agency”), and ethics (“Tip 99: Don’t enable scumbags”). Each tip cross-references related tips. The end result feels more like reading a bunch of blog posts than a single coherent book. The tips are grouped broadly by topic, but a single tip can go from explaining a technical concept to coding advice to rats chewing through cables. As “pragmatic” suggests, the book imposes no hard lines on its topics and the commentary on the tips goes wherever it needs to go.

In the rest of this post, I’ll go through some of the tips and my idiotic hot takes on them.

Tips 1–4

I tried to read The Pragmatic Programmer for the first time about six years ago. I looked at the Kindle sample, which only had Tips 1–4. I thought, “Wow, these guys are smug as hell”, deleted the sample, and didn’t come back to the book for another six years.

Tips 1–4 are basically the preamble that some college professors put in their syllabi where they lecture you about how you’re an adult now, you need to take responsibility for your own actions, only you benefit from attending this course and finishing your assignments, so on, yada yada, etc. If you’re a bit of a beaver-cleaver, you’ll read these four tips, pat yourself on the back for being so responsible and never making lame excuses, and a deep contentment will warm you from the heart outward for the rest of the day. Otherwise you can just skip to Tip 5. Go on, be a rebel. I won’t tell.

Tip 5: Don’t Live with Broken Windows

When it comes to software rot (or “technical debt”), there is definitely some truth to this idea. If developers feel the code is badly designed and full of hacks, they will contemptuously pile more hacks on top. I’ve seen it happen firsthand. We spent seven months talking about rewriting the entire codebase because the whole thing was such a cancerous dumpster fire that it could never be improved. Then most of the team left because the code was such a cancerous dumpster fire, and about a year after that, management finally gave the go-ahead to rewrite the code.

On the other hand, The Pragmatic Programmer pushes an absolute zero tolerance policy towards broken windows, which has not been realistic in my experience. All the codebases I’ve worked on were written as quickly as possible by people who are long gone. They’re usually bad in various ways. Sometimes in ways that are easy to fix—bad formatting can be fixed by automated tools, and fancy IDEs like IntelliJ IDEA can stick a big glowing flag on certain kinds of anti-patterns and code smells. Sometimes you can go back and patch up the rot and make sure to do better going forward. But sometimes the broken windows are so fundamental or widespread that you can’t fix them without tearing down huge swathes of code and rebuilding deep-seated parts of it. And sometimes you simply aren’t allowed the time to clean up bad code, because your team is tiny and you can’t afford to put a whole developer on a cleanup job for a month unless customers are complaining about it.

However, sometimes, even if you can’t fix the broken windows, you can at least quarantine them so most of your daily work doesn’t touch that code. One codebase I worked on had an absolutely hideous subsystem where a developer wrote all the code in a single god class of 500-line methods that mutated local variables and then returned them to be passed as arguments into other methods that also mutated them. It was horrid, and so brittle that with the original developer gone there was no chance of fixing it without breaking something, but on the bright side, it was all in a single god class, so we could just never look at that class until we had to.

Tip 9: Invest Regularly in Your Knowledge Portfolio

Steve Yegge points out in several of his blog rants that the half life of technical knowledge can vary a lot. So while it might be tempting to study the hip new language or framework that everyone on Hacker News is talking about, it’s a good idea to invest in long term assets as well.

Algorithms, data structures, math, and statistics rarely, if ever, go out of date. Unix tools are practical, have been around forever, and show no sign of going away, so getting familiar with grep and sed can be a great investment. It’ll pay off when you have some weird bug that only happens on the third Thursday of every month and you can bust out a grep command to find all those logs in your log files without even looking at the man pages. find, tr, cut, awk, curl, nc, jq, and screen are also great commands to look into, as well as the shell language itself. Recently I finished a script in four hours that I thought was going to take two days, because I realized I could replace a bunch of API calls and ugly JSON structure rearrangement that I was going to write out in Python with a short bash script using curl and jq. Editors like Vim and Emacs have also been around forever, and learning some of the more powerful commands can save you a ton of time. For a while, I had to debug data pipelines, which usually meant receiving a CSV of data that someone wasn’t satisfied with, taking the UUIDs out of it, and querying a bunch of tables to find out where in the pipeline data got dropped or corrupted. cut and Emacs saved me a ton of time; I could use cut to extract the UUIDs from the CSV, then paste it in Emacs and use replace-regexp to quote them and reformat them into a comma-separated list that I could just paste into an IN clause of an SQL query. Which brings me to SQL—it’s been around since the 70’s, and unlike disco, it doesn’t seem to be going anywhere, so you could do worse than to study it. So many of the weird bugs ORMs cause only become clear once you understand the SQL it must be generating.

The Pragmatic Programmer also suggests investing time in some of the weirder languages to have a bigger stock of ideas to draw on. I think this idea bears repeating. I see too many people online who know Python, Ruby, and Javascript and want to have a language war over which of those three is better. I also see people who know Python, Ruby, and Javascript complain that Java is ancient, inscrutable gibberish that somehow is also the most widely used language in the world. I was this person. When I first started learning to program, I wrote a huge rant (that I thankfully never published) comparing Visual Basic, Java, and Python. My arguments were that Visual Basic was better than Java because you have to write End If, so you know it ends an If, whereas in Java it’s always just a brace, but Python was the best because whitespace. I was stupid. What I was doing was equivalent to reviewing three books which all cover the same material in the same order and only comparing what fonts the text was printed in. I definitely support spending time with Icon, Prolog, OCaml, Io, or Idris, and learning just how different a language can be, before discussing differences between languages.

Tip 10: Critically Analyze What You Read and Hear

I majored in computer science, but before that, I was an English major, and before that, I spent ten years wanting to be an English major. One of the most important things you learn studying humanities is exactly what this tip says—how to critically analyze what you read and hear.

When you’re evaluating a new technology, be critical about the claims. Don’t just accept flashy benchmarks; probe the methodology behind those flashy benchmarks. Think about the context behind claims: maybe a group at Amazon is building great things with an experimental JIT compiler for Lua, but does that mean your team will be able to do the same? Maybe the software actually isn’t business critical and only has five users. Maybe the creator of the experimental JIT compiler is on the team and can fix any bugs they run into. Maybe Amazon, being Amazon, can make it worth the creator’s while to solve their problems.

Being critical doesn’t mean being constantly and unreservedly cynical. You can be positive about things. You can love Rust if you want. But you should love Rust because you critically analyzed the claims made about it and decided they were reasonably accurate, and you should always be on the lookout for new information that contradicts your current point of view. When you find that new information, critically analyze it and decide if it might be true or not. Then decide if you need to care. Even if your favorite language chokes on 5 HTTP requests per second, you can still enjoy hacking together a hobby project with it. But be honest; don’t convince your company to build something in your favorite language when there’s a strong chance the system will need to handle 1,000 HTTP requests per second.

Tip 14: Good Design is Easier to Change Than Bad Design

This is true, but no design is infinitely flexible. You always have to make some choices that will lock you in. I’ve seen (and written) a ton of code that was easy to extend in a way that turned out to never be necessary. I’ve also seen (and written) a ton of code that was built to be extensible, but not quite extensible enough to capture a later use case. This is where the tips about making your design orthogonal and decoupled come in. If you can rip out a whole module and replace it with something else, then you can change it, even if it’s not as easy to change as it would be if you’d thought of something when you first designed it.

Tip 17: Eliminate Effects between Unrelated Things

This is the larger principal behind a ton of well-known best practices. Don’t use global variables, because they provide a channel for two unrelated things to affect each other. The C preprocessor is a pain because unrelated files can affect each other if they happen to be compiled together. I also find a lot of semi-common practices in hyper-dynamic languages like Ruby, Javascript, and Python to have a lot of potential to create effects between unrelated things. Javascript didn’t used to have real modules, so libraries would shove things in the global namespace. It was a complete nightmare to deal with libraries fighting for symbol real estate in the window object. (JS for some reason allows just $ as a function name, so everybody fought over $ until jQuery came in and definitively staked claim to it.) I’ve had some infuriating bugs due to Ruby’s ability to define a class across several files combined with Rails’s implicit global imports. In Ruby, if you define a class with the same name in different files and load both of those files, they will be combined into a single class. Rails will automatically load every file you put in its blessed directories. So I’ve encountered ridiculous bugs where two programmers made classes with the same name in different contexts, and Rails helpfully imported both files, causing Ruby to combine those two unrelated classes into a single class and breaking code in two places that had nothing to do with each other.

Tip 19: Forgo Following Fads

This tip is pretty useless. The idea is good, but it offers no guidance on figuring out what’s a fad and should be ignored. I’ll offer a pointer back to Tip 10, though: critically analyze what you read and hear.

Tip 20: Use Tracer Bullets to Find the Target

This tip is awesome and the book is worth the cover price just for this.

The basic idea is that when you start a new system, you build an end-to-end skeleton of the entire architecture with some example flows. Stub out anything you have to, just get the end-to-end working as quickly as possible. This will show you what the architecture you have planned will look like in practice. It’s called a “tracer bullet” because it lets you see where you’re aiming, and you can assess how close your shot is to the target.

The book suggests showing your skeleton to customers. This probably depends on your organization’s culture and relationship to customers. Nothing I’ve worked on benefited from being shown to customers. The customers would get hung up on details like the color of the banner on mocked up web pages, or they would seize on random technical details that they happened to know about (“It won’t allow escalation of privilege, right?”, “The JSON responses are small and streamlined, right? That makes it fast.”, “Did you use Angular? Angular makes rich interfaces.”), or they would look at it, grunt, and go back to whatever they were doing. But it can be a great benefit to show it to engineering leadership, or to engineers on other teams, or product managers or UX experts. And if you happen to have customers able to give useful feedback, sure, show it to them.

Tip 32: Read the Damn Error Message

Part of this tip discusses rubber ducking, which is where you explain your problem to a rubber duck because your problems are idiotic and should not be inflicted on other humans. Because your problem is so idiotic, the rubber duck’s silence will make you feel shame, causing you to quit the industry and take up competitive bass fishing.

In case you don’t want to quit the industry yet, I’ve found that a more productive strategy than talking to a duck is writing out what the buggy code is doing as if you’re manually running through something you’ve written for an interview. If you don’t know why something happens, note that and the result and move on. Start a little before the bug, and keep going to a little bit after—the moment where the exception is thrown or the erroneous result returned is usually a good place. For me this usually causes something to click eventually, and I see what I’m missing. It can also help, once you’ve isolated the buggy code and the bad result it outputs, to treat it like a brainstorming session—what are five ways this bad result could have been created? Get completely crazy; “solar wind knocking an electron out of alignment, causing the bits to swap” is on the table. How do those five ways relate to the surrounding code? What else is going on in the program at the same time this code is executing?

Tip 37: Design with Contracts

I’ve come around a little on statically typed languages. Java is still kinda painful to write, but at least it’s also kinda painful to write for that guy who was three months out of university when he was asked to write an entire analytics tool in a week and didn’t have time to write comments or real error handling—his code is probably a mess, but at least Java forced him to leave a few more breadcrumbs about what the hell he was doing than Python or Javascript would have. And newer languages like Go and Rust reduce a lot of the annoying parts of Java by adding better type inference and other goodies.

Statically typed languages force you to write basic contracts for every function. It’s nicer to write contracts that express some real invariant about your data—“x represents an age, so it must be between 0 and 200″—but at least “x is an integer” is more than nothing.

But contracts are most important for system boundaries, where different people’s code comes together, and nowadays that often happens over a network. So that’s where statically typed data transfer formats, like protocol buffers, become useful.

Conclusion

There’s lot of good advice in The Pragmatic Programmer, and most of the tips I didn’t bother to discuss because I just nodded when I read them. But there’s also some stuff implicit in the book’s worldview that I don’t agree with—I have a whole rant about “people are not their code” that I might go into when I finish up the book. I recommend you read it, but that you keep in mind Tip 10. Don’t be a beaver-cleaver. Be a cool rebel who studies humanities. Donne is subversive, man.

A Comparison of Four Algorithms Textbooks

At some point, you can’t get any further with linked lists, selection sort, and voodoo Big O, and you have to go get a real algorithms textbook and learn all that horrible math, at least a little. But which book? There are tons of them.

I haven’t read every algorithms book out there, but I have read four of them. Maybe my experience with these four can help guide your decision. The four books are Algorithms, by Dasgupta, Papadimitriou, and Vazirani (hereafter called Dasgupta); Introduction to Algorithms, by Cormen, Leiserson, Rivest, and Stein (hereafter called CLRS); The Algorithm Design Manual, by Steve Skiena (hereafter called Skiena); and The Art of Computer Programming, Volumes 1-3, by Donald Knuth. I’ll do a five-point comparison, going over the prose style, code use, mathematical heaviness, breadth and depth of topics, and position on the continuum between theoretical and practical of each book.

There’s one thing you should know before I start: Dasgupta is available for free online, while Knuth and CLRS aren’t (well, they probably are, but not legally). So that might make your decision for you. I haven’t been able to determine if Skiena is legally available online. The book is posted in either PDF or HTML on a few legit-ish looking sites, but Skiena’s own page for it doesn’t mention anything about it being freely available, so proceed with caution.

Does it have to be one of these four?

Not at all. There are lots of good algorithms books that I’ve never read. You can even learn a lot by just reading CS.SE and Wikipedia. But these four are the ones I’ve personally used. Many people like one of these four, but they do reflect my taste and biases. But even if you decide to go with a different book, this overview might at least tell you what features to notice.

Prose style

A good algorithms book will usually explain its topics in three ways: with a natural language prose explanation, with some kind of implementation code, and with mathematical notation. When you’re starting out, the prose is usually the most important part, because it takes you from “What the hell is a binary search?” to “Got it, how do I write actual code for a binary search?”

For me, the winner on prose style has to be Knuth. He’s just a masterful writer. You might have to read some of his sentences twice, but that’s only because he gets across twice as much information as a lesser writer. Knuth’s prose explanations got me to understanding on several topics I’d despaired of ever getting with other books, like B-trees and merge sort.

Skiena is also an excellent prose stylist. Where Knuth is elegant and flowing, like the John Milton of algorithms, Skiena is direct and sharp, like the Ernest Hemingway of algorithms. Knuth and Skiena are good at different things: Skiena is good at finding a simple, direct way to explain things which are traditionally made more complicated than they need to be. His overview of Big O is amazingly clear and simple, as long as you have at least some memory of calculus. On the other hand, some topics are inherently really complicated, and this is where Knuth shines: he’ll give you several different ways to view a complicated topic, and chances are at least one of them will work for you.

Knuth is much quicker to jump to math, while Skiena mostly eschews math and tries to keep everything intuitive. We’ll discuss that more in the section on mathematical heaviness.

Dasgupta has a pretty good prose style too, more patient and beginner-friendly than either Skiena or Knuth. Sometimes, I found Dasgupta too verbose and felt the authors belabored the obvious too much, but other times that was an asset rather than a weakness (e.g. in the material on Fast Fourier Transforms).

CLRS probably has the weakest prose style of these four books. It’s closer to the standard math text style, written with a formal and distant air, sometimes favoring precision over clarity. It’s strictly functional, but tolerable if you already have some understanding of the topic.

Code use

Knuth’s big weakness is code. He uses two different notations, pseudocode and MIX assembly language, his imaginary assembly for his imaginary computer. I find both of them extremely difficult to follow.

The problems with MIX should be pretty obvious: it’s assembly language, so it’s just hard to read. Not only that, it’s pretty different from the MIPS, ARM, or x86 assembly that modern readers might have seen. It’s designed to be run on either a decimal or binary architecture and assumes a six-bit word. Knuth put a ton of effort into creating this imaginary machine and its assembly language. Since it’s made up, MIX assembly is still technically pseudocode; but MIX is to pseudocode as Quenya is to pseudo-languages. Newer editions of TAoCP use MMIX, which was redesigned to reflect modern assembly languages. I haven’t seen it yet, but I imagine it’s still hard to read since it’s assembly language.

Knuth also uses high-level pseudocode, but I find that hard to read too because it’s organized like an unstructured language with goto statements. If I were planning to implement the algorithms in an unstructured language, it would probably be fine, but I’ve always found that there’s a nontrivial translation process between Knuth’s unstructured pseudocode and structured pseudocode suitable for implementation in a modern language.

CLRS and Dasgupta both use high-level pseudocode that resembles Pascal, although not too slavishly. CLRS expresses some things as if they were methods or fields of an object, in a semi-object oriented way. Skiena also does some of this, but in addition he uses real C code.

A lot of modern books use C or Java throughout, which might be more to some people’s tastes. These books reflect my taste, and I like my algorithms language-independent. I don’t mind how Skiena uses C—he uses it mainly to show how to implement something when there’s a significant gap between theory and practice. But I’m glad he stuck to pseudocode for the most part.

Mathematical Heaviness

In this category, going from heaviest to lightest, we have roughly Knuth > CLRS > Dasgupta > Skiena. Dasgupta and Skiena are pretty close, while there’s a significant gap between them and CLRS. There’s a minor gap between CLRS and Knuth, and it really depends on the topic, but CLRS just didn’t take on as many inherently mathematical topics as Knuth did (especially in Knuth’s Volume 2, Seminumerical Algorithms, where he deals with random number generation and efficient arithmetic).

In general, Dasgupta is geared towards people (like students) who just recently learned a little math and are going to learn a little more, but haven’t internalized it all yet. Skiena is geared towards people (like working programmers, or graduate students) who’ve learned some math and forgotten a lot of it. Skiena is also student friendly, but he stresses practicality more than anything, and sometimes, in practice, you gotta do some math.

CLRS is also for students, but it’s more at the graduate student level, especially the later “Selected Topics” chapters that can sometimes get pretty heavy. Knuth seems more geared towards graduate students near the end of their Ph.Ds and practicing researchers.

All four books require you to know some combinatorics and a little number theory, and to have at least a vague memory of limits and function growth from calculus. That’s about it for Skiena and Dasgupta, though Skiena seems to expect you to have internalized the math a little more than Dasgupta does. The basic chapters of CLRS are just a little beyond Skiena in math background, but some topics get quite heavily mathematical. Knuth gets pretty heavily mathematical in almost every topic.

I’ve been conflating two things in the discussion so far, since they’re sort of linked: actual knowledge of theorems and axioms in mathematics, and the ability to read mathematical writing and proofs and follow an author’s reasoning (what’s often called “mathematical maturity”). Knuth requires the most technical knowledge of mathematics of the four here, and he’ll also stretch your reading ability pretty far; but CLRS will also stretch your reading ability, and Dasgupta is no cakewalk either, although I do think Dasgupta qualifies as a picnic. Skiena doesn’t have any proofs, but he kind of sneaks in proof-like things with his “war stories”, which usually read quite a bit like proofs, extended with discussion of some of the approaches he tried that didn’t work out.

If you’re getting your first algorithms book and you’re a CS undergrad, Dasgupta or Skiena is easiest to follow. CLRS will stretch you a little, but it’s manageable. If you’re a math or physics undergrad, CLRS shouldn’t be too hard and Knuth might be doable.

Breadth and Depth of Topics

Dasgupta is the loser here; not only does the book cover fewer topics than the others, the topics it chooses to cover are poorly organized and somewhat eccentric. I’m not sure why the authors threw in the awful chapter on quantum computing at the end; it’s totally incomprehensible unless you already understand quantum mechanics, which is no mean feat.

CLRS has the best balance of breadth and depth: it covers basic data structures; some more advanced ones like red-black trees, B-trees, and the union-find data structure; algorithmic paradigms like greediness and dynamic programming; graphs, shortest paths, and network flows; sorting; amortized analysis; and an assortment of other topics such as number theoretic algorithms, cryptography, linear programming, string matching, and NP completeness.

Skiena covers about the first third of CLRS, but he does a lot more with NP complete problems and how to design approximation schemes for them than CLRS does.

Knuth, of course, is the master of depth. Volume 1 covers math background and fundamental data structures; Volume 2 covers random number generation and arithmetic; Volume 3 covers searching and sorting, going through various sort routines and some more advanced data structures, such as B-trees, as well as developing the whole theory behind hash tables and how to choose hashing functions and table sizes; and Volume 4 covers combinatorial algorithms. Volume 4 is split into three subvolumes; right now, only Volume 4A has actually come out.

If you want to get just one book, I would get Skiena or CLRS. They include all the most important topics both in practice and for undergraduate classes. Skiena, as a bonus, even includes some computational geometry material, in case you’re doing video games or computer graphics.

Theoretical vs Practical

Skiena is the most relentlessly practical of this bunch. Knuth was actually pretty practical at the time Volume 1 came out, but he became less and less practical as time went on because people stopped writing things in assembly or using tape drives. Both give you implementation tips to make your code perform better, although Knuth’s can be hard to capitalize on since you’re not writing in assembly.

CLRS and Dasgupta are both theoretical in the sense that they mostly ignore the computer. Everything in CLRS will work, but sometimes their approach is too “straight from the theory”, as I discovered when I implemented Karp-Rabin according to their discussion and put it on Code Review.SE after struggling with numerous overflow-related bugs, only to have someone suggest a different approach that rectified all the performance issues and handled overflow elegantly.

Dasgupta and CLRS are both still good books, and if you’re just starting out learning algorithms, don’t get too caught up on this issue. Write your code in Python or Ruby, some quick and easy language that also ignores the metal. You’ll get the idea of the implementation, and you can deal with the issues around implementing it in some other language later. (Python even looks like the pseudocode in CLRS.)

Conclusion

I probably use CLRS the most, because of its breadth of coverage and because its pseudocode is so much easier to follow than Knuth’s. If I don’t understand the concept or the mathematical underpinnings, I’ll go to Knuth for the deep discussion, and the excellent prose style.

I just got Skiena recently, but in the future, I expect him to usurp CLRS somewhat. He covers most of the same material, but his prose style is more direct and his approach is more practical. Skiena is excellently organized and has a catalogue of problems and their algorithmic solutions in the second half of his book, good for browsing when you’re stumped.

I don’t use Dasgupta that much. Dasgupta was the text for my undergrad algorithms class, and while it was good in that capacity, it’s not really a book that continues to be useful once you’re past the first-semester course, mainly because of the lack of breadth in coverage and the eccentric organization and choice of topics.