Understanding Dynamic Arrays and Linked Lists

Almost all sequential data structures are implemented using either dynamic arrays or linked lists. Both of them are linear structures that grow as you add new items, but sometimes you can really torpedo your program if you use the wrong one. Here I’ll talk about the differences between them, and when one or the other makes sense.

Arrays: Like the Cereal Aisle

Arrays are like the cereal aisle at the grocery store.

cereal-aisle

When you’re shopping and you see the cereal you want, it’s easy to just reach out and grab a box, no matter where it is on the shelf. If we want a box of Honey Nut Cheerios, we just find Honey Nut Cheerios and grab it; we don’t have to even glance at other cereals, and we don’t have to move anything or reorganize anything. As we’ll see later, linked lists do not have this property.

If we worked in the store and wanted to switch the positions of two cereals—say we wanted to swap Frosted Cheerios and Multi-Grain Cheerios—that’s also pretty easy. We just take out every box of Frosted Cheerios, put them on a cart, then take out every box of Multi-Grain Cheerios and put them in Frosted Cheerios’s former spot, then take the boxes of Frosted Cheerios off the cart and put them where Multi-Grain was.

On the other hand, it’s hard to add new things to the cereal aisle. Say that General Mills comes out with a new Crispy Baklava Cheerios, and we wanted to put it next to Honey Nut since they both have a honey theme to them. We’d have to move down whatever that is next to Honey Nut in the picture (Chocolate Cheerios?). But then to move down Chocolate, we’d have to move down what’s next to it, in the purple box. And to move down the purple boxes, we’d have to move down the blue boxes next to them, and so on. If the shelf has space left at the end, and we’re not too fussy about where the new cereal goes, it’s pretty fast to stick it in at the end. It won’t be with the other Cheerios, but maybe we can do a swap so that it’s at least in the Cheerios section, even if it’s not right next to Honey Nut. If the shelf is full, we just can’t do it; we have to replace something, or move everything to a new shelf with more space.

This is also like arrays. Arrays are fixed-size. If you run out of space, you have to allocate a new array with more space and move everything over to it. If you want to put something in the middle, you have to move down everything that comes afterward. Both of these things can end up being huge wastes of time if you do a lot of them.

There’s another way you could deal with arrays having a fixed size: make an array that’s way bigger than you think you’ll ever need. Then you shouldn’t ever run out of space. But this wastes memory; when you make an array of 200 elements to hold a string that contains someone’s first and last names, which usually comes out to less than 40 characters, you’re holding on to memory that could be used for other things. Sure, modern computers have lots and lots of memory, but there’s no need to squander it when there are better options, just for the off chance that you’ll have to handle Finnish names.

Dynamic arrays are arrays that deal with these problems automatically. They grow bigger as you add elements to them, so you don’t have to worry about running out of space. They also balance speed and memory usage; a typical dynamic array implementation sometimes has to allocate new chunks of memory, but it also leaves itself room to grow, while not squandering memory by holding on to tons of space that it won’t need.

C++’s STL vector, Java’s ArrayList, and Python’s list all use dynamic arrays. In Python, you can do this

ls = []
for k in range(1000000):
    ls.append(k)

and it isn’t horrendously slow or wasteful of memory, because Python uses a clever trick behind the scenes. (On my machine, that code runs in about half a second, and my machine is pretty second-rate.) For one thing, every time a Python list needs to expand, it doesn’t just allocate enough space for one more item; it allocates enough for a few. 1 That way we’re not wasting too much memory on taking space which might never be used. On the other hand, we don’t need to allocate new memory and move everything over as often. Since allocating and moving over are time-consuming, this makes a big difference. If we only allocated three extra spaces for each allocation, for example, we’d have to reallocate and move a third as often.

Python also saves on memory allocation by reusing blocks of memory. Sometimes, when a data structure isn’t used anymore in the program, Python doesn’t actually get rid of the memory it uses. Instead, Python puts the memory in a “recycling bin”. The next time it needs memory, it looks in the recycling bin to see if any of the memory blocks there are the size it needs. If so, Python just erases the data in that memory block and stores the new data in it.

C++ and Java use similar schemes to save space and time. If you look at the methods for the vector and ArrayList classes, you’ll find that some of them refer to both a size and a capacity. The size is the number of things currently stored in the array that backs up the structure. The capacity is the amount of memory space in that array. When size == capacity, it’s time to allocate a new block of memory and move everything over.

Linked Lists: Like being a Train Conductor

Linked lists are like being on a train.

4carLinkTrain

Say you’re the conductor on a train and you need to find a specific passenger because it turns out he’s a stowaway with fake papers and you need to throw him off. You don’t know exactly which car he’s in. You’re at the front of the train. In order to find this passenger, you have to walk back through the cars until you see him. He might be in the second car. He might be in the last car. In the worst case, he might have jumped the train a few stops back because he knew you were on to him, and you have to search the entire train for a passenger who’s not even there.

This is how it is searching in a linked list. Since the nodes are hooked together by pointers and are stored at arbitrary places in memory, you have no choice but to look at the data in every node, following the pointers along the list, until you find what you’re looking for or come to the end. It’s not like the cereal aisle where you can just scan for what you want, then grab it right off the shelf.

But being on a train has some advantages, too. Say our engine blows and we need a new locomotive. Just pull into the train yard, unhook the locomotive, drag it into the depot, and then drag on a new one. The caboose is also easy to replace. And if we want to add a new first-class sleeper car in the middle, that’s also not so bad; we can uncouple the cars at the point we want to insert the new car, drag them apart, and use the shunting rails to get the new car moved in between. Then we just couple the new car to both halves of the old train, and we’re done.

Inserting new items in the middle or at the front of a linked list is quick and easy, because all you have to to do is reassign some pointers. The pointers are like the train couplings in the example; you just have to reconnect them, and you have a new node at the center of the list.

Unlike a dynamic array, which can grow, but makes you pay the huge cost of reallocating and moving everything, a linked list can keep on growing, one node at a time, until it’s filled all of memory, without incurring too much of a penalty. You never have to move anything, because it’s easy to attach a new node at the end or wherever you want it.

Where you use them

The dynamic array types (Python’s list, Java’s ArrayList, and C++’s vector) are usually the workhorse sequential collection of their language. Linked lists are usually used for more specialized collections. Since it’s quick and easy to add things at the beginning or end, and you can keep growing them without moving anything, they’re good for stacks and queues.

Linked lists are also good because their structure is flexible. You can do all sorts of weird things with them. For instance, there is the circular linked list. This is a linked list whose final node has a pointer to the first node, so when you traverse the list, you end up at the beginning again when you’re done.

One advantage of doing things this way is that you reduce the risk of dereferencing a null pointer in languages like C and C++. Rather than checking whether the next node is null, you can check whether it’s the same as the head; if you accidentally access the null pointer while you’re doing this, your program will crash, but if you happen to access the head of the list it’s no big deal.

Another advantage of circular linked lists is how clean and quick it is to reuse space. Memory allocation is fairly expensive, so there might be times when you want to keep the same nodes around and just reassign their data. You can make a queue that reuses nodes with a circular linked list. Keep around pointers to the head and the node where the next item will be added. When you remove an item from the queue, just point the head pointer at the next node. Now your queue has another “empty” node at the end. When you insert, check that you’re not trying to insert at the head. This sort of deletion is called lazy deletion, because you haven’t really deleted anything; you’ve just marked it unused (by switching the position of the head pointer).

When it comes to weird things you can do with linked lists, though, nothing beats the skip list. But you’ll have to read about that on your own, because I’m out of time. I hope this post has shed some light on the differences between the two common linear data structures.

  1. See this blog for more on Python’s list implementation.
Advertisements