Akka anti-patterns: being out of touch with the hardware

Choosing Akka as a tool is often - if not always - driven by the need for good performance. Surely, the actor model itself is appealing as a means for organizing and reasoning about code, but this isn’t in itself a good reason enough to use the Akka toolkit. If all you are concerned about is a nice way to organize code and build modular applications you might as well pick the Spring Framework which has a very rich and clear component model and provides very good support for building software where performance isn’t one of the driving factors.

If we’re concerned with performance, arguably one of the most important aspect to be concerned about is the hardware on which we run our software. Which is why the following observation I keep on making in the context of Akka project strikes me as particularly relevant and important to mention as part of the Akka anti-patterns series: knowing the target hardware really matters when it comes to building and deploying Akka applications.

Or, as Martin Thompson puts it,

Hardware tries so hard to make software fast; software tries so hard to make hardware slow.— Martin Thompson (@mjpt777) May 5, 2016

At the core is the amount of available processors

Dispatchers is what makes Akka actors tick. They’re like a magic giant squid handing threads over to actors for a limited amount of messages (and optionally, time) - a bit like Cthulhu, but nicer.

And in order to understand why knowing your hardware is a pretty big deal when it comes to configuring your Akka application, you will first need to understand how that friendly Cthulhu works. Because not tuning your dispatchers to your application and hardware is like using a relational database without indexes - it will work, but boy could it work faster if you were adding those few important indexes!

The most popular dispatcher is backed by the ForkJoinPool, the most important bit of configuration for this type of dispatcher looks as follows:

1
2
3
4
5
6
7
8


fork-join-executor {
    # Min number of threads to cap factor-based parallelism number to
    parallelism-min = 2
    # Parallelism (threads) ... ceil(available processors * factor)
    parallelism-factor = 2.0
    # Max number of threads to cap factor-based parallelism number to
    parallelism-max = 10
  }

What’s particularly important here is the setting of parallelism-factor used in the following calculation to figure out how many threads there should be in the pool by default:

1

ceil(available processors * factor)

In other words, we round up the multiple of the parallelism-max factor by the amount of available processors. But what’s an available processor?

Available processors is the number of processors available to the Java virtual machine. This value relates directly to the number of cores. If you have a quad-core CPU, then according to this definition the amount of available processor is 4. If you have two quad-core CPUs with hyperthreading, then the amount is 2 * 4 * 2 = 16.

Now, if all your Akka actors were homogenous, receiving homogeneous load (messages) and taking the same amount of CPU cycles to process a message, then you’d be best of to have just as many threads in your pool as you had CPU cores. The cores would always be as busy as possible and the overhead caused by the squid’s thread-juggling would be minimal. Chances are though that your actors aren’t homogenous, the loads they receive aren’t homogenous either and that as a result, if your program has more actors than there are threads in the pool, having just as many threads as there are cores may not be the most performant solution. Enter parallelism-factor which lets you tune how many threads there will be available by default in the pool and thus maximize the utilization of all of your cores.

That value, as you now may have guessed, cannot be guessed but has instead to be figured out by means of measuring the performance of your application under load.

And this is where knowing your hardware is really, really important. Here is a few things that you can get wrong.

Load-test on the target hardware

This seems plausible enough, but I’ve seen load-tests performed on staging environments that had very little in common with the production environment hardware. Needless to say, any configuration values that may work well on one machine may be fatal on another machine - amount of processors is a key value there, but also disk speed, network interfaces etc. play an important role.

Virtualization is your worst enemy

Sometime about 2007 the industry at large got into a virtualization frenzy that isn’t quite over yet. To the contrary, we’ve only now reached peak cloud, with even Oracle entering the race now (pretty much as the last big player).

The trouble with virtual hardware is, well, that you don’t know what your software really has at its disposal. If the hypervisor tells you that you have 8 cores, but in reality there are only 4 cores and they’re shared with two other VMs, then you’re kind of out of luck. This is especially a problem in corporate environments where operations may be done by an entirely different set of people than the ones writing the software. And even though “devops” ranks pretty high in the list of trending buzzwords, a lack of communication between the folks who write code and the ones that run the code is quite a problem when it comes to making the most out of the hardware.

Ergo, if you know that your brand new, shiny Akka application will be deployed on virtual hardware, make sure that you fight for dedicated resources as much as possible. At the very least, get the number of cores right.

Make your net work

It is not uncommon to forget or to take the network as granted, whilst having solid hardware on that end plays an important role.

When I built an Akka Cluster prototype on Raspberry PIs last summer, I had resolved to switching to wifi from LAN because I thought that the power supplies I was using for each PI wasn’t good enough - packets were dropped and under load the whole cluster just started to behave erratically. It took me up until a few weeks ago when I worked again with the same router to notice that I was using the wrong power supply for the router! Somehow the D-Link DIR-825 is pretty forgiving when you do this to it, until you start having more data to take care of. Note to self: clean-up and organize the networking hardware boxes.

If you build and load test a distributed application with Akka Cluster, then one of the first things you should be aware of is a scenario known as “split brain” wherein your nice cluster turns into separate islands not talking to each other. Under normal circumstances such a condition is caused by network partitions - think for example of a switch receiving a weird packet that causes it to question its fundamental perception of reality, not processing any more traffic as a consequence.

When you are load testing, however, you may be causing this condition by pumping so much traffic into your network that the control messages that Akka Cluster uses to determine network health no longer make it across the wire in time, causing nodes to form separate islands and leaving you wondering about your life choices (especially if you did not see this coming).

Here’s a few things you can do to mitigate this problem:

Remember that this is a load test, then prepare for real-life network partitions

The purpose of a load test is to stress the system until it starts showing odd behavior. If you manage to do this, great! If you can fix a few things while doing it, even better. In any case, don’t forget about that the question you are trying to answer is whether your system can withhold the load it was designed for. Does it hold the desired charge? Great! If not, throw in more hardware and iterate until you’re there.

You should however be ready for the day when that switch goes all emo over a weird packet. Make sure that you have set up an appropriate strategy for downing parts of the cluster that should go down. Only you know which ones these should be. And be aware of the fact that auto-downing of nodes with the default settings may not be the best strategy for most applications.

Load-test on the target hardware

Same as above. This is common sense, but I prefer to mention it. Always test on the target hardware - this includes the network.

There’s one tricky thing about networks that spices up the game: networks change over time. This is rule #5 of the fallacies of distributed computing: topology doesn’t change. Wrong! The network may change, and this may affect the way your application performs over load.

Virtualization is your worst enemy

Do you know what’s worse than having several VMs share the same underlying CPU? Having them share the same network adapter and wire. Especially if you’re load testing the application and flooding the network. What’s even more funny is if you’re load-testing an Akka Cluster application and all of your nodes are co-hosted as VMs on the same physical machine (or a small amount of machines). Make no mistake, this is not uncommon and kind of makes sense from the perspective of an operations team if they’re not told otherwise. The trouble of course is that the behavior under load may not quite be what you’d hope for.

Conclusion

“Know thy hardware”. When you’re working with Akka, you’re in the business of building high-performance applications and forgetting about hardware is not an option.

Also, hardware is nice:

Contents

At the core is the amount of available processors

Load-test on the target hardware

Virtualization is your worst enemy

Make your net work

Remember that this is a load test, then prepare for real-life network partitions

Load-test on the target hardware

Virtualization is your worst enemy

Conclusion