A handful Akka techniques

Published on 23 April 2014

I’ve been using Akka for 3 years in a variety of projects and by now, I have a hard time imagining to deal with some of the parts of my work without it. Sure, Scala provides other powerful paradigms for dealing with concurrency, but I find Actors to be one of the most elegant concept when it comes to reasoning about it.

There are a few techniques that I’ve used repeatedly in projects and that I would like to share. Make no mistake though – I am by no means an Akka expert, so some of them may turn out to be sub-optimal techniques or even anti-patterns – use them at your own risk, and make sure you understand their limitations. Also, if you haven’t done so yet, I would by all means recommend to sit down with a fresh cup of coffee and read the excellent Akka documentation.

The cuckoo clock

Cukoo Clock

I’m using this technique mainly in Play framework projects. The first version of the framework had a mechanism to schedule tasks repeatedly, and it dropped that mechanism in version 2 with the recommendation of using Akka’s schedulers.

The main idea behind this technique is to encapsulate the logic of a task that needs to be done repeatedly or at a certain time in one actor, and to wake the actor up by sending it a message:

Let’s look at how this mechanism works in detail:

Now, one last thing we need to do is to set the mechanism in motion. For a Play application, a good place for this would be the infamous Global:

And that’s it! Our cukoo clock is ready to go off every 5 minutes.

Note: the scheduler API is well-suited for tasks that involve repetition, less so for tasks that need to go off at certain time or date. It is however possible to achieve this with the API by calculating the duration between the scheduler initialization and the planned time. For these cases, I’m making use of the nscala-time library, which is a wrapper around the excellent Joda Time library.

Batch processor

One thing I find myself doing rather often with Akka is to massively parallize a given task, which often involves data crunching of sorts. This can be a simple set-up that only involves one supervisor and many workers of the same kind, or be something more complex involving a pipeline of sorts.

One supervisor and its army of workers

One supervisor and its army of workers

Multi-step structure

Multi-step structure

In most cases this kind of pattern involves a producer (a database such as MongoDB, MySQL, BaseX, Amazon SimpleDB; a multi-gigabyte XML file, etc.) from which items need to be picked and sent off for processing.

Let’s look at an example of a simple supervisor – workers chain. We’ll make a few assumption for this example:

Note: The code is intentionally left abstract, as we’re not interested in how data is being fetched from the source or processed, but about what is done with it.

Let’s go through the example step by step:

The method depicted above is of course very generic. In practice, the process often needs to be fine-tuned or can be adapted to be more performant for the specific use-case at hand. The nature of the data source often dictates how the details work. However, this is one of the things I like about Akka – it only provides the bare-bones infrastructure and leaves it up to the developer to choose which mechanism will be used for implementing the pipeline. Some improvements that we’d probably need in a real-life application include:

Poor man’s back-pressure

If you are a member of the Akka core team, skip this section. For the rest of humankind, this topic relates directly to one assumption we’ve done in the previous example: “our JVM has a decent amount of memory available so that the workers are able to queue all items in their mailbox”. In reality, this may not always be the case. However, if you continue to fetch data from a producer and pass it on to the workers, you will eventually run out of memory: by default, Akka uses an UnboundedMailbox which will keep on filling up.

In the example above, this problem is easy to fix, because we’re only ever passing on to the next batch when all items of the current one have been processed, which means that choosing an appropriate batch size is sufficient. However, we may have chosen a different implementation which pulls data out of the source continuously, because doing so is somewhat more expensive, or because you’re interested in running the system at maximum capacity.

In such a situation, we need a means to tell the supervisor to slow down. This concept is also known as back-pressure, and I invite you to read how to properly implement it. Sometimes, however, you may not be interested in optimally solving the back-pressure problem, and in this case, it may be ok to violate Akka’s prime directive: don’t block inside an actor.

Seriously?

It may not be optimal, nor elegant, but I’ve found this mechanism to work pretty well, provided that you have a rough idea of the memory limitation of the application at hand and that you’re not interested in scaling dynamically.

In order to be able to slow down appropriately, we need to decouple the supervisor and the data producer. This is how an implementation may look like:

So how does this work? Let’s have a closer look:

Some comments about this technique:
– as said, you need to have a good sense of what the maximum should be. This mechanism is everything else than dynamic and won’t work if items are highly heterogenous (in the sense that processing them does take very different amounts of time)
– it may be best to wait for a bunch of items to be processed before requesting a new batch, depending on the case at hand
– the error handling in the example above would need to be improved for a real-life usage

There are many other mechanisms to deal with work distribution that can easily be implemented with Akka. One other mechanism I’d like to try out when I get a chance is work stealing, wherein the workers take an active role of retrieving work instead of having it pushed to them.

Gotchas

During my time with Akka I’ve run in a few gotchas. They mostly amount to not having read the documentation well enough, but I’ve seen other do some of those mistakes, so I think they might be useful to list here.

Create too many root actors

Root actors, created directly via the ActorSystem instead of the context, should be used with care: creating such actors is expensive as it requires to synchronize at the level of the guardian. Also, in most cases, you don’t need to have many root actors, but rather one actor for the task at hand that itself has a number of children.

Closing over a future

This is an easy reasoning mistake to make and is explained in detail in this post. Let’s consider this example:

So what’s the problem? computeResult produces a Future, thus it is switching to a different execution context and liberating the main thread. This means that another ComputeResult message may come in, replacing the sender reference. When the first future completes, we would then answer to the wrong sender.

An easy fix for this is to capture the sender in a val like so:

Misinterpreting the meaning of Future.andThen

This is mainly related to Futures rather than Actors, but I’ve encountered it while mixing both paradigms. The Future API has a method called andThen “used purely for side-effecting purposes”. In other words, it won’t transform a future and chain itself to it, but rather be executed independently.

I was wrongly assuming that andThen would return a new, transformed, Future, so I tried doing the following:

pipeTo is a quite powerful tool in Akka’s toolkit which allows to send the result of a future to an actor (under the hood it creates an anonymous actor that waits for the future to complete).

The correct implementation would be:

Conclusion

In conclusion, I would like to re-iterate that Akka is a real useful tool to have in your toolset, and I think it’s going to stay around for a bit longer, so I can only advise anyone who hasn’t done so yet to go and try it out (even if you’re in Java-land, there’s a Java API as well). Another nice paradigm that is getting standardised as we speak are reactive streams, which deal with back-pressure as integral part of the concept.


Liked this post? Subscribe to the mailing list to get regular updates on similar topics.

4 Comments

  1. Great Article, one small tip, instead of


    object Global extends GlobalSettings {

    override def onStart(app: Application) {
    Akka.system.actorOf(Props(new ScheduledOrderSynchronizer), name = "orderSynchronizer")
    }

    You should be doing below, because the above has been deprecated as of 2.2.x, and potentially unsafe
    http://doc.akka.io/docs/akka/2.2.4/project/migration-guide-2.1.x-2.2.x.html


    object Global extends GlobalSettings {

    override def onStart(app: Application) {
    Akka.system.actorOf(Props[ScheduledOrderSynchronizer], name = "orderSynchronizer")
    }

    If you need to pass arguments to ScheduledOrderSynchronizer’s constructor then


    object Global extends GlobalSettings {

    override def onStart(app: Application) {
    Akka.system.actorOf(Props(classof[ScheduledOrderSynchronizer], arg1, arg2), name = "orderSynchronizer")
    }

    Or even better yet, as recommended in
    http://doc.akka.io/docs/akka/2.2.4/scala/actors.html


    object DemoActor {
    /**
    * Create Props for an actor of this type.
    * @param name The name to be passed to this actor’s constructor.
    * @return a Props for creating this actor, which can then be further configured
    * (e.g. calling
    .withDispatcher() on it)
    */
    def props(name: String): Props = Props(classOf[DemoActor], name)
    }

    class DemoActor(name: String) extends Actor {
    def receive = {
    case x ? // some behavior
    }
    }

    // And then in Global

    Akka.system.actorOf(DemoActor.props("hello"))

Leave a Reply