Akka anti-patterns: Java serialization

Akka makes use of serialization when messages leave the JVM boundaries. This can happen in mainly two scenarios: sending messages over the network when using Akka Cluster (do not use Akka Remote directly) or using Akka Persistence.

Now here’s the catch: the default serialization technology configured in Akka is nothing but the infamous Java serialization, which Mark Reinhold called a “horrible mistake” and which Oracle plans to dump in the near future anyway.

When using Java serialization, you get warning messages in your logs:

That’s nice, but it’s not enough – they are, after all, easy to ignore (and the warnings can be turned off – which can be tempting).

The Akka documentation discourages the use of Java serialization – there’s a note in the serialization documentation that mainly points out the performance issue:

Now, performance is one reason to not use Java serialization – but there are stronger reasons to not engage with it, especially in the context of Akka:

  • security: Java serialization is a known attack vector for some time now
  • schema evolution: when using Akka Persistence for event sourcing, you will encounter situations in which you need to evolve the stored messages (unless you don’t use the project in production and the requirements never evolve). Java serialization is ill-suited for dealing with changes in message encoding, there are ways to be backwards compatible but they are cumbersome at best. If you don’t plan ahead and have a running production system with Java serialization, you’re in for a bad surprise – you’ll have to switch the serialization technology mid-flight in a running production system, which needs to be very carefully coordinated
  • message protocol evolution: when you’re running a clustered application and want to redeploy it in a rolling fashion (i.e. upgrading one node after the other), you’ll be faced with having nodes at different protocol versions running. There is no silver bullet for this – for incompatible changes, you’ll likely need to build message adapters anyway – but for more common changes such as adding or renaming a field, binary protocols make life a lot easier than Java serialization

What to do instead?

When starting a new project, the first thing you should do is to disable Java serialization by adding this line to application.conf:

Next, you will need a binary serialization protocol. You really don’t want to use JSON, unless you are the type of person who enjoys load-testing their garbage collector as Martin Thompson puts it. There are quite a few around by now:

Martin Kleppmann wrote a nice comparison of Avro, Protocol Buffers and Thirft with a focus on message evolution. There are different approaches in binary protocols, the ones that use a per-field approach and the ones that go for a full schema – both approaches have their pros and cons and really need to be evaluated in the light of your project and organizational structure.

Akka uses Protocol Buffers internally, which has the advantage that integrating those is quite straightforward. For example, if you use ScalaPB to generate case classes from the Protocol Buffer definitions you only need the following configuration to get things to work:

One more thing: with the rise of Akka Typed, the best practice of defining one protocol per actor (instead of sharing messages) becomes even the more relevant. As such, one approach I’ve used successfully in a number of projects now uses the following pattern (in combination with ScalaPB):

  • define one protocol file for each actor
  • the proto file is placed in the same directory structure as the package structure of the actor, e.g. src/main/protobuf/foo/bar/SomeActorProtocol.proto for the actor foo.bar.SomeActor
  • enable the right options to have only one scala source file generated for all actor messages

For example, the SomeActorProtoco.proto file would look as follows:

That’s it! If you realize now that you’ve been doing it all wrong and are likely going to run into trouble with Java serialization, I invite you to check out the Akka anti-patterns overview. Chances are that this won’t lift your mood, but you might find out one useful thing or two.

Comments 7

  1. I’m curious how you do your serialization of the typed ActorRef. You have defined the sealed traits in the preamble so it seems like you are skipping creating a serializer using something like SerializerWithStringManifest and using the generated case classes from scalapb directly throughout your actors. Is that correct? If so, how do you going about unmarshaling the resulting typed ActorRef since it’s a String in the protobuf message?

  2. Pingback: Aecor — Purely functional event sourcing in Scala. Part 3 – Vladimir Pavkin

  3. Hi Manuel, thanks for another great blog post. While I completely agree on not using Java Serialization, I’m still looking for the definitive answer on “JSON vs. Protobuf”. My understanding is that in most use cases e.g. the Jackson JSON Serializer more or less on par with Protobuf, and that JSON serialization is not likely to become the bottleneck. JSON has some things going for it, too, like the human-readability. So I wonder what your assessment – only use JSON if you want to load test the garbage collector – is based on. Could you point me to an article explaining the drawbacks of JSON, e.g. regarding memory consumption?

    1. Post
      Author

      Hi Lutz! I don’t have a specific article I could point to. Probably I could quote Martin Thompson on this though, I’m sure he must’ve said something along those lines in one of his talks. If you think about it, it makes implicit sense though: with protobuf you can encode data in a much smaller amount of bytes than with JSON. With binary formats, the data is located at specific locations whereas with JSON, you encode and send around the name of the various fields which then also have to be GCed eventually. So for each JSON record you send, there’s automatically overhead / garbage being created. Not to mention that there’s likely quite a bit of tricks that can be applied in binary formats to improve on reading things. E.g. if you only want to read a specific field, you don’t need to evaluate the entire record since you know where the data should be. That’s not the case for JSON-based records.

  4. Hello guys! How have you been handling Akka Typed ActorRef[T] serialization with ScalaPB? I’ve tried somethings and got to a solution, but I didn’t like it very much. I’ve used the ActorRefResolver to map ActorRef[T] => String, but to do this it needs an ActorSystem. If it weren’t for that, I could’ve created a ScalaPB TypeMapper for ActorRef[T].

    My solution involves creating some traits to define an interface with a method to convert a proto command class to an external command class with the deserialized ActorRef[T].

    Any help would be apreciated!

    1. Post
      Author
    2. Hi Bruno Guilhermo,

      I have the same problem. My command has an ActorRef as a parameter, but with ScalaPB it has to be a string.

      message WGCommand {
      oneof sealed_value {
      EnqueueWorkDone enq = 1;
      QueueFull qF = 2;
      }
      }

      message AssignWork{
      required Work work = 1;
      required string actorRef = 2; //serialized actorRef
      }

      My solution involves creating some traits to define an interface with a method to convert a proto command class to an external command class with the deserialized ActorRef[T].

      I want to understand this portion. From a proto command, how can your external command deserialize without the actorRef’s actual ActorSystem?

Leave a Reply

Your email address will not be published. Required fields are marked *