Akka anti-patterns: Java serialization
Contents
Akka makes use of serialization when messages leave the JVM boundaries. This can happen in mainly two scenarios: sending messages over the network when using Akka Cluster (do not use Akka Remote directly) or using Akka Persistence.
Now here’s the catch: the default serialization technology configured in Akka is nothing but the infamous Java serialization, which Mark Reinhold called a “horrible mistake” and which Oracle plans to dump in the near future anyway.
When using Java serialization, you get warning messages in your logs:
[WARN] [03/29/2016 16:53:40.137] [LookupSystem-akka.remote.default-remote-dispatcher-8]
[akka.serialization.Serialization(akka://LookupSystem)] Using the default Java serializer for class
[sample.remote.calculator.Subtract] which is not recommended because of performance implications.
Use another serializer or disable this warning using the setting 'akka.actor.warn-about-java-serializer-usage'
That’s nice, but it’s not enough - they are, after all, easy to ignore (and the warnings can be turned off - which can be tempting).
The Akka documentation discourages the use of Java serialization - there’s a note in the serialization documentation that mainly points out the performance issue:
“It is highly encouraged to disable java serialization, so please plan to do so at the earliest possibility you have in your project. One may think that network bandwidth and latency limit the performance of remote messaging, but serialization is a more typical bottleneck.”
Now, performance is one reason to not use Java serialization - but there are stronger reasons to not engage with it, especially in the context of Akka:
- security: Java serialization is a known attack vector for some time now
- schema evolution: when using Akka Persistence for event sourcing, you will encounter situations in which you need to evolve the stored messages (unless you don’t use the project in production and the requirements never evolve). Java serialization is ill-suited for dealing with changes in message encoding, there are ways to be backwards compatible but they are cumbersome at best. If you don’t plan ahead and have a running production system with Java serialization, you’re in for a bad surprise - you’ll have to switch the serialization technology mid-flight in a running production system, which needs to be very carefully coordinated
- message protocol evolution: when you’re running a clustered application and want to redeploy it in a rolling fashion (i.e. upgrading one node after the other), you’ll be faced with having nodes at different protocol versions running. There is no silver bullet for this - for incompatible changes, you’ll likely need to build message adapters anyway - but for more common changes such as adding or renaming a field, binary protocols make life a lot easier than Java serialization
What to do instead?
When starting a new project, the first thing you should do is to disable Java serialization by adding this line to application.conf
:
akka.actor.allow-java-serialization = off
Next, you will need a binary serialization protocol. You really don’t want to use JSON, unless you are the type of person who enjoys load-testing their garbage collector as Martin Thompson puts it. There are quite a few around by now:
- Avro
- Protocol Buffers
- Thrift
- Kryo
- SBE (if you care about high performance)
Martin Kleppmann wrote a nice comparison of Avro, Protocol Buffers and Thirft with a focus on message evolution. There are different approaches in binary protocols, the ones that use a per-field approach and the ones that go for a full schema - both approaches have their pros and cons and really need to be evaluated in the light of your project and organizational structure.
Akka uses Protocol Buffers internally, which has the advantage that integrating those is quite straightforward. For example, if you use ScalaPB to generate case classes from the Protocol Buffer definitions you only need the following configuration to get things to work:
akka {
actor {
# repeated here, because you really should have this line in your project
akka.actor.allow-java-serialization = off
# which serializers are available under which key
serializers {
proto = "akka.remote.serialization.ProtobufSerializer"
}
# which interfaces / traits / classes should be handled by which serializer
serialization-bindings {
"scalapb.GeneratedMessage" = proto
}
}
}
One more thing: with the rise of Akka Typed, the best practice of defining one protocol per actor (instead of sharing messages) becomes even the more relevant. As such, one approach I’ve used successfully in a number of projects now uses the following pattern (in combination with ScalaPB):
- define one protocol file for each actor
- the
proto
file is placed in the same directory structure as the package structure of the actor, e.g.src/main/protobuf/foo/bar/SomeActorProtocol.proto
for the actorfoo.bar.SomeActor
- enable the right options to have only one scala source file generated for all actor messages
For example, the SomeActorProtoco.proto
file would look as follows:
|
|
That’s it! If you realize now that you’ve been doing it all wrong and are likely going to run into trouble with Java serialization, I invite you to check out the Akka anti-patterns overview. Chances are that this won’t lift your mood, but you might find out one useful thing or two.