CI / CD pipelines that run for each new and merged pull-request (or even for each commit on a branch) are today’s de-facto standard when it comes to developing software. Running the complete test suite automatically ensures that regressions are rapidly identified and do not make it into production.

As applications grow and their test suites grow (well, ideally, of course), build times tend to grow as well. Without care, these increasing build times can become a productivity bottleneck as the feedback loop provided by the automated CI/CD pipeline grows longer and longer. And yet I’ve also observed that teams tend to get used to those long build times (over 20 minutes or longer) without realizing just how much time is spent waiting for the tests to pass. It’s a bit of a “boiling frog” situation as the build times increase gradually and people just get used to them. The trouble is that this has a potentially large impact on team productivity - waiting for the CI pipeline to finish before submitting a pull-request for review (because you wouldn’t want to submit it if the tests fail), waiting for the pipeline to finish after merging it to the master branch before deployment, etc. is just wasted time - without mentioning the context switch that inevitably occurs when you have to wait for a 20 minutes for a build to finish.

Over the years I’ve developed a few techniques that help decouple the overall build time from the amount of tests. I’ve recently had a chance to put these techniques into practice (and to develop a few new ones) while helping to build a new payment system at MOIA. To say that testing is taken seriously at MOIA would be an understatement. The company’s engineering culture puts a very high emphasis on quality and reliability, which results in services having many tests of any kind: unit tests, property-based tests, integration tests, cross-service end-to-end tests, load tests, you name it. And while this is of course laudable, it also means that the execution time of tests can grow significantly if not tamed.

This article covers the most important techniques for keeping build times low.

Do not wait unnecessarily

The biggest offender is to let your tests idle longer than necessary. All those pauses add up, making your build take much longer than it should. Let’s take a look at a few offenders.

Default poll times for Futures

This one is particular relevant if you’re testing code with Futures in them using a polling approach.

In ScalaTest for example, there are two flavours of working with Futures: by fully embracing the asynchronous style in asynchronous specs and by turning Futures into a result using the ScalaFutures trait which provides a few methods such as whenReady, isReadyWithin and futureValue:

1
2
val response: Future[HttpResponse] = ...
response.futureValue.resultCode shouldBe 200

The futureValue construct is still quite popular as it allows to write sequential test code which ends up being easier to maintain, so many times will prefer that style over the asynchronous style. Using this construct requires to have an implicit PatienceConfiguration in scope (which is provided by default by the ScalaFutures trait) that is used to specify the polling interval and maximum timeout used by the above methods. Now, whilst the default values of 150 milliseconds timeout and 15 milliseconds of polling interval might be suffiscient in unit tests, this may not necessarily be the case in tests with a larger scopes (integration tests for example).

Now, whatever you do, make sure to be very careful about altering the default values, especially for the polling interval. For example, if you have one of the future invocations under test that may take up to 15 seconds to execute (because you’re setting up some kind of database mock or similar) then it might be tempting to define a scaled time span of 100 or to provide a trait for integration tests of the like

1
2
3
trait IntegrationTest {
  implicit val patienceConfig = PatienceConfig(Span(15, Seconds), Span(1, Second))
}

If you do this, however, you’ve just made your overall test execution one or two orders of magnitude slower. A poll interval of 1 second (or of 1,5 seconds when using the scaled time span) will have as consequence that Futures will be polled for readiness only at that interval - so if there are ready after 20 milliseconds, you’ll end up waiting an extra 980 ms or 1480 ms for no good reason.

Thread.sleep

Let’s admit it, this happened to all of us. You run a piece of test code and notice it fails because X is not ready. X is a database schema update, a Kinesis stream setup, a DynamoDB table creation, or any other operation that doesn’t provide the capability of saying “yes, the resource you need is now really ready for use”. If you’re using LocalStack for testing systems that make use of AWS services you most likely noticed that created resources aren’t always ready after performing a call that should return once ready.

Thread.sleep is a particulary appealing tool in this case because of it sheer simplicity. The issue however is that the value it needs to be called with is entirely arbitrary. It may take 100 milliseconds for a resource to get ready on your latest shiny MacBook Pro, or it may take up to 5 seconds on the CI environment that runs inside of a container on top of a VM where the schedulers end up alloting time slices of 50 milliseconds to each container.

Now I am not going to lie to you, getting your test suite rid of Thread.sleep requires a decent amount of work - especially if you have many different kinds of resources to wait for. All I can say is that it really does pay off to do this kind of work on the long run. The following techniques may help.

Use APIs to check for readiness

Let’s take the example of DynamoDB table creation in AWS. When you requires the creation of a table, there’s going to be some time between the completion of the request call and the tables being actually ready. You can check for the status of a table using a DescribeTable call.

The following utility method can be used to check for the readiness of a set of tables that have just been created. Note that this is using the Java client of DynamoDB and that the tests have an Akka ActorSystem at their disposal:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
protected def waitForTablesToBecomeReady(dynamo: AmazonDynamoDBAsync, tables: String*)(implicit system: ActorSystem): Future[Done] = {
  import system.dispatcher

  def loop(remainingTables: Set[String]): Future[Done] = {
    logger.debug(s"Wait for tables to become ready: ${remainingTables.mkString(", ")}")

    val futureDescriptions: Future[Set[DescribeTableResult]] =
      Future.sequence(
        remainingTables.map { table =>
          futureOf[DescribeTableRequest, DescribeTableResult](dynamo.describeTableAsync, new DescribeTableRequest(table)))
        }

    futureDescriptions.flatMap { descriptions =>
      // only keep thos tables that don't have an ACTIVE status yet
      val newRemaining = descriptions.filter(_.getTable.getTableStatus != TableStatus.ACTIVE.toString).map(_.getTable.getTableName)
      if (newRemaining.isEmpty) {
        Future.successful(Done)
      } else {
        // poll again later - the interval is rather large because this operation takes its time
        akka.pattern.after(250.milliseconds, system.scheduler)(loop(newRemaining))
      }
    }
  }

  loop(tables.toSet)
}

private def futureOf[X <: AmazonWebServiceRequest, T](call: (X, AsyncHandler[X, T]) => java.util.concurrent.Future[T], req: X): Future[T] = {
  val p = Promise[T]()
  val h = new AsyncHandler[X, T] {
    def onError(exception: Exception) {
      p.complete(Failure(exception))
      ()
    }

    def onSuccess(request: X, result: T) {
      p.complete(Success(result))
      ()
    }
  }
  call(req, h)
  p.future
}

(kudos to MOIA for allowing me to make this code available)

As you can see, this type of approach is slightly more involved than simply calling Thread.sleep(2000) - and yet if you’re lucky and your tables are ready within 250 milliseconds, you will end up waiting a lot less at each table creation.

Polling

A more abstract variation of the case above is to poll until a condition is true. This is often the case in integration tests where a component won’t publish an event when a state (or intermediary state) is reached. In this case, the following technique is quite usful:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def pollUntilTrue(condition: () => Future[Boolean], waitedAlready: FiniteDuration = Duration.Zero): Future[Done] =
    if (waitedAlready.toMillis >= patienceConfig.timeout.millisPart) Future.failed(new TimeoutException)
    else
      condition() flatMap {
        case true => Future.successful(Done)
        case false =>
          akka.pattern.after(patienceConfig.interval, actorSystem.scheduler) {
            pollUntilTrue(condition, waitedAlready + patienceConfig.interval.millisPart.millis)
          }
      }

Parallelize test execution

This never gets old

Setting this up is likely going to have one of the the largest impacts on build time execution you can possibly get. I’m not talking about running unit tests in parallel - most test frameworks do this by default - but about running the various suites in parallel. This usually has the largest impact for integration tests, although I’ve also used if for unit tests (because the execution speed of the parallelized integration tests got faster than the execution speed of the unit tests).

So far I’ve had the most success using this technique with CircleCI, there’s a useful blog post detailing the process. Note that the post details the process for unit tests, if you want to use it for integration tests (where it does have the largest impact) you just need to use the IntegrationTest scope:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
val printIntegrationTests =
taskKey[Unit]("Print full class names of integration tests to the file `integration-test-full-class-names.log`.")

printIntegrationTests := {
  import java.io._

  println("Print full class names of integration tests to the file `integration-test-full-class-names.log`.")

  val pw = new PrintWriter(new File("integration-test-full-class-names.log"))
  // replace "theProject" with your project name below
  (theProject / IntegrationTest / definedTests).value
    .sortBy(_.name)
    .foreach { t =>
      pw.println(t.name)
    }
    pw.close()
  }
}

And then to alter your circleci/config.yml to split the integration tests on several containers:

...
  - run: sbt printIntegrationTests
  - run: sbt "it:testOnly  $(circleci tests split --split-by=timings --timings-type=classname integration-test-full-class-names.log | tr '\n' ' ')
...

Don’t forget to change the parallelism in the configuration once you’ve integrated those changes:

integration-test:
      ...
      parallelism: 8

Tune mock configurations for test performance

When writing integration tests, some of the tests will be aimed at testing for reliability in case of failure whilst others will be aimed at testing the normal flow of execution. Presumably, you’ll want to abstract the failure-tolerance mechanisms such that they can be used by many components of your application, in which case testing for the coping mechanism doesn’t need to happen in each component test.

Some mocking tools, such as LocalStack make it possible to inject random failures in service calls. Make sure to turn this off for those tests that are not aimed at checking for recovery mechanisms:

KINESIS_ERROR_PROBABILITY: 0.0
DYNAMODB_ERROR_PROBABILITY: 0.0

Tune connectors for test performance

Another source of unnecessary waiting is caused by what I call connectors, which is to say components that interface with some type of external resource: database clients, kafka clients, http clients, you name it. You’ll likely have a number of them in any application and their default configuration is optimized for production.

Tuning these configurations pays off, especially for connectors that employ polling strategies to fetch results. You’ll need to spend a bit of time to understand how the connector works to be able to tune the right parameters (which you should’ve done anyway as you are using it).

Taking the example the case of the reactive kinesis client, it is worth to check out the default configuration and look at anything that resembles a polling interval, timeouts or graceful shutdown intervals (beware, do this for tests only):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
kinesis {
  default-consumer {
    kcl {
      idleTimeBetweenReadsInMillis = 100
      shardSyncIntervalMillis = 500
      parentShardPollIntervalMillis = 500
      shutdownGraceMillis = 1000
    }
    checkpointer {
      notificationDelayMillis = 100
      intervalMillis = 100
      backoffMillis = 0
    }
    worker {
      shutdownTimeoutSeconds = 5
      gracefulShutdownHook = false
    }
  }
}

CI performance improvements

Make sure to know your CI tool well, as there are many ways to increase the overall build speed.

This should go without saying, but if you are using a container-based CI pipeline, make sure to use images that have a small footprint. All those additional megabytes flying accross the network take time to download and to uncompress.

If you use CircleCI, you can use dependency caching to cache the .ivy2 and .sbt directories and avoid downloading artifacts at every build.

If you’re using a pipeline / workflow-based CI, be careful not to overdo the amount of pipeline steps - setting up each container has its cost and the granularity is not necessarily worth it.

Optimizing compilation speed

For Scala-based projects that grow large, compilation speed can become an issue. There are various techniques for improving compilation speed. I’ve personally never had the time to dig into this type of optimization and since there’s a drop-in replacement of the compiler that does just that I’ve always ended up recommending that one.

Use dedicated hardware

Finally, if your tests still are low, consider running a part of them on dedicated hardware (and I mean hardware, not virtual machines). I’ve used Hetzner for this in the past. One nice optimization you can do with a real server is to setup a ramdisk for the build directories (which you backup regularly of course, and load at startup from disk). For builds or tests that are doing intensive disk I/O, this dramatically reduces the overall execution time.

And that’s it for now. If you’re aware of more useful techniques, please let me know, I’d love to add them here!