Tuning Akka

With a dose of idealism and realism

Jan Macháček @honzam399  |   Alex Lashford @alexlashford

Congratulations! You've made it!

The users love your system - interactions increase

100s → 1000s → 10,000s of users!

And you worry

The basics

  • Event-driven
  • Scalable

The basics

  • Your actors form hierarchies, and you have only a few top–level actors
  • Defined supervisor strategies
  • When creating actors, set the dispatcher, router and mailbox
  • You do not allow actor's state to escape

Tune to achieve

You are event-driven and therefore more easily scalable. You are using Akka properly, so you just need to tweak some settings to achive the best responsiveness and resilience.

Mailboxes, dispatchers and routers

  • Dispatchers execute arbitrary code
  • Mailbox holds the messages for an actor
  • Routers allow a number of actors to process the incoming messages

Classes of problems

  • Number or string crunching
  • I/O
  • Memory

Number or string crunching

  • Consumes the thread
  • Avoid context switching
  • Set the number of threads in the pool to match your cores


  • Favour non-blocking APIs
  • Be aware of back-pressure
  • Carefully configure timeouts and remember to react to errors
  • If you must use blocking calls, bulkhead them


  • Use bounded mailboxes (BoundedMailbox, BoundedPriorityMailbox, BoundedControlAwareMailbox)
  • If your actor behaviour is processing a lot of data, consider using off-heap structures (direct ByteBuffers)
  • Memory / GC pressure will make your application die with a whimper, not a bang

Sometimes things aren't ideal

Your code is reactive and tuned, reality brings things that are:

  • not responsive
  • not resilient
  • not scalable


Divide your application not only by functional area, but by classifaction of the problem



It is far better for your system to know what its dependencies can cope with, than to deal with the big bang


  • The system does not react to the messages as soon as they arrive
  • Blocking I/O
  • Synchronisation


  • Failures start and then never stop
  • Failures spread throughout the system


  • Cannot deal with the load you are putting on it
  • Cannot report any back-pressure
  • Costs money for extensive load

So, tell me again







how everything's fine in production!


Record just enough information. Too much slows down the monitored system, too little lets events go unnoticed.


  • Actor creation & destruction
  • Message types, message rates, failures and performance at the actor level
  • Queue size at the (local) actor level
  • The number of available and running threads in the ThreadPools

Monitoring Options

  • Typesafe Console – development-focused montioring
  • Reactive Monitor – lightweight, configurable, open source
  • Kamon.io – lightweight, configurable, open source


  • Be reactive, isolate the non-reactive components
  • Measure and then measure again, do not guess
  • Find out how your application breaks under extreme load

Thank you!

Law of Murphy for devops: if thing can able go wrong, is mean is already wrong but you not have Nagios alert of it yet.