Tuning Akka

With a dose of idealism and realism

Jan Macháček @honzam399  |   Alex Lashford @alexlashford

Congratulations! You've made it!

The users love your system - interactions increase

100s → 1000s → 10,000s of users!

And you worry

The basics

  • Event-driven
  • Scalable

The basics

  • Your actors form hierarchies, and you have only a few top–level actors
  • Defined supervisor strategies
  • When creating actors, set the dispatcher, router and mailbox
  • You do not allow actor's state to escape

Tune to achieve

You are event-driven and therefore more easily scalable. You are using Akka properly, so you just need to tweak some settings to achive the best responsiveness and resilience.

Mailboxes, dispatchers and routers

  • Dispatchers execute arbitrary code
  • Mailbox holds the messages for an actor
  • Routers allow a number of actors to process the incoming messages

Classes of problems

  • Number or string crunching
  • I/O
  • Memory

Number or string crunching

  • Consumes the thread
  • Avoid context switching
  • Set the number of threads in the pool to match your cores

I/O

  • Favour non-blocking APIs
  • Be aware of back-pressure
  • Carefully configure timeouts and remember to react to errors
  • If you must use blocking calls, bulkhead them

Memory

  • Use bounded mailboxes (BoundedMailbox, BoundedPriorityMailbox, BoundedControlAwareMailbox)
  • If your actor behaviour is processing a lot of data, consider using off-heap structures (direct ByteBuffers)
  • Memory / GC pressure will make your application die with a whimper, not a bang

Sometimes things aren't ideal

Your code is reactive and tuned, reality brings things that are:

  • not responsive
  • not resilient
  • not scalable

Bulkheading

Divide your application not only by functional area, but by classifaction of the problem

 

Back-pressure

It is far better for your system to know what its dependencies can cope with, than to deal with the big bang

Responsive

  • The system does not react to the messages as soon as they arrive
  • Blocking I/O
  • Synchronisation

Resilient

  • Failures start and then never stop
  • Failures spread throughout the system

Scalable

  • Cannot deal with the load you are putting on it
  • Cannot report any back-pressure
  • Costs money for extensive load

So, tell me again

 

 

 

 

 

 

how everything's fine in production!

Monitor

Record just enough information. Too much slows down the monitored system, too little lets events go unnoticed.

 

  • Actor creation & destruction
  • Message types, message rates, failures and performance at the actor level
  • Queue size at the (local) actor level
  • The number of available and running threads in the ThreadPools

Monitoring Options

  • Typesafe Console – development-focused montioring
  • Reactive Monitor – lightweight, configurable, open source
  • Kamon.io – lightweight, configurable, open source

Remember!

  • Be reactive, isolate the non-reactive components
  • Measure and then measure again, do not guess
  • Find out how your application breaks under extreme load

Thank you!


Law of Murphy for devops: if thing can able go wrong, is mean is already wrong but you not have Nagios alert of it yet.