How many workers in a Docker cluster

This post will explain to you how to optimize your worker size when you need a failover scenario for container failure or worker failure.


Recently we have created a k8s cluster to host our Microservices architecture. In this blog post, I like to share how we calculated the number of worker machines we needed and which considerations are to take into account.

The Microservices architecture consists of multiple small applications running on their own containers. The sizes of the containers vary from small, medium or large. Each container has a CPU and memory size assigned. We have 2 failover scenarios we want to handle:

  • Failure of a container (restart)
  • Failure of a worker (worker stops working)

One resource type
Failure of a container 
is handled by having a worker available that has enough CPU or memory to restart the container. This means that you always need at least a worker that has enough resources for the largest container to do a restart. The following formula gives us the number of workers we need to accommodate this requirement:

Worker_Count = Total_Requested_Resources / (Worker_Node_Size - Max_Container_Size)<br>

To make the calculation easier we only take CPU size into account. Assume we have 3 CPU sizes (t-shirt sizes): large (2 CPU), medium (1 CPU) and small (0,5 CPU). Then we have 12 containers: 2 large, 4 medium and 6 small. The Total_Requested_Resources will be 11 CPU (2 * 2 + 4 * 1 + 6 * 0,5). When we can choose out worker CPU count 4 or 8:

Worker_Count = 11 / (4 - 2) = 5,5<br>

Worker_Count = 11 / (8 - 2) = 1,83<br>

Let’s say that 1 CPU costs one credit. To run on a CPU size 4 workers it costs 4 * 6 = 24 credits and when we run on a CPU size 8 workers 2 * 8 = 16 credits.

To handle the failure of a worker we can just add one extra worker to the cluster. The formula looks then like:

Worker_Count = 1 + (Total_Requested_Resources / (Worker_Node_Size - Max_Container_Size))<br>

Then you will get the following costs:  CPU size 4 workers it costs 4 * (6+1) = 28 credits and when we run on a CPU size 8 workers 8 * (2 + 1)  = 24 credits.

Configure resource limits

Resource limits can be set in your docker file. By adding the following lines:

    cpu: "1"<br>
    memory: "4g"<br>

Handle multiple resource types (CPU, Memory)
You can have more than one resource type that can be variable. When containers have different resource needs, you have to be sure that all resources needed are available on at least one machine. Otherwise, your largest containers will not be able restart if your system is perfectly balanced. A container that only needs a little CPU and a lot of memory will complicate the calculation. T-Shirt sizing your containers makes the calculation easier. You make the same 3 sizes and give your containers resources in proportion to each other that make sense for your application:

T-Shirt size General purpose CPU #  Memory in GB
Small 0,5 2
Medium 1 4
Large 2 8
The above table shows the proportions between CPU and memory for general purpose SKUS like Dsv3-series 1, Dv3-series 1, DSv2-series, Dv2-series, B-series, DC-series. For memory-optimized you get a 1 CPU to 8 memory ratio or more, these are SKUS like Esv3-series, Ev3-series, M-series, GS-series, G-series, DSv2-series 11-15, Dv2-series 11-15. CPU-optimized you have a ratio of 1 CPU to 2 memory, these are SKUS like Fsv2-series 1, Fs-series 1, F-series. 
All workers in a cluster have to be from the same SKU, so choosing the SKU gives you the ratio of the specific properties you can vary. For example, IO or network capacity can be important for the performance of a container. If that is the case, add it to the comparison. 
Final thoughts
Running your containers in a scenario where you need failover capacity adds a lot more resources to your system. In this case, to run 11 CPUs you need 24 CPUs on the worker nodes. That seems a lot of overhead. Running more containers or workloads on the same cluster will produce less overhead in unused CPUs. Sometimes it is better to downsize your largest container and run it as multiple small containers. This gives you more usable capacity in the cluster. Your wast in CPUs is at least the Max_Container_Size * Worker_Count. Choosing larger workers means you need fewer workers. That can bring the costs down. A good next step can be containers as a service, where the cluster is totally managed for you. 


Photo byfrank mckenna


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: