Recently we have created a k8s cluster to host our Microservices architecture. In this blog post, I like to share how we calculated the number of worker machines we needed and which considerations are to take into account.
The Microservices architecture consists of multiple small applications running on their own containers. The sizes of the containers vary from small, medium or large. Each container has a CPU and memory size assigned. We have 2 failover scenarios we want to handle:
- Failure of a container (restart)
- Failure of a worker (worker stops working)
One resource type
Failure of a container is handled by having a worker available that has enough CPU or memory to restart the container. This means that you always need at least a worker that has enough resources for the largest container to do a restart. The following formula gives us the number of workers we need to accommodate this requirement:
<br> Worker_Count = Total_Requested_Resources / (Worker_Node_Size - Max_Container_Size)<br>
To make the calculation easier we only take CPU size into account. Assume we have 3 CPU sizes (t-shirt sizes): large (2 CPU), medium (1 CPU) and small (0,5 CPU). Then we have 12 containers: 2 large, 4 medium and 6 small. The Total_Requested_Resources will be 11 CPU (2 * 2 + 4 * 1 + 6 * 0,5). When we can choose out worker CPU count 4 or 8:
<br> Worker_Count = 11 / (4 - 2) = 5,5<br>
<br> Worker_Count = 11 / (8 - 2) = 1,83<br>
Let’s say that 1 CPU costs one credit. To run on a CPU size 4 workers it costs 4 * 6 = 24 credits and when we run on a CPU size 8 workers 2 * 8 = 16 credits.
To handle the failure of a worker we can just add one extra worker to the cluster. The formula looks then like:
<br> Worker_Count = 1 + (Total_Requested_Resources / (Worker_Node_Size - Max_Container_Size))<br>
Then you will get the following costs: CPU size 4 workers it costs 4 * (6+1) = 28 credits and when we run on a CPU size 8 workers 8 * (2 + 1) = 24 credits.
Configure resource limits
Resource limits can be set in your docker file. By adding the following lines:
<br> resources:<br> limits:<br> cpu: "1"<br> memory: "4g"<br>
Handle multiple resource types (CPU, Memory)
You can have more than one resource type that can be variable. When containers have different resource needs, you have to be sure that all resources needed are available on at least one machine. Otherwise, your largest containers will not be able restart if your system is perfectly balanced. A container that only needs a little CPU and a lot of memory will complicate the calculation. T-Shirt sizing your containers makes the calculation easier. You make the same 3 sizes and give your containers resources in proportion to each other that make sense for your application:
T-Shirt size General purpose | CPU # | Memory in GB |
---|---|---|
Small | 0,5 | 2 |
Medium | 1 | 4 |
Large | 2 | 8 |
Running your containers in a scenario where you need failover capacity adds a lot more resources to your system. In this case, to run 11 CPUs you need 24 CPUs on the worker nodes. That seems a lot of overhead. Running more containers or workloads on the same cluster will produce less overhead in unused CPUs. Sometimes it is better to downsize your largest container and run it as multiple small containers. This gives you more usable capacity in the cluster. Your wast in CPUs is at least the Max_Container_Size * Worker_Count. Choosing larger workers means you need fewer workers. That can bring the costs down. A good next step can be containers as a service, where the cluster is totally managed for you.