Problem
With customers running more VMs and being liberal with the number of vCPU assigned to virtual machines, there are frequent complaints of performance degradation, CPU contention, higher CPU ready time, etc.
This article briefly covers the following topics.
CPU scheduling
Strict Co-scheduling
Relaxed Co-scheduling
CPU skew in virtual machines
Reducing the vCPU count
Let's look at how CPU scheduling works in a virtual machine (VM) environment. Imagine a system with 1 CPU socket and 4 physical cores, with no hyper-threading. You have two virtual machines: one with 4 vCPUs (VM1) and another with 2 vCPUs (VM2).
Strict Co-Scheduling
In strict co-scheduling, CPU cores are allocated to VMs in time slots. For example, during the first slot, all 4 cores go to VM1. VM2, with 2 vCPUs, has to wait, resulting in "CPU Ready" time. This alternating pattern causes both VMs to wait, affecting performance. Additionally, cores may be wasted if enough resources aren't available to run larger VMs, leaving gaps in utilization.
Relaxed Co-Scheduling
In relaxed co-scheduling, VMs use whatever cores are available. If a VM doesn’t get all its allocated vCPUs, it keeps running but some vCPUs may perform slower. This can lead to "CPU Skew," where some vCPUs lag behind.
Co-Stop
If CPU skew grows too large, ESXi may trigger Co-stop, limiting some cores and reducing overall performance.
Reducing vCPUs
Reducing the number of vCPUs is a solution. It improves performance.
For instance, if a VM with 4 vCPUs only uses 30%, reducing it to 2 vCPUs could increase CPU usage to 60%, improving efficiency without affecting performance. This also prevents wasted cores.
References:
https://kloudkonnect.wordpress.com/2019/08/23/understanding-esxi-cpu-scheduling/