Problem

With customers running more VMs and being liberal with the number of vCPU assigned to virtual machines, there are frequent complaints of performance degradation, CPU contention, higher CPU ready time, etc. 


This article briefly covers the following topics. 

CPU scheduling

Strict Co-scheduling

Relaxed Co-scheduling

CPU skew in virtual machines

Reducing the vCPU count

 


Let's look at how CPU scheduling works in a virtual machine (VM) environment. Imagine a system with 1 CPU socket and 4 physical cores, with no hyper-threading. You have two virtual machines: one with 4 vCPUs (VM1) and another with 2 vCPUs (VM2).



Strict Co-Scheduling

In strict co-scheduling, CPU cores are allocated to VMs in time slots. For example, during the first slot, all 4 cores go to VM1. VM2, with 2 vCPUs, has to wait, resulting in "CPU Ready" time. This alternating pattern causes both VMs to wait, affecting performance. Additionally, cores may be wasted if enough resources aren't available to run larger VMs, leaving gaps in utilization.



Relaxed Co-Scheduling

In relaxed co-scheduling, VMs use whatever cores are available. If a VM doesn’t get all its allocated vCPUs, it keeps running but some vCPUs may perform slower. This can lead to "CPU Skew," where some vCPUs lag behind.



Co-Stop

If CPU skew grows too large, ESXi may trigger Co-stop, limiting some cores and reducing overall performance.



Reducing vCPUs

Reducing the number of vCPUs is a solution. It improves performance. 


For instance, if a VM with 4 vCPUs only uses 30%, reducing it to 2 vCPUs could increase CPU usage to 60%, improving efficiency without affecting performance. This also prevents wasted cores.



References:

https://kloudkonnect.wordpress.com/2019/08/23/understanding-esxi-cpu-scheduling/