Cgroups: add cpu_quota_us limit
What
- Add a new configuration under
cgroups
calledcpu_quota
to configurecfs_quota_us
for the parent cgroup https://docs.kernel.org/scheduler/sched-bwc.html?highlight=cfs_quota_us - Add a new configuration under
cgroups.repositories
calledcpu_quota
to configurecfs_quota_us
for the repository cgroup https://docs.kernel.org/scheduler/sched-bwc.html?highlight=cfs_quota_us - Add metrics
-
gitaly_cgroup_cpu_cfs_periods_total
: Read fromcpu.stat
nr_periods https://docs.kernel.org/scheduler/sched-bwc.html#statistics -
gitaly_cgroup_cpu_cfs_throttled_periods_total
: Read fromcpu.stat
nr_throttled https://docs.kernel.org/scheduler/sched-bwc.html#statistics -
gitaly_cgroup_cpu_cfs_throttled_seconds_total
: Read fromcpu.stat
throttled_time https://docs.kernel.org/scheduler/sched-bwc.html#statistics
-
- Add more test coverage when only specific values are set.
Why
At the moment we limit memory and CPU via
cpu.shares
which will only throttle a cgroup when there is contention on the CPU.
This means that potentially a single repository can still hog all of the
CPU on a gitaly node. We've seen a case of this in
gitlab-com/gl-infra/production#8318 (closed), a
single repository saturated the CPU, and the scheduler couldn't balance
the CPU for other tasks/requests to be scheduled.
We hoped CPU shares would be enough, but we need an upper CPU quota for gitaly cgroups so no single repository can fully saturate the CPU.
There are a few concerns that are addressed
Concern 1: cfs_period_us
cfs_period_us
is used to calculate the cfs_quota_us
(what we are
setting now), the default value seems to be
hardcoded
but the Linux kernel but this can be updated, so Gitaly is explicitly
settings this to 100ms (default value)
Concern 2: not using cfs_burst_us
This could allow for CPU bursts, even when they exceed the
cfs_quota_us
, we don't set this because it's available on the newer kernel
versions (5.15). The way users can avoid throttling is by
oversubscribing cfs_quota_us
Concern 3: Wasting available resources
When the user sets these we'll be artificially limiting the CPU that they consume, this can leave performance on the table when a repository is using all its quota, and no other process is using the CPU. This is the only drawback and one we are willing to take since it adds more reliability in the long run. We can reduce the effect of this by oversubscribing.
Concern 4: Observability
The kernel already exports stats which Gitaly exposes as, and also cadvisor
Reference: https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/17332