External project imports generate Gitaly host CPU saturation
Summary
Github imports create many long-running transactions that cause apdex SLO incidents for engineers on call. This can create a noisy neighbor situation as CPU resources appear to be the primary bottleneck. These incidents also are un-actionable for those on call, and create un-needed work.
Impact
- Causes a degradation in our Gitaly host apdex.
- Create un-actionable alerts and incidents for Engineers on Call.
- CPU saturation can lead to poor performance for some other projects on the same Gitaly host.
Relevant incidents:
Example apdex metrics:
Example Kibana logs for the most recent incident:
Recommendation
Throttle or rate limit the remote import process to minimize the impact on a Gitaly host node.
Verification
The best way I can think of is to perform our own imports of large public projects to see if we can validate the solution is working. We can also refer to logs for long running and high volume UserCreateBranch
and CommitDiff
method calls in Gitaly.