API endpoint /api/:version/groups/:id has n+1 across postgres, gitaly, and redis-cache
The /api/:version/groups/:id
endpoint can be used to amplify traffic to both redis-cache and gitaly.
Example incident:
Example request:
{
"route": "/api/:version/groups/:id",
"status": 200,
"duration_s": 4.48373,
"cpu_s": 2.86766,
"db_replica_count": 142,
"db_replica_duration_s": 0.239,
"gitaly_calls": 198,
"gitaly_duration_s": 1.329452,
"redis_cache_calls": 601,
"redis_cache_duration_s": 0.529326,
}
https://log.gprd.gitlab.net/goto/7574c8f8102082b3c79ab87d67db3f7f
The above request shows ~150 calls on the postgres replica, ~200 calls to gitaly, and ~600 calls to redis-cache. This is a vector for amplifying traffic and poses a scalability and DoS risk.
Related: &3533 (closed).
Verification
We want to vastly reduce or eliminate requests to this endpoint with 100+ gitaly calls.
https://log.gprd.gitlab.net/goto/a963966f642f7b9a18e667bb664aff31
Problem to solve
Reduce currently heavy usage of /api/:version/groups/:id
when performing queries against the projects
field. This field is already deprecated and will be removed in %15.0.
Proposal
Add a rate limit to this endpoint, which can be revisited when the projects
field is removed in %15.0.