SQL n+1 in `GET /api/:version/groups/:id/projects`
The endpoint GET /api/:version/groups/:id/projects
appears to have a SQL n+1 issue.
At a guess, it appears that the problem is related to the include_subgroups=true
parameter, especially when a large page size is used. Worst case is around 1000 sql calls per request, but 95th percentile is over 400 per request.
- p95: 475.014 db calls
- max: 1,072 db calls
- samples 256781 (24 hours)
Mechanical Sympathy alert: https://gitlab.slack.com/archives/CM5EQH125/p1620310206029700
- Kibana Search of Rails log for db counts > 100
- Kibana Visualization for P50/75/95/99 for db count
- Kibana search of postgres log for query duration > 8 seconds
- Kibana search of View Durations > 5 seconds
- Rails Dashboard
Current status 2022/03/14
-
!80937 (merged) merged
- Feature flag for above will be enabled this week
- !81838 (merged) approaching final review
Next steps
-
Measure effects of removing N+1 for first_auto_devops_config
- !80937 (merged)- Improves maybe another 100 ms or so. Not huge, but it does reduce DB queries by about 100, also, depending on group hierarchy depth/breadth. We will measure full results in production using the dashboards.
-
Perform another analysis of the endpoint to determine next bottleneck, whether N+1 or slow query. Done with Stan's help - see #330140 (comment 848497207) -
Parse the data above to determine next steps.
Exit criteria
Dashboard: https://log.gprd.gitlab.net/goto/ca3934e0-8e74-11ec-a649-b7cbb8e4f62e
- Reduce % queries with DB count >100 from 14% to 5%
- Reduce 99th percentile of DB count from 779 to 400
- Reduce 99th percentile of DB duration from 2.2s to 1.5s
Edited by Drew Blessing