Create optimal indexes for created_at order
What does this MR do?
The context for this is #36390 (comment 250204180).
I'm going to add a few indexes to projects
that optimally support the projects API requests. That is mostly driven by the fact that we allow for various order_by
and sort
options:
order_by: Return projects ordered by id, name, path, created_at, updated_at, or last_activity_at fields. Default is created_at sort: Return projects sorted in asc or desc order. Default is desc
See https://docs.gitlab.com/ee/api/projects.html.
In this MR, we concentrate on the created_at
order.
Now, for unauthenticated requests, we always have the visibility_level=20
filter in place to match public projects. We have an existing index on visibility_level, created_at, id
.
Let's look at a query for the first page of GET /api/v4/projects?order_by=created_at&sort=desc
SELECT
"projects".*
FROM
"projects"
WHERE
"projects"."visibility_level" = 20
AND "projects"."pending_delete" = FALSE
ORDER BY
"projects"."created_at" DESC,
"projects"."id" DESC
LIMIT 20 OFFSET 0;
This nicely uses the existing index with a backward scan. Note we order by DESC, DESC
and the existing index is ordered by ASC, ASC
, so this is the exact opposite. This allows for the backwards scan: https://explain.depesz.com/s/760Tc
Let's try this for sort=asc
:
SELECT
"projects".*
FROM
"projects"
WHERE
"projects"."visibility_level" = 20
AND "projects"."pending_delete" = FALSE
ORDER BY
"projects"."created_at" ASC,
"projects"."id" DESC
LIMIT 20 OFFSET 0;
Note that now we order by ASC, DESC
and that is incompatible with any (forward/backward) scan on the index with order asc, asc
. Hence this results in reading all the data from the index and sorting it on the fly: https://explain.depesz.com/s/H1rE
I suspect that this is the primary reason we see these really high response times for some parameter combinations for the project API endpoint.
My aim is to one-by-one create optimal indexes for the sorting options we have. Measure the impact, rinse and repeat.
The alternative option is to adjust the tie breaker in https://gitlab.com/gitlab-org/gitlab/blob/ab/projects-api-created-at-indexes/lib/api/helpers.rb#L317 to match the sort-order of the actual order_by
field. In this case, we would only need one index in asc, asc
order and we would be able to scan it backwards in desc, desc
order.
The issue for the alternative solution is #195881 (closed). I would still like to go forward with this MR, as this is a low hanging fruit anyways. If we decide to change the tie breaker order, we can simply drop one of the indexes created here.