Investigate slow /api/:version/users query
Summary
Slow API requests due to offset-based pagination. A switch to keyset-based pagination woulp help.
More details on approach to fix
- We should change this to not use offset. Since offset inhibits paginating via an index and thus puts a lot of load on postgres.
- We should use keyset pagination
- Examples of where we use keyset pagination for the rest api:
- https://docs.gitlab.com/ee/api/rest/index.html#keyset-based-pagination
lib/gitlab/pagination/cursor_based_keyset.rb
- The remaining challenge here is that it would be a breaking change if we removed offset pagination so we will need to support both for some period of time and hope that customers move over to keyset pagination.
- In the meantime, there is a way to automatically "switching" from offset pagination to keyset pagination if the following conditions are present:
- The integration uses the next page headers for loading the next page. (https://docs.gitlab.com/ee/api/rest/#pagination-link-header)
- The integration doesn't use the count and the current-page headers.
- Integration requests a page.
- In the next page LINK header generate add an extra parameter to the next page query: ?uses_link_header=true
- If an API call has the uses_link_header=true query parameter, we know that they take the next page url from the link headers so we can generate the keyset-paginated url and provide it to the integration: ?cursor=xyz
- The next request will start using keyset pagination
Details on problem
Query:
SELECT "users".* FROM "users"
WHERE "users"."state" NOT IN ($1, $2, $3)
AND ("users"."state" IN ($4))
AND "users"."user_type" IN ($5, $6, $7, $8)
AND "users"."user_type" IN ($9, $10, $11, $12)
AND "users"."user_type" IN ($13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26)
ORDER BY "users"."id" DESC
LIMIT $27
OFFSET $28
Example of a single user scraping /api/v4/users
and paginating a lot:
Edited by Jessie Young