Implement generic keyset pagination
What does this MR do?
This MR implements utility classes for building generic keyset paginated ActiveRecord
queries.
Where we could use this?
- GraphQL API, replacing the existing keyset pagination implementation.
- REST API
- Batch processing, background jobs. Iterating over large volume of data.
Why keyset pagination?
https://use-the-index-luke.com/no-offset
Current implementation(s)
- For the REST API we have an implementation for paginating by project id. (https://gitlab.com/gitlab-org/gitlab/-/tree/master/lib/gitlab/pagination/keyset.rb)
- GraphQL uses keyset pagination by default, there are some limitations though:
- Only two columns are supported
- For expressions (
LOWER
or other functions) extra parsing of the order values are required: see OrderInfo class - Some orders cannot be reversed easily, so extra checks are needed: see LastItems class
- Paginating backwards might not work: !50579 (merged)
- Tie breaker column is always the primary key
This MR contains
- Keyset pagination utility classes
- Testing the query builder with different ordering options using an "in-memory" table
- Simple iterator class to loop over records
- Update the GraphQL
merged_at
ordering forMergeRequest
to use a different tie breaker column - Update the GraphQL
similarity
ordering for projects
Compatibility with our GraphQL code
In the Gitlab::Graphql::Pagination::Keyset::KeysetExperiment
module overrides functionality in the Keyset::Connection
class to make the experimental implementation work if the ActiveRecord
scope is ordered by the new Order
class.
Idea for integration:
The OrderInfo
class could inspect the current ActiveRecord
scope and generate the Order
configuration (check primary key, order column, nullable) using the new implementation. If generating the configuration is not possible, raise an error and inform the user that the order needs to be configured manually, like for the MergeRequest#order_merged_at
scope.
Performance
Similar to the existing GraphQL pagination implementation.
Looking at the queries (original GraphQL keyset implementation and this MR), I noticed that we use OR
conditions when paginating and ordering by two columns. The last column is usually a tie-breaker which is used for non-distinct fields, for example: ORDER BY created_at, id
.
Unfortunately this makes keyset pagination as slow as a standard OFFSET + LIMIT
query and won't scale well. Alternatively we could try transforming the OR
queries into UNION
to leverage index scans.
Usage example
Keyset pagination based on two columns
merge_request_metrics_merged_at, merge_request_metrics.id
columns.
Ci::Pipeline
Keyset pagination with Snippet: ci_pipelines_keyset.rb
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Related to !50579 (merged) and #281152 (closed), among others.