Support SQL literal keyset values
What does this MR do?
This MR changes the condition building logic in the keyset pagination library in a way that eliminates the conditions (in ruby) on the passed in cursor values. The changes will mostly affect queries where NULL
values are anticipated.
Why?
The groupoptimize plans to implement efficient group level queries based on this PoC. The optimization heavily relies on recursive queries and SQL literal keyset values. This requires the underlying library to generate SQL queries that will work with "unknown" values: actual value or NULL. This makes things a bit more complicated because we cannot just do column=x
because column=NULL
returns NULL
.
A cursor value can be the following SQL expression: ARRAY[1,2,3][2]
(2nd element of an array which is defined somewhere in the query.
Example
For example, when the nulls are located at the beginning of the resultset (ASC
), we add the following condition to capture the potential rows with actual values: (1,3,4...):
if column_definition.nulls_first? && value.blank?
conditions << column_definition.column_expression.not_eq(nil) # relative_position IS NOT NULL
Relative positions according to the sorting (relative_position ASC
):
NULL
NULL
NULL
NULL
1
3
4
5
6
The value.blank?
check needs to be eliminated and replaced with an SQL equivalent. Let's assume that the value is 3. This is how the condition would look like:
relative_position IS NOT NULL AND 3 IS NULL
This evaluates the expression as false and no rows will be returned. Query when the relative position value is nil
:
relative_position IS NOT NULL AND NULL IS NULL
This returns the rows: 1, 3, 4, 5, 6
Query changes
Keyset OR
query, old:
SELECT "issues".*
FROM "issues"
WHERE (("issues"."id" > 147
AND "issues"."relative_position" = 30)
OR ("issues"."relative_position" > 30))
ORDER BY "issues"."relative_position" ASC,
"issues"."id" ASC
LIMIT 10
Keyset OR
query, new:
SELECT "issues".*
FROM "issues"
WHERE (("issues"."id" > 147
AND "issues"."relative_position" = 30
AND 30 IS NOT NULL)
OR ("issues"."id" > 147
AND "issues"."relative_position" IS NULL
AND 30 IS NULL)
OR ("issues"."relative_position" > 30)
OR ("issues"."relative_position" IS NOT NULL
AND 30 IS NULL))
ORDER BY "issues"."relative_position" ASC,
"issues"."id" ASC
LIMIT 10
The same query but with UNION
, old:
SELECT "issues".*
FROM (
(SELECT "issues".*
FROM "issues"
WHERE ("issues"."id" > 147
AND "issues"."relative_position" = 30)
ORDER BY "issues"."relative_position" ASC, "issues"."id" ASC
LIMIT 10)
UNION ALL
(SELECT "issues".*
FROM "issues"
WHERE ("issues"."relative_position" > 30)
ORDER BY "issues"."relative_position" ASC, "issues"."id" ASC
LIMIT 10)) issues
ORDER BY "issues"."relative_position" ASC,
"issues"."id" ASC
LIMIT 10
The same query but with UNION
, new:
SELECT "issues".*
FROM (
(SELECT "issues".*
FROM "issues"
WHERE ("issues"."id" > 147
AND "issues"."relative_position" = 30
AND 30 IS NOT NULL)
ORDER BY "issues"."relative_position" ASC, "issues"."id" ASC
LIMIT 10)
UNION ALL
(SELECT "issues".*
FROM "issues"
WHERE ("issues"."id" > 147
AND "issues"."relative_position" IS NULL
AND 30 IS NULL)
ORDER BY "issues"."relative_position" ASC, "issues"."id" ASC
LIMIT 10)
UNION ALL
(SELECT "issues".*
FROM "issues"
WHERE ("issues"."relative_position" > 30)
ORDER BY "issues"."relative_position" ASC, "issues"."id" ASC
LIMIT 10)
UNION ALL
(SELECT "issues".*
FROM "issues"
WHERE ("issues"."relative_position" IS NOT NULL
AND 30 IS NULL)
ORDER BY "issues"."relative_position" ASC, "issues"."id" ASC
LIMIT 10)) issues
ORDER BY "issues"."relative_position" ASC,
"issues"."id" ASC
LIMIT 10
Performance
The query change will not affect the performance significantly (only parsing time). We do have more conditions however these conditions will be turned to a no-op, for example: 30 IS NULL
will make the whole subquery a no-op.
Screenshots or Screencasts (strongly suggested)
Does this MR meet the acceptance criteria?
Conformity
-
I have included changelog trailers, or none are needed. (Does this MR need a changelog?) -
I have added/updated documentation, or it's not needed. (Is documentation required?) -
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?) -
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides. -
This change is backwards compatible across updates, or this does not apply.
Availability and Testing
-
I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.) -
I have tested this MR in all supported browsers, or it's not needed. -
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.
Security
Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Related to #335388 (closed)