Do not consider ci_runners ordering for job queue statistics
What does this MR do and why?
Describe in detail what your merge request does and why.
This MR makes a small fix to the recently reviewed !107694 (merged) so that only id: :desc
is used for sorting when taking the latest 100 CI jobs to compute statistics. Sorting by runner_id does not make sense, but instead of changing the original MR which is already in final approval, I've created this follow-up MR.
Part of #377963 (closed). The downstream MR !107999 (merged) which is based on this branch already has all approvals, and can be merged after this one.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
-
Go to the shell in your GDK
gitlab
directory and runbundle exec rake "gitlab:seed:runner_fleet[root]"
. This will seed your GDK with some runners and jobs required for testing this MR. -
Log into http://gdk.test:3000 as an administrator. Make sure you have an Ultimate license enabled.
-
Visit http://gdk.test:3000/-/graphql-explorer and enter the following query:
{ runners(type: INSTANCE_TYPE) { count jobsStatistics { queuedDuration { p50 p75 p90 p95 p99 } } } }
-
Upon running the query, you should see the different percentile values.
Database query plans
Old query
SQL query
SELECT PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY builds.duration) AS p50, PERCENTILE_DISC(0.75) WITHIN GROUP (ORDER
BY builds.duration) AS p75, PERCENTILE_DISC(0.9) WITHIN GROUP (ORDER BY builds.duration) AS p90,
PERCENTILE_DISC(0.95) WITHIN GROUP (ORDER BY builds.duration) AS p95,
PERCENTILE_DISC(0.99) WITHIN GROUP (ORDER BY builds.duration) AS p99
FROM (
SELECT ("ci_builds"."started_at" - "ci_builds"."queued_at") AS duration
FROM (WITH RECURSIVE "array_cte" AS MATERIALIZED (
SELECT "ci_runners"."id"
FROM "ci_runners"
WHERE "ci_runners"."runner_type" = 1), "recursive_keyset_cte" AS ((
SELECT NULL::ci_builds AS records, array_cte_id_array, ci_builds_runner_id_array,
ci_builds_id_array, 0::bigint AS count
FROM (
SELECT ARRAY_AGG("array_cte"."id") AS array_cte_id_array, ARRAY_AGG("ci_builds"."runner_id") AS
ci_builds_runner_id_array, ARRAY_AGG("ci_builds"."id") AS ci_builds_id_array
FROM (
SELECT "array_cte"."id"
FROM array_cte) array_cte
LEFT JOIN LATERAL (
SELECT "ci_builds"."runner_id" AS runner_id, "ci_builds"."id" AS id
FROM "ci_builds"
WHERE "ci_builds"."runner_id" = "array_cte"."id"
ORDER BY "ci_builds"."runner_id" DESC, "ci_builds"."id" DESC
LIMIT 1) ci_builds ON TRUE
WHERE "ci_builds"."runner_id" IS NOT NULL
AND "ci_builds"."id" IS NOT NULL) array_scope_lateral_query
LIMIT 1)
UNION ALL (
SELECT (
SELECT ci_builds
FROM "ci_builds"
WHERE "ci_builds"."id" = recursive_keyset_cte.ci_builds_id_array[position]
LIMIT 1), array_cte_id_array, recursive_keyset_cte.ci_builds_runner_id_array[:position_query.position - 1] ||
next_cursor_values.runner_id || recursive_keyset_cte.ci_builds_runner_id_array[position_query.position + 1:],
recursive_keyset_cte.ci_builds_id_array[:position_query.position - 1] || next_cursor_values.id ||
recursive_keyset_cte.ci_builds_id_array[position_query.position + 1:],
recursive_keyset_cte.count + 1
FROM recursive_keyset_cte, LATERAL (
SELECT runner_id, id, position
FROM UNNEST(ci_builds_runner_id_array, ci_builds_id_array)
WITH ORDINALITY AS u (runner_id, id, position)
WHERE runner_id IS NOT NULL
AND id IS NOT NULL
ORDER BY 1 DESC, 2 DESC
LIMIT 1) AS position_query, LATERAL (
SELECT "record"."runner_id", "record"."id"
FROM (
VALUES (NULL, NULL)) AS nulls
LEFT JOIN (
SELECT "ci_builds"."runner_id" AS runner_id, "ci_builds"."id" AS id
FROM "ci_builds"
WHERE "ci_builds"."runner_id" = recursive_keyset_cte.array_cte_id_array[position]
AND (("ci_builds"."runner_id", "ci_builds"."id") <
(recursive_keyset_cte.ci_builds_runner_id_array[position],
recursive_keyset_cte.ci_builds_id_array[position]))
ORDER BY "ci_builds"."runner_id" DESC, "ci_builds"."id" DESC
LIMIT 1) record ON TRUE
LIMIT 1) AS next_cursor_values))
SELECT (records).*
FROM "recursive_keyset_cte" AS "ci_builds"
WHERE (count <> 0)) ci_builds
WHERE "ci_builds"."started_at" IS NOT NULL
LIMIT 100) builds
Execution plan
https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/14068/commands/49515
Aggregate (cost=627.94..627.95 rows=1 width=80) (actual time=942.373..942.387 rows=1 loops=1)
Buffers: shared hit=1230 read=477 dirtied=62
I/O Timings: read=914.891 write=0.000
-> Limit (cost=622.16..625.68 rows=100 width=16) (actual time=538.522..941.438 rows=100 loops=1)
Buffers: shared hit=1221 read=476 dirtied=62
I/O Timings: read=914.327 write=0.000
-> Subquery Scan on ci_builds (cost=622.16..625.68 rows=100 width=16) (actual time=538.520..941.399 rows=100 loops=1)
Buffers: shared hit=1221 read=476 dirtied=62
I/O Timings: read=914.327 write=0.000
-> CTE Scan on recursive_keyset_cte ci_builds_1 (cost=622.16..624.43 rows=100 width=3967) (actual time=538.461..941.212 rows=100 loops=1)
Filter: (((ci_builds_1.records).started_at IS NOT NULL) AND (ci_builds_1.count <> 0))
Rows Removed by Filter: 1
Buffers: shared hit=1221 read=476 dirtied=62
I/O Timings: read=914.327 write=0.000
CTE array_cte
-> Index Scan using index_ci_runners_on_runner_type on public.ci_runners (cost=0.43..81.64 rows=96 width=4) (actual time=4.942..136.997 rows=73 loops=1)
Index Cond: (ci_runners.runner_type = 1)
Buffers: shared hit=5 read=75 dirtied=26
I/O Timings: read=134.382 write=0.000
CTE recursive_keyset_cte
-> Recursive Union (cost=75.55..540.52 rows=101 width=136) (actual time=526.379..940.333 rows=101 loops=1)
Buffers: shared hit=1221 read=476 dirtied=62
I/O Timings: read=914.327 write=0.000
-> Limit (cost=75.55..75.57 rows=1 width=136) (actual time=526.373..526.378 rows=1 loops=1)
Buffers: shared hit=255 read=262 dirtied=51
I/O Timings: read=518.192 write=0.000
-> Subquery Scan on array_scope_lateral_query (cost=75.55..75.57 rows=1 width=136) (actual time=526.372..526.377 rows=1 loops=1)
Buffers: shared hit=255 read=262 dirtied=51
I/O Timings: read=518.192 write=0.000
-> Aggregate (cost=75.55..75.56 rows=1 width=96) (actual time=526.371..526.374 rows=1 loops=1)
Buffers: shared hit=255 read=262 dirtied=51
I/O Timings: read=518.192 write=0.000
-> Nested Loop (cost=0.70..74.82 rows=96 width=16) (actual time=18.264..525.968 rows=60 loops=1)
Buffers: shared hit=255 read=262 dirtied=51
I/O Timings: read=518.192 write=0.000
-> CTE Scan on array_cte (cost=0.00..1.92 rows=96 width=4) (actual time=4.946..137.314 rows=73 loops=1)
Buffers: shared hit=5 read=75 dirtied=26
I/O Timings: read=134.382 write=0.000
-> Subquery Scan on ci_builds_3 (cost=0.70..0.75 rows=1 width=12) (actual time=5.319..5.319 rows=1 loops=73)
Filter: ((ci_builds_3.runner_id IS NOT NULL) AND (ci_builds_3.id IS NOT NULL))
Rows Removed by Filter: 0
Buffers: shared hit=250 read=187 dirtied=25
I/O Timings: read=383.811 write=0.000
-> Limit (cost=0.70..0.74 rows=1 width=12) (actual time=5.316..5.316 rows=1 loops=73)
Buffers: shared hit=250 read=187 dirtied=25
I/O Timings: read=383.811 write=0.000
-> Index Only Scan using index_ci_builds_on_runner_id_and_id_desc on public.ci_builds ci_builds_4 (cost=0.70..1566.28 rows=45567 width=12) (actual time=5.313..5.313 rows=1 loops=73)
Index Cond: (ci_builds_4.runner_id = array_cte.id)
Heap Fetches: 36
Buffers: shared hit=250 read=187 dirtied=20
I/O Timings: read=383.811 write=0.000
-> Nested Loop (cost=0.87..46.29 rows=10 width=136) (actual time=4.133..4.135 rows=1 loops=100)
Buffers: shared hit=966 read=214 dirtied=11
I/O Timings: read=396.135 write=0.000
-> Nested Loop (cost=0.16..1.98 rows=10 width=112) (actual time=0.066..0.067 rows=1 loops=100)
Buffers: shared hit=6
I/O Timings: read=0.000 write=0.000
-> WorkTable Scan on recursive_keyset_cte (cost=0.00..0.20 rows=10 width=104) (actual time=0.000..0.001 rows=1 loops=100)
I/O Timings: read=0.000 write=0.000
-> Limit (cost=0.16..0.16 rows=1 width=20) (actual time=0.063..0.063 rows=1 loops=100)
Buffers: shared hit=6
I/O Timings: read=0.000 write=0.000
-> Sort (cost=0.16..0.18 rows=10 width=20) (actual time=0.062..0.062 rows=1 loops=100)
Sort Key: u.runner_id DESC, u.id DESC
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=6
I/O Timings: read=0.000 write=0.000
-> Function Scan on u (cost=0.01..0.11 rows=10 width=20) (actual time=0.018..0.035 rows=60 loops=100)
Filter: ((u.runner_id IS NOT NULL) AND (u.id IS NOT NULL))
Rows Removed by Filter: 0
I/O Timings: read=0.000 write=0.000
-> Limit (cost=0.71..0.78 rows=1 width=12) (actual time=2.539..2.539 rows=1 loops=100)
Buffers: shared hit=559 read=113 dirtied=9
I/O Timings: read=248.235 write=0.000
-> Nested Loop Left Join (cost=0.71..0.78 rows=1 width=12) (actual time=2.538..2.538 rows=1 loops=100)
Buffers: shared hit=559 read=113 dirtied=9
I/O Timings: read=248.235 write=0.000
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows=1 loops=100)
I/O Timings: read=0.000 write=0.000
-> Limit (cost=0.71..0.75 rows=1 width=12) (actual time=2.536..2.536 rows=1 loops=100)
Buffers: shared hit=559 read=113 dirtied=9
I/O Timings: read=248.235 write=0.000
-> Index Only Scan using index_ci_builds_on_runner_id_and_id_desc on public.ci_builds ci_builds_5 (cost=0.71..560.47 rows=15189 width=12) (actual time=2.533..2.533 rows=1 loops=100)
Index Cond: ((ROW(ci_builds_5.runner_id, ci_builds_5.id) < ROW((recursive_keyset_cte.ci_builds_runner_id_array)[u."position"], (recursive_keyset_cte.ci_builds_id_array)[u."position"])) AND (ci_builds_5.runner_id = (recursive_keyset_cte.array_cte_id_array)[u."position"]))
Heap Fetches: 255
Buffers: shared hit=559 read=113 dirtied=7
I/O Timings: read=248.235 write=0.000
SubPlan 2
-> Limit (cost=0.58..3.60 rows=1 width=1272) (actual time=1.510..1.510 rows=1 loops=100)
Buffers: shared hit=401 read=101 dirtied=2
I/O Timings: read=147.900 write=0.000
-> Index Scan using index_ci_builds_on_id_partition_id_unique on public.ci_builds ci_builds_2 (cost=0.58..3.60 rows=1 width=1272) (actual time=1.506..1.506 rows=1 loops=100)
Index Cond: (ci_builds_2.id = (recursive_keyset_cte.ci_builds_id_array)[u."position"])
Buffers: shared hit=399 read=101
I/O Timings: read=147.900 write=0.000
New query
SQL query
SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY builds.duration) AS p50, PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER
BY builds.duration) AS p75, PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY builds.duration) AS p90,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY builds.duration) AS p95,
PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY builds.duration) AS p99
FROM (
SELECT ("ci_builds"."started_at" - "ci_builds"."queued_at") AS duration
FROM (WITH RECURSIVE "array_cte" AS MATERIALIZED (
SELECT "ci_runners"."id"
FROM "ci_runners"
WHERE "ci_runners"."runner_type" = 1), "recursive_keyset_cte" AS ((
SELECT NULL::ci_builds AS records, array_cte_id_array, ci_builds_id_array,
0::bigint AS count
FROM (
SELECT ARRAY_AGG("array_cte"."id") AS array_cte_id_array, ARRAY_AGG("ci_builds"."id") AS ci_builds_id_array
FROM (
SELECT "array_cte"."id"
FROM array_cte) array_cte
LEFT JOIN LATERAL (
SELECT "ci_builds"."id" AS id
FROM "ci_builds"
WHERE "ci_builds"."runner_id" = "array_cte"."id"
ORDER BY "ci_builds"."id" DESC
LIMIT 1) ci_builds ON TRUE
WHERE "ci_builds"."id" IS NOT NULL) array_scope_lateral_query
LIMIT 1)
UNION ALL (
SELECT (
SELECT ci_builds
FROM "ci_builds"
WHERE "ci_builds"."id" = recursive_keyset_cte.ci_builds_id_array[position]
LIMIT 1), array_cte_id_array, recursive_keyset_cte.ci_builds_id_array[:position_query.position - 1] ||
next_cursor_values.id || recursive_keyset_cte.ci_builds_id_array[position_query.position + 1:],
recursive_keyset_cte.count + 1
FROM recursive_keyset_cte, LATERAL (
SELECT id, position
FROM UNNEST(ci_builds_id_array)
WITH ORDINALITY AS u (id, position)
WHERE id IS NOT NULL
ORDER BY 1 DESC
LIMIT 1) AS position_query, LATERAL (
SELECT "record"."id"
FROM (
VALUES (NULL)) AS nulls
LEFT JOIN (
SELECT "ci_builds"."id" AS id
FROM "ci_builds"
WHERE "ci_builds"."runner_id" = recursive_keyset_cte.array_cte_id_array[position]
AND ("ci_builds"."id" < recursive_keyset_cte.ci_builds_id_array[position])
ORDER BY "ci_builds"."id" DESC
LIMIT 1) record ON TRUE
LIMIT 1) AS next_cursor_values))
SELECT (records).*
FROM "recursive_keyset_cte" AS "ci_builds"
WHERE (count <> 0)) ci_builds
WHERE "ci_builds"."started_at" IS NOT NULL
AND "ci_builds"."runner_id" IS NOT NULL
LIMIT 100) builds
Execution plan
https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/14190/commands/49783
Aggregate (cost=555.99..556.00 rows=1 width=80) (actual time=7.218..7.224 rows=1 loops=1)
Buffers: shared hit=1628
I/O Timings: read=0.000 write=0.000
-> Limit (cost=550.24..553.75 rows=99 width=16) (actual time=1.728..7.112 rows=100 loops=1)
Buffers: shared hit=1618
I/O Timings: read=0.000 write=0.000
-> Subquery Scan on ci_builds (cost=550.24..553.75 rows=99 width=16) (actual time=1.726..7.095 rows=100 loops=1)
Buffers: shared hit=1618
I/O Timings: read=0.000 write=0.000
-> CTE Scan on recursive_keyset_cte ci_builds_1 (cost=550.24..552.51 rows=99 width=3967) (actual time=1.681..7.021 rows=100 loops=1)
Filter: (((ci_builds_1.records).started_at IS NOT NULL) AND ((ci_builds_1.records).runner_id IS NOT NULL) AND (ci_builds_1.count <> 0))
Rows Removed by Filter: 1
Buffers: shared hit=1618
I/O Timings: read=0.000 write=0.000
CTE array_cte
-> Index Scan using index_ci_runners_on_runner_type on public.ci_runners (cost=0.43..45.03 rows=54 width=4) (actual time=0.070..0.216 rows=73 loops=1)
Index Cond: (ci_runners.runner_type = 1)
Buffers: shared hit=78
I/O Timings: read=0.000 write=0.000
CTE recursive_keyset_cte
-> Recursive Union (cost=42.31..505.21 rows=101 width=104) (actual time=1.386..6.625 rows=101 loops=1)
Buffers: shared hit=1618
I/O Timings: read=0.000 write=0.000
-> Limit (cost=42.31..42.33 rows=1 width=104) (actual time=1.385..1.387 rows=1 loops=1)
Buffers: shared hit=505
I/O Timings: read=0.000 write=0.000
-> Subquery Scan on array_scope_lateral_query (cost=42.31..42.33 rows=1 width=104) (actual time=1.384..1.387 rows=1 loops=1)
Buffers: shared hit=505
I/O Timings: read=0.000 write=0.000
-> Aggregate (cost=42.31..42.32 rows=1 width=64) (actual time=1.384..1.386 rows=1 loops=1)
Buffers: shared hit=505
I/O Timings: read=0.000 write=0.000
-> Nested Loop (cost=0.70..42.04 rows=54 width=12) (actual time=0.197..1.335 rows=60 loops=1)
Buffers: shared hit=505
I/O Timings: read=0.000 write=0.000
-> CTE Scan on array_cte (cost=0.00..1.08 rows=54 width=4) (actual time=0.072..0.239 rows=73 loops=1)
Buffers: shared hit=78
I/O Timings: read=0.000 write=0.000
-> Subquery Scan on ci_builds_3 (cost=0.70..0.75 rows=1 width=8) (actual time=0.014..0.015 rows=1 loops=73)
Filter: (ci_builds_3.id IS NOT NULL)
Rows Removed by Filter: 0
Buffers: shared hit=427
I/O Timings: read=0.000 write=0.000
-> Limit (cost=0.70..0.74 rows=1 width=8) (actual time=0.014..0.014 rows=1 loops=73)
Buffers: shared hit=427
I/O Timings: read=0.000 write=0.000
-> Index Only Scan using index_ci_builds_on_runner_id_and_id_desc on public.ci_builds ci_builds_4 (cost=0.70..1522.52 rows=45431 width=8) (actual time=0.014..0.014 rows=1 loops=73)
Index Cond: (ci_builds_4.runner_id = array_cte.id)
Heap Fetches: 26
Buffers: shared hit=427
I/O Timings: read=0.000 write=0.000
-> Nested Loop (cost=0.86..46.09 rows=10 width=104) (actual time=0.051..0.051 rows=1 loops=100)
Buffers: shared hit=1113
I/O Timings: read=0.000 write=0.000
-> Nested Loop (cost=0.15..1.95 rows=10 width=80) (actual time=0.029..0.029 rows=1 loops=100)
Buffers: shared hit=3
I/O Timings: read=0.000 write=0.000
-> WorkTable Scan on recursive_keyset_cte (cost=0.00..0.20 rows=10 width=72) (actual time=0.000..0.000 rows=1 loops=100)
I/O Timings: read=0.000 write=0.000
-> Limit (cost=0.15..0.16 rows=1 width=16) (actual time=0.028..0.028 rows=1 loops=100)
Buffers: shared hit=3
I/O Timings: read=0.000 write=0.000
-> Sort (cost=0.15..0.18 rows=10 width=16) (actual time=0.028..0.028 rows=1 loops=100)
Sort Key: u.id DESC
Sort Method: top-N heapsort Memory: 25kB
Buffers: shared hit=3
I/O Timings: read=0.000 write=0.000
-> Function Scan on unnest u (cost=0.00..0.10 rows=10 width=16) (actual time=0.006..0.014 rows=60 loops=100)
Filter: (u.id IS NOT NULL)
Rows Removed by Filter: 0
I/O Timings: read=0.000 write=0.000
-> Limit (cost=0.71..0.78 rows=1 width=8) (actual time=0.012..0.012 rows=1 loops=100)
Buffers: shared hit=610
I/O Timings: read=0.000 write=0.000
-> Nested Loop Left Join (cost=0.71..0.78 rows=1 width=8) (actual time=0.011..0.011 rows=1 loops=100)
Buffers: shared hit=610
I/O Timings: read=0.000 write=0.000
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows=1 loops=100)
I/O Timings: read=0.000 write=0.000
-> Limit (cost=0.71..0.75 rows=1 width=8) (actual time=0.011..0.011 rows=1 loops=100)
Buffers: shared hit=610
I/O Timings: read=0.000 write=0.000
-> Index Only Scan using index_ci_builds_on_runner_id_and_id_desc on public.ci_builds ci_builds_5 (cost=0.71..547.05 rows=15144 width=8) (actual time=0.010..0.010 rows=1 loops=100)
Index Cond: ((ci_builds_5.runner_id = (recursive_keyset_cte.array_cte_id_array)[u."position"]) AND (ci_builds_5.id < (recursive_keyset_cte.ci_builds_id_array)[u."position"]))
Heap Fetches: 100
Buffers: shared hit=610
I/O Timings: read=0.000 write=0.000
SubPlan 2
-> Limit (cost=0.58..3.60 rows=1 width=1271) (actual time=0.006..0.006 rows=1 loops=100)
Buffers: shared hit=500
I/O Timings: read=0.000 write=0.000
-> Index Scan using index_ci_builds_on_id_partition_id_unique on public.ci_builds ci_builds_2 (cost=0.58..3.60 rows=1 width=1271) (actual time=0.006..0.006 rows=1 loops=100)
Index Cond: (ci_builds_2.id = (recursive_keyset_cte.ci_builds_id_array)[u."position"])
Buffers: shared hit=500
I/O Timings: read=0.000 write=0.000
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.