Periodically reconcile ci_runner_versions
What does this MR do and why?
Describe in detail what your merge request does and why.
This MR adds a cron job that runs every hour to ensure that the new ci_runner_versions
table is correctly populated based on the distinct values of ci_runners.version
and the calculated values of Gitlab::Ci::RunnerUpgradeCheck#check_runner_upgrade_status
for each version.
In the future, we may want to react to changes in Ci::Runner#version
in order to react more quickly instead of a fixed hourly schedule.
Part of GraphQL: Expose filter on GitLab Runner upgrade... (#358406 - closed)
Related MRs
- Add cron_job for updating of ci_runner_versions (gitlab-org/charts/gitlab!2666 - merged)
- Add ci_runner_versions_reconciliation_worker se... (omnibus-gitlab!6198 - merged)
Screenshots or screen recordings
These are strongly recommended to assist reviewers and reduce the time to merge your change.
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
- Open the GDK console
- Run
Ci::Runners::ReconcileExistingRunnerVersionsService.new.execute
- You should see the results in the form of
{:total_inserted=>0, :total_updated=>0, :total_deleted=>0, :status=>:success}
. Make sure that the returned values make sense for the state of your instance. Running a second time should return 0 for all hashtotal_*
values.
Database plans
Check if runners exist (/app/models/concerns/bulk_insert_safe.rb:138)
SELECT 1 AS one
FROM "ci_runners"
LIMIT 1
Limit (cost=0.00..0.38 rows=1 width=4) (actual time=12.213..12.216 rows=1 loops=1)
Buffers: shared read=1 dirtied=1
I/O Timings: read=9.402 write=0.000
-> Seq Scan on public.ci_runners (cost=0.00..649728.87 rows=1704087 width=4) (actual time=12.210..12.211 rows=1 loops=1)
Buffers: shared read=1 dirtied=1
I/O Timings: read=9.402 write=0.000
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/10848/commands/38991
Getting distinct versions from ci_runners (/app/models/concerns/bulk_insert_safe.rb:157)
SELECT DISTINCT "ci_runners"."version"
FROM "ci_runners"
Unique (cost=0.43..44379.97 rows=726 width=7) (actual time=0.164..842.805 rows=1288 loops=1)
Buffers: shared hit=73065 read=8226
I/O Timings: read=213.883 write=0.000
-> Index Only Scan using index_ci_runners_on_version on public.ci_runners (cost=0.43..40520.25 rows=1543885 width=7) (actual time=0.163..558.817 rows=1543885 loops=1)
Heap Fetches: 108786
Buffers: shared hit=73065 read=8226
I/O Timings: read=213.883 write=0.000
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/10848/commands/38994
Insert new versions into ci_runner_versions (/app/models/concerns/bulk_insert_safe.rb:163)
INSERT INTO "ci_runner_versions" ("version", "status")
VALUES ('14.10.0', NULL), ('14.10.0~beta.35.g0cb1a1ef', NULL), ('14.10.1', NULL), ('14.11.0~beta.29.gd0c550e3', NULL), ('development version', NULL)
ON CONFLICT
DO NOTHING
RETURNING "version"
ModifyTable on public.ci_runner_versions (cost=0.00..0.06 rows=5 width=34) (actual time=6.747..6.893 rows=5 loops=1)
Buffers: shared hit=39 read=3 dirtied=7 written=4
I/O Timings: read=0.236 write=0.000
-> Values Scan on "*VALUES*" (cost=0.00..0.06 rows=5 width=34) (actual time=0.004..0.012 rows=5 loops=1)
I/O Timings: read=0.000 write=0.000
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/10848/commands/38999
Deleting stale versions (reconcile_existing_runner_versions_service.rb:47)
DELETE FROM "ci_runner_versions"
WHERE "ci_runner_versions"."version" NOT IN ( SELECT DISTINCT "ci_runners"."version"
FROM "ci_runners")
ModifyTable on public.ci_runner_versions (cost=44381.78..44433.88 rows=644 width=6) (actual time=671.158..671.161 rows=0 loops=1)
Buffers: shared hit=81300
I/O Timings: read=0.000 write=0.000
-> Seq Scan on public.ci_runner_versions (cost=44381.78..44433.88 rows=644 width=6) (actual time=671.156..671.158 rows=0 loops=1)
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 1288
Buffers: shared hit=81300
I/O Timings: read=0.000 write=0.000
SubPlan 1
-> Unique (cost=0.43..44379.97 rows=726 width=7) (actual time=0.047..668.919 rows=1288 loops=1)
Buffers: shared hit=81291
I/O Timings: read=0.000 write=0.000
-> Index Only Scan using index_ci_runners_on_version on public.ci_runners (cost=0.43..40520.25 rows=1543885 width=7) (actual time=0.046..385.292 rows=1543885 loops=1)
Heap Fetches: 108786
Buffers: shared hit=81291
I/O Timings: read=0.000 write=0.000
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/10848/commands/39001
NOTE: This query contains a SeqScan, but the table of unique runner versions is bound to be a small table (< 2000 items in .com), so it should not be a problem.
Search for potential stale ci_runner_versions rows - step 1 (/app/models/concerns/each_batch.rb:62)
SELECT "ci_runner_versions"."version"
FROM "ci_runner_versions"
WHERE ("ci_runner_versions"."status" IN (1, 2, 0)
OR "ci_runner_versions"."status" IS NULL)
ORDER BY "ci_runner_versions"."version" ASC
LIMIT 1
Limit (cost=0.28..0.33 rows=1 width=21) (actual time=0.037..0.038 rows=1 loops=1)
Buffers: shared hit=3
I/O Timings: read=0.000 write=0.000
-> Index Scan using index_ci_runner_versions_on_version on public.ci_runner_versions (cost=0.28..70.66 rows=1288 width=21) (actual time=0.035..0.035 rows=1 loops=1)
Filter: ((ci_runner_versions.status = ANY ('{1,2,0}'::integer[])) OR (ci_runner_versions.status IS NULL))
Rows Removed by Filter: 0
Buffers: shared hit=3
I/O Timings: read=0.000 write=0.000
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/10848/commands/39002
Search for potential stale ci_runner_versions rows - step 2 (first window of 2000 versions, /app/models/concerns/each_batch.rb:81)
SELECT "ci_runner_versions"."version"
FROM "ci_runner_versions"
WHERE ("ci_runner_versions"."status" IN (1, 2, 0)
OR "ci_runner_versions"."status" IS NULL)
AND "ci_runner_versions"."version" >= '14.10.1'
ORDER BY "ci_runner_versions"."version" ASC
LIMIT 1 OFFSET 2000
Limit (cost=14.52..14.58 rows=1 width=21) (actual time=0.210..0.211 rows=0 loops=1)
Buffers: shared hit=14
I/O Timings: read=0.000 write=0.000
-> Index Scan using index_ci_runner_versions_on_version on public.ci_runner_versions (cost=0.28..14.52 rows=256 width=21) (actual time=0.078..0.191 rows=256 loops=1)
Index Cond: (ci_runner_versions.version >= '14.10.1'::text)
Filter: ((ci_runner_versions.status = ANY ('{1,2,0}'::integer[])) OR (ci_runner_versions.status IS NULL))
Rows Removed by Filter: 0
Buffers: shared hit=14
I/O Timings: read=0.000 write=0.000
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/10848/commands/39003
Search for potential stale ci_runner_versions rows - step 3 (reconcile_existing_runner_versions_service.rb:55)
SELECT "ci_runner_versions".*
FROM "ci_runner_versions"
WHERE ("ci_runner_versions"."status" IN (1, 2, 0)
OR "ci_runner_versions"."status" IS NULL)
AND "ci_runner_versions"."version" >= '14.10.1'
Index Scan using index_ci_runner_versions_on_version on public.ci_runner_versions (cost=0.28..14.52 rows=256 width=23) (actual time=0.086..0.226 rows=256 loops=1)
Index Cond: (ci_runner_versions.version >= '14.10.1'::text)
Filter: ((ci_runner_versions.status = ANY ('{1,2,0}'::integer[])) OR (ci_runner_versions.status IS NULL))
Rows Removed by Filter: 0
Buffers: shared hit=14
I/O Timings: read=0.000 write=0.000
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/10848/commands/39004
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.