Nullify duplicate project runners tokens
What does this MR do and why?
Describe in detail what your merge request does and why.
This MR adds background migrations to ensure that there are no duplicate runner registration tokens in the projects
table. The goal is to later follow up with adding unique indices for two columns in the projects
table: runners_token
and runners_token_encrypted
. While we do have logic in the application to prevent new duplicates, there is a chance that existing tables have duplicates (as seen in the .com database).
This MR performs the following actions for each of the columns:
- Detect and nullify any duplicate runner registration tokens. This will cause a new registration token to be generated the next time it is needed. Due to the timing of https://gitlab.com/gitlab-com/gl-security/security-operations/sirt/operations/-/issues/2010 and subsequent decision to rotate all registration tokens, there should be minimal to no additional impact on customers;
Screenshots or screen recordings
These are strongly recommended to assist reviewers and reduce the time to merge your change.
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
Migration output
These migrations are expected to take about 30 hours to run (based on ~1800 batches given 17.8M records currently in .com and a 1-minute delay between batches), most of which is spent waiting. Without the wait times, it would be less than one hour to run the migrations.
bin/rails db:migrate
== 20220308115219 ScheduleResetDuplicateCiRunnersTokenEncryptedValuesOnProjects: migrating
-- transaction_open?()
-> 0.0000s
-- index_exists?("projects", [:id, :runners_token_encrypted], {:unique=>false, :name=>"tmp_index_projects_on_id_and_runners_token_encrypted", :algorithm=>:concurrently})
-> 0.0254s
-- execute("SET statement_timeout TO 0")
-> 0.0007s
-- add_index("projects", [:id, :runners_token_encrypted], {:unique=>false, :name=>"tmp_index_projects_on_id_and_runners_token_encrypted", :algorithm=>:concurrently})
-> 0.0047s
-- execute("RESET statement_timeout")
-> 0.0007s
-- Scheduled 1 ResetDuplicateCiRunnersTokenEncryptedValuesOnProjects jobs with a maximum of 10000 records per batch and an interval of 120 seconds.
The migration is expected to take at least 120 seconds. Expect all jobs to have completed after 2022-03-14 14:43:17 UTC."
-- transaction_open?()
-> 0.0000s
-- index_exists?("projects", {:name=>"tmp_index_projects_on_id_and_runners_token_encrypted"}, {:algorithm=>:concurrently})
-> 0.0215s
== 20220308115219 ScheduleResetDuplicateCiRunnersTokenEncryptedValuesOnProjects: migrated (0.2702s)
== 20220308115502 ScheduleResetDuplicateCiRunnersTokenValuesOnProjects: migrating
-- transaction_open?()
-> 0.0000s
-- index_exists?("projects", [:id, :runners_token], {:unique=>false, :name=>"tmp_index_projects_on_id_and_runners_token", :algorithm=>:concurrently})
-> 0.0207s
-- add_index("projects", [:id, :runners_token], {:unique=>false, :name=>"tmp_index_projects_on_id_and_runners_token", :algorithm=>:concurrently})
-> 0.0036s
-- Scheduled 0 ResetDuplicateCiRunnersTokenValuesOnProjects jobs with a maximum of 10000 records per batch and an interval of 120 seconds.
The migration is expected to take at least 0 seconds. Expect all jobs to have completed after 2022-03-14 14:41:17 UTC."
-- transaction_open?()
-> 0.0000s
-- index_exists?("projects", {:name=>"tmp_index_projects_on_id_and_runners_token"}, {:algorithm=>:concurrently})
-> 0.0211s
== 20220308115502 ScheduleResetDuplicateCiRunnersTokenValuesOnProjects: migrated (0.0565s)
bin/rails db:rollback RAILS_ENV=development STEP=2
== 20220308115502 ScheduleResetDuplicateCiRunnersTokenValuesOnProjects: reverting
== 20220308115502 ScheduleResetDuplicateCiRunnersTokenValuesOnProjects: reverted (0.0000s)
== 20220308115219 ScheduleResetDuplicateCiRunnersTokenEncryptedValuesOnProjects: reverting
== 20220308115219 ScheduleResetDuplicateCiRunnersTokenEncryptedValuesOnProjects: reverted (0.0000s)
Database query plans
runners_token_encrypted
Fetch initial record
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/9157/commands/32465
SELECT
"projects"."id"
FROM
"projects"
WHERE
"projects"."runners_token_encrypted" IS NOT NULL
ORDER BY
"projects".id ASC
LIMIT 1
Limit (cost=0.56..0.60 rows=1 width=4) (actual time=0.331..0.332 rows=1 loops=1)
Buffers: shared hit=1 read=4
I/O Timings: read=0.243 write=0.000
-> Index Only Scan using tmp_index_projects_on_id_and_runners_token_encrypted on public.projects (cost=0.56..621254.58 rows=17816628 width=4) (actual time=0.329..0.329 rows=1 loops=1)
Heap Fetches: 0
Buffers: shared hit=1 read=4
I/O Timings: read=0.243 write=0.000
Fetch end of batch
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/9157/commands/32466
SELECT
"projects".id
FROM
"projects"
WHERE
"projects".runners_token_encrypted IS NOT NULL
AND "projects".id >= 1
ORDER BY
"projects".id ASC
LIMIT 1 OFFSET 10000
Limit (cost=374.26..374.29 rows=1 width=4) (actual time=41.750..41.752 rows=1 loops=1)
Buffers: shared hit=9360 read=562 dirtied=174
I/O Timings: read=34.669 write=0.000
-> Index Only Scan using tmp_index_projects_on_id_and_runners_token_encrypted on public.projects (cost=0.56..665796.15 rows=17816628 width=4) (actual time=0.024..41.118 rows=10001 loops=1)
Index Cond: (projects.id >= 1)
Heap Fetches: 464
Buffers: shared hit=9360 read=562 dirtied=174
I/O Timings: read=34.669 write=0.000
Fetch records in window
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/9157/commands/32468
SELECT
"projects".id
FROM
"projects"
WHERE
"projects".runners_token_encrypted IS NOT NULL
AND "projects".id >= 1
AND "projects".id < 116681
Index Only Scan using tmp_index_projects_on_id_and_runners_token_encrypted on public.projects (cost=0.56..5500.76 rows=51725 width=4) (actual time=0.029..4.774 rows=10000 loops=1)
Index Cond: ((projects.id >= 1) AND (projects.id < 116681))
Heap Fetches: 464
Buffers: shared hit=9921
I/O Timings: read=0.000 write=0.000
Find the duplicate tokens from the batch
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/9157/commands/32473
SELECT
"projects"."id"
FROM
"projects"
WHERE
"projects"."id" IN (<10000 tokens>)
GROUP BY
"projects"."id"
HAVING (COUNT(*) > 1)
Aggregate (cost=0.56..21079.67 rows=3333 width=4) (actual time=134.946..134.948 rows=0 loops=1)
Group Key: projects.id
Filter: (count(*) > 1)
Rows Removed by Filter: 10000
Buffers: shared hit=49623 read=201 dirtied=7
I/O Timings: read=101.612 write=0.000
-> Index Only Scan using projects_pkey on public.projects (cost=0.56..20904.67 rows=10000 width=4) (actual time=13.279..129.595 rows=10000 loops=1)
Index Cond: (projects.id = ANY ('{...}'::integer[]))
Heap Fetches: 472
Buffers: shared hit=49623 read=201 dirtied=7
I/O Timings: read=101.612 write=0.000
The case for the non-encrypted runners_token
column is the same, except that the number of rows containing values in .com is much smaller (193688 for runners_token
vs 17597284 for runners_token_encrypted
).
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Part of #354027 (closed)