Nullify duplicate runner authentication tokens
What does this MR do and why?
Describe in detail what your merge request does and why.
This MR adds background migrations to ensure that there are no duplicate runner tokens in the ci_runners
table. The goal is to later follow up with adding unique indices for two columns in the ci_runners
table: token
and token_encrypted
. While we do have logic in the application to prevent new duplicates, there is a chance that existing tables have duplicates (as seen in the .com database). It is similar to a previous MR that performed a similar action on the projects
table (runners_token*
columns).
This MR performs the following actions for each of the columns:
- Detect and nullify any duplicate runner registration tokens. This will cause the affected runners to become unusable. This should not be a problem since the only situation that could have explained this scenario is manual handling of the database rows a long time ago. These runners don't seem to be in use anymore;
Screenshots or screen recordings
These are strongly recommended to assist reviewers and reduce the time to merge your change.
description | screenshot |
---|---|
creating duplicates and running migration | |
batched_background_migration_jobs table after migration |
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
-
On the GDK Postgres console connected to the
ci
database (gdk psql -d gitlabhq_development_ci
), perform the following commands:SELECT id, token FROM "ci_runners" ORDER BY id LIMIT 10; INSERT INTO ci_runners (runner_type, token) VALUES (1, 'duplicate token'); INSERT INTO ci_runners (runner_type, token) VALUES (1, 'duplicate token');
-
Now that we have a couple of records with duplicate tokens, let's run the migration from this MR:
bundle exec rails db:migrate
in
log/application.log
you should be able to confirm that the migrations went through:17027:2022-09-23T11:37:58.405Z: {:message=>"BatchedJob transition", :batched_job_id=>12, :previous_state=>:pending, :new_state=>:running, :batched_migration_id=>12, :job_class_name=>"ResetDuplicateCiRunnersTokenEncryptedValues", :job_arguments=>[], :exception_class=>nil, :exception_message=>nil} 17028:2022-09-23T11:37:58.520Z: {:message=>"BatchedJob transition", :batched_job_id=>12, :previous_state=>:running, :new_state=>:succeeded, :batched_migration_id=>12, :job_class_name=>"ResetDuplicateCiRunnersTokenEncryptedValues", :job_arguments=>[], :exception_class=>nil, :exception_message=>nil} 17280:2022-09-23T11:42:03.052Z: {:message=>"BatchedJob transition", :batched_job_id=>14, :previous_state=>:pending, :new_state=>:running, :batched_migration_id=>14, :job_class_name=>"ResetDuplicateCiRunnersTokenEncryptedValues", :job_arguments=>[], :exception_class=>nil, :exception_message=>nil} 17281:2022-09-23T11:42:03.166Z: {:message=>"BatchedJob transition", :batched_job_id=>14, :previous_state=>:running, :new_state=>:succeeded, :batched_migration_id=>14, :job_class_name=>"ResetDuplicateCiRunnersTokenEncryptedValues", :job_arguments=>[], :exception_class=>nil, :exception_message=>nil}
-
Once the migration is complete, the new runners should have their tokens nullified. To verify:
SELECT id, token FROM "ci_runners" ORDER BY id LIMIT 10; SELECT * FROM batched_background_migration_jobs;
Migration output
These migrations are expected to take about 30 hours to run (based on ~1800 batches given 17.8M records currently in .com and a 1-minute delay between batches), most of which is spent waiting. Without the wait times, it would be less than one hour to run the migrations.
bin/rails db:migrate
main: == 20220922143143 ScheduleResetDuplicateCiRunnerTokenValues: migrating ========
main: -- The migration is skipped since it modifies the schemas: [:gitlab_ci].
main: -- This database can only apply migrations in one of the following schemas: [:gitlab_main, :gitlab_shared, :gitlab_internal].
main: == 20220922143143 ScheduleResetDuplicateCiRunnerTokenValues: migrated (0.0001s)
main: == 20220922143634 ScheduleResetDuplicateCiRunnerTokenEncryptedValues: migrating
main: -- The migration is skipped since it modifies the schemas: [:gitlab_ci].
main: -- This database can only apply migrations in one of the following schemas: [:gitlab_main, :gitlab_shared, :gitlab_internal].
main: == 20220922143634 ScheduleResetDuplicateCiRunnerTokenEncryptedValues: migrated (0.0000s)
ci: == 20220922143143 ScheduleResetDuplicateCiRunnerTokenValues: migrating ========
ci: == 20220922143143 ScheduleResetDuplicateCiRunnerTokenValues: migrated (0.0292s)
ci: == 20220922143634 ScheduleResetDuplicateCiRunnerTokenEncryptedValues: migrating
ci: == 20220922143634 ScheduleResetDuplicateCiRunnerTokenEncryptedValues: migrated (0.0136s)
bin/rails db:rollback RAILS_ENV=development STEP=2
main: == 20220922143634 ScheduleResetDuplicateCiRunnerTokenEncryptedValues: reverting
main: -- The migration is skipped since it modifies the schemas: [:gitlab_ci].
main: -- This database can only apply migrations in one of the following schemas: [:gitlab_main, :gitlab_shared, :gitlab_internal].
main: == 20220922143634 ScheduleResetDuplicateCiRunnerTokenEncryptedValues: reverted (0.0001s)
ci: == 20220922143634 ScheduleResetDuplicateCiRunnerTokenEncryptedValues: reverting
ci: == 20220922143634 ScheduleResetDuplicateCiRunnerTokenEncryptedValues: reverted (0.0196s)
main: == 20220922143143 ScheduleResetDuplicateCiRunnerTokenValues: reverting ========
main: -- The migration is skipped since it modifies the schemas: [:gitlab_ci].
main: -- This database can only apply migrations in one of the following schemas: [:gitlab_main, :gitlab_shared, :gitlab_internal].
main: == 20220922143143 ScheduleResetDuplicateCiRunnerTokenValues: reverted (0.0001s)
ci: == 20220922143143 ScheduleResetDuplicateCiRunnerTokenValues: reverting ========
ci: == 20220922143143 ScheduleResetDuplicateCiRunnerTokenValues: reverted (0.0228s)
Database query plans
https://gitlab.com/gitlab-org/gitlab/blob/pedropombeiro%2F375140%2F1-nullify-duplicate-tokens/lib/gitlab/background_migration/reset_duplicate_ci_runners_token_values.rb#L17-21
Job link: https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/12487/commands/44123
SELECT "ci_runners"."token"
FROM "ci_runners"
WHERE "ci_runners"."token" IN ( SELECT DISTINCT "ci_runners"."token"
FROM "ci_runners"
WHERE "ci_runners"."id" BETWEEN 1 AND 2000
AND "ci_runners"."id" >= 1
AND "ci_runners"."id" < 2000)
GROUP BY "ci_runners"."token"
HAVING (COUNT(*) > 1)
Query plan:
HashAggregate (cost=2661.16..2670.73 rows=255 width=27) (actual time=556.107..556.109 rows=0 loops=1)
Group Key: ci_runners.token
Filter: (count(*) > 1)
Rows Removed by Filter: 654
Buffers: shared hit=1741 read=707 dirtied=107
I/O Timings: read=533.858 write=0.000
-> Nested Loop (cost=1149.89..2657.34 rows=765 width=27) (actual time=121.841..554.178 rows=654 loops=1)
Buffers: shared hit=1741 read=707 dirtied=107
I/O Timings: read=533.858 write=0.000
-> HashAggregate (cost=1149.46..1157.11 rows=765 width=27) (actual time=118.307..118.913 rows=654 loops=1)
Group Key: ci_runners_1.token
Buffers: shared hit=185 read=142 dirtied=107
I/O Timings: read=109.985 write=0.000
-> Index Scan using ci_runners_pkey on public.ci_runners ci_runners_1 (cost=0.43..1147.55 rows=766 width=27) (actual time=3.695..117.538 rows=654 loops=1)
Index Cond: ((ci_runners_1.id >= 1) AND (ci_runners_1.id <= 2000) AND (ci_runners_1.id >= 1) AND (ci_runners_1.id < 2000))
Buffers: shared hit=185 read=142 dirtied=107
I/O Timings: read=109.985 write=0.000
-> Index Only Scan using index_ci_runners_on_token on public.ci_runners (cost=0.43..1.94 rows=1 width=27) (actual time=0.663..0.664 rows=1 loops=654)
Index Cond: (ci_runners.token = (ci_runners_1.token)::text)
Heap Fetches: 158
Buffers: shared hit=1556 read=565
I/O Timings: read=423.873 write=0.000
https://gitlab.com/gitlab-org/gitlab/blob/pedropombeiro%2F375140%2F1-nullify-duplicate-tokens/lib/gitlab/background_migration/reset_duplicate_ci_runners_token_encrypted_values.rb#L17-21
Job link: https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/12487/commands/44124
SELECT "ci_runners"."token_encrypted"
FROM "ci_runners"
WHERE "ci_runners"."token_encrypted" IN ( SELECT DISTINCT "ci_runners"."token_encrypted"
FROM "ci_runners"
WHERE "ci_runners"."id" BETWEEN 1 AND 2000
AND "ci_runners"."id" >= 1
AND "ci_runners"."id" < 2000)
GROUP BY "ci_runners"."token_encrypted"
HAVING (COUNT(*) > 1)
Query plan:
HashAggregate (cost=2772.39..2781.97 rows=255 width=56) (actual time=585.707..585.710 rows=0 loops=1)
Group Key: ci_runners.token_encrypted
Filter: (count(*) > 1)
Rows Removed by Filter: 654
Buffers: shared hit=2411 read=703
I/O Timings: read=567.887 write=0.000
-> Nested Loop (cost=1150.02..2768.56 rows=766 width=56) (actual time=7.371..583.747 rows=654 loops=1)
Buffers: shared hit=2411 read=703
I/O Timings: read=567.887 write=0.000
-> HashAggregate (cost=1149.46..1157.12 rows=766 width=56) (actual time=1.077..1.718 rows=654 loops=1)
Group Key: ci_runners_1.token_encrypted
Buffers: shared hit=327
I/O Timings: read=0.000 write=0.000
-> Index Scan using ci_runners_pkey on public.ci_runners ci_runners_1 (cost=0.43..1147.55 rows=766 width=56) (actual time=0.041..0.754 rows=654 loops=1)
Index Cond: ((ci_runners_1.id >= 1) AND (ci_runners_1.id <= 2000) AND (ci_runners_1.id >= 1) AND (ci_runners_1.id < 2000))
Buffers: shared hit=327
I/O Timings: read=0.000 write=0.000
-> Index Only Scan using index_ci_runners_on_token_encrypted on public.ci_runners (cost=0.55..2.08 rows=1 width=56) (actual time=0.884..0.888 rows=1 loops=654)
Index Cond: (ci_runners.token_encrypted = (ci_runners_1.token_encrypted)::text)
Heap Fetches: 158
Buffers: shared hit=2084 read=703
I/O Timings: read=567.887 write=0.000
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Part of Replace runner authentication token indices wit... (#375140 - closed) https://gitlab.com/gitlab-org/security/gitlab/-/issues/616+