Add background migration to backfill casted_cluster_agent_id
What does this MR do and why?
In GraphQL filter for vulnerabilities by container... (#358763 - closed) we want to add filter for vulnerabilities by container images with vulnerabilities, to do so we need to change the column type that was erroneously set to text
instead of bigint
. This MR is changing that. In this MR we are adding background migration to handle backfilling casted_cluster_agent_id
with data from cluster_agent_id
column. The last MR will remove old column and rename casted_cluster_agent_id
to cluster_agent_id
. This is second MR of planned 3 (first was Change column type of vulnerability_reads.clust... (!88333 - merged)).
⋊> env VERBOSE=true bundle exec rake db:migrate:up:main VERSION=20220525201022
main: == 20220525201022 AddTemporaryIndexForVulnerabilityReadsClusterAgentIdMigration: migrating
main: -- transaction_open?()
main: -> 0.0000s
main: -- index_exists?(:vulnerability_reads, [:id, :report_type], {:name=>"tmp_index_vulnerability_reads_on_report_type", :algorithm=>:concurrently})
main: -> 0.0148s
main: -- execute("SET statement_timeout TO 0")
main: -> 0.0005s
main: -- add_index(:vulnerability_reads, [:id, :report_type], {:name=>"tmp_index_vulnerability_reads_on_report_type", :algorithm=>:concurrently})
main: -> 0.0266s
main: -- execute("RESET statement_timeout")
main: -> 0.0006s
main: -- transaction_open?()
main: -> 0.0000s
main: -- index_exists?(:cluster_agents, [:id, :project_id], {:name=>"tmp_index_cluster_agents_on_project_id", :algorithm=>:concurrently})
main: -> 0.0042s
main: -- add_index(:cluster_agents, [:id, :project_id], {:name=>"tmp_index_cluster_agents_on_project_id", :algorithm=>:concurrently})
main: -> 0.0041s
main: == 20220525201022 AddTemporaryIndexForVulnerabilityReadsClusterAgentIdMigration: migrated (0.0696s)
⋊> env VERBOSE=true bundle exec rake db:migrate:down:main VERSION=20220525201022
main: == 20220525201022 AddTemporaryIndexForVulnerabilityReadsClusterAgentIdMigration: reverting
main: -- transaction_open?()
main: -> 0.0000s
main: -- indexes(:vulnerability_reads)
main: -> 0.0175s
main: -- execute("SET statement_timeout TO 0")
main: -> 0.0006s
main: -- remove_index(:vulnerability_reads, {:algorithm=>:concurrently, :name=>"tmp_index_vulnerability_reads_on_report_type"})
main: -> 0.0050s
main: -- execute("RESET statement_timeout")
main: -> 0.0008s
main: -- transaction_open?()
main: -> 0.0000s
main: -- indexes(:cluster_agents)
main: -> 0.0055s
main: -- remove_index(:cluster_agents, {:algorithm=>:concurrently, :name=>"tmp_index_cluster_agents_on_project_id"})
main: -> 0.0025s
main: == 20220525201022 AddTemporaryIndexForVulnerabilityReadsClusterAgentIdMigration: reverted (0.0482s)
Queries
explain SELECT MIN(vulnerability_reads.id), MAX(vulnerability_reads.id) FROM vulnerability_reads;
Time: 17ms
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/10292/commands/36465
We have around ~14 millions records in vulnerability_reads
table (https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/10292/commands/36430), however we will perform updates for ~300000 of them (https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/10292/commands/36432 and https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/10292/commands/36475):
select vulnerability_reads.id from vulnerability_reads INNER JOIN cluster_agents ON CAST(vulnerability_reads.cluster_agent_id AS bigint) = cluster_agents.id AND vulnerability_reads.project_id = cluster_agents.project_id INNER JOIN projects ON vulnerability_reads.project_id = projects.id WHERE vulnerability_reads.report_type = 7;
Time: 9.225 s
- planning: 18.630 ms
- execution: 9.207 s
- I/O read: 6.139 s
- I/O write: 0.000 ms
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/10292/commands/36432
We have batches of size 2_500
=> 14_000_000 / 2_500 = 5600 loops; we have set interval delay for 2 minutes, so we have 5600 * 2.minutes = 11_200.minutes ~ 187.hours
.
Each batch will perform update in subbatches of 100:
explain select vulnerability_reads.id from vulnerability_reads INNER JOIN cluster_agents ON CAST(vulnerability_reads.cluster_agent_id AS bigint) = cluster_agents.id AND vulnerability_reads.project_id = cluster_agents.project_id INNER JOIN projects ON vulnerability_reads.project_id = projects.id WHERE vulnerability_reads.report_type = 7 AND vulnerability_reads.id BETWEEN 13750000 AND 13752000;
Time: 323.135 ms
- planning: 22.626 ms
- execution: 300.509 ms
- I/O read: 298.628 ms
- I/O write: 0.000 ms
https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/11077/commands/39693
Update query for subbatch
explain update vulnerability_reads SET casted_cluster_agent_id = CAST(vulnerability_reads.cluster_agent_id AS bigint) FROM cluster_agents WHERE CAST(vulnerability_reads.cluster_agent_id AS bigint) = cluster_agents.id AND vulnerability_reads.project_id = cluster_agents.project_id AND vulnerability_reads.report_type = 7 AND vulnerability_reads.id BETWEEN 13740000 AND 13740100;
Summary:
Time: 62.326 ms
- planning: 46.800 ms
- execution: 15.526 ms
- I/O read: 13.950 ms
- I/O write: 0.000 ms
https://console.postgres.ai/gitlab/gitlab-production-tunnel-pg12/sessions/11078/commands/39694
We are performing 20 subbatches for single batch (2_500 / 100 = 25
), each batch should finish within 2 minutes, so max time for single update is 2.minutes / 25 = 120.seconds / 25 = 4.8.seconds
, with additional indexes we are way below that value.
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to GraphQL filter for vulnerabilities by container... (#358763 - closed)