Store number of affected rows in metrics for batched background migrations
What does this MR do and why?
Update Gitlab::Database::BackgroundMigration::BatchMetrics
and add #instrument_operation
. This new method will store not only execution time for sub-batches, but also number of the affected records, assuming that these are returned from the background migration.
This is needed so that we can improve automatic batch size optimization, and take into account not only duration, but also work done.
How to set up and validate locally
- Create batched background migration like
# frozen_string_literal: true
# lib/gitlab/background_migration/dummy_bbm.rb
module Gitlab
module BackgroundMigration
class DummyBbm < BaseJob
include Gitlab::Database::DynamicModelHelpers
def perform(start_id, end_id, batch_table, batch_column, sub_batch_size, pause_ms)
parent_batch_relation = relation_scoped_to_range(batch_table, batch_column, start_id, end_id)
parent_batch_relation.each_batch(column: batch_column, of: sub_batch_size) do |sub_batch|
batch_metrics.instrument_operation(:update_all) do
sub_batch.where('id % 2 = 0').update_all('id = id')
end
end
end
def batch_metrics
@batch_metrics ||= Gitlab::Database::BackgroundMigration::BatchMetrics.new
end
private
def relation_scoped_to_range(source_table, source_key_column, start_id, stop_id)
define_batchable_model(source_table, connection: connection).where(source_key_column => start_id..stop_id)
end
end
end
end
- In rails console execute the following
helpers = ActiveRecord::Migration.new.extend(Gitlab::Database::MigrationHelpers)
helpers.queue_batched_background_migration(
'DummyBbm',
:projects,
:id,
job_interval: 1,
batch_size: 100,
max_batch_size: 100,
sub_batch_size: 5
)
active_migration = Gitlab::Database::BackgroundMigration::BatchedMigration.find 12 # Use the id returned from the previous command
Gitlab::Database::BackgroundMigration::BatchedMigrationRunner.new.run_entire_migration(active_migration)
pp active_migration.batched_jobs.map(&:metrics)
[{"timings"=>
{"update_all"=>
[0.00703599996631965,
0.001476000004913658,
0.0024839999969117343,
0.0013969999854452908,
0.0019730000058189034]},
"cmd_tuples"=>{"update_all"=>[2, 3, 2, 3, 0]}}] # <- number of rows updated for each sub-batch
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #351786
Edited by Krasimir Angelov