Sample batched background migrations more quickly
What does this MR do and why?
Solves Batched background migration sampling times out... (gitlab-org/database-team/gitlab-com-database-testing#72 - closed) by changing how batches to be used by sampling are chosen.
Previously batches were chosen by:
- Iterating the entire table using the batching strategy (which always runs
each_batch
) over the entire table - Shuffling all the jobs
- Running jobs until a timeout
Instead, this strategy chooses jobs uniformly without iterating across the entire table.
It samples batches starting percentages of the way through the table, beginning at the beginning and end, then subdividing the distance over and over.
By choosing the batches without scanning the table, we avoid a very costly operation that sometimes timed out the 10 hour testing pipeline limit for large tables.
How to set up and validate locally
This new batching strategy was used in this report, since !95631 (merged) was running in to timeouts: !95631 (comment 1069771729)
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.