Improve background migrations for primary key conversions
This is an issue that spun out of the work of #288004 (closed), although there has been a long-standing desire to improve background migrations.
Background
The first primary key targeted for conversion was events
. Due to the large size of the table, the current estimate for the background migration to complete is ~6 months. This estimate is based on timings from database-lab, where the limiting factor is the number of records that can be updated per batch. Due to differences between database-lab and production hardware, we feel confident that the batch size can be increased for production, but this is difficult to verify. By improving background migrations, we can solve this problem without investing a lot of manual effort, and lay the groundwork for a better system.
There are several drawbacks to the background migration approach:
- Scheduling migrations on large tables is a heavy process. It requires batching over the entire table in a migration, and can result in tens or possibly even hundreds of thousands of jobs being enqueued, with many not executing for weeks.
- Migration execution is completely static. Once the jobs are enqueued, they are frozen. There is no way to reconfigure, pause, or abort the migration.
- Most migrations provide no insight on their execution. While we did add some rudimentary tracking with partitioning, we can improve on this.
Goals
For the primary key conversion, we have to migrate some of the largest tables in the database, and any tables with foreign keys referencing those columns. We can't easily reason about how these migrations will perform, and once kicked off can run for a long period of time. The two biggest requirements for these conversion migrations are:
- Allow migrations to be configurable. We should be able to adjust the batch-sizes of the migrations on the fly, based on monitoring from the production environment.
- Allow migrations to be parallelizable. This means that not only can multiple background migrations be run in parallel, but also a single migration can be broken up and run in parallel as well. This is a necessity given the size of the tables that need to be processed in order to convert all columns.
But this also requires changes that will improve the efficiency of scheduling and provide better tracing over in-progress migrations.
Design
The approach we discussed:
- Introduce a new table which tracks background migrations. This would include details like:
- migration class name
- details for batching: table, min/max values for the migration, batch size
- delay interval between jobs
- status
- Build on the existing job tracking table, adding details like:
- job start/stop id
- tracking for number of attempts
- At the time the migration migration is "enqueued", rather than walk the table and enqueue each job, simply create the record(s) in the migration tracking table.
- Add a new
sidekiq
cron job which periodically scans the above table and enqueues necessary jobs based on their interval. - The enqueued job calculates its own batch range based on previous executions and currently configured batch size, executes, and updates the status in the database when complete. We can also use an
ExclusiveLease
here to ensure we only have one active job at a time. - Failed jobs can be automatically retried N times based on their database status, to keep consistent behavior with current
sidekiq
retries.
Parallelization
This adds to the complexity of the initial changes, so we should avoid tackling this directly at the outset. For the time being, we can simulate this by creating multiple entries in the migration tracking table with different start/stop ranges to batch over. For example, with a table containing 1B records, we can create four records in the migration tracking table, with ids of 1..250M, 250M..500M, 500M..750M, 750M..1B.
This will still be picked up and executed as four different migrations, and occur in parallel, and we can build on this idea later.