Remove possible infinite loop from BackgroundMigrationWorker
What does this MR do?
We had an issue on production where background migration jobs that were scheduled were constantly getting re-scheduled every time they were run. These were very quick jobs (duration of only a second or two), and there were a lot of them, ~70,000. gitlab-com/gl-infra/production#2820 (closed). This became an infinite loop because the minimum_interval
in background_migration_worker.rb
keeps migration classes of the same class from running within 2 minutes of each other.
Originally thought about removing minimum_interval
. However, this is still important. Instead we added a parameter to BackgroundMigrationWorker#perform
called lease_attempts
. This is basically a counter that gets decremented each time we try to obtain a lease on the migration class and fail. By default we will try 5 times, with a 2 min delay between each one.
So in general this will keep us from getting into an infinite loop. We can also specify a lease_attempt
of 0
to bypass the check. This gives us the ability to run many very quick migrations in rapid succession, as discussed in the updated docs.
Screenshots
Does this MR meet the acceptance criteria?
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Availability and Testing
-
Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process. -
Tested in all supported browsers -
Informed Infrastructure department of a default or new setting change, if applicable per definition of done
Security
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Related to #267828 (closed)