Skip to content

Remove possible infinite loop from BackgroundMigrationWorker

What does this MR do?

We had an issue on production where background migration jobs that were scheduled were constantly getting re-scheduled every time they were run. These were very quick jobs (duration of only a second or two), and there were a lot of them, ~70,000. gitlab-com/gl-infra/production#2820 (closed). This became an infinite loop because the minimum_interval in background_migration_worker.rb keeps migration classes of the same class from running within 2 minutes of each other.

Originally thought about removing minimum_interval. However, this is still important. Instead we added a parameter to BackgroundMigrationWorker#perform called lease_attempts. This is basically a counter that gets decremented each time we try to obtain a lease on the migration class and fail. By default we will try 5 times, with a 2 min delay between each one.

So in general this will keep us from getting into an infinite loop. We can also specify a lease_attempt of 0 to bypass the check. This gives us the ability to run many very quick migrations in rapid succession, as discussed in the updated docs.

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Related to #267828 (closed)

Edited by Brett Walker

Merge request reports

Loading