Better documentation and procedures for cleaning up background migrations
Today we had a call with @smcgivern, @stanhu, @DouweM, @sitschner, @pcarranza and I about background migrations, in particular about how to deal with cleanups that may take a long time depending on how fast users are with upgrading. GitLab employees can see the full notes here https://docs.google.com/document/d/1xeyoo7cbDx1FHMFa5owg2ks_HsnFpoCRhDK9Ll_fWrw/edit
The actions we determined that need to be taken are:
- Merge the migration (https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/12463) for 9.5.
- In the 9.5 release post announce the migration, and perhaps other background migrations (e.g. MR diffs)
- In the 10.0 release post warn people that they should first upgrade to 9.5 as they will otherwise need downtime that could take hours depending on their database size
- Document that downtime deploys can take a very long time now that we have background migrations
- Document that one should upgrade to the next minor version as soon as possible, and that if they can't they should keep at least 1 week between every minor release to allow background migrations to complete
- Document that stealing/background migration cleanups can only occur in major or minor releases, but never in patch releases
- Add some kind of check to the Omnibus package so a big warning is produced when upgrading more than 1 minor release at a time (e.g. from 9.4 straight to 10.0), with a link to the online upgrade guide (https://docs.gitlab.com/ee/update/README.html#upgrading-without-downtime)
@yorickpeterse can take care of the documentation side of things, but we'll need somebody with better Omnibus understanding to take care of the check (if we can even include that in Omnibus in the first place).