Deactivate mirrors that do not have the correct license
Background
We used to offer pull mirroring for free to private projects on GitLab.com. We stopped doing that in March 2020: https://about.gitlab.com/releases/2020/03/12/free-period-for-cicd-external-repositories/
When we did that, we ran into performance issues with the sheer number of free private mirrors we were loading into memory, only to reject: #212074 (closed)
We addressed this by checking the plan in the query, and only searching for mirrors that are either:
- On paid plans.
- On public projects. (As public projects get Gold features for free.) - this is no longer valid as of 2022-02-17 for new projects and 2022-07-01 for all free public projects.
Problem
This query got slower over time, because the database still had to inspect all of the 'old' pull mirrors: #216252 (closed)
This is because we search for all pull mirrors and put them in order where the one that was updated least frequently is first. As all the free private pull mirrors stopped being processed in March, they were always at the front of this set, and the query performance degraded.
This query is listed as a top 50 queries by total time in https://console.postgres.ai/gitlab/gitlab_production/reports/425/files/113266/md#postgres-checkup_K003
This performance issue was explored in #325503 (closed).
Workaround
#216252 (comment 334514544) describes a hacky workaround where we make sure that we only look for mirrors that are supposed to be next mirrored after we disabled free pull mirroring. This buys us some time, but has its own problems.
New problems
- If a user had a free private pull mirror that was due to be updated before 2020-03-28, and then either make the project public or pay for a plan, it won't resume automatically. They will need to force an update.
- The query was already complex, and now it has an extra wrinkle specifically for GitLab.com.
Proposal
Somehow mark these mirrors as inactive, in such a way that:
- It's cheap to do. (Maybe just a migration that only runs for GitLab.com?)
- It's easy for a user to restart their mirror.
Then we can remove all of the block inside if check_mirror_plans_in_query?
in UpdateAllMirrorsWorker#pull_mirrors_batch
, making the query much simpler and easier to reason about.
We can also remove the part of UpdateAllMirrorsWorker#schedule_mirrors!
that links to this issue in a comment.