Draft: Add cron job for scheduling seat refreshes
What does this MR do and why?
Adds a cron job to to perform Gitlab subscription seat refreshes via a limited capacity worker (added previously)
For more info/context, see below.
How to setup and validate locally
- Enable the feature flag:
Feature.enable :limited_capacity_seat_refresh_worker
- Ensure you have at least one Namespace with a GitlabSubscription (i.e. purchase a plan for a group)
- Check how many you have that require a seat attribute refresh
requiring_refresh_count = GitlabSubscription.where('last_seat_refresh_at = NULL OR last_seat_refresh_at <= ?', 1.day.ago ).count => 1 # or however many you have
- Optional: if the previous command returned 0 then we can update a subscription to qualify with:
GitlabSubscription.last.update(last_seat_refresh_at: 2.months.ago)
- Enqueue the limited capacity job:
GitlabSubscriptions::ScheduleRefreshSeatsWorker.new.perform
- Confirm the subscriptions were updated
requiring_refresh_count = GitlabSubscription.where('last_seat_refresh_at = NULL OR last_seat_refresh_at <= ?', 1.day.ago ).count => 0 # if it was successful
Background
Every subscription for GitLab.com is represented by a GitlabSubscription
.
The GitlabSubscription
contains 3 key pieces of information:
-
max_seats_used
- the maximum number of billable seats theNamespace
has used -
seats_in_use
- the current number of billable seats theNamespace
is using -
seats_owed
- the number of seats the customer needs to pay for
To keep these attributes up to date, an existing worker runs every day at midnight UTC that:
- iterates over every single
GitlabSubscription
- refreshes the seat attributes for each one
- updates the DB records via a manual SQL
UPDATE
to be more performant (oneUPDATE
query for each batch of subscriptions)
The Problem
- The worker that runs each day has historically been prone to error (gitlab-com/gl-infra/scalability#1116 (closed)) due to timeouts
- The existing job is very long running, and so is at risk of being interrupted (e.g. pod or process restart), resulting in namespaces not having their seat attributes updated, and it’s time to run will only ever increase as we increase our number of subscriptions on GitLab.com
- The manual SQL means we bypass any callbacks defined in the model
The Solution
The solution is to replace the one job with Limited Capacity jobs: Sidekiq limited capacity worker.
Doing so will allow us to have:
- One quick running job per
GitLabSubscription
- Loop over all
GitlabSubscription
without fear of interruption - Use “normal” update methods and avoid bypassing the regular lifecycle hooks/callbacks
Recalculating the seat attributes is important for billing and usage statistics, so the plan is to add the new limited capacity worker behind a feature flag (rollout issue) so that we can have both running at the same time initially.
Once we have confirmed the new job is working as expected, we can remove the old job and the feature flag.
How will it work?
The limited capacity setup will essentially do the following:
- A cron job will schedule the seat attribute refresh every 6 hours
- The refresh worker will:
- Look for the next
GitlabSubscription
that has not been refreshed in the last 24 hours - Immediately update the last refreshed timestamp (
last_seat_refresh_on
) so that it doesn’t get picked up by a parallel job - Refresh the seats for that subscription
- Look for the next
- The scheduler will queue a new job if there is remaining work and the maximum number of running jobs haven’t already been queued
The MRs
Replacing the existing job involves adding 2 workers and a database change. So to make it easier to review, it’s been split into the following MRs:
Title | Link | Stage |
---|---|---|
Add the required DB column | !103937 (merged) | in review |
Add the new LimitedCapacity worker | !104099 (merged) | blocked |
Add the scheduler worker | !104705 (closed) |
|
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.