Improve the efficiency of non-strictly monotonic (non-gapless) IIDs for pipelines
Introduction
In gitlab-com/gl-infra/production#4051 (closed), the primary database on GitLab.com suffered from contention on the internal_ids
table.
This lead to downstream saturation in pgbouncer, sidekiq, web, api, and git services.
Cause
The cause of this contention was slow client transactions locking internal_ids
with SELECT FOR UPDATE
, which were then blocking other transactions from obtaining an ID.
This is because the blocked transactions cannot progress until the previous IID has been committed to the database, since we rely on strictly monotonic IDs: that is, each ID follows the previous one by exactly 1
and there are never any gaps.
Proposal
- For some high contention classes, allow non-strict monotonic IDs: that is, each ID is greater than the previous one, but occasionally there may be gaps, for example when an ID is issued, when the preceding requesting transaction rolls back.
- Allow certain classes,
Pipeline
to use non-strict monotonic sequences - It's unlikely that any user would notice the occasional missing ID for pipelines
This could potentially be done with a new mixin, NonStrictAtomicInternalId
(for example).
# For scaling purposes, allow non-strict monotonic sequences
module Ci
class Pipeline < ApplicationRecord
extend Gitlab::Ci::Model
include NonStrictAtomicInternalId # instead of AtomicInternalId
Implementation
Obtaining the ID would need to be done in a non-nested separate transaction, and therefore through a separate connection to Postgres.
The commit would be issued immediately on this transaction, and passed back to the caller (or even better, an implicit transaction could be used for the single statement).
I'm not completely sure that this approach will work, hence this is a discussion. However, even if it's a little complicated, theres a good chance that this approach will help us reduce the locking on our primary instance and potentially avoid other incidents like the one we saw today.