Allow idempotent jobs to use load balancing
What does this MR do?
In #322452 (closed) we allowed Sidekiq jobs to use read-only database replicas.
When a job for an idempotent worker is enqueued while another unstarted job is already in the queue, GitLab will drop the second job. If a worker is using LB capabilities, the job will be executed and take into account only the first write-ahead location.
The deduplication should always take into account the latest write-ahead location into account, not the first one.
Proposed solution:
- Replication strategy will store the latest WAL locations in Redis in case that the job is dropped due to the deduplication strategy.
- We store this WAL location only if the job is configured to utilize LB capabilities (
data_consistency != :always
) and the write-ahead location has changed. - On job execution, the replication strategy will update job's hash with
dedup_wal_locations
containing the latest location from the Redis cache (in case it exists). - SidekiqServerMiddleware will use an updated
dedup_wal_locations
location when checking if the replica has caught up.
Support for multiple databases
For the Sidekiq that are using load balancing capabilities, we are currently storing write-ahead log location in job hash
job['database_write_location'] # contains primary write-ahead log location
job['database_replica_location'] # contains replica write-ahead log location in case we schedule a job from replica
In order to support multiple databases, we will update this format and add support for structure that will map correct wal location to corresponding database configuration name:
job['wal_locations'] = {
main: 'A0/A12FB',
ci: '0/B12EA'
}
For deduplication, each time when the job gets rejected, we will keep track of the latest wal_location per each database name.
Note: Until we add support for multiple databases to our load balancer, we will use the hardcoded key, only for the main
database:
job['wal_locations'] = {
main: 'A0/A12FB'
}
Scheduling jobs in the future:
GitLab doesn’t skip jobs scheduled in the future, as we assume that the state has changed by the time the job is scheduled to execute. Workers that are utilizing load balancing capabilities to read from replicas are scheduled in the future for 1 second, in order to give the replication process time to complete (!61501 (merged))
I created a separate MR to add Rubocop rule to ensure that we include scheduled jobs when deduplicating workers that are utilising LB capabilities
Sreenshots or Screencasts (strongly suggested)
Does this MR meet the acceptance criteria?
Conformity
-
I have included changelog trailers, or none are needed. (No need, feature is behind the ff, which is disabled by default. Does this MR need a changelog?) -
I have added/updated documentation, or it's not needed. (Documentation will be update when we rollout the ff.Is documentation required?) -
I have properly separated EE content from FOSS, or this MR is FOSS only. (Where should EE code go?) -
I have added information for database reviewers in the MR description, or it's not needed. (Does this MR have database related changes?) -
I have self-reviewed this MR per code review guidelines. -
This MR does not harm performance, or I have asked a reviewer to help assess the performance impact. (Merge request performance guidelines) -
I have followed the style guides. -
This change is backwards compatible across updates, or this does not apply.
Availability and Testing
-
I have added/updated tests following the Testing Guide, or it's not needed. (Consider all test levels. See the Test Planning Process.) -
I have tested this MR in all supported browsers, or it's not needed. -
I have informed the Infrastructure department of a default or new setting change per definition of done, or it's not needed.
Security
Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.
-
Label as security and @ mention @gitlab-com/gl-security/appsec
-
The MR includes necessary changes to maintain consistency between UI, API, email, or other methods -
Security reports checked/validated by a reviewer from the AppSec team
Related to #325291 (closed)