Fix Geo verification state backfill job can exceed batch size
What does this MR do and why?
In certain cases,
Geo::VerificationStateBackfillWorker
can operate on a batch of rows which exceeds the specified batch size. Causing or additionally contributing to memory spikes #429242 (closed).
Fixes #430913 (closed)
The edge case is when there are a lot of unused IDs to skip over.
This MR fixes the underlying code so that it does not return a batch that exceeds the specified batch size.
It also fixes Gitlab::Geo::RegistryConsistencyWorker
, but it may be that we didn't notice a performance issue with that worker because it does a bulk insert rather than iterating over each row and instantiating an ActiveRecord object and saving it.
How to set up and validate locally
This is a bug in a background job that runs every minute. It's hard to test as a black box. The following validates the fix in Rails console.
Demonstrate bug on master branch:
- Already have GDKs with Geo
- Already have 5 or more projects which are checksummed on the primary
- Change to the primary GDK's gitlab directory:
cd /path/to/primary/gdk/gitlab
- Stop GDK
gdk stop
- Start only the backend services
gdk start gitaly postgresql postgresql-geo redis
- In Rails console,
Geo::ProjectState.count
should return at least5
- In Rails console, delete 4
project_states
rows with the lowestproject_id
s. Run:Geo::ProjectState.order(:project_id).limit(4).delete_all
- In Rails console,
Geo::ProjectState.count
should now return 4 fewer than before - In Rails console, run
VerificationStateBackfillService
once for a batch of 2 projects, starting from ID 1:s = Geo::VerificationStateBackfillService.new(Project, batch_size: 2); b = Gitlab::Geo::BaseBatcher.new(s.replicable_model, s.verification_state_table_class, s.verification_state_model_key, key: s.send(:batcher_key)); b.send(:reset); s.execute
- Notice that all 4
project_states
rows got inserted, even though batch size is 2
Demonstrate fix on this MR's branch:
- Already have GDKs with Geo
- Already have 5 or more projects which are checksummed on the primary
- Change to the primary GDK's gitlab directory:
cd /path/to/primary/gdk/gitlab
- Stop GDK
gdk stop
- Start only the backend services
gdk start gitaly postgresql postgresql-geo redis
- In Rails console,
Geo::ProjectState.count
should return at least5
- In Rails console, delete 4
project_states
rows with the lowestproject_id
s. Run:Geo::ProjectState.order(:project_id).limit(4).delete_all
- In Rails console,
Geo::ProjectState.count
should now return 4 fewer than before - In Rails console, run
VerificationStateBackfillService
once for a batch of 2 projects, starting from ID 1:s = Geo::VerificationStateBackfillService.new(Project, batch_size: 2); s.send(:batcher).send(:reset); s.execute
- Notice that 2
project_states
rows got inserted, as expected
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.