Draft: Improve delete object specs
This MR depends on !129214 (merged) being merged first. Will rebase on master once the first MR is merged. For now, only look at the last commit.
These specs were extracted from the original MR (!128279 (merged)) that was reverted (!128995 (merged)) because of these specs intermittently failing.
After further investigation on what was causing the flakiness, it turns out it was of inconsistent behavior of the query planner during the loading of batch to update here:
module Ci
class DeleteObjectsService
# ...
def load_next_batch
# `find_by_sql` performs a write in this case and we need to wrap it in
# a transaction to stick to the primary database.
Ci::DeletedObject.transaction do
Ci::DeletedObject.find_by_sql([next_batch_sql, new_pick_up_at: RETRY_IN.from_now])
end
end
def next_batch_sql
<<~SQL.squish
UPDATE "ci_deleted_objects"
SET "pick_up_at" = :new_pick_up_at
WHERE "ci_deleted_objects"."id" IN (#{locked_object_ids_sql})
RETURNING *
SQL
end
# ...
end
end
Actual query:
TRANSACTION (0.2ms) BEGIN
UPDATE "ci_deleted_objects" SET "pick_up_at" = '2023-08-12 14:34:22.294497' WHERE "ci_deleted_objects"."id" IN (SELECT "ci_deleted_objects"."id" FROM "ci_deleted_objects" WHERE (pick_up_at < '2023-08-12 14:24:22.290522') ORDER BY "ci_deleted_objects"."pick_up_at" ASC LIMIT 100 FOR UPDATE SKIP LOCKED) RETURNING *
TRANSACTION (0.1ms) COMMIT
In the case of the flaky specs:
# spec/services/ci/delete_objects_service_spec.rb
it 'limits the number of records removed' do
stub_const("#{described_class}::BATCH_SIZE", 1)
expect { execute }.to change { Ci::DeletedObject.count }.by(-1)
end
it 'removes records in order' do
stub_const("#{described_class}::BATCH_SIZE", 1)
execute
expect { past_ready.reload }.to raise_error(ActiveRecord::RecordNotFound)
expect(ready.reload.present?).to be_truthy
end
These were intermittently failing because even though we stub BATCH_SIZE
to 1
, the LIMIT 1
condition is not consistently followed by the UPDATE
query. Reference https://stackoverflow.com/questions/75755863/limit-1-not-respected-when-used-with-for-update-skip-locked-in-postgres-14.
So in some cases depending on which route the query planner decides to use based on the state of the test database, it would delete more than what was expected.