Use the DatabaseCleaner 'deletion' strategy instead of 'truncation'
What does this MR do?
Uses the DatabaseCleaner deletion strategy instead of the truncation strategy
Are there points in the code the reviewer needs to double check?
Performance is a question. Truncation is a fixed cost, whereas deletion costs scale with the number of rows. Mostly, we have only a few rows per spec, and others have seen performance improvements by switching to deletion in that case, e.g. http://sevenseacat.net/posts/2015/use-database-cleaners-deletion-strategy/
Some gitlab-specific examples:
spec/models/merge_request_spec.rb
Mostly transaction specs, a couple of truncation specs.
# Before (truncation)
Finished in 1 minute 10.82 seconds (files took 10.23 seconds to load)
236 examples, 0 failures
real 1m21.612s
user 0m52.324s
sys 0m3.740s
# After (deletion)
Finished in 1 minute 8.01 seconds (files took 9.84 seconds to load)
236 examples, 0 failures
real 1m18.482s
user 0m52.084s
sys 0m3.332s
Performance of transaction-using specs is unaffected.
spec/features/users_spec.rb
All :js
specs
# Before
Finished in 47.15 seconds (files took 10.01 seconds to load)
10 examples, 0 failures
real 0m57.846s
user 0m20.140s
sys 0m3.772s
# After
Finished in 23.16 seconds (files took 10.21 seconds to load)
10 examples, 0 failures
real 0m34.076s
user 0m20.616s
sys 0m3.848s
Performance is improved significantly \o/
The overall effect on the test suite remains to be seen, but I'm hopeful that it is a significant improvement. If it's neutral, or even a slight regression, I still think it's worth it for solving the sequence issue. I've tried several approaches now, and they've all had drawbacks.
Why was this MR needed?
Most of our specs run using the DatabaseCleaner 'transaction' strategy, which is fast and good. It also has the property of not resetting sequences, so project IDs (and so, with hashed storage, on-disk repositories) never get re-used.
However, some - notably capybara, javacript, database migration, multi-threaded and (for Geo) FDW - specs require the data to be visible from multiple database connections, so we can't use transactions.
Currently, we use the truncation strategy instead. However, this has the undesirable property of resetting sequences, leading to order-dependent spec failures (some specs will fail if they happen to come after a spec that uses the truncation strategy).
The 'deletion' strategy allows data to be seen from multiple connections, is faster than truncation for our use case (but slower than transaction), and leaves sequences untouched, so is a better choice for those specs that are currently using truncation.
Screenshots (if relevant)
Does this MR meet the acceptance criteria?
-
Documentation created/updated -
Tests added for this feature/bug - Review
-
Has been reviewed by Backend -
Has been reviewed by Database
-
-
Conform by the merge request performance guides -
Conform by the style guides -
Squashed related commits together -
Internationalization required/considered
What are the relevant issue numbers?
Related to #40744 (closed)
Closes #30783 (closed)