Optimize CSV export to use find_each
As seen in #360012 (closed),
exporting issues to CSV using find_each
can hit a database timeout
since an ORDER BY id is needed. Use each_batch
instead.
This commit adds a feature flag, export_csv_preload_in_batches
, that
will preload associations in the batch load itself to reduce the work
the database needs to load all issues.
How to test locally
-
Create lots of issues for a project:
project = Project.last user = User.last 100000.times do |index| issue = Issue.new(author: user, description: 'test issue', title: "test issue #{index}", project: project) issue.save! end
-
Attempt to export the CSV.
service = Issues::ExportCsvService.new(project.issues, project); nil output = File.open("/tmp/test.csv", "w") output.write(service.csv_data); nil output.close
-
Before the feature flag is enabled, you'll see this query:
Issue Load (1.3ms) SELECT "issues".* FROM "issues" WHERE "issues"."project_id" = 52 ORDER BY "issues"."id" ASC LIMIT 1000 /*application:console,db_config_name:main,console_hostname:jet-arm.local,console_username:stanhu,line:/lib/csv_builder.rb:85:in `each'*/
-
Enable the feature flag:
Feature.enable(:export_csv_preload_in_batches)
-
After the feature flag is enabled, you'll see these queries:
Issue Load (3.4ms) SELECT "issues"."id" FROM "issues" WHERE "issues"."project_id" = 52 ORDER BY "issues"."id" ASC, "issues"."created_at" ASC LIMIT 1 /*application:console,db_config_name:main,console_hostname:jet-arm.local,console_username:stanhu,line:/app/models/concerns/each_batch.rb:62:in `each_batch'*/ Issue Load (3.0ms) SELECT "issues"."id" FROM "issues" WHERE "issues"."project_id" = 52 AND "issues"."id" >= 503 ORDER BY "issues"."id" ASC, "issues"."created_at" ASC LIMIT 1 OFFSET 1000 /*application:console,db_config_name:main,console_hostname:jet-arm.local,console_username:stanhu,line:/app/models/concerns/each_batch.rb:81:in `block in each_batch'*/ Issue Load (1.4ms) SELECT "issues".* FROM "issues" WHERE "issues"."project_id" = 52 AND "issues"."id" >= 503 AND "issues"."id" < 1503 /*application:console,db_config_name:main,console_hostname:jet-arm.local,console_username:stanhu,line:/lib/csv_builder.rb:82:in `block in each'*/
gitlab-org/gitlab
issues
Database Lab comparison with Original query
SELECT "issues".* FROM "issues" WHERE "issues"."project_id" = 278964 ORDER BY "issues"."id" ASC LIMIT 1000
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/14996/commands/52157 shows 8 minutes!
Time: 8.086 min
- planning: 6.708 ms
- execution: 8.086 min
- I/O read: 20.263 min
- I/O write: 0.000 ms
Shared buffers:
- hits: 224924 (~1.70 GiB) from the buffer pool
- reads: 930960 (~7.10 GiB) from the OS file cache, including disk I/O
- dirtied: 2724 (~21.30 MiB)
- writes: 0
each_batch
queries:
New Start query
SELECT "issues"."id" FROM "issues" WHERE "issues"."project_id" = 278964 ORDER BY "issues"."id" ASC, "issues"."created_at" ASC LIMIT 1
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/14996/commands/52155
Time: 2.129 s
- planning: 0.444 ms
- execution: 2.128 s
- I/O read: 5.569 s
- I/O write: 0.000 ms
Shared buffers:
- hits: 116824 (~912.70 MiB) from the buffer pool
- reads: 7334 (~57.30 MiB) from the OS file cache, including disk I/O
- dirtied: 1699 (~13.30 MiB)
- writes: 0
https://gitlab.com/gitlab-org/gitlab/-/blob/f50075762cf33d3841b88bb191770776b07ede77/app/models/concerns/each_batch.rb#L81
Stop:SELECT "issues"."id" FROM "issues" WHERE "issues"."project_id" = 278964 AND "issues"."id" >= 382515 ORDER BY "issues"."id" ASC, "issues"."created_at" ASC LIMIT 1 OFFSET 1000
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/14996/commands/52156
Time: 634.853 ms
- planning: 0.704 ms
- execution: 634.149 ms
- I/O read: 1.393 s
- I/O write: 0.000 ms
Shared buffers:
- hits: 116158 (~907.50 MiB) from the buffer pool
- reads: 6176 (~48.30 MiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0
Batched query
SELECT "issues".* FROM "issues" WHERE "issues"."project_id" = 278964 AND "issues"."id" >= 382515 AND "issues"."id" < 383515 ORDER BY "issues"."id" ASC
https://postgres.ai/console/gitlab/gitlab-production-tunnel-pg12/sessions/14996/commands/52158
Time: 34.173 ms
- planning: 2.983 ms
- execution: 31.190 ms
- I/O read: 28.256 ms
- I/O write: 0.000 ms
Shared buffers:
- hits: 55 (~440.00 KiB) from the buffer pool
- reads: 461 (~3.60 MiB) from the OS file cache, including disk I/O
- dirtied: 0
- writes: 0