Speed up project exports by moving the archive to the cache dir
What does this MR do and why?
As described in https://github.com/carrierwaveuploader/carrierwave#large-files, CarrierWave will first copy a file into the cache and then copy the file to its final store.
Previously the move_to_cache
value was always set to false
, which
meant that CarrierWave would always copy the generated archive file
(e.g. /path/tmp/work
) to the cache dir
(e.g. /path/tmp/cache
). Since both paths are on the same filesystem,
this copy is unnecessary and slows down the generation of project
exports.
To ensure files are moved instead of copied, we can just inherit from
the GitlabUploader
implementation of move_to_cache
, which returns
true
if it's a local file, false
otherwise. We have to be careful
to only allow this optimization for project/group exports because
imports might be importing a static template.
For my test with the Linux kernel, this change saved 47 seconds of unnecessary I/O with a 3.4 GB archive.
Relates to #349425 (closed)
How to set up and validate locally
- Create a new project by importing the Linux kernel or some large repository.
- Generate a project export.
- Observe the total time for the Sidekiq job to complete.
You can see the effect with a simple test. Download the export and try to store it manually:
Before
With a 3.4 GB export.tar.gz
on disk, notice how this took 47.5 s:
upload = ImportExportUpload.find_by(project_id: <your exported project>)
Benchmark.measure { upload.export_file = File.open('/tmp/export.tar.gz') }
=> #<Benchmark::Tms:0x00007f4fe0a3b470 @label="", @real=47.504550284007564, @cstime=0.0, @cutime=0.0, @stime=4.314954, @utime=1.8556370000000015, @total=6.170591000000002>
After
Notice how the time drops to 0.001 s:
upload = ImportExportUpload.find_by(project_id: <your exported project>)
irb(main):026:0> Benchmark.measure { upload.export_file = File.open('/tmp/export.tar.gz') }
=> #<Benchmark::Tms:0x00007f4fe1ba3370 @label="", @real=0.0015618450124748051, @cstime=0.0, @cutime=0.0, @stime=0.0003949999999974807, @utime=0.0011679999999998358, @total=0.0015629999999973165>
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.