Skip to content

Speed up project exports by moving the archive to the cache dir

Stan Hu requested to merge sh-speed-up-export into master

What does this MR do and why?

As described in https://github.com/carrierwaveuploader/carrierwave#large-files, CarrierWave will first copy a file into the cache and then copy the file to its final store.

Previously the move_to_cache value was always set to false, which meant that CarrierWave would always copy the generated archive file (e.g. /path/tmp/work) to the cache dir (e.g. /path/tmp/cache). Since both paths are on the same filesystem, this copy is unnecessary and slows down the generation of project exports.

To ensure files are moved instead of copied, we can just inherit from the GitlabUploader implementation of move_to_cache, which returns true if it's a local file, false otherwise. We have to be careful to only allow this optimization for project/group exports because imports might be importing a static template.

For my test with the Linux kernel, this change saved 47 seconds of unnecessary I/O with a 3.4 GB archive.

Relates to #349425 (closed)

How to set up and validate locally

  1. Create a new project by importing the Linux kernel or some large repository.
  2. Generate a project export.
  3. Observe the total time for the Sidekiq job to complete.

You can see the effect with a simple test. Download the export and try to store it manually:

Before

With a 3.4 GB export.tar.gz on disk, notice how this took 47.5 s:

upload = ImportExportUpload.find_by(project_id: <your exported project>)
Benchmark.measure { upload.export_file = File.open('/tmp/export.tar.gz') }
=> #<Benchmark::Tms:0x00007f4fe0a3b470 @label="", @real=47.504550284007564, @cstime=0.0, @cutime=0.0, @stime=4.314954, @utime=1.8556370000000015, @total=6.170591000000002>

After

Notice how the time drops to 0.001 s:

upload = ImportExportUpload.find_by(project_id: <your exported project>)
irb(main):026:0> Benchmark.measure { upload.export_file = File.open('/tmp/export.tar.gz') }
=> #<Benchmark::Tms:0x00007f4fe1ba3370 @label="", @real=0.0015618450124748051, @cstime=0.0, @cutime=0.0, @stime=0.0003949999999974807, @utime=0.0011679999999998358, @total=0.0015629999999973165>

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Stan Hu

Merge request reports

Loading